I think there is a flaw in the methodology. The first and second datasets do not seem comparable. The first is a set of papers that were downloaded from SciHub between two dates. There is no explanation as to why they were chosen to be downloaded. The second set comprises papers from the same journals that were published in the same date range. It's not clear if these papers are in SciHub or not (since ~85% of paywalled are in SciHub, I suspect most are also in SciHub).
Recent Comments
I think there is a flaw in the methodology. The first and second datasets do not seem comparable. The first is a set of papers that were downloaded from SciHub between two dates. There is no explanation as to why they were chosen to be downloaded. The second set comprises papers from the same journals that were published in the same date range. It's not clear if these papers are in SciHub or not (since ~85% of paywalled are in SciHub, I suspect most are also in SciHub).