The Evolution of Preprints as an Open Access Format for Scholarly Publishing: Market Forces and Recent Developments

Whereas a comprehensive overview of the history of preprint servers indicates their explosive growth over recent decades, empirical findings also show that, in the almost frictionless market of preprint publishing, concentration and convergence dynamics are at play.

Like Comment

In their preprint paper, published on February 17, 2021, at the arXiv server, Boya Xie, Shen Zhihong and Kuansan Wang offer a broad diachronic and analytic perspective on the evolution of preprints from the early 1990s to 2020s. Incidentally, their findings on the share of different paper repositories in the preprint sector also demonstrate that, other than showing a predominant presence of physics (31%), mathematics (19%), computer science (12%) and economics (11%) in the overall preprint output, it is a highly concentrated market with only few dominant players These are arXiv, SSRN and bioRxiv with estimated 1.8 million, 950,733, and 98,301 papers respectively. Moreover, as this paper indicates, whilst the total output of scholarly papers stands at 69.36 million, the overall output of preprints, defined in this study as including both postprints and working papers, such as in economics, only amounts to 2.83 million. Yet, out of the total preprint volume, only around 41% (1.15 million) were found to be associated with published papers, so that the majority of preprints (59% or 1.68 million) have not undergone journal-level peer review and text editing procedures. Likewise, for only about 1.66% of published scholarly papers, their preprint versions were found to be also available at repository servers, which is likely due to the relative recency of this phenomenon (Xie, Shen and Wang, 2021, pp. 1-2).

Consequently, despite the media attention lavished upon preprints, e.g., in the pandemic period, preprints, as a scholarly output format, account for a minor share only of the knowledge production market. They have also historically demonstrated a consistent growth in their yearly numbers from near zero to 226,861 preprints between 1991 and 2019. However, their yearly share of the total scholarly paper output has been growing more steeply in the 1990s, with the emergence of the technical preconditions for their circulations, such as personal computers, networking protocols and file formats. While recent years have also seen an accelerated growth in the share of preprints, as part of total output, in recent years, given that it was projected to attain 6.4% in 2020, their de facto output peak has likely been attained in 2019 with a share of yearly paper output standing around 5.5%. In other words, as this paper suggests, the two pioneering fields of scholarly research, e.g., physics and mathematics, have registered marked declines of around 3% in their shares of yearly preprint output which fell to around 34% and 32% respectively in 2020. Though the share of preprints in computer and biological sciences has grown rapidly in the last decade for the former and in the last five years for the latter, to attain close to 17% and 8% of the yearly total of all papers respectively in 2020, except for economics, in other scholarly fields no significant growth in the share of preprints has been found in recent years or longer time spans (Xie, Shen and Wang, 2021, p. 4).

One may, thus, argue that as the number of preprint servers and papers grew in recent decades, their competitive advantage in the academic market has experienced a decline. This can be not unrelated to the steep declines in the periods between paper submission and publication at major preprint servers, such as NBER, SSRN and arXiv, especially between 2015 and 2020. Likewise, after an absolute peak in the early 1990s and a more recent, relative peak in 2005, the citation advantage of preprint deposition for both published articles and standalone, unpublished papers has been continuously declining between around 2005 and 2020. In other words, for all disciplines researched in this study, on average the citation impact of preprints has been stagnant, fluctuating or declining, such as for physics, biology and economics respectively, between 2000 and 2015, while showing rapid decreases in the last five years across disciplines, in step with falling publication rates in recent years across scholarly fields and major repositories (Xie, Shen and Wang, 2021, pp. 5-8).       

By Pablo Markin


Xie, Boya, Zhihong Shen, and Kuansan Wang. "Is Preprint the Future of Science? A Thirty-Year Journey of Online Preprint Services." arXiv preprint arXiv:2102.09066 (2021).


Featured Image Credits:  La Jolla Tidepools with Reflection, San Diego, CA, USA, January 23, 2016 | © Courtesy of Photos By Clark/Flickr.

Pablo Markin

Community Manager, Open Research Community


Go to the profile of Pablo Markin
about 1 month ago

Though the article analyzed in this post did not provide clear-cut indications on whether preprint repositories can be considered to be the platforms modeling or representing the future of Open Access publishing, e.g., for scholarly papers, it can be tentatively suggested that models that do not involve author-facing fees or broadly-based funding are likely to attain limits to their growth, as its trajectory reaches an inflection point, while remaining niche phenomena. Yet this can also be the case not only due to internal limitations of respective models but also as a consequence of external market forces, such as competitive pressures that preprint platforms experience and the important role of impact factor performance for author-level publication decisions.