Bagga, Sunyam, and Andrew Piper. "HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust." Journal of Open Humanities Data, 8, (2022).
This paper presents a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000.
Like
Be the first to like this
Please sign in or register for FREE
If you are a registered user on Open Research Community, please sign in