Bagga, Sunyam, and Andrew Piper. "HATHI 1M: Introducing a Million Page Historical Prose Dataset in English from the Hathi Trust." Journal of Open Humanities Data, 8, (2022).

This paper presents a new dataset built on prior work consisting of 1,671,370 randomly sampled pages of English-language prose roughly divided between modes of fictional and non-fictional writing and published between the years 1800 and 2000.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Please sign in or register for FREE

If you are a registered user on Open Research Community, please sign in