Skip to main content

Text Mining With the Hathi Trust Research Center


       Text mining or text analysis are blanket terms for analyzing documents with software tools. It is a methodological approach and is discipline agnostic. Text analysis is performed on a group of material that are designed to answer specific questions.  For example the material can be  all material by a certain author, all works in a subgenre, or all works by a certain set of authors. The Hathi Trust Research Center (HTRC) is the research arm of HathiTrust. The HTRC facilitates scholarly research providing mechanisms for researchers to access content and study it using computational tools for text analysis 

        Most HTRC services require an account. Register for an account by going to the Portal and choosing "Sign up" from the menu at Anyone with an email address from a nonprofit institution of higher education is allowed to register, including those whose institutions are not HathiTrust members. (UNH is a HathiTrust member)

         You can create a workset of books in HaihiTrust Digital Library and import this into HTRC to run basic algorithms. It is also possible to work with HTRC to gain access to the entire HathiTrust corpus, including materials still in copyright, to use in nonconsumptive research* activities. In the 2010 Authors Guild vs Google amended settlement agreement states: "Non-Consumptive Research" means research in which computational analysis is performed on one or more Books, but not research in which a researcher reads or displays substantial portions of a Book to understand the intellectual content presented within the Book.”  Non-consumptive analytics includes image analysis, text extraction, textual analysis and information extraction, linguistic analysis, automated translation, and indexing and search. There is more on HathiTrust's Non-Consumptive Use Research Policy available here.


HTRC documentation link

Getting Started Guide                
HTRC's documentation and FAQ to get you started.

HTRC provides extensive documentation on the Tools, including instruction videos, tutorials, presentations, examples and Getting Started FAQs.