Text mining can be used to address questions such as:
The Hathi Trust Research Center (HTRC) , the research arm of HathiTrust, facilitates scholarly research by providing mechanisms for researchers to access content and providing computational tools for text analysis.
Most HTRC services require an account. Register for an account by going to the Portal and choosing "Sign up" from the menu at analytics.hathitrust.org. Anyone with an email address from a nonprofit institution of higher education is allowed to register, including those whose institutions are not HathiTrust members. (UNH is a HathiTrust member)
You can create a workset of books in HatihiTrust Digital Library and import this into HTRC to run basic algorithms. It is also possible to work with HTRC to gain access to the entire HathiTrust corpus, including materials still in copyright, to use in nonconsumptive research* activities. In the 2010 Authors Guild vs Google amended settlement agreement states: "Non-Consumptive Research" means research in which computational analysis is performed on one or more Books, but not research in which a researcher reads or displays substantial portions of a Book to understand the intellectual content presented within the Book.” Non-consumptive analytics includes image analysis, text extraction, textual analysis and information extraction, linguistic analysis, automated translation, and indexing and search. There is more on HathiTrust's Non-Consumptive Use Research Policy available here.
Getting Started Guide
HTRC's documentation and FAQ to get you started.
Graphically explore language trends over time in millions of volumes in the Hathi Trust Digital Library.