Research Guides: Text Mining With the Hathi Trust Research Center: Additional Text Analysis Tools

Tools

Voyant
The easiest to use web text analysis tool. Voyant is free and allows users to upload or paste text. The program determines word frequencies, colocates and display them graphically.
Wordle

Creates a word cloud from your own text.
MALLET
MALLET (MAchine Learning for LanguagE Toolkit) is a collection of tools that facilitate document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization.
WordSeer
WordSeer is a collection of text analysis tools targeted at humanities scholars that includes side-by-side comparison, grammatical search, and document/sentence/word-set features.
Google Books Ngram Viewer FREE
Charts the frequencies of any word or short sentence using yearly count of n-grams found in the sources printed between 1500- present. If you are interested in performing a large scale analysis on the underlying data, download of the corpora is available
Google Books BYU View FREE
Compares The Corpus of Historical American English (COHA), Google Books (Standard), and the Google Books (BYU / Advanced) corpus in NGrams.
Cultoromics Bookworm Viewer FREE
Developed by Culturomics at Harvard, it is an interface tool for queries in the Google Books corpus. Users can run queries in highly selective corpora based on subject (books on world history, American books on science, etc.) though these corpora are much smaller than those in the full Google Books collection.

JSTOR Data for Research
Data for Research is a free data mining tool for journal content on JSTOR, available to the public. It provides the ability to obtain data sets via bulk downloads, and includes a faceted search interface, online viewing of document-level data, downloadable datasets (including word frequencies, citations, key terms, and ngrams)

Internet Archive and Open Library
The Internet Archive and Open Library offers over 6,000,000 fully accessible public domain eBooks.
Oxford Text Archive
Collection of more than 5,000 texts, more than 2,000 of which have been marked up and keyed in by hand. Includes a large number of early English texts from the ECCO-TCP collection as well as all of Shakespeare and other works.
Hathi Trust Research Center
The Hathi Trust Research Center provides access for non-profit and academic users to the data behind the millions of books within the Hathi Trust.
Chronicling America
Full text of hundreds of pre-1923 American Newspapers made available by the Library of Congress.
Mark Davies' Corpora Site
Mark Davies at BYU hosts several large corpora including a 100+ million word corpus of Time Magazine (1923-2006). His corpa in (English, Spanish, Portuguese), which are widely used..
MONK Project
Monk at the University of Illinois provides access to the full text of 525 works of pre-1900 American Litearture as well as many of the works of William Shakespeare.
Open Culture
Free cultural and educational media, including ebooks.
Humanities Data from Michigan State University
Full text of Sunday school books, cookbooks, and a college newspaper.
Project Gutenberg
Thousands of out-of-copyright books and digital texts.
Open Library
"One web page for every book." Browse millions of book titles, many of which are available to read online or download.