The easiest to use web text analysis tool. Voyant is free and allows users to upload or paste text. The program determines word frequencies, colocates and display them graphically.
Creates a word cloud from your own text.
MALLET (MAchine Learning for LanguagE Toolkit) is a collection of tools that facilitate document classification, sequence tagging, and topic modeling. There is also an add-on toolkit (Graphical Models in MALLET) for visualization.
WordSeer is a collection of text analysis tools targeted at humanities scholars that includes side-by-side comparison, grammatical search, and document/sentence/word-set features.
Google Books Ngram Viewer FREE
Charts the frequencies of any word or short sentence using yearly count of n-grams found in the sources printed between 1500- present. If you are interested in performing a large scale analysis on the underlying data, download of the corpora is available
Google Books BYU View FREE
Compares The Corpus of Historical American English (COHA), Google Books (Standard), and the Google Books (BYU / Advanced) corpus in NGrams.
Cultoromics Bookworm Viewer FREE
Developed by Culturomics at Harvard, it is an interface tool for queries in the Google Books corpus. Users can run queries in highly selective corpora based on subject (books on world history, American books on science, etc.) though these corpora are much smaller than those in the full Google Books collection.
Data for Research is a free data mining tool for journal content on JSTOR, available to the public. It provides the ability to obtain data sets via bulk downloads, and includes a faceted search interface, online viewing of document-level data, downloadable datasets (including word frequencies, citations, key terms, and ngrams)
The Internet Archive and Open Library offers over 6,000,000 fully accessible public domain eBooks.
Collection of more than 5,000 texts, more than 2,000 of which have been marked up and keyed in by hand. Includes a large number of early English texts from the ECCO-TCP collection as well as all of Shakespeare and other works.
The Hathi Trust Research Center provides access for non-profit and academic users to the data behind the millions of books within the Hathi Trust.
Full text of hundreds of pre-1923 American Newspapers made available by the Library of Congress.
Mark Davies at BYU hosts several large corpora including a 100+ million word corpus of Time Magazine (1923-2006). His corpa in (English, Spanish, Portuguese), which are widely used..
Monk at the University of Illinois provides access to the full text of 525 works of pre-1900 American Litearture as well as many of the works of William Shakespeare.
Free cultural and educational media, including ebooks.
Full text of Sunday school books, cookbooks, and a college newspaper.
Thousands of out-of-copyright books and digital texts.
"One web page for every book." Browse millions of book titles, many of which are available to read online or download.