Two Methods of Indexing Books:
For a Definition of OCR (Optical Character Recognition):
The accuracy of OCR scanning depends on the OCR software used and especially on the quality of the image being OCR’d. Many search engines for OCR text (for example Ancestry.com’s search of historical newspapers) frequently return as matches pages that contain the given name anywhere on the page and the surname anywhere on the page—and not necessarily the given and surnames together on a page.
You often get better search results when you search for a combination of surnames together. This is especially important when searching for common surnames. If a keyword search is available, consider listing geographic or other identifiers to narrow your search. If the search engine you are using does not allow wildcard and/or Soundex searches, you will need to do repetitive searches on spelling variations of the names sought.
When considering the size of online (electronically imaged) book collections, remember the Library of Congress has a collection of over 100,000 local history books for the United States. The greatest numbers of county and municipal histories were published in the 1880s and 1890s. They are now out-of-copyright and can legally be scanned, OCR’d, and made searchable, viewable, and printable online. Many different organizations and institutions are doing so.
Scanned and OCR’d Books Online: Commercial Genealogical Services
1. Ancestry.com has 20,000+ books imaged and indexed
a. Click on the SEARCH tab at Ancestry.com
b. In the Browse Records column on the right side of the page, click on Stories, Memories & Histories (under Stories & Publications).
c. From the Search Family & Local Histories page you have the option to search using the template, to do an exact search (by clicking on the Exact Matches box in the search template), or browse titles of the books in this collection. Most of your results from these searches allow you to see images of the pages of a specific book.
d. An alternate method is to use the Browse by location option on the main search page and scroll down through the databases listed for that locality to the Stories, Memories & Histories topic. For example, there are 8333 stories & histories pertinent to Missouri available for browsing and every-name searching at Ancestry.com. Another method is to use Ancestry's Card Catalog search found under the Search tab to search for printed material about a specific place (example: Buchanan Missouri).
e. Ancestry.com’s Family & Local Histories Collection was gathered from the Newberry Library (Chicago), the Widener Library (at Harvard University), the New York Public Library, the University of Illinois at Urbana, genealogical society collections, and other sources. It includes works for the U.S. and all fifty states, Canada, the United Kingdom, and parts of Europe. Some works in this collection go back to the 1500s while others extend into the 20th century.
2. Genealogy.com has 16,000 books imaged and indexed. Genealogy.com is owned by the same parent company as Ancestry.com. There is a great deal of overlap between these collections.
HeritageQuest Online has 20,000+ books imaged and indexed from the
Genealogy & Local History microfilm collection done by their parent company
ProQuest. However, their list of source libraries is suspiciously similar to
that of Ancestry.com and Genealogy.com with the addition of the Sutro Library in
California. HQ Online’s collection contains compiled genealogies, local
histories (county, church, and general histories), city and business
directories, collective biographies, early vital records works (example: the
NEHGS series of pre-1850 Massachusetts vital records), documentary collections
(example: the 138 published volumes of the Colonial Records of Pennsylvania),
and military records (example: Record of service of Michigan volunteers in
the Civil War, 1861-1865).
4. GenealogyBank (www.genealogybank.com) has more than 11,700 books, pamphlets, and printed items (including genealogies, biographies, funeral sermons, local histories, etc.) published in the U.S. prior to 1900 that are every-word searchable. They also have a growing collection of historical newspapers (over 2,400 historical U.S. newspapers) and other historical documents (more than 133,000 reports, lists, and documents searchable online).
WorldVitalRecords.com is adding books regularly to several of its search
Scanned and OCR’d Books:
Retailers of Electronic Books
(in eBook and CD formats)
1. eBooks are books in an electronic format readable on your computer, PDA, or a special eBook reader. eBooks.com and eBookMall.com are online eBook retailers. While most of their inventory is fiction, at least a small percentage of their offerings are non-fiction topics of interest to genealogists and historians. Try a title search on your surnames and/or the geographic areas your ancestors resided, then a keyword search on topics such as iron industry in early Tennessee. At present, there is no way to every-word search these electronic books without buying them and looking at their individual indices (if they have one).
2. Genealogical.com sells genealogy books, genealogy books on CDs, and genealogy databases on CDs. About two years ago they added to their Web site a name search feature. This search engine searches an every-word index to about half of their books on CDs with more titles being every-name indexed each month.
3. ArchiveCD Books USA
4. America's Book CDs
5. Arphax Books
Scanned and OCR’d Books Online: Library Digital Collections
1. The red links currently found in 5000+ entries in the Family History Library Catalog online are links to digital files at other libraries. The Family History Library anticipates more and more interlibrary sharing of digital collections and catalog entries in the future. Example: Do a surname search in the FHLC online for GOURLEY. Select the entry for Paul Gourley: a pioneer father and notice in this catalog entry the red link that says, “To view a digital version of this book click here.” This takes you to the Family History Archives site which includes a Search This Object tool for an every-name search for that one book.
2. The BYU Family History Archives (www.lib.byu.edu/fhc/) has easy basic search features and a more advanced search template. It provides a list of scanned & OCR’d titles containing your search terms and within each title shows in red the numbers of pages that contain those search terms.
3. The Kenton County [Kentucky] Public Library Web site (www.kenton.lib.ky.us/) is typical of many such sites in that it is now offering both a digital library (digital collections and databases accessible through that library’s Web site) and eBooks (for download). Click on the Genealogy link at this site to find useful electronic resources.
4. Historical Society of Washington County, Virginia
5. Alexandria [City, VA] Library
6. Library of Congress American Memory Project
7. Early Americas Digital Archive at the Univ. of Maryland
8. University of Michigan Digital Library Projects finder
9. The North Carolina Collection Biographical Index at the University of NC at Chapel Hill
10. Documenting the American South at the University of NC at Chapel Hill
11. Online Books Page at the University of Pennsylvania
12. Pa's Past: Digital Bookshelp at Penn State
13. Perseus Digital Library at Tufts University
14. Making of America at Cornell University
15. Making of America at University of Michigan
Scanned and OCR’d Books Online: Portal/Search Engine Sites
1. Google Books is in beta at books.google.com and is built from several sources. From Google's Partner Project, authors and publishers submit books for scanning and Google displays as much information (sometimes the whole book) as permitted. In Google’s Library Project (whose goal is to build an enhanced card catalog of the world’s books plus make available online rare and out-of-copyright books) information is submitted by libraries and subject to copyright restrictions. For books that are no longer under copyright restrictions (or have author permission from the Partner Project), you see an option to "Search within this book." Another option is "Find this book in a library." Google has also entered limited metadata about still other books which are therefore not every-word searchable.
2. Google Scholar is also in beta at scholar.google.com and is good at finding books, thesis, and other written works from an academic or library environment.
3. Project Gutenberg at www.gutenberg.com
4. Live Search Books at http://books.live.com
5. The Open Library an offshoot project of the Internet Archives
6. 250+ Killer Digital Libraries & Archives at the Online Education Database site
7. Librarian's Internet Index
8. Digital Collections at the Digital Library Federation
9. World Digital Library
10. Open Content Alliance
12. About.com search engine and search on a category of books like "Kentucky Historical Gazetteer"
13. Cyndi's List > Books > Books Online
Human Indexing Projects: NSDAR Library in Washington, D.C.
1. Genealogical Records Committee Reports (GRCs) were begun back in the 1910s by the NSDAR and contain previously unpublished records of genealogical value resulting in more than 17,000 volumes from across the country. These transcripts contain a tremendous amount of unique genealogical material from such sources as family bibles and gravestones which may no longer exist or no longer be readable. These un-indexed GRC books are now being every-name indexed by DAR volunteers. This index currently contains 27,422,030 names and is searchable for free at the DAR Library Web site (www.dar.org/library/). Copies from the GRC volumes may be obtained from the DAR Library for a nominal charge.
The Analytical Card Catalog (now called the Analytical Index), also done
by DAR volunteers, is a unique index to un-indexed books, especially large older
local history books. It is located and searchable only in the DAR Library in
Washington, D.C. On the outside it looks like any other 3x5 paper catalog in
wooden drawers (recently moved to cardboard boxes in metal drawers on the main
floor of the NSDAR Library). A typical card entry contains the following
GROCE family of Cumberland Co., KY
Portrait & Biographical Album of Morgan & Scott Counties, Illinois 1889
ILL Counties MOR POR p. 517 [tells where to locate that book in the DAR Library]
Human Indexing Projects: FamilySearch Indexing online from the LDS Church (www.familysearchindexing.org) Volunteer to help index the contents (2.5 million rolls of microfilm and 1.7 million microfiche) of the Granite Mountain Vault (Scanstone Project) or participate in projects sponsored by genealogical and/or historical societies. For example, the Indiana Genealogical Society has completed their Indiana Marriages indexing project covering 1870 to April 1905 and is now working on Indiana Marriages since 1790. Literally hundreds of thousands of names from records are being indexed daily. Check this site for links to lists of current projects, upcoming projects, and completed projects.