1 / 24

Google Books

Present and Future. Google Books. James Crawford Engineering Director Google Books. Why and how Google scans books  Challenges The Future . Overview. Google Confidential and Proprietary. Why and How Google Scans Books. Google’s mission.

eshe
Télécharger la présentation

Google Books

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Present and Future Google Books James Crawford Engineering Director Google Books

  2. Why and how Google scans books  Challenges The Future Overview Google Confidential and Proprietary

  3. Why and How Google Scans Books

  4. Google’s mission To organize the world’s information and make it universally accessible and useful. Online contentBillions of web pages Offline contentBillions of items becoming indexed Google Confidential and Proprietary

  5. Book Team's Mission To organize the world’s books and make them universally accessible and useful. Google Confidential and Proprietary

  6. Organize the world's books – About the Book

  7. Make accessible – Limited previews from publishers & authors (30,000 publisher partners)

  8. Make accessible – Embedded Viewer for Libraries

  9. Number of books scanned: fifteen million Number of pages: five billion Number of words: over two trillion Libraries: forty Publishers: thirty thousand Vital stats Google Confidential and Proprietary

  10. Google Books in a nutshell Google Confidential and Proprietary

  11. Challenges

  12. 478 languages Kabardian: 16Khasi: 78Khoisan: 53Khotanese 21Kikuyu: 48Kinyarwanda: 77 Kyrgyz: 702Kimbundu: 14Konkani: 83Komi: 48Kongo: 134Korean: 35905 Kosraean: 10 Kpelle: 6Karachay-balkar: 17Karelian: 28Kru: 26Kurukh: 30Kuanyama: 9Kumyk: 16Kurdish: 220Kutenai: 0Klingon: 3Kalmyk: 26 • Kashubian: 14 • Kara-kalpak: 102Kabyle: 50Kachin: 18Kalaallisut: 82Kamba: 29Kannada: 2600Karen: 50Kashmiri: 289Kanuri: 25Kawi: 106 • Kazakh: 1871

  13. A diversity of dates • 18?? • [196-?] • 1957/8 • late 14th century • finita quarto nonas Januarias [1490] • mense Septembri: Anno Millesimo q[ui]ngentesimo decimonono • mense iulio, anno M.D.XXXX • התשנ״א (Hebrew year 5751 = Gregorian 1990/1 CE) • ١٣٧٣ (either Islamic year 1373 AH = Gregorian 1953/4 CE or Persian year 1373 AP = Gregorian 1994/5 CE)

  14. Works, Expressions, Manifestations, and Items Library of Congress Books in Print title Lord of the Rings, v.1 The Fellowship of the Ring author John Roland Reuel Tolkien J.R.R. Tolkien publisher Houghton Mifflin Ballantine Books year 1954 1994

  15. Annotations

  16. The Future 

  17. Google Editions Buy Anywhere: Purchase directly on Google Books, devices, retail partner sites, affiliates, and brick and mortar stores. Read Anywhere: Users can read eBooks on desktop, tablets, iPhone, Android phone, and eInk Readers.  Cloud storage and cloud sync. More to Read: Target is 400K+ paid books and over 2M freepublic domain books.

  18. Google Book Settlement (US only) • If approved, resolves lawsuit brought against Google • Benefits: • Rightsholder control • Snippets => 20% • Library subscriptions • Free terminal in every US public library building • Downloadable books for purchase • Access for the print-disabled • Book Rights Registry: a non-profit organization to find and pay rightsholders • Research corpus

  19. Books as a corpus of human knowledge • Understand one book • Understand all books • Understand relations between books

  20. Linguistic analysis • "Research that performs linguistic analysis over the Research Corpus to understand language, linguistic use, semantics and syntax as they evolve over time and across different genres or other classifications of Books."

  21. Steven Abney and Terry Szymanski, University of Michigan. Automatic Identification and Extraction of Structured Linguistic Passages in Texts. Elton Barker, The Open University, Eric C. Kansa, University of California-Berkeley, Leif Isaksen, University of Southampton, United Kingdom. Google Ancient Places (GAP) Dan Cohen and Fred Gibbs, George Mason University. Reframing the Victorians. Gregory R. Crane, Tufts University. Classics in Google Books. Miles Efron, Graduate School of Library and Information Science, University of Illinois. Meeting the Challenge of Language Change in Text Retrieval with Machine Translation Techniques. Brian Geiger, University of California-Riverside, Benjamin Pauley, Eastern Connecticut State University. Early Modern Books Metadata in Google Books. David Mimno and David Blei, Princeton University. The Open Encyclopedia of Classical Sites. Alfonso Moreno, Magdalen College, University of Oxford. Bibliotheca Academica Translationum: link to Google Books Todd Presner, David Shepard, Chris Johanson, James Lee, University of California-Los Angeles. Hypercities Geo-Scribe. Amelia del Rosario Sanz-Cabrerizo and José Luis Sierra-Rodríguez, Universidad Complutense de Madrid. Collaborative Annotation of Digitalized Literary Texts. Andrew Stauffer, University of Virginia. JUXTA Collation Tool for the Web. Timothy R. Tangherlini, University of California-Los Angeles, Peter Leonard, University of Washington. Tools & Techniques for Automated Literary Analysis, Based on the Scandinavian Corpus in Google Books. Digital Humanities

  22. Insights into human progress oxide of lead may be thus a heavy fire a striking proof miles distant from terms of peace presents the appearance more than mortal vexation of spirit zeal and devotion lesbian and gay health care professionals abuse and neglect the overall process shift away from the power elite a research project the poor countries probability of failure increased awareness of Old-fashioned trigrams New-fangled trigrams Google is preparing trigram data for release for research purposes Source: Matthew Gray & Yuan K. Shen Google Confidential and Proprietary

  23. Organize the world's books and make them universally accessible and useful

  24. Thank You!

More Related