1 / 82

Master’s Thesis Defense

Master’s Thesis Defense. Bibliographic Tools In The Context Of WWW And LaTeX Munushree Thummala Committee members Dr. Prabhaker Mateti (Advisor) Dr. Thomas Hartrum Dr. T.K. Prasad. Agenda. Introduction BiBTeX Primer Bibliographic Tool Survey Requirements for the BiBTeXTools

tabib
Télécharger la présentation

Master’s Thesis Defense

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Master’s Thesis Defense Bibliographic ToolsIn The Context Of WWW And LaTeX Munushree Thummala Committee members Dr. Prabhaker Mateti (Advisor) Dr. Thomas Hartrum Dr. T.K. Prasad

  2. Agenda • Introduction • BiBTeX Primer • Bibliographic Tool Survey • Requirements for the BiBTeXTools • Design Discussion • Conclusion • Future Work • Questions & Answers Session • Demonstration

  3. Introduction • Preparing academic papers • Collecting bibliographic entries • Tools used to prepare the papers • Common problems

  4. BibTeX Primer • What is BibTeX? • Helps prepare the References section in their documents • Defines entry types and required/optional fields • Uses “style” files to define the format of references • Standards for publications are specified in style files • Used with LaTeX • Latex collects \cite{}s in the .tex file • BibTeX extracts corresponding references from .bib file • BibTeX formats and sorts according to the .bst style • Output of BibTeX program is LaTeX formatted text

  5. Sample BibTeX entry @mastersthesis{Thummala-2007, author = {Munushree Thummala}, title = {Bibliographic tools in the context of WWW and \latex},month = {November}, year = {2007}, school = {Wright State University},OPTkey = {}, OPTtype = {}, OPTaddress = {}, OPTnote = {}, OPTannote = {}, advisor ={Prabhaker Mateti} }

  6. Contribution Of Thesis • Evaluation of Bibliographic tools • BiBTeX to Database Suite of Tools • Database to store BibTeX entries • LoadBiBTeX • BibSearch • Discovery of Duplicate BiBTeX entries • Normalization of BiBTeX entries • Text to BiBTeX Translation • TextToBiBTeX command line tool & API • PDFrefsToBiBTeX command line tool • Integration of TextToBiBTeX into Aigaion

  7. Bibliographic Tools • There are 100+ tools • In this thesis: 87 are reviewed • Tools were evaluated for the following: • Formats supported • Navigating, Searching and Sorting capabilities • Ease of maintaining bibliographic entries • Duplicate discovery • Import/Export to other formats

  8. Bibliographic Tools • Web browser based tools • Aigaion, Bibsonomy, CiteULike, Zotero, BibORB, Basilic, PubsOnline, etc. • Desktop/Small scale tools • JabRef, KBibTeX, TkBibTeX, BibDB, BibEdit, Open Office Bibliographic Manager, Tellico, etc. • Commercial tools • Scholar’s Aid, Bookends, NotaBene, ProCite, etc. • Utilities • Bib2html, Bibclean, Bp, Bibdup, Sixpack, etc.

  9. A Few Notable Tools • Aigaion • Zotero • Bibsonomy • JabRef

  10. Aigaion • Web application, Open source • Easy to use • Supports basic editing features • Supports Multiple Users • Native format is BiBTeX • Organizes references by Topics & Sub Topics • Maintains a list of authors to eliminate duplication • Duplicate discovery present in import feature

  11. Aigaion (Contd. 2)

  12. Aigaion (Contd. 3)

  13. Aigaion (Contd. 4) Author Profile

  14. Zotero • Firefox Browser Extension Easy to use • Organizes entries in collections • Captures bibliographic entries from websites automatically • Some drawbacks • Loses BiBTeX citation keys and custom fields while importing • Not well suited for managing BiBTeX bibliographies • Local storage

  15. Zotero (Contd. 2)

  16. Zotero (Contd. 3)

  17. Zotero (Contd. 4)

  18. Zotero (Contd. 5)

  19. Bibsonomy • Web browser based, hosted service • Easy to use • References • Users upload refs and bookmarks to Bibsonomy • Made available to other users • Tagged with keywords for categorization and search • Can be exported as BiBTeX • Browser shortcuts to capture entries from web

  20. Bibsonomy (Contd. 2)

  21. Bibsonomy (Contd. 3)

  22. Bibsonomy (Contd. 4)

  23. Bibsonomy (Contd. 5)

  24. JabRef • Desktop Application • Easy to use • Multiple bib files can be edited • Search online: • CiteSeer, Medline, IEEExplore, ArXiv.org • Native format is BibTeX • Auto generate BiBTeX keys • Imports/Exports multiple formats

  25. JabRef (Contd. 2)

  26. JabRef (Contd. 3)

  27. JabRef (Contd. 4)

  28. CiteuLike • Web browser based, hosted service • Easy to use • References • Users upload refs to CiteULike • Made available to other users • Tagged with keywords for categorization and search • Can be exported as BiBTeX • Browser shortcuts to • capture entries from web • cite the current article

  29. CiteuLike (Contd. 2)

  30. CiteuLike (Contd. 3)

  31. CiteuLike (Contd. 4)

  32. Requirements for New Tools • Text to BiBTeX translation • Translating free style text into BibTeX • Customizing the translation • Certainty of Recognition measure • Extract references section from PDF papers • Provide an API for other developers to integrate free style translation into their applications • Command line invocation • GUI also • Normalized BiBTeX output

  33. Requirements (Contd. 2) • Database of Bibliographic entries • Database to store BiBTeX files • Tool to Detect duplicates • Command line invocation • Normalized BiBTeX output

  34. Requirements (Contd. 3) • Search and Generate BiBTeX files • Flexible searches • Command line invocation • Outputs BiBTeX format • Normalized BiBTeX output • Platform Independent

  35. Database on Local Machine • Tables to store • BiBTeX entries • lookup data for text to BiBTeX translation • search index data for fast and flexible searching

  36. Database Of BiBTeX Entries A schema to store BiBTeX entries including string macros Ability to specify a tag for each entry Tag defaults to .bib filename

  37. Database Of Lookup Data A database Schema to store lookup tables Lookup Tables: Author Sub Names Journal Names Publishers Cities States Months Organizations

  38. Database Of Search Indexes A database Schema to store BiBTeX Search Index data Stores data as sequence of tokens Provides ability to search Any field(s) Any keyword(s) Citation key also stored as tokens

  39. LoadBiBTeX Tool • Loads BiBTeX files into the database and updates the search index tables • Loads the lookup tables used by Text to BiBTeX tool • Detects duplicates

  40. LoadBibTeX– Loads BiBTeX Files • Program Usage • LoadBiBTeX –loadentries –bibtag thesis2007 –bibfile thesis.bib • Any entries that have errors are not loaded and are shown in the output • Updates the index tables used by the BibSearch tool

  41. LoadBibTeX– Populate Lookup Tables • Program Usage • LoadBiBTeX –loadauthors –loadpublishers –loadjournals –bibfile thesis.bib • Only new values are loaded • The above command does not load the BiBTeX entries

  42. LoadBibTeX– Duplicate Discovery • Program Usage • LoadBiBTeX –dupdisc –bibtag thesis2007 –bibfile thesis.bib • The BiBTeX entries in thesis.bib are read and compared to the entries in the database corresponding to the bibtag thesis2007 • Any entries considered to be duplicates are displayed for the user

  43. BibSearch – Searching The Database • Program Usage • BibSearch –bibtag thesis2007 –fields author –keywords Donald Knuth • The database is searched for entries with the tag “thesis2007” and the words “Donald” and “Knuth” in the “author” field • The resulting BiBTeX entries and any required @String constructs are normalized and written to the output

  44. Normalization • Make BiBTeX entries consistent • Some of the rules • Citation Keys are consistent • Fields are enclosed in {} to preserve formatting • Month field abbreviations are expanded • Missing required fields are indicated to the user appropriately • Order of the fields in the output • Where is it implemented? • In whichever tool a particular rule makes sense • Spread across TextToBiBTeX, LoadBibTeX, BibSearch

  45. Normalization (Example 2) • @mastersthesis{Thummala2007, title = “Bibliographic tools in the context of WWW and \latex”, year = 2007, school = “Wright State University”, month = “Nov”, author = “Munushree Thummala”, advisor = “Prabhaker Mateti”,} • @MASTERSTHESIS{Thummala-2007, AUTHOR = {{Munushree} {Thummala}}, TITLE = {{Bibliographic} tools in the context of {WWW} and \latex}, MONTH = {November}, YEAR = {2007}, SCHOOL = {{Wright} {State} {University}}, ADVISOR= {{Prabhaker} {Mateti}},}

  46. Normalization (Example 3) • @InCollection{ lawrence01access, author = "Steve Lawrence", title= "Access to Scientific Literature", journal = "The {\it Nature} Yearbook of Science and Technology", editor = "Declan Butler", publisher = "Macmillan", address = "London, England", pages = "86-88", year = 2001 } • @INCOLLECTION{ Lawrence-2001, AUTHOR = {{Steve} {Lawrence}}, TITLE = {{Access} to {Scientific} {Literature}}, BOOKTITLE= {}, YEAR = {2001}, JOURNAL = {The {\it Nature} {Yearbook} of {Science} and {Technology}}, EDITOR = {{Declan} {Butler}}, PUBLISHER= {{Macmillan}}, ADDRESS = {{London}, {England}}, PAGES = {86-88}, }

  47. Text to BiBTeX Translation • What are Free Style References and where would authors find these ? • References at the end of academic papers • References on Internet sites like CiteSeer • A jotted-down text description • How do authors benefit from this translation ? • No need to manually convert to BiBTeX • Significantly better accuracy • Speeds the process of translating multiple references

  48. Text to BiBTeX Translation (Contd. 2) • Ways to translate free style text • Write a routine to analyze the strings and guess the fields • Develop • Language Grammar • Recursive Descent Parser • Which method did we pick? • Recursive Descent Parsing • Tried other methods with varying degrees of success

  49. Text to BiBTeX Translation (Contd. 3) • How does the Parser work? • Extent = A sequence of tokens • Field type = An extent that matches the set of okTokens for that field and ends when a notOkToken (including a delimiting token) is hit. • Backtrack: If the current token in an extent does not match the field, it is backtracked to the beginning token, and given a chance to match other field types. • Unrecognized: If the current token does not match any field type, it is appended to the unrecognized field list and the above process is repeated starting at the next token.

  50. Text to BiBTeX Translation (Contd. 4) • How is a series of tokens recognized as a field? • Author, Journal fields - lookup table and heuristics • Title field - quoted strings or heurisitics • Pages field – • [PAGES.|PP.|P.] <number [–][–number]> • Year field - a four digit number between 1900 and 2100 • Volume field – • [VOL. | VOLUME] <number> • Number field – • [NO. | NUMBER] <number> • Abbrev field – • <volume>(<number>):<startpage>–[-]<endpage> • Edition field- • EDITION<number> or <number> EDITION • Publisher field, Place, State - Lookup table

More Related