1 / 16

In Progress Review

A demonstration and overview of the ongoing progress in the development of the LAMP Lab Natural Language Machine Translation (Chin-MT) project at the University of Maryland. Includes a technical presentation and discussion on future directions.

aheather
Télécharger la présentation

In Progress Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999

  2. I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT a. Demonstration and overview: 9:30-9:45 Introduction to project, B. Onyshkevych 9:45-9:55 Rationale and Overview of Progress in Development of System Components, A. Weinberg 9:55-10:00 Overview of Demonstration, P. Resnik and W. Shen 10:00-10:30 Demonstration and Questions, P. Resnik and W. Shen

  3. I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT Cont’d b. Technical Presentation/Future Directions 10:45- Laboratory Management Issues 20 Min. Parsing - Construction Covered to Date New Directions, A. Weinberg, P. Resnik 20 Min. Lexicon - Scalability of Current Components, Creation of Grids, Automatic Acquisition, Mining, B. Dorr 20 Min. Generation- Discussion of Current Algorithm Future Directions, D. Traum

  4. THE LAMP LABORATORY - MT PROJECT Faculty: Dr. Bonnie Dorr, CS, UMIACS Dr. Philip Resnik, Linguistics, UMIACS Dr. Amy Weinberg, Linguistics, UMIACS PostdoctoralDr. Gina Levow, UMIACS* Researchers: Dr. Mari Olsen, UMIACS Dr. David Traum, UMIACS Graduate Joseph Garman, Linguistics Scott Thomas, CS Students: Nazer Habash, CS* Jin Tong, CS Wade Shen, CS

  5. THE LAMP LABORATORY - MT PROJECT Cond’t NSA Visiting Ron Dolan, Library of Congress Scholars: John Kovarik, DoD MaryEllen Okurowski, DoD Visiting Scholars: Dekang Lin, 01/99-08/99

  6. OUR GOAL Automatically created high quality, broad coverage machine translation. Example of Word to Word: <Ask David/Phil to provide> 1. Example where generation output: - perfect - slightly degraded - generation degraded by CLCS - gloss ok

  7. OUR GOAL Example of CLCS output: Example of generated string:

  8. WORK ON CHIN - MT Work on Chin - MT began -- Oct. 1997 1st Phase: Development of Small Scale End to End System on representative (159 sentence) corpus of Chinese newspaper (Tsin hua) articles.

  9. WORK ON CHIN - MT Cont’d Development of Broad Scale Static Resources: Lexicon: Optilex 250 entries augmented with appropriate argument structure (thematic role) grids and Lexical conceptual structures. <Bonnie: current coverage of English lexicons - Chinese lexicons>

  10. WORK ON CHIN - MT Cont’d Parser: small scale; 217 grammar rules Multipath REAP Generation: Add <David Traum>

  11. WORK ON CHIN - MT Cont’d Integration with Currently Existing or Simultaneously Built Resources from Other Institutions - NMSU/Mikrokosmos interface - ISI/Nitrogen

  12. SYSTEM COMPONENTS AND COVERAGE Output: English translated string Shared ONI & ISI (Nitrogen) Output: Composed LCS (CLSCS) transformed to AMR (<David - Abstract meaning represention>) Output: argument structure augmented syntactic string Output 1 parsed corpus with appropriate argument structure features for Lexical - conceptual structure (LCS) composition Output: segmented string with complex names identified as single smts. Input: unsegmented Chinese string { Syntaotic recoding and Realization: translate kcs based features to Nitrogen features: Feb: algorithm implemented NMSU Semantic ontolgoies F(unctional) structure transducer -input to NMSU semantics (90 f-structures to NMSO for evaluation - Dec. 1998) English lexical selection Feb: algorithm implemented <David - coverage> Lexical Conceptual Structure (LCS) composition June: inefficiency composed LCS for--------sentences Feb: -------handled by LCS composition Parser June: 404 fragments - 352 legal parse 269 correct parse Feb: 100 out of 150 full sentences with correct parse Sementor/nometagger June: hand segmentation hand tagging 150 sentences

  13. Slide 4: • Intermediate Milestones/Next Steps: • Full end to end integration with NMSO: • a. f-structure to TMR integration. • b. f-structure to AMR-based generation • Evaluation of LCS as fail soft mechanism. Comparison of translations produced by LCS/Nitrogen. • Improvement of Coverage/Move towards Broad Scale Coverage of all components: • Parsing: - design/experimentation with • - extension to Minper (in cooperation with Dekong (in Vol Monitoba) • Lexicon • - Broad coverage for adjectives and nouns, the latter of which will be automatically subdivided into simple and event-based nominals. Corresponding English refinements. Finish Broad coverage for prepositions. • - Finish English verb grid refinement and Chinese grid generation and checking. Speed up by dividing remaining verbs into Levin classes. • - Port verb grids and refine composition algorithm for event-based nominals, include features from WordNet and will be assigned atomic LCSs. Event-based noun entries will be automatically associated with LCS’s from their verbal counterparts (abduction derived from abduct) for event based nounts in Optilex. • - Broad coverage and representation refinement for functional elements (numbers, numerals, classifiers). These LCSed by hand in the current iteration. • - Port verb-based LCS entries into the noun lexicon for English and Chinese. • Discourse • - Sept 1999: • - Additional testing and improvement of LCS path. Debugging and testing more as the clcses become available. • - Additional of NMSU path -. Then converting nmsu f-structures to English. The plan for that is to convert either to nitrogen lattices, or perhaps amr’s, depending on what these f-structures actually look like.

  14. Laboratory management Problems: 1. version control too many copies of software- code runs on one copy not the other. need to roll back to previous version of some piece of software but its not around unless someone has saved it. Solution: Installation of Concurrent Version System(CVS) check -in/check-out software static resources and running programs checked in. They become the “official version”. Automatic consistency checking at “check-in time”. If differences from previous version, need permission form previous check in to check in new version or merge.

  15. State of implementation: Chinese/English lexicons under CVS - next LCS programs: convert to shorthand/longhand - then, parser, f-structure, generation programs Complete by June

  16. Problem 2: Operating and file system problems: program works on machine A, not machine B. All machines switched to Solaris 2.6 and installation of AFS ( new networked file system manager) AFS provides better management for large programs: shared file speedup, local caching, local control of protections, permissions. Improved environment will allow us to discourage work from home. Lower bandwidth for improved communication between members of the team.

More Related