160 likes | 285 Vues
The LING 581 course focuses on advanced computational linguistics, providing students with hands-on project experience using natural language software packages. Students will learn data formatting, installation, and operation of software, while completing exercises that mimic real-world applications and enhance programming skills in languages like Perl, Java, and Lisp. The course aims to prepare students for employment opportunities by developing valuable computational abilities. Required to use Unix or similar environments, students will engage with tools like the Penn Treebank and various parsing algorithms.
E N D
LING 581: Advanced Computational Linguistics Lecture Notes January 19th
Course • Webpage • http://dingo.sbs.arizona.edu/~sandiway/ling581-11/ • Enrollment
Course Objectives • Gain meaningful project experience • dealing with natural language software packages • installation • input data formatting • operation • project exercises • useful “real-world” computational experience • write small programs • abilities gained will be of value to employers
Computational Facilities • Advise using your own laptop/desktop • we can also make use of this computer lab • but you don’t have installation rights on these computers • Platforms • You need to run some variant of Unix… (your task #1 for this week) e.g. • Linux • de facto standard for advanced/research software • Cygwin on Windows • http://www.cygwin.com/ • Linux-like environment for Windows making it possible to port software running on POSIX systems (such as Linux, BSD, and Unix systems) to Windows. • MacOS X • Not quite Linux, some porting issues, especially with C programs
Theme • Language Understanding
Project Topics • PTB (Penn Treebank) search/lookup software (tgrep2), • Part-of-speech taggers. • The use and modification of statistical parsers trained on Treebanks (Bikel-Collins, and others) • Ontologiesand Semantic Networks: WordNet etc. • Question-Answering (QA) • Sentence Parsing using contemporary linguistic theory: Minimalist Program
Grading • Completion of all homework tasks will result in a satisfactory grade (A)
In the News recently… www.ibmwatson.com
You will be exposed to Perl Java Lisp s-exps Bikel-Collins Parser You will need to review concepts from LING 538 regexp use Penn POS tags Project 1: PTB
PTB • Availability • Linguistic Data Consortium (LDC) • U. of Arizona is a (fee-paying) member of this consortium • Resources are made available to the community through the main library • URL • http://sabio.library.arizona.edu/search/X
PTB (V3) • Call Record
Task 1 • Install cygwin or ubuntu • Install the PTB • Borrow it from the library • Or use the cd I’ve brought with me • Familiarize yourself with the organization and layout of the files • e.g. the difference between mrg and prd formats • As is standard in the literature, we’ll be using the WSJ (Wall Street Journal) section of the PTB
00/wsj_0001.mrg ( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (, ,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (. .) )) 00/wsj_0001.mrg ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) )) TreeBank Browsing
TreeBank Browsing • My out-dated tool (treebank viewer) • URL • http://dingo.sbs.arizona.edu/~sandiway/treebankviewer/
PTB Search Tools Looking ahead • Google and Install • tgrep2 • http://tedlab.mit.edu/~dr/Tgrep2/ • a fast command line search tool for parse trees • C program (source, Makefile) • Tregex • http://nlp.stanford.edu/software/tregex.shtml • Graphical java version • Penn Treebank Online (tgrep interface) • http://www.ldc.upenn.edu/ldc/online/treebank/ • doesn’t seem to be working tgrepsearch currently unavailable.. • tgrep • VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to)) • approximation to finding Verb Phrases headed by "believe" that have an infinitival complement with a non-null subject