Overview of GATE MAF API and Annotated Noun Extraction in LIRICS
This document provides a concise overview of the GATE MAF API, detailing how to implement it for clients of the Multimedia Annotation Framework (MAF). It includes sample code that demonstrates how to list strings annotated as nouns within the text using the MAF document. The examples illustrate how to retrieve all word forms and filter them by a specific feature ID. This process will enhance the management of linguistic data and assist researchers in their linguistic analyses, presenting a valuable resource for users working with annotated documents.
Overview of GATE MAF API and Annotated Noun Extraction in LIRICS
E N D
Presentation Transcript
Julien NiocheUniv. SheffieldLIRICS.MAF API : a quick overviewLirics Barcelona Meeting21 / 06 / 05
API Implementation for the GATE PRs GATE MAF API Clients XML Document
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); }
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } From LiricsDocumentImpl Find out the MAF Information
All wordForms in MAF document will have MAF information attached // sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); }
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Find out only those wordForms which have feature “pos” and value “noun”
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Obtain an iterator for such wordForms
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Consider one wordForm at a time
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } For each wordForm find out all underlying tokens
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Obtain an Iterator to iterate through all tokens
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Considering one Token at a time
// sample code to list all strings that has been annotated // as noun in the text MAF doc = liricsDocImpl.getMAF(); WordFormSet wfs = doc.getAllWordForms(); wfs = wfs.getWordFormsWithFeatureID(“pos@noun”); Iterator<WordForm> iter = wfs.getIterator(); while(iter.hasNext()) { WordForm wf = iter.next(); TokenSet tkSet = wf.getAllTokens(); Iterator<Token> tkIter = tkSet.iterator(); while(tkIter.hasNext()) { Token token = tkIter.next(); System.out.print(token.getTokenString()); System.out.print(“ “); } System.out.println(); } Finally printing the token value