1 / 11

Machine Learning in GATE

Machine Learning in GATE. Valentin Tablan. Machine Learning in GATE. Uses classification . [Attr 1 , Attr 2 , Attr 3 , … Attr n ]  Class Classifies annotations . (Documents can be classified as well using a simple trick.) Annotations of a particular type are selected as instances.

avedis
Télécharger la présentation

Machine Learning in GATE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in GATE Valentin Tablan

  2. Machine Learning in GATE • Uses classification. [Attr1, Attr2, Attr3, … Attrn]  Class • Classifies annotations. (Documents can be classified as well using a simple trick.) • Annotations of a particular type are selected as instances. • Attributes refer to instance annotations. • Attributes have a position relative to the instance annotation they refer to.

  3. Attributes Attributes can be: • Boolean The [lack of] presence of an annotation of a particular type [partially] overlapping the referred instance annotation. • Nominal The value of a particular feature of the referred instance annotation. The complete set of acceptable values must be specified a-priori. • Numeric The numeric value (converted from String) of a particular feature of the referred instance annotation.

  4. Implementation Machine Learning PR in GATE. Has two functioning modes: • training • application Uses an XML file for configuration: <?xml version="1.0" encoding="windows-1252"?> <ML-CONFIG> <DATASET> … </DATASET> <ENGINE>…</ENGINE> <ML-CONFIG>

  5. <DATASET> <DATASET> <INSTANCE-TYPE>Token</INSTANCE-TYPE> <ATTRIBUTE> <NAME>POS_category(0)</NAME> <TYPE>Token</TYPE> <FEATURE>category</FEATURE> <POSITION>0</POSITION> <VALUES> <VALUE>NN</VALUE> <VALUE>NNP</VALUE> <VALUE>NNPS</VALUE> … </VALUES> [<CLASS/>] </ATTRIBUTE> … </DATASET>

  6. <ENGINE> <ENGINE> <WRAPPER>gate.creole.ml.weka.Wrapper</WRAPPER> <OPTIONS> <CLASSIFIER>weka.classifiers.j48.J48</CLASSIFIER> <CLASSIFIER-OPTIONS>-K 3</CLASSIFIER-OPTIONS> <CONFIDENCE-THRESHOLD>0.85</CONFIDENCE-THRESHOLD> </OPTIONS> </ENGINE>

  7. Attributes Position Instances type: Token

  8. Machine Learning PR • Can save a learnt model to an external file for later use. Saves the actual model and the collected dataset. • Can export the collected dataset in .arff format.

  9. Standard Use Scenario Application • Prepare data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc). • [ Load the previously saved model. ] • Run the ML PR in application mode. • [ Save the learnt model. ] Training • Prepare training data by enriching the documents with annotation for attributes. (e.g. run Tokeniser, POS tagger, Gazetteer, etc). • Run the ML PR in training mode. • Export the dataset as .arff and perform experiments using the WEKA interface in order to find the best attribute set / algorithm / algorithm options. • Update the configuration file accordingly. • Run the ML PR again to collect the actual data. • [ Save the learnt model. ]

  10. An Example Learn POS category from POS context.

  11. Using Other ML Libraries The MLEngine Interface Method Summary • void addTrainingInstance(List attributes) Adds a new training instance to the dataset.  • Object classifyInstance(List attributes) Classifies a new instance.  • void init() This method will be called after an engine is created and has its dataset and options set.  • void setDatasetDefinition(DatasetDefintion definition) Sets the definition for the dataset used.  • void setOptions(org.jdom.Element options) Sets the options from an XML JDom element. • void setOwnerPR(ProcessingResource pr) Registers the PR using the engine with the engine. 

More Related