1 / 20

A Probabilistic Representation of Systemic Functional Grammar

A Probabilistic Representation of Systemic Functional Grammar. Robert Munro. Department of Linguistics, SOAS, University of London. Outline. Introduction Functions in the nominal group Machine learning Testing framework Classification vs unmarked function Gradational realization

gina
Télécharger la présentation

A Probabilistic Representation of Systemic Functional Grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Probabilistic Representation of Systemic Functional Grammar Robert Munro Department of Linguistics, SOAS, University of London

  2. Outline • Introduction • Functions in the nominal group • Machine learning • Testing framework • Classification vs unmarked function • Gradational realization • Delicacy • Conclusions

  3. Introduction • An exploration of the ability of machine learning to learn and represent functional categories as fundamentally probabilistic • Gauged in terms of the ability to: • computationally learn functions from labeled examples and apply to new texts. • represent functions probabilistically: a gradation of potential realization between categories. • explore finer layers of delicacy.

  4. Functions in the nominal group • Functions: • Deictic, Ordinative, Quantitative, Epithet, Classifier, Thing (Halliday 1994) • Gradations: • Here, ‘red’ functions also functions as an Epithet. • The uptake of such marked classifiers will not be uniform. • Overlap does not necessarily limit significance. Deictic The Ordin. first Quant. three Epith. tasty Class. red Thing wines

  5. Machine Learning • Machine learning: • computational inference from specific examples. • A learner named Seneschal was developed for the task here: • probabilistic • seeks sub-categories (improves both classification and analysis) • allows categories to overlap • not too dependent on the size of the data set

  6. Machine Learning • The task here: • Given categories & with known values for x and y, infer a probabilistic model (potentially with sub-categories) that can classify new examples: x ? ? ? ? ? ? ? y

  7. Machine Learning • It is important that attributes (x,y,z...) : • represent features that distinguish functions • can be discovered automatically (for large scales) • are meaningful for analysis…? • Compared to manually constructed parsers: • greater scales than are practical • more features/dimensions than are possible (100’s are common)

  8. Testing Framework • The model was learned from 10,000 labeled words from Reuters sports newswires from 1996 • 23 features: • part-of-speech and its context • punctuation • group / phrase contexts • collocational tendencies • probability of repetition

  9. Testing Framework • Accuracy: • The ability to correctly identify the dominant function in 4 test corpora (1,000 words each): • Reuters Sports Newswires (1996) • Reuters Sports Newswires (2003) • Bio-informatics abstracts • Extract from Virginia Woolf’s ‘The Voyage Out’

  10. Testing Framework • Gradational model of realization: • calculated as the probability of a word realizing other functions, averaged between all clusters. • Finer layers of delicacy: • Manual analysis of clusters found within a function.

  11. Unmarked function • Unmarked function: function defined by only part-of-speech (POS) and word order. • eg: adjective = Epithet, non-final noun = Classifier • Previous functional parsers have assumed that most instances are unmarked: • POS taggers are almost 100% accurate • word order is trivial • …so the problem is solved?

  12. Unmarked function • This is a false assumption. • Across the corpora: • < 40% of non-final adj’s realized Epithets • < 50% of Classifiers were nouns • 44% of Classifiers were ‘marked’!

  13. Unmarked function • This task halved the classification error:

  14. Gradational Realization Thing • Nominal functions are typically represented deterministically: • Although described as probabilistic, • With relationships existing between all functions Class. Epith. Deictic The Ordin. first Quant. three Epith. tasty Class. red Thing wines Quant. Ordin. Deictic

  15. Delicacy Demonstrative Deictic Possessive Ordinative Numerative Tabular Quantitative Discursive Epithet Expansive Classifier Hyponymic First Name Intermediary Named Entity Last Name Thing Nominative Group-Releasing non-Nom. Stated Nominal Described

  16. Delicacy • More delicate functions for Classifiers (Matthiessen 1995) : • Hyponymic: describing a taxonomy or general ‘type-of’ relationship eg: ‘red wine’, ‘gold medal’, ‘neural network architecture' • Expansive: expands the description of the Head. eg: ‘knee surgery’, ‘optimization problems', ‘sprint champion’,

  17. Delicacy

  18. Delicacy • More delicate descriptions can be found: • more features • more instances / registers • other algorithms / parameters • Methodology can be applied to: • other parts of a grammar • other languages

  19. Conclusions • Gradational modeling of functional realization is desirable • Sophisticated methods are necessary for computationally modeling functions: • Markedness is common • Machine learning is a useful tool and participant in linguistic analysis.

  20. Thank you • Acknowledgments: • Geoff Williams • Sanjay Chawla • The slides and extended paper will be published at: • www.robertmunro.com/research/

More Related