A Probabilistic Representation of Systemic Functional Grammar

A Probabilistic Representation of Systemic Functional Grammar Robert Munro Department of Linguistics, SOAS, University of London

Outline • Introduction • Functions in the nominal group • Machine learning • Testing framework • Classification vs unmarked function • Gradational realization • Delicacy • Conclusions

Introduction • An exploration of the ability of machine learning to learn and represent functional categories as fundamentally probabilistic • Gauged in terms of the ability to: • computationally learn functions from labeled examples and apply to new texts. • represent functions probabilistically: a gradation of potential realization between categories. • explore finer layers of delicacy.

Functions in the nominal group • Functions: • Deictic, Ordinative, Quantitative, Epithet, Classifier, Thing (Halliday 1994) • Gradations: • Here, ‘red’ functions also functions as an Epithet. • The uptake of such marked classifiers will not be uniform. • Overlap does not necessarily limit significance. Deictic The Ordin. first Quant. three Epith. tasty Class. red Thing wines

Machine Learning • Machine learning: • computational inference from specific examples. • A learner named Seneschal was developed for the task here: • probabilistic • seeks sub-categories (improves both classification and analysis) • allows categories to overlap • not too dependent on the size of the data set

Machine Learning • The task here: • Given categories & with known values for x and y, infer a probabilistic model (potentially with sub-categories) that can classify new examples: x ? ? ? ? ? ? ? y

Machine Learning • It is important that attributes (x,y,z...) : • represent features that distinguish functions • can be discovered automatically (for large scales) • are meaningful for analysis…? • Compared to manually constructed parsers: • greater scales than are practical • more features/dimensions than are possible (100’s are common)

Testing Framework • The model was learned from 10,000 labeled words from Reuters sports newswires from 1996 • 23 features: • part-of-speech and its context • punctuation • group / phrase contexts • collocational tendencies • probability of repetition

Testing Framework • Accuracy: • The ability to correctly identify the dominant function in 4 test corpora (1,000 words each): • Reuters Sports Newswires (1996) • Reuters Sports Newswires (2003) • Bio-informatics abstracts • Extract from Virginia Woolf’s ‘The Voyage Out’

Testing Framework • Gradational model of realization: • calculated as the probability of a word realizing other functions, averaged between all clusters. • Finer layers of delicacy: • Manual analysis of clusters found within a function.

Unmarked function • Unmarked function: function defined by only part-of-speech (POS) and word order. • eg: adjective = Epithet, non-final noun = Classifier • Previous functional parsers have assumed that most instances are unmarked: • POS taggers are almost 100% accurate • word order is trivial • …so the problem is solved?

Unmarked function • This is a false assumption. • Across the corpora: • < 40% of non-final adj’s realized Epithets • < 50% of Classifiers were nouns • 44% of Classifiers were ‘marked’!

Unmarked function • This task halved the classification error:

Gradational Realization Thing • Nominal functions are typically represented deterministically: • Although described as probabilistic, • With relationships existing between all functions Class. Epith. Deictic The Ordin. first Quant. three Epith. tasty Class. red Thing wines Quant. Ordin. Deictic

Delicacy Demonstrative Deictic Possessive Ordinative Numerative Tabular Quantitative Discursive Epithet Expansive Classifier Hyponymic First Name Intermediary Named Entity Last Name Thing Nominative Group-Releasing non-Nom. Stated Nominal Described

Delicacy • More delicate functions for Classifiers (Matthiessen 1995) : • Hyponymic: describing a taxonomy or general ‘type-of’ relationship eg: ‘red wine’, ‘gold medal’, ‘neural network architecture' • Expansive: expands the description of the Head. eg: ‘knee surgery’, ‘optimization problems', ‘sprint champion’,

Delicacy

Delicacy • More delicate descriptions can be found: • more features • more instances / registers • other algorithms / parameters • Methodology can be applied to: • other parts of a grammar • other languages

Conclusions • Gradational modeling of functional realization is desirable • Sophisticated methods are necessary for computationally modeling functions: • Markedness is common • Machine learning is a useful tool and participant in linguistic analysis.

Thank you • Acknowledgments: • Geoff Williams • Sanjay Chawla • The slides and extended paper will be published at: • www.robertmunro.com/research/

A Probabilistic Representation of Systemic Functional Grammar

A Probabilistic Representation of Systemic Functional Grammar

Presentation Transcript

Systemic Functional Linguistics

A Probabilistic Framework for Video Representation

Systemic functional grammar SFG and discourse

Functional Grammar

An overview of Systemic-Functional Linguistics

Systemic Functional Grammar

Systemic functional grammar (SFG) and discourse

Lexical Functional Grammar

Lexical Functional Grammar

Probabilistic Context Free Grammar

Probabilistic Context Free Grammar

Lexis and phraseology in a systemic functional grammar Gordon Tucker

Lexical-Functional Grammar

A Systemic Functional Micro-grammar of River Plate Spanish Clitics

A Probabilistic Approach to Semantic Representation

A Probabilistic Framework for Video Representation

BBI3416 Functional Grammar

SYSTEMIC FUNCTIONAL GRAMMAR CHRIS BUTLER (hlm. 527-533)

BBI3416 Functional Grammar

Lexical Functional Grammar