440 likes | 522 Vues
Modelling the evolution of language for modellers and non-modellers. Benefits of modelling Pitfalls How to communicate your results?. Recapitulation. Computer simulations are a synthetic science (versus analytic science) A theory is implemented as a model.
E N D
Modellingthe evolution of languagefor modellers and non-modellers Benefits of modelling Pitfalls How to communicate your results? Modelling language origins and evolution IJCAI-05
Recapitulation • Computer simulations are a synthetic science (versus analytic science) • A theory is implemented as a model. • The model is simulated using a computer. Modelling language origins and evolution IJCAI-05
Advantages of computer modelling • CMs allow us a view on difficult to study processes • Old, complex or single-occurrence processes. • CMs allow us to study mathematically intractable problems. • Complex non-linear systems such as language. • CMs are explicit, detailed, consistent, and clear. • But that is also its weak point. More on that later… • CMs, through their relative simplicity, allow verification. • Experimental reproduction is rare in other disciplines. Modelling language origins and evolution IJCAI-05
More advantages of computer modelling • CMs produce falsifiable claims. • This is really conducting science in the Popperian tradition. • CMs produce quantitative predictions. • Allowing clear and unambiguous comparison with real data. • CMs allow exploring different parameter settings • Evolutionary, environmental, individual and social factors can be easily varied. • CMs allow unethical experiments. • No permission is needed from your ethical commission to do language deprivation experiments on agents. Modelling language origins and evolution IJCAI-05
Caveats • Of course… to balance all the advantages, computer modelling also has some disadvantages. • Being aware of possible problems, might enable us to dodge them. Modelling language origins and evolution IJCAI-05
Caveat 1: CMs are explicit, detailed, consistent, and clear • Computer models contain simplifications and abstractions which are immediately obvious because of their clear specification. • This makes models lightning rods for criticism. Modelling language origins and evolution IJCAI-05
Caveat 1: CMs are explicit, detailed, consistent, and clear • Solutions • Obfuscate your model so everyone is awed by its complexity and dares not criticise it. • Or better, justify every choice made during the construction of your model and stress the relevance for linguistics. Modelling language origins and evolution IJCAI-05
Caveat 2: Too far from reality • We want computer models to explain cognitive or linguistic phenomena. • Examples • A grammar is a symbol G with a learning probability. • An individual creates utterances consisting of strings drawn from an alphabet {a,b,c}, … • … • These abstractions make it hard for non-modellers to accept CM results. Modelling language origins and evolution IJCAI-05
Caveat 2: Too far from reality • The field should understand that abstraction is not necessarily bad. • Most scientific disciplines use abstraction. Think of physics or theoretical biology. • Verbal models and field research use abstraction and assumptions as well, but these are hardly ever doubted. Modelling language origins and evolution IJCAI-05
Caveat 3: CM is too much fun • Too often computer models are just run for the fun of it, and the goal of modelling is neglected. • It is all too tempting to try yet another variation of a simulation or add yet another neat feature. • Eventually you end up with too much data, making a proper analysis impossible. Modelling language origins and evolution IJCAI-05
Caveat 3: CM is too much fun • Solution • Define a hypothesis which you will a test using CM, work towards testing this hypothesis. • Demonstration is good, understanding is better. • Do exploratory data analysis: look beneath immediate results for explanations • Look for variability: what parameters have an influence on the results, what you are looking for is a causal effect. Modelling language origins and evolution IJCAI-05
Caveat 4: CMs are not embedded in the field • Sometimes CM and their results are “solitary” • Models and results are not brought to bear with existing theories or existing empirical data. Modelling language origins and evolution IJCAI-05
Caveat 4: data should be related back to other disciplines • Solution • Start from a claim, and look for existing theories in the field. • Empirical data is wonderful if you can lay your hands on it. But be aware that making the link between empirical data and your results is often very difficult. • Explain how your results might shed new light on existing theories, but don’t be overconfident. Modelling language origins and evolution IJCAI-05
Caveat 5: magic numbers • When building models, one inescapably ends up introducing magic numbers. • Learning rate for a neural network, merging parameter for categories, number of possible grammars, … • Sometimes magic numbers are inherent to the phenomenon your studying (like in physics). Modelling language origins and evolution IJCAI-05
Caveat 5: magic numbers • Solution • Try to avoid magic numbers (easier said than done). • Try to choose extreme values, this polarises your argument. • Learning rate is either 0 for memory-less learner, or 1 for a batch-learner (cfr. Gold, 1967; Nowak, 2001; Zuidema, 2003). • Find optimal values for magic numbers. • Using some kind of optimisation (e.g. K. Smith, 2003). • Justify the magic numbers as well as possible. • Could the magic numbers be the important result of your research? • Try to make your results insensitive to them. Modelling language origins and evolution IJCAI-05
Caveat 6: reification • Your model is an abstraction of reality. • Even though it behaves as the real thing, are you allowed to make claims about the real thing based on an abstract model? • Are you sure that the dynamics of your model are similar to what goes on in the real world?Do submarines swim? Modelling language origins and evolution IJCAI-05
Caveat 6: reification • Solutions • Again, the field should understand that abstraction is not necessarily bad. • Make sure that you do not present simulation result as the truth and nothing but the truth. CMs do not provide proof! • CM is an exploratory tool, and should —if possible — be checked against hard data. Modelling language origins and evolution IJCAI-05
Some more practical advice • Good advice –that each of us neglected once upon a time- for doing computational modelling. Modelling language origins and evolution IJCAI-05
Control • A control is an experiment in which the hypothesized cause is left out • So the hypothesized effect should not occur either. • Be aware that placebo effects might occur, rendering your control experiment worthless. Modelling language origins and evolution IJCAI-05
Control • Control experiments provide a base line to check your results against. • How successful are agents at communicating if they randomly generate syntactic rules (instead of using grammatical induction)? • Are the results where agents use grammatical induction significantly better? • Without a base line, your results are meaningless. Modelling language origins and evolution IJCAI-05
Hypothesis testing • Different ways to interpret results • Exploratory data analysis: looking for patterns in the data, often after filtering the data with statistical methods. • Hypothesis testing however remains superior. Modelling language origins and evolution IJCAI-05
Hypothesis testing • Example: toss a coin ten times, observe eight heads. Is the coin fair (i.e., what is it’s long run behavior?) and what is your residual uncertainty? • You say, “If the coin were fair, then eight or more heads is pretty unlikely, so I think the coin isn’t fair.” • Proof by contradiction: Assert the opposite (the coin is fair) show that the sample result (≥ 8 heads) has low probability p, reject the assertion, with residual uncertainty related to p. • Estimate p with a sampling distribution. (From Cohen, Gent & Walsh) Modelling language origins and evolution IJCAI-05
Hypothesis testing • If the coin were fair (p= .5, the null hypothesis) what is the probability distribution of r, the number of heads, obtained in N tosses of a fair coin? Get it analytically or estimate it by simulation (on a computer): • Loop K times • r := 0 ;; r is num.heads in N tosses • Loop N times ;; simulate the tosses • Generate a random 0 ≤ x ≤ 1.0 • If x < p increment r ;; p is the probability of a head • Push r onto sampling_distribution • Print sampling_distribution Modelling language origins and evolution IJCAI-05
Hypothesis testing • 10,000 times 10 tossesproduces this distribution • This is an estimated distributionusing Monte Carlo sampling • Probability of 8 or moreheads in N=10 tosses is0.057 • As this probability is very low, we can reject the null hypothesis (H0: the coin is fair). • p =0.057 is the residual uncertainty. Modelling language origins and evolution IJCAI-05
Dos and don’ts… • Don’t throw away old code • When programming keep a log of all program code and all parameter settings. • Use version control. • Don’t change two things at once in your simulation • You will never know which parameter caused what. • Do collect all your data • But be reasonable about this. Gigabyte large data files are often of little use. Modelling language origins and evolution IJCAI-05
Dos and don’ts… • Repeat your experiments • Using different settings, different random seeds, … • Make sure your experiments are reproducible (don’t end up with a “cold fusion” experience). • Don’t trust yourself on bugs • Time and time again tiny bugs are discovered in code that was taught to be flawless. • Do look at the raw data • Statistical measures often obfuscate results (e.g. outliers are averaged away). Modelling language origins and evolution IJCAI-05
Dos and don’ts • Make a fast implementation • When your program runs faster, you will do more experiments and explore more parameter settings Modelling language origins and evolution IJCAI-05
Communication • Eventually you want to communicate your simulation results to others. How to do that? • Bridging the gap between modellers and non-modellers using communication. Modelling language origins and evolution IJCAI-05
Hallmarks of a good experimental paper • Clearly define your goals and claims • Perform a large scale test • Both in number and size of instances • Use a mixture of problems • Real-world, random, standard benchmarks, ... • Do a statistical analysis of results (source Bernard Moret & David Johnson) Modelling language origins and evolution IJCAI-05
Hallmarks continued • Place your work in context • Compare your work to other work in the field. • Mention work by others • Ensure reproducibility • Forces you to be clear. • Adds support to your claims. • Publish code and data on the web. • Ensure comparability • Makes it easier for others to check your results. • Report all experimental settings. • Do not hide anomalous results. Modelling language origins and evolution IJCAI-05
Pitfalls • Result could be predicted by back-of-envelope calculation. • Bad experimental setup • To few experiments. • Being happy with one “lucky run”. • Poor presentation of data • Lack of statistics. • No mention of base line • Too much statistics, thus neglecting the raw data. Modelling language origins and evolution IJCAI-05
Pitfalls continued • Failing to report key implementation issues. • Extrapolating from tiny samples. • Drawing conclusion not supported by the data. • Ignoring the literature. Modelling language origins and evolution IJCAI-05
Resistance against modelling • Modellers often have to answer critical remarks from non-modellers. • A survey among 30 experienced researchers in the field has yielded the following themes. Modelling language origins and evolution IJCAI-05
“How can you validate this model?” • Often a mistaken assumption that simulation models must be realistic and hence “calibrated” against real data. • Or a neglect on the part of the modeller, to not make the results falsifiable. Modelling language origins and evolution IJCAI-05
“You've built in the result" • Show how there are parameter settings for the model where the particular result in question does not emerge. • Be clear about what hypotheses the model is testing and to maintain a clear distinction between data, model and theory. Modelling language origins and evolution IJCAI-05
“This model stands on its own and has no relation with any linguistic phenomenon” • This is only caused by neglecting the existing literature. • Always embed your model in the proper cognitive/linguistic context. • Often modellers do not start from empirical data. • An appeal for starting for building models on existing research. Modelling language origins and evolution IJCAI-05
“It is possible to build models which come up with contrary results - how can you 'prove'which is correct?” • Every model hinges on its initial assumptions, these should be clearly defined and maintained throughout the model. • Your model is only as good as the initial assumptions it is based on. Modelling language origins and evolution IJCAI-05
“Your model uses evolutionary computing techniques, but language does not evolve - it is learned” • There often is confusion between the techniques used and the phenomena which are studied. • It is not because some parameter is optimized using genetic algorithms, that the phenomenon is evolutionary. • One should also realize that genetic algorithms are by no means a model of evolution, but rather an optimization technique Modelling language origins and evolution IJCAI-05
“I liked your talk. I study Mayan grammatical constructions, can you incorporate this in your model?” • This is a misapprehension about simple idealistic models - they are not intended to be exhaustive, but instead directed at testing a specific hypotheses. Modelling language origins and evolution IJCAI-05
Where do modellers publish? • Journals sympathetic to computational modelling • Artificial Life. • Adaptive Behavior. • Journal of Artificial Societies and Social Simulation. • Artificial Intelligence • Others • Complex Systems • Journal of theoretical Biology. • Connection Science • Studies in Language • Advances in Complex Systems • Proceedings of the Royal Society of London, Series B • Brain and Language • Cognitive Science • Trends in Cognitive Science • Verbum • Language Typology • Sprachtypologie und Universalienforschung • Language and Cognitive Processes • Cognitive Brain Research • Journal of Phonology • Acoustic research letters online • Behavioral & Brain Sciences Modelling language origins and evolution IJCAI-05
Where do modellers gather? • Evolution of Language Conference • International Conference on Artificial Life • European Conference on Artificial Life • From animals to animats: Simulation of Adaptive Behavior conference • Emergence and Evolution of Linguistic Communication • … Modelling language origins and evolution IJCAI-05
What tools do modellers use? • Programming languages • C, C++, Lisp, Objective CAML, Prolog, Scheme, Perl, Java, … • Mathematical packages • Matlab, Maxima, … • Visualization tools • GNUplot, xfig, Grace (open source and free tools) • MS Excel (for graph plotting) • Miscellaneous • Tlearn (neural net package), PHYLIP (phylogenetic tree reconstruction) • NSL simulation environment (neural networks) • SPSS (statistics) • Praat (phonetics simulator) • gawk Modelling language origins and evolution IJCAI-05
Take home messages • Non-modellers have a hard time understanding your terminology and techniques. Explain and justify anything you do. • Non-modellers often fail to see the usefulness of modelling. Place you model in a context and place your results in that context. Demonstrate how your results provide insights that could not be gotten from pen-and-paper analysis. • Don’t do modelling for the modelling. Take a concrete problem and tackle it. Modelling language origins and evolution IJCAI-05
Resources • Evolution of language resourceshttp://www.isrl.uiuc.edu/amag/langev • These slides, code and miscellaneous stuffhttp://www.ling.ed.ac.uk/ ~paulv/tutorial.html Modelling language origins and evolution IJCAI-05