Assessing the Frequency of Empirical Evaluation in Software Modeling Research

Assessing the Frequency ofEmpirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod) October 17, 2011 Jeffrey C. Carver, Eugene Syriani and Jeff Gray (presenter) University of Alabama Department of Computer Science {carver, esyriani, gray}@cs.ua.edu

Background • Many creative modeling ideas • Impression that the field has not followed the traditional Scientific Method • Most new techniques are not (thoroughly) evaluated • Investigate the prevalence of this phenomenon • Considered MODELS papers from 2006-2010 • Also considered papers from empirical conference (ESEM)

Background: Empirical Studies “models” used more generally on this slide • The understanding of a discipline evolvesover time • We get more sophisticated in our methods • We are able to test and prove or disprove hypotheses • The empirical paradigm has been used in many other fields, e.g., physics, medicine, manufacturing

Empirical Studies: Misconceptions • Empirical studies are not “one-shot deals.” Studies on live development projects are not the only ones that matter. • Software engineering is a laboratory science • Understanding our discipline involves • Observation, reflection, model building, experimentation • Followed by iteration • Symbiotic nature of research and development • Research needs laboratories to observe & manipulate variables • Development needs to understand how to build systems better

Empirical Studies: Misconceptions • Overall purpose • “We ran a study of technology X and now we know…” • Technology X doesn’t work (NO) • Technology X performed worse than technology Y in our environment (YES) • “Environment” includes people & their expertise, project goals, etc. • Measuring performance implies we decided on some metric that we felt was an important indicator • No solution is really expected to be better for all users under all conditions Assist in evolution Yes/No Certification of a technology Yield insights and answers Find appropriate environment

Empirical Studies: Outputs • Empirical study can help to provide information of interest to teams that might eventually adopt a technology: • Does it work better for certain types of people? • Novices: It’s a good solution for training • Experts: Users need certain background knowledge… • Does it work better for certain types of systems? • Static/dynamic aspects, complexity • Familiar/unfamiliar domains • Does it work better in certain development environments? • Users [did/didn’t] have the right documentation, knowledge, amount of time, etc… to use it Shull, 2004

Our Objective and Methodology • Goal: Determine how many recent modeling papers had some type of empirical evaluation of their claims • Three step methodology • Develop initial characterization scheme • Identify candidate papers • Review candidate papers and finalize characterization

Characterization Scheme Formative Case Studies: Papers gather information about use of technique in practice

Results

Results – Summary from 2006-2010 17%

Results - Trends

Results:Human-Based Controlled Experiments • Total of 12 in 5 years! Should be more • Observations • Generally, low level of detail reported • Most had less than 25 participants • 2 had over 50, 1 did not even report the number • Most participants were undergraduate students • General misunderstanding in many papers by equating “discussion” to “evaluation”

Results:Formative Case Studies • Total of 10, need to see more • 4 did not involve humans • Analyze existing source code to understand how various modeling tools would/would not work • 6 involved humans • Surveys to understand how existing tools were not meeting developer needs • Generally, a study of output requirements for needed tools

Results:ESEM Focus • The ESEM conference has three types of papers: Regular Papers, Short Papers, and Posters • Across the same 5 year period, we only found 17 modeling papers • Of those 17 papers, only 4 were Regular Papers (10 pages IEEE or ACM format) out of 178 Regular candidates • 10 were Short Papers (4 pages) out of a total of 118 Short Papers • 3 of the papers were Poster summaries • Even with the empirical area, modeling papers are not very well represented (typically, just short papers)

Conclusions • Summary: • Rigor of empirically validated research in software modeling is weak • Very large percentage of papers with no evaluation • Did not include technical reports or extended publication in a journal • Plan to repeat analysis with SoSym • Would like to push the community to conduct more empirical evaluations • Paper has URLs pointing to the data from our observations • Recommendations: • Team up with empirical researchers • Venues need to provide additional space for reporting empirical results (e.g., 2 extra pages in paper length for those papers that have a clear evaluation)

Questions or comments?

Assessing the Frequency of Empirical Evaluation in Software Modeling Research

Assessing the Frequency of Empirical Evaluation in Software Modeling Research

Presentation Transcript

Travel Demand Modeling Software Evaluation

Assessing Disease Frequency

Empirical Evaluation

Empirical Methods for Assessing CST

Empirical Evaluation of Reliability Improvement in an Evolving Software Product Line

Empirical Research Methods for Software Engineering

A Realistic Empirical Evaluation of the Costs and Benefits of UML in Software Maintenance

An Empirical Evaluation of Extendible Arrays

Assessing the Suitability of UML for Modeling Software Architectures

Empirical Evaluation Assessing usability (with users)

Empirical Evaluation in End-User Software Engineering

Empirical Modeling Process

Empirical Study of the Curriculum Math Modeling of Floods

Empirical Modeling

Empirical Research

Assessing the Quality of Research

Empirical Approaches to Assessing Discrimination

Evaluation (cont.): Empirical Studies

Empirical Evaluation of innovations in automatic repair

Empirical Landslide Modeling

Modeling in the Frequency Domain

Empirical Evaluation