Experimentation in Computer Science

Experimentation in Computer Science and Software EngineeringKavi KhedoSenior LecturerDepartment of Computer Science and Engineering Faculty of EngineeringUniversity of Mauritiusk.khedo@uom.ac.muhttp://khedo.wordpress.com

References • Tichy, W.F., “Should Computer Scientists Experiment More ?”, IEEE Computer, May 1998 • Zelkowitz, M.V, and Wallace, D.R., “Experimental Models for Validating Technology”, IEEE Computer, May 1998.

Outline • Nature of computing • Why experiment? • Methods of experimentation • Issues and possible approaches • Looking ahead • Conclusion

Nature of Computing • Science or engineering? • Computers and programs are human creations. • CS not a natural science in the traditional sense. • Computers and software • Subject of enquiry not just technical issues • But models of information and information processes.

Computer Science • “A science is any discipline in which the fool of this generation can go beyond the point reached by the genius of the last generation.” Max Gluckman • Computer science is a young and constantly evolving discipline. It is therefore viewed in different ways by different people, leading to different perceptions of whether it is a “science” at all.

Modeling information processes • Are information processes artificial? • Where and how do they occur? • Computer models compare poorly with information processes found in nature. • e.g., nervous systems, immune systems, genetic processes, brains of programmers and users, etc.

Why experiment ? • Experiments don’t prove a thing ! • View of mathematicians • No amount of experimentation provides proof with absolute certainty • Show presence of errors but not their absence • A theory can be shot down by contrary evidence • Test theoretical predictions against reality • A theory gets accepted if all known facts in its domain can be deduced from it and are verified by experiments • e.g., astrophysics

Why experiment ? • Example of a failed theory: • Failure probability of multi-version programs is the product of the failure probabilities of individual versions. • Experiments by Knight and Leveson showed significantly higher failure than predicted. • False assumption detected by experiment: faults in program versions are statistically independent.

Why experiment ? • Another example: • Artificial neural networks originally discarded on theoretical grounds. • Experiments showed properties better than predicted. • Now researchers have developed better theories to explain what is observed.

Benefits of experimentation • Help build reliable base of knowledge. • reduce uncertainty about adequacy of theories, methods and tools. • Lead to new, useful and unexpected insights. • open new areas of investigation. • Accelerate progress by eliminating fruitless approaches, erroneous assumptions and fads.

How to experiment • General categories of experiments: • Scientific method. • Engineering method. • Empirical method.

Scientific method • Develop a theory to explain a phenomenon. • Propose a hypothesis and test alternative variations of it. • Collect data to verify or refute claims of the hypothesis.

Engineering method • Develop and test a solution to a hypothesis. • Based on results of the test, improve the solution. • Iterate until no further improvement needed.

Empirical method • Statistical method proposed as a means to validate a hypothesis. • There may not be a formal model or theory describing the hypothesis. • Data collected to verify the hypothesis.

A comparison of the scientific method (on the left) with the role of experimentation insystem design (right).

Other important aspects • Replication • Other researchers must be able to reproduce the experiments. • Influence • Impact of experimental design on the result. • Temporal properties • Historical or current data? • Is any required information missing?

Lack of validation in CS and SE • 40% of papers requiring empirical evaluation had none. • in a sample of 400 papers published by the ACM in 1993 • 50% in software related journals. • 40-50% of SE papers found to be unvalidated. • study by Zelkowitz and Wallace (Computer, May 1998) • Much smaller percentage in disciplines such as physics, psychology and anthropology.

Argument:Experiments do not prove anything. • Response: • True, experiments show only evidence for or against a theory, but cannot prove or disprove it. • However: experiments are used for theory testing, and for exploration leading to theory development. • Theory acceptance follows gradual community acceptance as evidence accumulates • (Note importance of repeatability)

Argument: Traditional scientific method is not applicable • Response: • Applicability is identical, only the target object/subject changes • We’re dealing partly with human processes and activities, these have clearly been amenable to experimentation in other disciplines • Likewise, encodings of processes (e.g. programs) can be investigated

Argument: The current level of experimentation is sufficient • Response: • Not when compared with other sciences • Tichy: 50% vs. 15% of unsupported claims • Zelkowitz/Wallace: 40% - 50% unvalidated papers • Note: Tichy is not advocating replacing theory and engineering by experiment, but advocating balance.

Argument: Experiments are expensive • Response: • So what!? • Depends on the importance of the research questions, some are clearly important enough. • There’s a spectrum of experimental approaches differing in cost from which to choose. • Benchmarks could amortize costs. • Other scientific disciplines accept this.

Cost of experiments • Require more resources than theory. • So what ? • Example: • A significant segment of software industry switched from C to C++ at a substantial cost. • No solid evidence to show that C++ is superior to C for programmer productivity and software quality.

Benchmarks • A sample of the task domain • Effective and affordable way to experiment • Well-defined performance measurements • Used in several areas: • Speech understanding, information retrieval, pattern recognition, data warehousing and OLAP, etc. • Help to eliminate unpromising approaches and exaggerated claims

Argument: Demos are sufficient • Demos provide proof-of-concepts in the engineering sense. • Illustrate a potential, but depend on observers’ imagination and extrapolation. • Do not produce solid evidence. • Not a substitute for the scientific process. • Satisfactory when presenting a radically new idea or a significant breakthrough. • e.g., first compiler, time-sharing system, OO language, web browser, etc. • Demos don’t investigate cause/effect, don’t provide (statistically) quantifiable results

Examples of questions for experimentation • Introduce theories of how requirements are refined into programs and test them. • Deeper understanding of what is intelligence. • Quality of human computer interactions. • Relative merits of parallel machine models and algorithms. • Behavior of algorithms on typical problems.

Argument: Too much noise (too many variables to control) • Too many variables make experimentation hard. • No more than in other fields, this is just laziness • Human subjects experiments are particularly difficult but other fields have developed many techniques for addressing these difficulties • Benchmarking can simplify many questions in CS. • Benchmark development can help • Composition of the benchmark is subjective, and so the weakest link. • Is the benchmark representative enough? • Evolve over time to be close to what needs to be tested.

Argument: Progress will slow • (e.g. requiring experimentation with every paper will prevent ideas from emerging.) • We are wasting time by targeting unproductive research and development, productivity might actually improve given more experimentation. • There’s no reason for prohibiting conceptual papers and papers formulating new theories or hypotheses. (It’s a question of balance.)

Argument: Technology changes too fast • Technology changes too fast, experiments are nonrelevant by the time they’ve been completed. • Response: • Experiment focus is then too narrow • Consider instead the bigger picture (e.g. fundamental underlying questions, not ephemeral concerns.)

Argument: You’ll never get it published. • Response: • Can be true, especially when you run into reviewers who don’t understand empirical science! • But this has been changing. Still, a painful process of education in empirical research methods continues to be needed.

Potential Substitutes for Experimentation • Feature comparison • Okay sometimes, but it isn’t science. • Intuition • There are plenty of examples of times when intuition has been wrong • Expert judgment • Get real. Science is built on skepticism.

Concepts Vs Experiments • Rapid publication of novel concepts and new hypotheses is important. • But questionable ideas need to be weeded out by meaningful validation. • Then scientists can concentrate on promising approaches • Need for balance.

Problems with experiments • Unrealistic assumptions, manipulated data • Failure to provide details for repeating experiments • Results over-interpreted, or do not generalise • Scientific process can self-correct errors, hoaxes and even fraud.

CS as a harder science • Most papers take small steps forward. • Scientists should create models, formulate hypotheses and test them using experiments. • Competing theories: new theory replacing old lead to paradigm shifts • In physics, but not so evident in CS • Physical symbol system theory Vs knowledge processing theory in AI. • A theory needed for behavior of algorithms on typical problems.

Conclusion • CS research used to rely far less on experiments than most other disciplines. • A good case exists for more experimentation. • Conventional scientific methods have made CS a ‘hard’ science. • Balance between theory, engineering and experimentation needed.

Experimentation in Computer Science

Experimentation in Computer Science

Presentation Transcript

REFERENCES

References

References

References

References

References

References

REFERENCES

References

References

References

References

References

References

References

References

References

References

References

References

References: