1 / 26

CS 178H Introduction to Computer Science Research

CS 178H Introduction to Computer Science Research. What is CS Research?. What is CS Research?. Discovery of new knowledge of computing through mathematical analysis and experimental evaluation of algorithms and computer software. Epistemology (definitions from Wikipedia).

cbramlett
Télécharger la présentation

CS 178H Introduction to Computer Science Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 178HIntroduction to Computer Science Research What is CS Research?

  2. What is CS Research? • Discovery of new knowledge of computing through mathematical analysis and experimental evaluation of algorithms and computer software.

  3. Epistemology(definitions from Wikipedia) • Epistemology (from Greekεπιστήμη - episteme, "knowledge" + λόγος, "logos") or theory of knowledge is the branch of philosophy concerned with the nature and scope (limitations) of knowledge.It addresses the questions: • "What is knowledge?" • "How is knowledge acquired?" • "What do people know?" • "How do we know what we know?"

  4. Rationalism • Rationalism is "any view appealing to reason as a source of knowledge or justification" (Lacey 286). In more technical terms it is a method or a theory "in which the criterion of the truth is not sensory but intellectual and deductive" (Bourke 263). • Originated with Socrates (469 BC–399 BC) and Plato (428/427 BC – 348/347 BC).

  5. Empiricism • Empiricism is a theory of knowledge which asserts that knowledge arises from experience. Empiricism emphasizes the role of experience and evidence, especially sensory perception, in the formation of ideas. • Originated with Aristotle (384 BC – 322 BC)

  6. Rationalism in CS(Theoretical CS) • Programs are formal mathematical objects. • Therefore, important properties of algorithms/software can be proven mathematically. • Termination • Correctness (satisfies a formal specification) • Computational Complexity (time and space requirements)

  7. Theoretical CS Research • Algorithm Design and Analysis • Design a new (more efficient) algorithm for some well-defined problem (e.g. sorting, longest-common-subsequence) • Mathematically prove the correctness and improved complexity of the new algorithm. • Theoretical Analysis • Form a mathematical conjecture about a computational problem (e.g. graph isomorphism is NP-complete) • Mathematically prove the conjecture as a theorem.

  8. Limits of Rationalism in CS • Sometimes software is too complex to analyze theoretically. • Sometimes correctness cannot be characterized formally and depends on natural or human behavior. • Protein folding • Handwriting/speech recognition • Sometimes software behavior on real data depends on unknown natural properties of this data. • Locality affecting paging performance

  9. Empiricism in CS(Experimental CS) • Behavior of software can be studied experimentally. • Anecdotal evidence (running a few sample cases) is insufficient. • Collect data (e.g. accuracy, run-time) on running programs many times on large, real-world benchmark collections. • Verify hypotheses about behavior using controlled experiments. • Statistically analyze results for significance.

  10. Scientific Method(steps from Wikipedia) • 1) Define the question • 2) Gather information and resources (observe) • 3) Form hypothesis • 4) Perform experiment and collect data • 5) Analyze data • 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • 7) Publish results • 8) Retest (frequently done by other scientists)

  11. 1) Define the question • Example from My Research: Search Query Disambiguation from Short Sessions • Can a web search engine disambiguate queries? scrubs Search ?

  12. 2) Gather information and resources • Obtained web search session data from Microsoft • Find instances of ambiguous queries • Find contextual clues that might help disambiguate queries

  13. 98.7 fm huntsville hospital www.star987.com www.huntsvillehospital.com kroq ebay.com www.kroq.com www.ebay.com scrubs scrubs ??? ??? scrubs.com scrubs-tv.com Context can Aid Disambiguation

  14. 3) Form Hypothesis • Previous queries and clicks in a session can help disambiguate queries by relating them to previous sessions involving the same query (where we know what result was clicked).

  15. 4) Perform Experiment and Collect Data • Build system that uses prior context and previous session data to predict clicked results for new user. • Reorder results from existing search engine based on predicted probability of clicking on a result. • Should reduce number of results user needs to examine before finding a relevant one. • Test on unseen data and compare predictions to actual results clicked.

  16. huntsville hospital huntsvillehospital.org ebay ebay.com scrubs ??? Using Relational Information with aMarkov Logic Network (MLN) huntsville school . . . scrubs scrubs.com . . . hospitallink.com scrubs scrubs-tv.com … ebay.com

  17. Controlled Experiment • Performance of experimental system must be compared to some baseline or control. • Controls are necessary to demonstrate the system is improving over some naïve method (strawman) or current best system for a problem. • For example, in the old joke, someone claims that they are snapping their fingers "to keep the tigers away"; and justifies this behavior by saying "see - its working!" While this "experiment" does not falsify the hypothesis "snapping fingers keeps the tigers away", it does not really support the hypothesis - not snapping your fingers does not keep the tigers away as well (Wikipedia: Experiment)

  18. Control for Query Disambiguation • Simple control is to order results from search engine randomly. • Another baseline is to just use ordering from existing (non-personalized) search engine.

  19. Performance Metrics • Need quantitative measure of system’s performance (runtime or accuracy). • Compare quantitative performance of experimental system to baseline control system. • To measure accuracy of ordering of web search results we measure AUC-ROC • Percentage of irrelevant results not seen by user before finding a relevant result (if scan results from top)

  20. 5) Analyze Data • Do results support the hypothesis? • Are differences statistically significant? • Use statistical test to determine if observed differences are unlikely to be due only to random variation, i.e. probability of null hypothesis < .05.

  21. Results (AUC-ROC) * Indicates statistically significant improvement over previous result * * *

  22. 6) Interpret data and draw conclusions that serve as a starting point for new hypothesis • Is random ordering the best baseline to compare to? • What if just order results based on popularity (i.e. how many people clicked on a particular result after submitting a given ambiguous query).

  23. New Baseline Results

  24. Refine System • Develop MLN that incorporates popularity information. • Rerun experiment to obtain results for revised version and verify the hypothesis that it performs better than the popularity baseline.

  25. Results for Revised System

  26. 7) Publish Results • Paper submitted to the international data mining conference. • KDD-09: Paris, June 28 – July 1, 2009

More Related