1 / 26

Strategies for solving scientific problems using computers

Strategies for solving scientific problems using computers. Outline. Motivation A standard framework Which tool to use? Critical considerations Aftermath. Motivation. Most (all?) problems in modern geoscience benefit strongly from computer methods

phiala
Télécharger la présentation

Strategies for solving scientific problems using computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Strategies for solving scientific problems using computers

  2. Outline • Motivation • A standard framework • Which tool to use? • Critical considerations • Aftermath

  3. Motivation • Most (all?) problems in modern geoscience benefit strongly from computer methods • A good hypothesis warrants a clear analytical approach • Make large problems more tractable • Avoid a posteriori rationalizations as much as possible • Encourage predictions rather than diagnoses • Our scientific thought process must be defensible,and so should our methodology

  4. Overarching framework for a scientific problem Review existing research Formulate a hypothesis Collect new data Where will most of your time/energy be spent? Process this data Interpretation Evaluate and present your hypothesis

  5. A sub-framework for computer-based problems Load data within chosen work environment Format this data Process this data Visualize results Scientific interpretation

  6. What relevant tools exist? • Many options are free or free-ish • Overlapping functionality • Many are user-extendable

  7. Finding the best tool for the job • Are you already familiar with it? • Can it already do what you need it to do, or is it conceivable that it could do so after some effort? • Is it easy (enough) to learn? • Is it intuitive? • Is it fast enough? • Does it support the command line, a GUI or both? • Can you understand what it’s doing, or is it a black box? • Does it have sufficient mathematical functionality? • Does it have sufficient mapping functionality? • Can it easily generate reproducible output? • Is it popular within your field? • Can its output be shared easily? • Is it affordable and accessible?

  8. Which tool to use for geoscience? • MATLAB, Python, GMT and ArcGIS are the best current options

  9. Intuitive/explicit processing and data visualization

  10. Every platform is vulnerable but some more so

  11. A sub-framework for computer-based problems Load data within chosen work environment Format this data Process this data Visualize results Scientific interpretation

  12. A directory structure for computer-based problems research/code/ current_project data (raw) mat (formatted) your code fig (useful not pretty) old (no need to delete)

  13. Incidentally, a similar manuscript structure research/manuscript/ current_paper draft (versioned) fig (pretty) master document revised (basically inevitable) final (proofs, published)

  14. Loading data • Load all necessary data first • This step can be (but is rarely actually) a deal-breaker • If someone or something generated it, you can almost certainly read it • A question that will keep coming up:How often will you need to do this? • The answer is almost always: Much more often than you think • A valuable habit: Spend the time to record data loading (i.e., not just ad hoc in the command line) and sourcing • Save the MATLAB/etc.-formatted data before processing

  15. How often will you need to do this? • Only once, I swear: • command line and save • import data using GUI and save • Every time I want to do this analysis: • Write it down and comment • Often and with lots of data: • Time to consider how to make it faster • So often that other people will have to do it for me: • Consider writing a GUI, which enforces standardization

  16. Format data • Data structures to use in descending order of preference: • scalar • vector • matrix • structure/object • cell

  17. Numeric vs. logical vs. string • Several different data types to consider • numeric (MATLAB defaults to double precision signed) • string • logical (true/false)

  18. Most (all?) data are imperfect • NaN: Not a Number

  19. Poor variable names • data • index • constant • var • test • temp • i, j • any name identical to or confusingly similar to an existing function name • do not abuse case sensitivity • names that are not descriptive: you will forget what “A” means

  20. Processing data • Document what not how even you think it’s just for you, because you are your own worst enemy • Re-use shamelessly, but avoid copy/paste • Is this a function or a script? Will you re-use it often? • The other kind of MATLAB cell

  21. Visualizing data • Physically separate visualization code • Visualize as you’re writing, but not as you’re running • Again, use cells

  22. MATLAB is trying to help you (similar to Word) • Code could be better • Code is wrong

  23. Whitespace and indentation • Choose a style and stick with it No Better

  24. Order of operations • Forget it exists and use parentheses instead Never Better

  25. Functions warrant error checking

  26. Aftermath • In the long term, do not keep failed/commented code in a working function/script • Getting complicated and/or popular? Consider a versioning scheme or repository, e.g., Github or RunMyCode

More Related