1 / 20

Biostatistics 760

Biostatistics 760. Random Thoughts. Upcoming Classes. Bios 761: Advanced Probability and Statistical Inference Bios 767: Longitudinal Data Analysis Bios 780: Theory and Methods for Survival Analysis Bios 841: Statistical Consulting. Bios 761. Frequentist and Bayesian decision theory

Télécharger la présentation

Biostatistics 760

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics 760 Random Thoughts

  2. Upcoming Classes • Bios 761: Advanced Probability and Statistical Inference • Bios 767: Longitudinal Data Analysis • Bios 780: Theory and Methods for Survival Analysis • Bios 841: Statistical Consulting

  3. Bios 761 • Frequentist and Bayesian decision theory • Hypothesis testing: UMP tests, etc. • Bootstrap and other methods of inference • High dimensional data methods

  4. Bios 780 • Time-to-event data • Right censoring • Counting processes; martingales • Semiparametric approaches • Kaplan-Meier estimator • Log-rank statistic • Cox model • Data analysis

  5. Bios 841 • Consulting versus collaboration • Bringing it all together to solve problems • Communicating about statistics • Three real problems • Three journal style reports • One final oral presentation • Real time problem solving • What is the role of statistical theory?

  6. A Few War Stories • As a student: thesis on surrogates • As a postdoc: infectious diseases • As a new professor: cystic fibrosis (CF)* • Working on tenure: empirical processes • Empirical processes and cancer* • Chair of the DSMC for NICHD • Artificial intelligence and NSCLC

  7. CF Neonatal Screening • 1992: Joined Phil Farrell’s CF study team • 1997: Farrell, Kosorok, Laxova, et al, published in NEJM • 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial • States offering CF newborn screening: 3 in 1997, 12 in 2004, all 50 today

  8. What Role Did “Theory” Play? • Used state-of-the-art statistical methods that were robust (GEE) • In other CF research we have used: • Current status methods (parametric, robust) • Constrained regression estimation • Semiparametric bootstrap inference • Martingale based survival analysis • New work using artificial intelligence

  9. Empirical Processes and Cancer • Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) • Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage • Diagnostics show the model fits poorly

  10. What is the Problem? • Poor survival function prediction • Possibly incorrect interpretation of risk factor effects • A model that adds a single parameter to the Cox model was developed and fit • This new model fits well (Kosorok,Lee and Fine, 2004) • Inference for the new model is complicated

  11. What Does Theory Tell Us? • We can derive valid inferential tools for the new model: estimation and bootstrap • Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: • The direction of the regression coefficients is preserved • Should use robust variance for Cox model

  12. Theory Versus Applications • The title implies there is conflict between theory and applications • This isn’t true! • Theory provides a basis for correct thinking and problem solving for applications • Applications drive new theoretical development

  13. Theory Can Be Impractical • Law of iterated logarithm: needs sample size of 108 (“asymptopia”). • Sometimes higher order approximations are needed before it becomes useful. • Sometimes computational properties of asymptotically optimal estimators are poor. • Some hard problems take years to solve.

  14. Why Theory is Needed • Often it does work for practical sample sizes. • Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. • Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). • Theory can drive scientific discovery. • Some results are beautiful.

  15. Data Mining Versus Inference • Data mining is summarizing and representing data no matter how complicated • Inference is determining valid measures of uncertainty • Patterns obtained from data mining can be misleading • Inference without data mining may miss important structure

  16. The Core of Statistics • Statistics is the science of science • How do we learn from our world and draw meaningful and valid conclusions from it? • Need both data mining and valid inference • Requires a unique kind of intuition • Needs many different intellectual perspectives • One of the most challenging of all fields

  17. Everyone Needs Core Literacy • All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve • All statisticians need to know enough about applications to know what is important • All biostatisticians need to know enough statistical methods to be useful in practice • The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology

  18. Semiparametric Inference • The study of statistical models with parametric and/or nonparametric parts • Can achieve trade-off between scientific meaning and model “robustness” • Estimation and inference are often hard • There exists an efficiency bound for parametric and some nonparametric parts • NPMLE, testing and estimating equations

  19. Empirical Processes • Tools for complex model inference and high dimensional data • Can determine universal properties of semiparametric methods: • Consistency • Rate of convergence • Limiting distributions • Valid inference (empirical process bootstrap) • Empirical processes are everywhere

  20. The Road Ahead • Whatever you choose to do, the core statistical theory classes will help you. • Be patient as your learn. • Be willing to work hard (struggle is good). • It takes many different kinds of thinkers with different learning styles. • There are important discoveries to be made in both applications and theory.

More Related