1 / 15

Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha

Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha. New York University. Courant Institute of Math & Computer Sciences. Department of Biology. Gloria Coruzzi Mike Chou Andrew Kouranov Laurence Lejay. Dennis Shasha. Bud Mishra

hansl
Télécharger la présentation

Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha

  2. New York University Courant Institute of Math & Computer Sciences Department of Biology Gloria Coruzzi Mike Chou Andrew Kouranov Laurence Lejay Dennis Shasha Bud Mishra Marco Antoinotti Marc Rejali

  3. LIGHT Photosynthesis Glu Sugar Gln AminoAcids Asp NH4+ Asn

  4. Light, Carbon and Amino acids differentially regulate N-assimilation genes Carbon Carbon Light Light Amino acids GS2 AS1 Amino acids Asn C4:N2 Gln C5:N2 C:N C:N

  5. Goal: Figure out the Circuit for many genes • A Multi-factor Approach to C:N sensing in plants. • Identify how a combination of interactions of “inputs” (Light, Carbon, & Nitrogen) affects gene regulation using Combinatorial Design and Genome Chip analysis. Identify Arabidopsis mutants defective in C:N sensing Forward genetics: Selections for C:N sensing mutants Reverse genetics: Mutants in candidate C:N signaling genes Ultimate Goal: Virtual plant… (frankenfoods)

  6. A Combinatorial Approach to discovering interactions • Inputs: *Light • *Starvation to Various Nutrients • *Carbon • *Inorganic N (NO3/NH4) • *Organic N (Glu) • *Organic N (Gln) If inputs are take binary values (first approximation) 6 binary (+/-) inputs= 26 or 64 input combinations (or treatments) Use combinatorial design to reduce number of treatment combinations required to effectively cover the experimental space

  7. ACTIVIST DATA MINING Don’t study the experiments (only). Change them. Combinatorial design generates a subset of the 64 treatments that give “good” approximation of the entire experimental space. For every pair of “inputs”, all four combinations of binary variables are tested: Example; NO3 and Carbon have four possible combinations +NO3 +Carbon; +NO3 -Carbon; -NO3 +Carbon; -NO3 -Carbon Each combination of inputs is present in at least one treatment of experiments predicted by combinatorial design

  8. “Combinatorial design” predicts 12 conditions to test the effect of Light in all combinations of Starvation, Carbon, and Nitrogen

  9. “Pivot” analysis of gene expression data from C:N treatments Find “minimal pairs” of treatments that are the same except in one input (e.g. Light) to measure its effect on a dependent variable (gene) (e.g. AS1) Analyze a series of minimal pair treatments using one input (e.g. Light) as a “pivot”, to determine the effect of light on a dependent variable (e.g. AS1) under a variety of carbon and nitrogen combinations. If consistent, likely always true.

  10. LITE represses AS1 & induces GS2 under a variety of C:N conditions

  11. GLU induces AS1 & represses GS2 under a variety of conditions

  12. Underlying Method: combinatorial design Combinatorial design: Inspired by work in software testing by David Cohen, Siddhartha Dalal, Michael Fredman and Gardner Patton at Bellcore/Telcordia. Their problem: how to test a good set of inputs to a program to discover whether there are any bugs. Not program coverage, but input coverage. Not all input combinations, but all combinations of every pair of of input variables. Hypothesis: every input combination should give same output: no error. If true for designed subset, then program is ok.

  13. Underlying Method: combinatorial design 2 Scientific question: does input X induce (resp. repress) the output? If so, then, regardless of the other inputs, X should induce. So, choose X = low and then a combinatorial design of the other inputs. Then choose X = high and then the same combinatorial design of the other inputs. If for each context c in the design (high,c) has more output than (low,c) -- minimal pair -- then X is inductive.

  14. Underlying Methods: adaptive design What happens when X isn’t uniformly inductive or repressive? Suppose X shows induction normally, but repression occasionally. That is for most c values (low, c) vs. (high, c) shows induction, but for one c’ (low,c’) vs. (high, c’) shows repression. Then study difference between those c values showing induction that are closest to c’ and design experiments to reduce those differences.

  15. Conclusions About Methodology Design/don’t wait: Use the data you are given, sure, but don’t be shy to ask for more. Combinatorial Design can help test a hypothesis: e.g. 10 three-valued variables require 59,049 experiments to cover whole space. Combinatorial design can reduce this to 27. Adaptation is easy: Study differences between normal cases and abnormal ones to discover fine structure.

More Related