1 / 29

Mining for constant valued biclusters using RCB

Mining for constant valued biclusters using RCB. Sean Landman March 7 th 2011. Outline. Review Biclustering Apriori Range Support Patterns (RAP) RCB ( Atluri et al., 2009) Definition Algorithm Genetic Interactions (GI) data Experimental results. Review: Biclusters.

sheila
Télécharger la présentation

Mining for constant valued biclusters using RCB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining for constant valued biclusters using RCB Sean Landman March 7th 2011

  2. Outline • Review • Biclustering • Apriori • Range Support Patterns (RAP) • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results

  3. Review: Biclusters • Clustering along both dimensions • i.e. Genes co-expressed across a subset of conditions rather than across all conditions • Different types of biclusters: Image: Atluri et al. (2009)

  4. Motivation • Constant value biclusters are important for analyzing genetic interaction data • More later… • Problems with previous approaches: • Reliance on heuristics • i.e. Top-down greedy search • Focus on different types of biclusters • Need a way to find constant valued biclusters without relying on heuristics

  5. Review: Apriori principle Image: Feb. 7 Lecture Slides

  6. Review: Apriori algorithm • Input: support threshold, transaction data • Start with set of all 1-itemsets • Discard itemsets with support less than threshold • For k = 2 to N • Generate all possible k-itemsets from (k-1)-itemets • Discard k-itemsets with support less than threshold

  7. Review: RAP framework • Efficient and exhaustive discovery of all constant row/column biclusters • “Association analysis for real-valued data” • Range Support measure:

  8. Review: RAP framework • Range Support = 1.4 + 0.9 = 2.3 Image: Pandey et al. (2008)

  9. Review RAP framework • Range Support measure is anti-monotonic • i.e. Adding an additional item can only decrease the Range Support • Algorithm: • Apriori-like algorithm using Range Support measure instead of Support count

  10. Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results

  11. Range Constrained Blocks (RCB) • Similar idea to RAP • Association analysis framework • Exhaustive and efficient discover of all (nearly-) constant valued biclusters • RAP : constant-row/column :: RCB : constant-value

  12. Range Constrained Blocks (RCB) • Definition: • i.e., Submatrices with all values within a relative range • Range measure is monotonic • i.e. Adding anything to RCB block can only raise its Range score

  13. Range Constrained Blocks (RCB) • Range = (5 – 2) / 2 = 1.5 • Range = (45 - 30) / 30 = 0.5

  14. Why not post-process from RAP? • RCB is 2-dimensional, RAP is 1-dimensional • Combinatorial explosion of examining all submatrices of RAP patterns • Not all RCB patterns are contained within the RAP patterns

  15. Apriori approach? • Not quite… • Item sets are 1-dimensional • Evaluated with Support / Range Support measures • RCB blocks are 2-dimensional • Evaluated with Range measure • Thus, “item set lattice” is exponentially larger

  16. Apriori approach? Image: Feb. 7 Lecture Slides

  17. Algorithm outline • Two separate Apriori-like discovery steps: • 1 – Discover all square RCBs • 2 – Merge square RCBs to discover all RCBs • Examples: • 1.1: Find all 1x1 RCBs • 1.2: Find all 1xN or Nx1 RCBs (for all N) • 2.1: Find all 2x2 RCBs • 2.2: Find all 2xN or Nx1 RCBs (for all N) • etc… • Only keep RCBs of size 3x3 or larger

  18. 1 - Discovering all square RCBs Image: Atluri et al. (2009)

  19. 2 - Merging square RCBs • For each set of square RCBs of a particular size that share a common dimension: • Merge using an Apriori-like algorithm Image: Atluri et al. (2009)

  20. Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results

  21. Application: Genetic Interactions • Rows and columns both represent genes • Entries represent the level of genetic interaction between genes • Determined using gene knockout experiments • ε = FAB – FAFB • i.e. FA represents fitness after gene A is deleted

  22. Application: Genetic Interactions • ε = FAB – FAFB • Negative ε represents functional redundancy • Positive ε represents interactions within a functional pathway • Focus in this paper • Positive RCBs in this context represent a complex of functionally related genes

  23. Application: Genetic Interactions Image: Costanzo et al. (2010)

  24. RAD55 RAD57 RAD51 RAD54 RAD52 Between Pathway Interactions (compensatory) REV7 REV1 REV3 RAD55 RAD57 RAD51 RAD54 RAD52 RAD55 RAD57 RAD51 RAD54 RAD52 Within Complex/Pathway Interactions Application: Genetic Interactions Images: Kelly & Ideker (2005), Schuldiner et al. (2005)

  25. Results • Small biclusters • Low Range score Image: Atluri et al. (2009)

  26. Results • Mean functional evaluation (FE) score corresponds well with the Range measure used to define RCB blocks Image: Atluri et al. (2009)

  27. Results • RCB patterns tend to have a much tighter spread between minimum and maximum values than FP or RAP (i.e. better Range score) Image: Atluri et al. (2009)

  28. Conclusion • RCB framework is used to find constant valued biclusters… • Exhaustively • Efficiently • Used for discovering functionally related gene modules in GI data • Other applications: gene expression data?

  29. Questions?

More Related