290 likes | 420 Vues
This paper presents an innovative method for discovering constant valued biclusters using Range Constrained Blocks (RCB), emphasizing its relevance in analyzing genetic interaction data. The review of existing biclustering approaches reveals their limitations due to reliance on heuristics. The paper introduces a novel algorithm inspired by the Apriori principle and Range Support measure, enabling efficient discovery of all constant row/column biclusters. Experimental results demonstrate the effectiveness of the RCB framework in identifying significant gene interactions within complex genetic data.
E N D
Mining for constant valued biclusters using RCB Sean Landman March 7th 2011
Outline • Review • Biclustering • Apriori • Range Support Patterns (RAP) • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Review: Biclusters • Clustering along both dimensions • i.e. Genes co-expressed across a subset of conditions rather than across all conditions • Different types of biclusters: Image: Atluri et al. (2009)
Motivation • Constant value biclusters are important for analyzing genetic interaction data • More later… • Problems with previous approaches: • Reliance on heuristics • i.e. Top-down greedy search • Focus on different types of biclusters • Need a way to find constant valued biclusters without relying on heuristics
Review: Apriori principle Image: Feb. 7 Lecture Slides
Review: Apriori algorithm • Input: support threshold, transaction data • Start with set of all 1-itemsets • Discard itemsets with support less than threshold • For k = 2 to N • Generate all possible k-itemsets from (k-1)-itemets • Discard k-itemsets with support less than threshold
Review: RAP framework • Efficient and exhaustive discovery of all constant row/column biclusters • “Association analysis for real-valued data” • Range Support measure:
Review: RAP framework • Range Support = 1.4 + 0.9 = 2.3 Image: Pandey et al. (2008)
Review RAP framework • Range Support measure is anti-monotonic • i.e. Adding an additional item can only decrease the Range Support • Algorithm: • Apriori-like algorithm using Range Support measure instead of Support count
Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Range Constrained Blocks (RCB) • Similar idea to RAP • Association analysis framework • Exhaustive and efficient discover of all (nearly-) constant valued biclusters • RAP : constant-row/column :: RCB : constant-value
Range Constrained Blocks (RCB) • Definition: • i.e., Submatrices with all values within a relative range • Range measure is monotonic • i.e. Adding anything to RCB block can only raise its Range score
Range Constrained Blocks (RCB) • Range = (5 – 2) / 2 = 1.5 • Range = (45 - 30) / 30 = 0.5
Why not post-process from RAP? • RCB is 2-dimensional, RAP is 1-dimensional • Combinatorial explosion of examining all submatrices of RAP patterns • Not all RCB patterns are contained within the RAP patterns
Apriori approach? • Not quite… • Item sets are 1-dimensional • Evaluated with Support / Range Support measures • RCB blocks are 2-dimensional • Evaluated with Range measure • Thus, “item set lattice” is exponentially larger
Apriori approach? Image: Feb. 7 Lecture Slides
Algorithm outline • Two separate Apriori-like discovery steps: • 1 – Discover all square RCBs • 2 – Merge square RCBs to discover all RCBs • Examples: • 1.1: Find all 1x1 RCBs • 1.2: Find all 1xN or Nx1 RCBs (for all N) • 2.1: Find all 2x2 RCBs • 2.2: Find all 2xN or Nx1 RCBs (for all N) • etc… • Only keep RCBs of size 3x3 or larger
1 - Discovering all square RCBs Image: Atluri et al. (2009)
2 - Merging square RCBs • For each set of square RCBs of a particular size that share a common dimension: • Merge using an Apriori-like algorithm Image: Atluri et al. (2009)
Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Application: Genetic Interactions • Rows and columns both represent genes • Entries represent the level of genetic interaction between genes • Determined using gene knockout experiments • ε = FAB – FAFB • i.e. FA represents fitness after gene A is deleted
Application: Genetic Interactions • ε = FAB – FAFB • Negative ε represents functional redundancy • Positive ε represents interactions within a functional pathway • Focus in this paper • Positive RCBs in this context represent a complex of functionally related genes
Application: Genetic Interactions Image: Costanzo et al. (2010)
RAD55 RAD57 RAD51 RAD54 RAD52 Between Pathway Interactions (compensatory) REV7 REV1 REV3 RAD55 RAD57 RAD51 RAD54 RAD52 RAD55 RAD57 RAD51 RAD54 RAD52 Within Complex/Pathway Interactions Application: Genetic Interactions Images: Kelly & Ideker (2005), Schuldiner et al. (2005)
Results • Small biclusters • Low Range score Image: Atluri et al. (2009)
Results • Mean functional evaluation (FE) score corresponds well with the Range measure used to define RCB blocks Image: Atluri et al. (2009)
Results • RCB patterns tend to have a much tighter spread between minimum and maximum values than FP or RAP (i.e. better Range score) Image: Atluri et al. (2009)
Conclusion • RCB framework is used to find constant valued biclusters… • Exhaustively • Efficiently • Used for discovering functionally related gene modules in GI data • Other applications: gene expression data?