Advanced Techniques in Binary Factor Analysis
170 likes | 273 Vues
Explore Blind Search and Formal Concepts methods for Binary Factor Analysis, including optimizations and quality checking. Discover efficient algorithms and future possibilities in binary data analysis.
Advanced Techniques in Binary Factor Analysis
E N D
Presentation Transcript
Using Blind SearchandFormal ConceptsforBinary Factor Analysis Aleš Keprt ales.keprt@vsb.cz
Synopsis • Binary Factor Analysis (BFA)- introduction to BFA- exact solution of BFA- quality checking • Possible optimizations • Blind Search method • Method based on Formal Concepts • Testsand the comparison of methods • Possible future work
Binary Factor Analysis (BFA) • Factor analysis of binary data • Using boolean arithmetic • Trying to express matrix X as a product of two matrices • is binary matrix multiplication • Initial conditions:we know X, dimensions of all matrices,number of one’s per row of A
Exact BFA Solution(i.e. reference factorizer) • For checking other algorithms • Searches for the best (optimal) solution • Exact = opposite toapproximatesolution The key:Perform all possible optimizations to avoid checking of all bit-combinations
Boolean arithmetic • Classical arithmetic: • Boolean arithmetic:
Quality check • Discrepancy (česky „odchylka”) • Our goal is to minimize discrepancy • We distinguish between positive and negative discrepancy
Possible optimizations • Empty rows and columns – skip • Duplicate rows and columns – merge • Order factor loadings (rows of A) • Constant number of one’s per row of A • Use the knowledge of A to get F • Parallel (distributed) computaion using multiple computers
Quality check • When merging duplicate rows/columns:
Blind Search • Blind Search = „slepé hledání” • The strategy:1. Build up particular candidate for A2. Find the best F for this A3. Compute discrepancy4. Remember the best A,F pair so far5. Back to step 1
Blind Search (2.) • The key tasks:1. Building up the candidates for A2. How to get F when knowing A • Building up the candidates for A:- row by row (one row one factor)- bit-coded matrices can be much faster- factors cannot repeat on several rows
Blind Search (3.) • How to get F when knowing X and A:- bit-coded matrices may help (yet again)- going row by row (better than cassical„rows x columns”multiplication algo) • How to get the particular row of F:(a) blindsearch (b) do some preprocessing, then blindsearch
Using Formal Concepts • Concepts are our candidates for the rows of matrix A • Example: set“p3”- data matrix size100x100- 10 concepts (see pict.) • Searching for 5 factors:- only 8 candidates- 56 possible combinations
Details • Concepts generate no negative discrepancy higher computation error higher semantic value (for us) • We can get negative discrepancy when computing F (as discussed before) • Performed tests gave promising results
Final Comparison • The presented algo’s are very similar • Giving the same results in our tests • Computation times are very different- times of set “p2” (mentioned before): blindsearch: approx. 3x109 yearsconcepts: 7 seconds
Possible Future Work • Current implementaion uses independent application to compute concepts lattices • Integration to a single application may speed up the computation • We don’t need to compute whole concept lattice, even don’t need to know all concepts • Need to find better algorithm for binary matrix pseudo-division F = X/A