420 likes | 530 Vues
Explore the innovative realm of data-driven algorithms developed by Bernard Chazelle at Princeton University. This research delves into linear programming, dimension reduction, and optimization techniques vital for processing vast datasets in applications such as face and voice recognition. It highlights concepts like the Johnson-Lindenstrauss Transform, Fast Johnson-Lindenstrauss Transform, and property testing within graph theory. This comprehensive study bridges theoretical underpinnings and practical applications, offering insights into algorithmic efficiency, randomization, and approximation in tackling complex problems.
E N D
Data-Powered Algorithms Bernard Chazelle Princeton University
Dimension Reduction 25 10000 Images (face recognition) Signals (voice recognition) Text (NLP) . . . Nearest neighbor searching Clustering . . .
Dimension reduction All pairwise distances nearly preserved
Johnson-Lindenstrauss Transform (JLT) d v Random Orthogonal Matrix c log n 2 d
Friendly JLT d N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) c log n 2 N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - Friendlier JLT d c log n 2 d log n 2 = ( )
1 1 1 1 1 1 1 + + + + + + + - - - - - - - Sparse JLT ? d 0 . . . 0 0 0 0 0 0 c log n 2 1 d 0 0 0 0 0 . . . o(1)-Fraction non-zeros 0
Main Tool: Uncertainty Principle Heisenberg Time Frequency
1 1 1 1 + + + + - - - - c log n 2 log3 n 2 = O( + d log d + d ) Fast Johnson-Lindenstrauss Transform (FJLT) d d d Discrete Fourier Transform 0 N(0,1) d . . . Optimal ??
Data-Powered Algorithms
theory experimentation
theory experimentation computation
theory experimentation 1950... computation
input output Most interesting problems are too hard !!
input output So, we change the model… randomization approximation
input output PTAS for ETSP randomization approximation
input output Impossible to approximate chromatic number within a factor of… randomization approximation
input output Berkeley “school” (program checking & probabilistic proofs) randomization Property Testing [RS’96, GGR’96] approximation
Distance is 4 edit distance
no bipartite yes
no anything bipartite yes [GR’97]
Mixingcase 18 17 7 62 bipartite! non-bipartite! polylog cycles Birthday paradox
Non-mixingcase Nonmixing implies small cuts [M’89]
Dense graphs Hofstadter. Godel, Escher, Bach. Is graph k-colorable? [GGR98, AK99]
Main tool Szemerédi’s Regularity Lemma Far from k-colorable Lots of witnesses
Property Testing http://www.cs.princeton.edu/~chazelle/ • Graph algorithms • connectivity • acyclicity • k-way cuts • clique • Distributions • independence • entropy • monotonicity • distances • Geometry • convexity • disjointness • delaunay • planeEMST