Effect of Linearization on Normalized Compression Distance
410 likes | 772 Vues
Effect of Linearization on Normalized Compression Distance. Jonathan Mortensen Julia Wu DePaul University July 2009. Introduction. Kolmogorov Complexity is an emerging similarity metric Transformation Distance Universal Similarity Measure
Effect of Linearization on Normalized Compression Distance
E N D
Presentation Transcript
Effect of Linearization on Normalized Compression Distance Jonathan Mortensen Julia Wu DePaul University July 2009
Introduction • Kolmogorov Complexity is an emerging similarity metric • Transformation Distance • Universal Similarity Measure • Does not require feature identification and selection • How can it be applied to images? • CBIR, Classification • Investigate its effectiveness • Discovered some fundamentals have been overlooked thus far
Outline • Background • Kolmogorov Complexity and Complearn • Research Topics • Spatial Transformations • Intensity Transformations • Image Groupings • Conclusion • Future Work
Background • Li (2004): successful clustering of phylogeny trees, music, text files • 1D to 2D data? • Tran (2007): NCD not a good predictor of visual indistinguishability • Only one photograph used, one type of linearization (row-by-row) • Gondra (2008): CBIR using NCD produced statistically significant measures against H0 of random retrieval and other similarity measures • Test set of hundreds of images, inconsistent methods of compression and concatenation, linearization unclear
K(x) – The length of the shortest program or string x* to produce x K(x|y) - The shortest binary string to convert output x given input y E(x,y)=max{K(x|y),K(y|x)} Normalized Information Distance: Kolmogorov Complexity
Kolmogorov Complexity • Universal, in that it captures all other semi-computable normalized distance measures • Therefore also semi-computable • Compression losslessly simplifies strings, and therefore is used as an approximation, C(x) “The human brain is incapable of creating anything which is really complex.”--Kolmogorov, A.N., Statistical Science, 6, p314, 1990
CompLearn • Open Source package which implements K-Complexity • Developed by Rudi Cilibrasi, Anna Lissa Cruz, Steven de Rooij, and Maarten Keijzer • Uses basic linux compression tools to develop the comparison map
Initial Questions • Linearization Methods and Alternatives • How to Preserve a 2D signal • Linearization’s affect NCD on spatial transformations and intensity shifts • Do additional feature images lower NCD? • CBIR: Can K-Complexity be used with feature vectors or image semantics
Spatial Transformations • Applied 4 types of linearization to 800 images (original and 7 transformations) • Found that each linearization type produced distinctly different NCDs • Certain linearizations result in lower NCDs for certain transformations
Linearization Methods Row Major Column Major Hilbert-Peano SPC: Images transformed to 128x128 SCPO: Images transformed to 35% of original size
Spatial Transformations Original Image Down Shift Left Shift 180 rotation 90 rotation 270 rotation Reflection Y Axis Reflection X Axis
Intensity Transformations • Additive Constant • Three types of noise • Gaussian • Speckle • Salt and Pepper • Least Significant Bit (LSB) Steganography • Contrast Windowing
Additive Constant Image 937.jpg +32 and +64 respectively • P = Intensity + Constant • +4, +8, +12… +100 • 16 bit • 255 (+4)-> 259 • Truncation • 255 (+4)-> 255 • Wrap • 255 (+4)-> 4
Various Noise • Gaussian (Statistical) • Speckle (Multiplicative) • Salt and Pepper (Drop-off) 0.32 and 0.64 Variance/Noise Density Respectively
Noise Cont: • Gaussian and Speckle Noise don’t compress well • Gaussian and Salt Pepper experience some posterior decay
Least Significant Bit Steganography • Hide4PGP • “Scrambles” message • Changes pixel bit to most similar color with opposite bit assignment • Spreads secret data over entire file • True Grayscale: Changes two bits per pixel Image with No Text Image hiding “Gettysburg Address”
Contrast Windowing • Computed Tomography image enhancement that increases contrast in certain structures • Brief Medical Exploration
Contrast Windowing Lung Window (-200 HU, width 2000 HU) Bone Window (300 HU, width 1500 HU) Patient 5: Original Image top left Soft Tissue Window (50 HU, width 350 HU)
P1 P3 P5
Conclusion: "How Many" vs "How Little" • NCD for Ordinal Comparisons • Numerical Redundancy Selective Entire Picture Gaussian Speckle Noise Salt and Pepper Noise Steganography Additive Constants Contrast Windowing Larger NCD SmallerNCD
Feature Image Comparison and Grouping • Feature Image: Pixel based values derived from the original image • 3 Main Types of Linearization • Avg NCD inter > Avg NCD intra • The greater inter - intra, the better NCD finds groupings
Feature Image Linearization • Image-At-Once – row-order one feature image at a time • Row Concatenation – Appends all images, then performs row-order linearization • Pixel Order – Selects value from same pixel of each feature image in row-order fashion • Gray Row-Major – Grayscales an image and follows row-order on intensities
Data Set and Methods • Corel Image Database with 10 predefined groupings • Linearized by 5 methods • NCDs were found within a group and then to the left and to the right
Results • Nearly every linearization produced statistically different NCDs • Intra Group was always less than Inter Group • Gray provided the greatest difference Inter-Intra • Thought this was due to filesize • Triple Concat’ed Gray creating equal filesize: Found an even greater difference
Conclusion • NCD is a good model for predefined human groupings and linearization has little impact on this • Gray-Triple Row-Major may be the best form of linearization • Direction of concatenation does not matter • Defined a methodology for any number of feature images
Conclusion • Compressor Errors • Numerical Redundancy • Ordinal Variables vs Nominal Variables • EX: 195 195 195 195 <=> 198 198 198 198 • NCD = 0.100000 • 199 199 199 199 <=> 202 202 202 202 • NCD = 0.128205 • NCD needs refinement • 2D image as a 1D string?
Future Work • Image Scaling and Normalization • Additional Feature Images • New Forms of Image concatenation • Investigate Compressors (Numeric?)
References • A. Itani and D. Manohar. Self-Describing Context-Based pixel ordering. Lecture notes in computer science, pages 124{134, 2002. • M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004. • R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000. • R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/. • R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003. • N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007. • I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.