190 likes | 329 Vues
This work explores the concept of "Controlled Rounding" within the context of statistical analysis in a rural village, focusing on the age of animal owners and associated demographics. We provide classical results affirming that all matrices can be subjected to controlled rounding, ensuring that rounding errors in row, column, and overall totals remain within acceptable limits. Our findings extend to "Unbiased Controlled Rounding", enabling accurate matrix transformations while adhering to statistical integrity and minimizing rounding errors. This contributes to reliable data analysis in animal ownership statistics.
E N D
Unbiased Matrix Rounding Tobias Friedrich (joint work with B.Doerr, C.Klein, R.Osbild) Max-Planck-Institut für Informatik, Saarbrücken, Germany
Statistics of a rural village Age of owner #Animals Animal
Statistics of a rural village Age of owner #Animals Animal
Statistics of a rural village Age of owner #Animals Animal
Statistics of a rural village Rounding to multiples of ten
Statistics of a rural village Rounding to multiples of ten
Statistics of a rural village Rounding to multiples of ten
Totals not preserved! Statistics of a rural village Rounding to multiples of ten
Statistics of a rural village Controlled Rounding
Basic Problem: “Controlled Rounding” • Round a [0,1] matrix to a {0,1} matrix s.t. • rounding errors in row totals are less than one • rounding errors in column totals are less than one • rounding error in grand total is less than one • Classical result: All matrices have controlled roundings • Bacharach ’66, Cox&Ernst ’82: Statistics • Baranyai ’75: Hypergraph coloring
Extension 1:Unbiased Controlled Rounding • “Unbiased” = Randomized: • Pr(yij = 1) = xij, • Pr(yij = 0) = 1 – xij. • Result: Unbiased controlled roundings exist • Cox ’87 • Follows also from GKPS (FOCS ‘02)
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ b b b b ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ P P P P ( ( ( ( ) ) ) ) 8 8 8 8 8 8 8 8 b b 8 8 b b j i j i 1 1 2 2 ¡ ¡ ¡ ¡ < < < < : : : : a a x x x x y y y y i i j j i i j j i i j j i i j j ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ j i j i 1 1 a a = = = = Extension 2:Strongly Controlled Rounding • Small errors in initial intervals of rows/columns: • Observation: Errors less than two in arbitrary intervals. • Allows reliable range queries. • # of pigs owned by 20-59 year olds
Our Result • Unbiased strongly controlled roundings exist • Unbiased strongly controlled roundings exist, i.e., one can round a real matrix to an integer matrix s.t. • rounding errors in row/column/grand totals are less than one • rounding errors in initial row/column intervals are less than one • rounding is unbiased/randomized • It can be generated in time • O((mn)2) • O(mn l), if numbers have binary length at most l • O(mn b2), if numbers are multiples of 1/b
Alternating Cycle Trick • Simplifying assumptions: • Row/column sums integral
0 1 0 9 0 0 6 4 0 7 0 2 0 9 0 6 ¡ ¡ + + " " " " : : : : : : : 0 0 0 0 3 4 0 1 0 5 0 3 0 2 B C : : : : : : : X = B C 0 9 0 4 0 7 0 2 0 8 @ A : : : : : 0 2 0 8 0 6 0 6 0 4 : : : : : Alternating Cycle Trick • Choose an alternating cycle (of non-zeroes) • Compute possible modifications: εmin= -0.1, εmax= 0.3 • (a) Non-randomized: Modify with any ε [here: ε = εmax](b) Unbiased: Suitable random choice At least one entry becomes 0 or 1 Time complexity: One iteration O(mn), total O((mn)2).
Fast Alternating Cycle Trick • Additional assumption: • All numbers have finite binary expansion
0 0 0 1 1 1 0 0 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 ¡ ¡ + + + ¡ ¡ + " " " " " " " " : : : : : : : : : : : : : : : 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 B B B C C C : : : : : : : : : : : : : : : X X X = = = B B B C C C 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 @ @ @ A A A : : : : : : : : : : : : : : : 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 : : : : : : : : : : : : : : : Fast Alternating Cycle Trick • Choose an alternating cycle (with 1s in last digit) • Allow only modifications ε1= -0.001, ε2= 0.001 • (a) Non-randomized: Modify with either value(b) Unbiased: Pick each value with 50% chance Bit-length in whole cycle reduces Time complexity: Amortized O(mn) to reduce by 1 bit, Total O(mn l) with l denoting bit length
0 0 0 1 1 1 = = = = = = = = = = = = 1 2 3 4 1 1 2 5 5 5 2 3 3 5 5 5 4 4 4 5 5 5 2 2 2 5 5 5 0 0 0 f g 0 1 + ¡ ¡ + " " " " 5 5 5 5 ; ; ; ; ; = = = = = = = = 3 3 2 5 5 5 4 4 1 5 5 0 0 0 3 3 3 5 5 5 0 0 0 B B B C C C X X X = = = B B B C C C = = = = = = = = = = = = = = = 2 2 2 5 5 5 2 2 2 5 5 5 2 2 2 5 5 5 2 2 2 5 5 5 2 2 2 5 5 5 @ @ @ A A A = = = = = = = = = = = = = = = 4 4 4 5 5 5 1 1 1 5 5 5 4 4 4 5 5 5 3 3 3 5 5 5 3 3 3 5 5 5 Multiples of 1/b (here b=5) • Choose an alternating cycle (of non-zeros) • Allow only modifications ε1= -1/b, ε2= +1/b • (a) Non-randomized: Derandomization(b) Unbiased: Pick each value with 50% chance Entries perform random walk in Time complexity: Amortized O(b2) to round one entry, Total O(mn b2)
Summary • Unbiased strongly controlled roundings: • “randomized roundings” • rounding errors in initial intervals of rows/column: < 1 • Result: Can be generated in time • O((mn)2) • O(mn l), if numbers have binary length at most l • O(mn b2), if numbers are multiples of 1/b Have a good weekend!