On the Competitiveness of Self Organizing Linear Search

On the Competitiveness of Self Organizing Linear Search J. Ian Munro (University of Waterloo) Competitiveness: How well does an on line algorithm do in comparison to one that is given the sequence of operations in advance? Self Organizing Search: Based on requests received, continually reorganize the data structure to make frequently accessed elements cheaper to get. Model of Computation: What operations can we do? What do we count? Amortized cost in our case.

Self Organizing Linear Search Elements in a linear list (array or linked) On access can reorder the elements inspected, count number records inspected Likely approaches: • Move to Front S E L F O R G A N I Z I N G L I N E A R.. I S E L F O R G A N Z I N G L I N E A R.. • Simple exchange S E L F O R G A N I Z I N G L I N E A R.. S E L F O R G A I N Z I N G L I N E A R..

Known Bounds • Expected case (long sequence, elements have fixed independent probabilities of access): • Cost(SE)  Cost(MtF)  2 Opt = 2  i pi (McCabe; Rivest) • Replace pi by fi/m, then on an amortized basis Cost(MtF)  2  i fi/m (Bentley and McGeough) • Move to Front is 2-competitive (Sleator and Tarjan): i.e. within a factor of 2 of the cost of the off line optimal…which can move elements about as if it knows what will be accessed but model is crucial.

That Model • Scan to the element requested, and no further • (charge 1 per element inspected) • Put requested element anywhere among scanned values • (free) • Swap as many pairs of consecutive elements as you like • (1 per exchange) so swapping front and back halves of list costs (n2) 1 2 3 4 5 6 .. n/2 (n/2+1) ..n (n/2+1) .. n 1 2 3 4 5 6 .. n/2

A Realistic Change in the Model • Scan as far as you like. Charge one per element inspected • KEY POINT Can rearrange portion inspected arbitrarily at no charge (in fact we will perform very simple rearrangements) so • so swapping front and back halves of list costs (n) 1 2 3 4 5 6 .. n/2 (n/2+1) .. n (n/2+1) .. n 1 2 3 4 5 6 .. n/2

Order by Next Request • Scan for element i, continue to the 2lg i • Reorder the element requested by NEXT REQUEST Cost New Ordering 161 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2 2 1 .. 4 3 4 1 2 .. 2 4 3 .. 8 5 6 7 8 1 2 3 4 .. 2 6 5 .. 4 7 8 5 6 .. 2 8 7 .. 16 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8

Cost of a “Cycle” for Order by Next Request • Cost (under our model) is n lg n, amortized lg n • Lower bound Theorem: Under our model, n ln n inspections are required to individually scan for each of n elements. Proof idea: “Crossing sequence argument” Access to elements in position i or later must occur at least n/i times … for a total cost of at least  n/i  n ln n

More General Bound on Amortized Cost of OBNR Main Theorem: Let r denote the number of distinct elements requested since the last for a given value, then the amortized cost for a search can be charged as at most 1 + 4 lg r Idea of Proof: “Soak the Middle Class” For any request, entire cost is borne by penultimate block of size 2lg i - 2, so costs 4 for each of them. Lemma: An element can be in the penultimate block at most once between its accesses.

Tweaking the Constant Blocks of size 2k are not optimal. Make them of size powers of 4.24429 .. And the preceding bound becomes 2.66241 .. lg r

Corollaries • Bounds look much like Splay Tree bounds • Request Gap View: Total cost of sequence in which element i occurs ri times is at most n + 2.66241..  lg (ri + 1). or • Entropy View: Let fidenote the frequency of accesses to i in a sequence of length m, then the total cost is at most n + 2.66241..  fi lg (m/fi), or writingfi/m= pi, the amortized cost is O(1 + pi lg 1/pi)

Linear Search Versus Binary Trees .. Off Line • Bounds in terms of frequencies are same order. • Splay trees can take advantage of key order (e.g. accessing values sequentially). • Order by Next Request on trees .. a bad case: 1 2 3 4 6 5 7 8 12 10 14 9 13 11 15 Order by Next Request takes amortized time lg n and makes no changes in the tree!

On the Competitiveness of Self Organizing Linear Search