470 likes | 605 Vues
Some slides by Borys Bradel, some from the actual presentation. How to Update?. Update 12: Match [0,255],[0,63],[12,13] Tight Bound [12,13] Increment. When to Split?. Adaptive to stream size Relative importance crucial Bounds error SplitThreshold = E*N/log(R) E: error threshold
E N D
Some slides by Borys Bradel, some from the actual presentation.
How to Update? Update 12: Match [0,255],[0,63],[12,13] Tight Bound [12,13] Increment
When to Split? • Adaptive to stream size • Relative importance crucial • Bounds error • SplitThreshold = E*N/log(R) • E: error threshold • N: number of elements read so far • R: size of range (e.g. 32 for 32 bit program counter)
Benefit of E*N/log(R) • Guaranteed error bound wrt N: E • Maximum per node value: SplitThreshold • Number of nodes in tree up to root: log(R) • Normalization factor: N • SplitThreshold*log(R)/N = E • Guaranteed memory bound:O(log(R)/E) • Threshold doubles when element added • 15 (1+2+4+8) elements -> 4 new nodes • Convergent series
Batched Merges Threshold doubles -> exponential back off Theoretical Graph
Disadvantages • Error bounds guaranteed with respect to number of elements processed • Potentially 1%-10% of several billion • Guaranteed memory bound without merges is defined (i.e. O(log(R)/E)), but meaningless • Merge threshold is not defined • Probably similar to SplitThreshold • Proportional to N • E’*N/log(R) where E’<<E
q: back off factor bigger q implies more time between merges If b is too small, the tree is too deep (is 3 best?)
Stage 0: 1k buffer okay, for split/merge stalls Is 1k really enough?
Stage 1: (4096 x 36)Ternary CAM Without pipelining bottleneck: 7ns >4000 wires?
Stage 3: (16KB) SRAM - if set increment counter If pipeline CAM, bottleneck: 1.26ns What indices?
Stage 4: Figure out if need to stall (split/merge) 1st merge at event 1024: #merges=log(events)-10
Issues • 4000+ possible subranges is aggressive • 4000+ wires? • Is 1000 entry buffer enough to stall while all nodes merged? • Indices are probably pointers to parent nodes (that’s my guess anyway)
High Accuracy? May be be okay for code regions
Value Profiles Are the bounds tight enough to be useful?