Download
Download Presentation
Data flow analysis in the Processor Farmlet

# Data flow analysis in the Processor Farmlet

Télécharger la présentation

## Data flow analysis in the Processor Farmlet

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Data flow analysis in the Processor Farmlet Transient Behavior of the M/M/1 process Gustavo Cancelo

2. S1 S2 S0 Review of a stationary M/M/1 queue • The input and output to the queue are Poisson distributed. • : input data rate (w/s, pix/BCO) • : output data rate • : traffic intensity    State diagram Steady State equations Gustavo Cancelo

3. Problem statement A Farmlet has N processing nodes. When one nodes fails, the remaining N-1 nodes will see an increase in the average data flow they must process. Several questions come up: • How is the data flow dynamics in the remaining N-1 working processors? • Is the new system stable or do we need to throttle data? • If the node’s failure can be fixed by reinitializing the processor, how much time do we have before running into problems with the other nodes? • How much processing idle-time we need to provide to each processor to prevent data loss by instability (i.e overflow) during a fault? Note that processing idle-time almost linearly translates into \$\$\$. Gustavo Cancelo

4. o = i/N o = 0 o = i/N o = i/(N-1) Buffer Manager Buffer Manager i i . . . . . . o = i/N o = i/(N-1) o = i/N o = i/(N-1) Problem statement (cont.) Before failure (BF) at t-> After failure (AF) at t-> (1a) • If AF>1 the system becomes unstable and must throttle data in the long term. • Here the term “unstable” implies that N-1 nodes cannot process the average input rate and in the “long term” (i.e after 4 or 5 time constants of the system dynamics ) the buffers will grow unbounded. • Equations 1b, 1c, and 1d are valid only if the system remains stable. (1b) (1c) (1d) Gustavo Cancelo

5. Transient M/M/1 • The M/M/1 process dynamics can be modeled by a differential-difference equation. • This equation shows that M/M/1 Space State is continuous in the time domain an discrete in the state that describes the buffer size. (2) • To solve (2) we must use both the Laplace and the Z-transforms. (3) • Where i represents the buffer size at t0 and P0*(s) is the Laplace transform of the initial state. Gustavo Cancelo

6. Transient M/M/1 (continued) The solution to (3) is: (4) where Ik(x) is the modified Bessel function of the first kind. Equation (4) not only includes Bessel functions but also an infinite sum of them! The increasing and decreasing exponential terms in Pk(t) generate numerical problems when we calculate Pk(t) for t ->  Gustavo Cancelo

7. Transient M/M/1 (continued) (4) Pk(t) represents the entire distribution at each time instant. If we let t-> , (4) provides the M/M/1 steady state equations Now we’ll focus on Pk(t)’s 1st moment Pmean(t). There isn’t a closed expression for Pmean(t). Standard methods use numeric integration in the time domain or the in the transformed domain. An alternative is approximation of Pmean(t) with simpler functions. Gustavo Cancelo

8. Optimal Least-Squares Approximation to Pmean(t) Optimal Least-Squares Approximation to Pmean(t). Assuming that Pmean(t) is stable (i.e. ρ= λ/μ < 1), it can be demonstrated that Pmean(t) is a non-decreasing function in t with exponential behavior. Hence it make sense to approximate it with a function like (5) The approximation is done using the L2 norm (6) The approximation involves inserting equation (5) into (6), deriving, and equating the result to zero to solve for the optimal coefficients. Since qn(t) is an infinite series it must be truncated. The truncation defines our model. That is the order of the approximation and the number of coefficients we need to solve for. Gustavo Cancelo

9. --- Pk(t)/qmean --- q1/qmean --- q2/qmean Optimal Least-Squares Approximation to Pmean(t) (cont.) Due to the exponential behavior of Pmean(t), 1st and 2nd order approximations work pretty well. b1, in q1 and a1, b1 and b2 in q2 are a function of ρ. An important result of this approximation is that a time constant for the dynamic process can be obtained. Let τ1=1/ b1 be the time constant of the 1st order approximation. Gustavo Cancelo

10. --- Pk(t)/qmean --- q1/qmean --- q2/qmean Optimal Least-Squares Approximation to Pmean(t) (cont.) Example: Let τ1=1/ b1 ρ=0.9 b1 = 0.0123μ μ=3.3events/ms Then, τ1=24.4ms 24.4ms => 24.4 million clocks of a 100MHz clock system (not bad!) If we want to have a moderate increase in queue sizes and processor workload we should attempt to recover from a fault condition as fast as possible. A fault recovery time ~ 0.1τ1 will increase the queue by ~ 20% Gustavo Cancelo

11. Summary • Upon fault of a processing node the remaining N-1 nodes see an increase of their input queue size and processing workload (Eq. 1a-1d). • The fault condition may bring the system to unstability or saturation in the long term (Eq. 1a). • However, if we can recover fast enough from the fault condition, we may be able to keep the queue size and workload within reasonable bounds, even when the process is unstable in the long term. This will allow us to design keeping the processors with a low idle-time. • A throttling system must be available for when the system cannot recover from a fault, such us faults caused by hardware problems. Gustavo Cancelo

12. Some Costing Considerations • Motherboard + 4 Daughterboards (M+4D) = \$2380 • Motherboard + 6 Daughterboards (M+6D) = \$2820 • Cost of Idle Time • M+4D case: \$17K per every 1% (10% IdleTime=\$170K) • M+6D case: \$13K per every 1% (10% IdleTime=\$130K) • How much does it cost to remain stable after a processor fault: • M+4D case: 2<1 => 1<0.75: ~\$250K • M+6D case: 2<1 => 1<0.83: ~\$113K Gustavo Cancelo

13. Example of a Triplet entry in the file: -888 -1 8 -0.35682 -0.68696 -34.17005 -0.39532 -0.83705 -38.42006 -0.38984 -1.00026 -42.67006 0 8 -0.35761 -0.68310 -33.77005 -0.38282 -0.81598 -38.02005 -0.41201 -0.99012 -42.27005 … -888 Inner/Outer, Left/Right going triplet Pixel Points Bend view plane N-1 plane x, y, z N plane x, y, z N+1 plane x, y, z Station No Non-bend view plane The Triplet’s File If, for instance, dim(x)=16 bits, dim(y)=16 bits, dim(z)=7 bits, each Pixel Point occupies 40 bits. Three Pixel Point occupy 120. We can use 11 for tags => 128 bits = 4-32 bit words per Triplet line. Gustavo Cancelo

14. Triplet’s Data Statistics • Average event size: • In number of triplets: 88.90 triplets • In number of words (4-32bit words/triplet): 355.61 words (1.4KB) •  event size • In number of triplets: 80.06 • In number of words (4-32bit words/triplet): 320.25 words (1.3KB) • Largest event size of a sample of 2500 events • In number of triplets: 633 triplets (7.12 times the average) • In number of words (4-32bit words/triplet): 2532 words (~10KB) • Average execution time: • In s: 90.96 s • In number of BCO clocks: 688.83 clocks •  execution time: • In s: 141.7 s • In number of BCO clocks: 1073.8clocks Gustavo Cancelo

15. Triplet’s Data Statistics (2) • The throughput is based in the average execution time 90.96s. • If we can execute at this speed we’d only need 690 processors! • Average data throughput to a M+4D: • In number of triplets: 3.52 million triplets/s • In number of words (4-32bit words/triplet): 14.08 Mw/s • In bits/s 450.56 Mb/s • Average data throughput to a M+6D: • In number of triplets: 5.28 million triplets/s • In number of words (4-32bit words/triplet): 21.12 Mw/s • In bits/s: 675.84 Mb/s Gustavo Cancelo

16. Triplet’s Data Statistics (3) Gustavo Cancelo

17. Triplet’s Data Statistics (4) Gustavo Cancelo

18. Farmlet simulation run • Simulates a processor failure • The simulation run for 800 BCOs • Processor No4 “failed” at BCO=100 and was operative again starting at BCO=600. • Simulation parameters • Processor’s internal queue maximum size is 2 events deep. • Buffer Manager’s individual queues are just one event deep. • The FIFO input buffer size is not restricted. • The data is moved around in 32-bit words. One word per clock cycle. The system clock is set at 106 MHz. • Simulation Results • Input FIFO queue size • Processor idle time Gustavo Cancelo

19. Simulation output (1) Gustavo Cancelo

20. Simulation output (2) Gustavo Cancelo

21. Simulation output (3) Gustavo Cancelo