Comprehensive Guide to Discrete Event Simulation Techniques
Learn the basics of Discrete Event Simulation, including Event Scheduling, smpl usage, Initialization Routine, and Facility Control concepts. Dive into simulation modeling and imitation techniques for real-world processes.
Comprehensive Guide to Discrete Event Simulation Techniques
E N D
Presentation Transcript
Simulation Modeling • Imitation of the operation of a real-world process or system over time • Objective: to collect data as if a real system were being observed • Data collected from the simulation are used to estimate the performance/dependability measures of the system
Discrete Event Simulation • modeling of the system as it evolves over time by a representation in which the state variables change only at a countable number of points in time • terminology: • simulation clock: a variable that gives the current value of the simulated time • event: an instantaneous occurrence which may change the state of the system
Simulation Terminology • event list: a list (data structure) consisting of event records with each record containing the time of occurrence of a particular event, e.g., the arrival time, the departure time of a client • timing routine: a subroutine which determines and removes the most imminent event record from the event list and advances the simulation clock to the time when the corresponding event is to occur • event routine: a subroutine which updates the state of the system when a particular type of event occurs • one event routine for each type of event
Event Scheduling • Determine the number of event types in the system, e.g., 1: arrival, 2: request for service, 3: service completion, 4: timer, etc. • Place one or more initial event records in the event list, each containing • event time, event type, customer class, etc. • Determine the most imminent event in the event list (by the timing routine) in a loop until a specified stopping rule is satisfied • update the simulation clock when an event record is removed from the event list
Event Scheduling (cont.) • Pass the control to the event routine corresponding to the event type • Update the state of the system • Gather the statistics if necessary • Report the simulation results when the simulation is completed • For example • the average response time per client • the loss probability of calls • the system throughput • the average number of clients served over a time period
Simulation using smpl • In the smpl view of systems, there are three types of entities: • resources: facilities • smpl provides functions to define, request, release and preempt (queueing) facilities • tokens: active entities of the systems, e.g., tasks, users (indistinguishable or distinguishable) • events: a change of state of any system entity is an event • smpl provides functions for scheduling and for selecting events in the order of event occurrence time
Initialization routine; timing control routine to select the most imminent event from the event list (event clock is updated implicitly) { event type 1: event routine for event type 1; event type 2: event routine for event type 2; . . event type n: event routine for event type n; } statistics reporting routine; Structure of An smpl Program
smpl(m, s) int m=0; /* always 0 */ char *s; smpl provides seeds for 15 streams for generating random numbers. To collect a set of 15 sample values of a particular performance measure, one can invoke smpl() 15 times: loop: repeat 15 times { smpl(0, “hw1”); } One can also use stream(1), stream(2), etc. to specify the stream number to be used in a simulation run Initialization Routine
fd = facility(s, n) char *s; int n; /* # of servers */ => define a queueing server with “n” servers; smpl automatically manages enqueueing/dequeueing activities r = request(fd_id, token_id, pri) intfd_id; inttoken_id; intpri; => request a server of facility “fd_id” be reserved for the token designated by “token_id” with priority “pri” (higher is better) r=0: facility is reserved r=1: facility is busy and the request is blocked in the queue ordered on priority Facility Definition and Control
r = preempt(fd_id, tkn_id, pri) intfd_id, tkn_id, pri; => same as request() except that it will preempt the server if it is busy serving a task with priority < “pri” => the event record corresponding to the preempted token (for the service completion event) is removed from the event list and a queue entry with the residual time is created r=0: facility is reserved r=1: facility is busy and the request is blocked in the queue ordered on priority release(fd_id, tkn_id) intfd_id; inttkn_id; => release the facility and if the queue is not empty, reschedule an event with the event occurrence time at NOW for a blocked task, and reschedule an event with the event occurrence time at NOW+ the residual time for a preempted task. Facility Definition and Control create an event of the same type and put it in the event list
schedule(event_id, te, tkn_id) int event_id; real te; /* time interval relative to the current time */ int tkn_id; => schedule the event with id “event_id” to occur at NOW+te => this essentially inserts an event record with the event occurrence time NOW+te into the event list => part of the information in the event record is event_id, tkn_id and the event occurrence time NOW+te Example: schedule(2, 0.0, token_id) => schedule event type #2 associated with token id “token_id” to occur NOW Scheduling Events
cause(event_id, tkn_id) int *event_id; int *tkn_id; => remove the most imminent event from the event list and automatically advance the simulation clock to the event occurrence time => return the event number (type) and token id to the caller Typically in the smpl program, we use a select statement on the event_id returned, so as to transfer the control to the appropriate event routine. Timing Routine
cancel(event_id) int event_id; => search the event list and remove the first event with the event number = event_id Canceling Events Get Current Simulation Time t = real time() => return the current simulation clock value => real is a predefined type; it is the same as double in C
n= int inq(fd); => returning # of tokens currently in queue (not including the ones in service) r = int status(fd) => r=0: facility is free; r=1: facility is busy u = real U(fd) => mean # of tokens in service n = real Lq(fd) => mean # of tokens in queue excluding the ones in service b = real B(fd) => mean busy period = accumulated busy time/release counts Status Functions
r = real drand48(); /* available on UNIX machines */ => return r in the range of (0,1) r = real expntl(x) double x; => return an exponentially distributed sample value with mean x r = real uniform(a,b) double a,b; => return a real number r in the range of (a,b) k = int random(i,j) int i, j; => return an integer k in the range of (i,j) r = real normal(x,s) => return a normally distributed sample value with mean x and standard deviation s Random Variate Generation (rand.c)
trace(n) int n; => generate trace messages when a facility is defined, requested, or released, or whenever an event is scheduled or caused n=0: trace is off n=1; free-running, i.e., trace messages are generated continuously n=2: screen by screen running (press any key to resume tracing) n=3: message by message running (press any key to resume) Traces and Debugging
#include “smpl.h” main() { real Ta=200, Ts=100, te=200000; int customer=1, event, server; smpl(0, “M/M/1 Queue”); server = facility(“server”,1); schedule(1, 0.0, customer); while (time()< te) { cause(&event, &customer); switch(event) { case 1: /* arrival */ schedule(2,0.0, customer); schedule(1, expntl(Ta), customer); break; case 2: /* request server */ if (request(server, customer,0)==0) schedule(3, expntl(Ts), customer); break; case 3: /* completion */ release(server, customer); break; } } report(); } M/M/1 smpl program
Confidence Interval and Level • Suppose we collect N sample values Y1, Y2, …, YN from N simulation runs • sample mean Y = (Y1 + Y2 + …+ YN )/N • true mean is m • Define 1-a as the probability that the absolute value of the difference between the Y and m is equal to or less than H • that is, prob[ Y-H <= m <= Y+H] = 1- a Confidence interval half-width Confidence interval Confidence level
Confidence Interval and Level (cont.) • When Y1, Y2, …, YN are independent random variables from a normal distribution with the mean m, H is defined by H = ta/2;N-1* s/sqrt(N) where t is the student’s t distribution and s2 is the sample variance given by s2 = Si (Yi - Y)2 /(N-1) (and thus sis the standard deviation).
Batch Mean Analysis by smpl m m samples to generate Y1 samples to generate Y2 • Use a batch size m around 2000 observations to collect a sample value Yito justify the normal distribution assumption (by central limit theorem). • Delete d = 0.1 m initial observations • Collect k = 10 batches and compute the confidence interval half-width H • If the desired accuracy has not been reached, collect another batch and compute H again. Repeat as necessary.
BMA: stat.c and bmeans.c • Based on 95% confidence level (a = 0.05) with 10% confidence accuracy (H/Y = 10%) • The following three routines are provided: • init_bm(d, m): d is number of initial observations to be discarded and m is the number of observations to collect one sample Yi • obs(y): y is the observation value generated out of a simulation run • if the returning value is 1, it means that the required confidence level and accuracy have been reached; otherwise, need to continue calling this function obs(y) • civals(Y, H, k): Y, H and k are passed in by reference. This function returns the final result.
M/M/1 smpl program with BMA case 1: /* arrival */ ts[customer] = time(); schedule(2,0.0, customer); if (++tk_id >= TOKENS) tk_id=0; schedule(1,expntl(Ta),tk_id); break; case 2: /* request server */ if (request(server, customer,0)==0) schedule(3,expntl(Ts),customer); break; case 3: /* release server */ release(server, customer); if (obs(time()-ts[customer])== 1) cont = FALSE; break; } } /* end while */ civals(&mean, &hw, &nb); printf(”Y= %f; H= %f after %d batches\n”, mean, hw, nb); } #include “smpl.h” #define TOKENS 1000 #define TRUE 1 #define FALSE 0 main() { real Ta=200.0,Ts=100.0,mean,hw; inttk_id=0,customer=0,event,server,nb; real ts[TOKENS]; /* start time stamp */ intcont=TRUE; smpl(0,"M/M/1 Queue with BMA"); init_bm(200,2000); /* d=200; m=2000 */ server=facility("server",1); schedule(1,0.0,tk_id); while (cont) { cause(&event,&customer); switch(event) {
Bmeans.c printf("batch %2d mean = %.3f",k,smy); smy=0.0; n=0; /* reset batch variables */ if (k>=10) then { /* compute grand mean & half width */ Y=smY/k; var=(smY2-k*Y*Y)/(k-1); h=T(0.025,k-1)*sqrt(var/k); printf(", rel. HW = %.3f",h/Y); if (h/Y<=0.1) then r=1; } printf("\n"); } return(r); } civals(mean,hw,nb) real *mean,*hw; int *nb; { /* return batch means analysis results */ *mean=Y; *hw=h; *nb=k; } #include "smpl.h" #include "stat.c" static intd,k,m,n; static real smy,smY,smY2,Y, h; init_bm(m0,mb) int m0,mb; { /* set deletion amount & batch size */ d=m0; m=mb; smy=smY=smY2=0.0; k=n=0; } obs(y) real y; { int r=0; real var; if (d) then {d--; return(r);} smy+=y; n++; if (n==m) then { /* batch complete: update sums & counts */ smy/=n; smY+=smy; smY2+=smy*smy; k++;
Chap 2: Reliability and Availability Models Reliability R(t) = prob{S is fully functioning in [0,t]} Suppose from [0,t] time period, we measure out of N components, of which N0 (t): # of components operating correctly at time t Nf (t): # of components which have failed at time t Physical meaning: instantaneous rate at which components are failing
Failing rate of a single component How many unfailing components are there at time t ? N Nf(t+dt) - Nf(t) dt t time t+dt (2.3) text p.28 Z(t) is also called the hazard rate.
For electronic components, Z(t)’s relationship with respect to time is a bathtub curve. Burn-in period Z(t) High failure rate due to faulty design, manufacturing or assembly. Useful time phase Wear-out phase (due to aging) Infant mortality phase Weak components are removed during the “burn-in” period. Failure rate can be assumed to be a constant during the useful time period, say time
Exponential Failure Law e.g. = 0.01 hr-1, what is R(t) at t = 100 hrs? Ans: e-0.01*100 For hardware components, exponential failure law is frequently assumed. For software components, the reliability may grow as the software’s design faults are removed during the testing/debugging phase.
In general, we can assume Z(t) = ( t )-1 Weibull dist. R(t) = e- (t ) Z(t) Use this to model software failure rate >1 <1 e.g. = -1 R(t) = =1 -1 t et This yields t , R(t) 1 t 0, R(t) 0 Reliability improves as a function of t
Formal Definition of R(t): Let x be a .. representing the life of a system and let F be the cumulative distribution function (CDF) of x. Then, For a component obeying the exponential failure law
Mean Time to Failure (MTTF): The expected time that a system will operate before the first failure occurs i first failing time 1 t1 Identical systems N tN Discrete case Continuous case failure time Q: what is the reliability of a system obeying the exponential failure law at t = MTTF? Ans:
MTTR (Mean Time to Repair) If also assume a failed system obeys “Exponential Repair Law”, then Relationship between MTBF (Mean time between failure) , MTTR & MTTF: time MTTF MTTR MTTF MTTR Time of 2nd failure Time of first failure MTBF = MTTR + MTTF If MTTF >> MTTR Then MTBF MTTF
Availability Instantaneous (or point) Availability A(t) = prob {the system is functioning at time t} Regardless of the # of times it may have failed & been repaired during [0, t] A(t) A time t 0 Steady-State Availability = MTTF MTTF+MTTR For a system without repair, A(t) = R(t)
Assume exponential “failure” & “repair” law See also page 67, text chapter 4 O F Time domain: Po’(t) = - Po(t) + PF(t) PF’(t) = Po(t) - PF(t) with initial state Po(0) = 1 & PF(0) = 0 Laplace domain: SPo(S) - 1 = - Po(S) + PF(S) SPF(S) - 0 = Po(S) - PF(S) Po(0) PF(0)
LT F(t) L(F(t)) = f(S) Inverse LT 1/S 1/S2 n!/Sn+1 1/(S-a) 1 t tn eat Po(t) = + e-(+)t + + Physical meaning: Po(t) = prob {the system is functioning at time t } PF(t) = - e-(+)t + + + e-(+)t A(t) = + + Po(S) Similarly PF(S) Inverse LT to return to time domain
e.g. = 0.01 & = 0.1 1.0 0.9 0.90909... A(t) t Q1: unavailability? Ans: PF(t) Q2: R(t)? Still e-t Q3: Steady-state availability? Ans: A( t) = +
Modeling: Series-Parallel Reliability Block Diagrams A series-parallel block diagram represents the logical structure of a system with regard to how the reliabilities of its components affect the system reliability. Components are combined into blocks in series parallel or k-of-n configurations
1 1 2 n A. Serial system: each element of the system is required to function correctly for the system to function correctly. 1 2 3 n B. Parallel system: only one of several elements must be operational for the system to be operational. Assumptions: independent random variables perfect coverage so up to n-1 failures can be tolerated. 2
Computer 1 Interface 1 Display 1 Bus 1 Computer 2 Interface 2 Display 2 Bus 2 Parallel 1 2 3 4 Where Rj,i is for jth component of ith unit C. Combination of series & parallel systems e.g. Numerical ex: R=0.9 then Rsystem = [1-(1-0.9)2]4 = 0.96 v.s. Rnon-redundant = (0.9)4 = 0.6561
e.g. R1 R1 1 R2 R2 R2 2 R3 3 Rparallel = 1 - ( 1 - Rseries, 1 )( 1 - Rseries, 2 )( 1 - Rseries, 3 ) = 1 - ( 1 - R12 )( 1 - R23 )( 1 - R3 ) Q: Prove the following theorem: Replication at the component level is more effective than replication at the system level in improving system reliability using the same # of components. Ans: Show that is better than Assume R=1/2 for each component
D. k-out-of-n e.g. TMR ( Triple Module Redundancy ) is a 2-out-of-3 system. RTMR(t) = R1(t) * R2(t) * R3(t) + R1(t) R2(t) ( 1 - R3(t) ) + R1(t) R3(t) ( 1 - R2(t) ) + R2(t) R3(t) ( 1 - R1(t) ) when R1(t) = R2(t) = R3(t) = R(t) R2-out-of-3(t) = 3R2(t) - 2R3(t) all are functioning 1 failed & 2 are functioning In general, 3 e.g.
3 2 R2 - R + 0.5 = 0 R = 1.0 or 0.5 n n ! å - l - l - = - t i t n i R ( t ) ( e ) ( 1 e ) system - ( n i )! i ! = i k ¥ æ ö 1 1 1 ò \ = = + + ç ÷ MTTF R ( t ) dt ... system l n k è ø 0 Rsystem RTMR(t) Q: Is RTMR(t) > Rsystem with a single component(t) ? 1.0 Let 3R2 - 2R3 = R R(t) single RTMR(t) = R(t) when the reliability of a single module is 1.0 or 0.5 More realistic region In fact, when R(t) < 0.5, R(t) > RTMR(t) R(t) 1.0 0.5 Q: what is the MTTF of a k-out-of-n system when each single module follows the exponential failure law with a failure rate of ? e.g. 2-out-of-3: MTTF= 2-out-of-5: MTTF=
m1 P1 m2 P2 m3 MTTF = 6 0.00903 3 0.01042 6 0.01667 3 0.01806 2 0.02431 1 0.0257 - - - = = + + 226.09 Q: what is the reliability & MTTF of the following structure? Fig. 2.5, p36 A parallel system too Also a parallel system m = 0.00139/day ( 1-out-of-3 ) p = 0.00764/day ( 1-out-of-2 ) Rsystem(t) = [ 1 - ( 1 - e-0.00764t )2 ][ 1 - ( 1 - e-0.00139t )3 ] = 6e-0.00903t-3e-0.01042t-6e-0.01667t+3e-0.01806t+2e-0.02431t-e-0.0257t Recall that Repair rate Failure rate of component i Ai(t) = Equations & obtained above can also be used to compute 1 2 3 the system availability by replacing Ri(t) with Ai(t)
4 5 6 Assuming For steady state availability i i + i Use Ai( t ) = into equations above & 4 5 6 Specifically,
C D A B C E D C block name {( param-list )} optional < block line > end can be one of the following: Chap 9: Reliability & Availability Modeling Reliability Block Diagramusing sharpe P.358 Appendix B
2) parallel name name-1 name-2 {name3 name4 …} Name 1 The parallel system is assigned to the first name. Name 2 3) series name name-1 name-2 {name3 name4 …} 4) kofn name expression-1, expression-2, component-name The series system is assigned to the first name. Name 1 Name 2 k n identical components 1) comp name exponential-polynomial Referring to the cumulative dist function ( cdf ) F(t) = 1-e-t R(t) = e-t exp (lambda) meaning cdf (component-name{,state}{;arg-list}) which has been defined before gen triple 1, triple 2, ... in the form of (aj, kj, bj) or See p. 352
Ex: m1 Sharing memory: a k-out-of-n device CPU1 Fig. 2.5 (p. 36) m2 CPU2 m3 Output CDF for system 1-6e-0.00903t+3e-0.0104t+… (1-out-of-3) k=1.00 mean(system;k,3,…) = 2.26*102 rel (10,k,3…) = 9.9981*10-1 rel (365,k,3…) = 8.33*10-1 k=2.00 in semi symbolic form 5) kofn name k-expression, n-expression, name1 name2 {name3…} representing a k-out-of-n system having possibly different components Components do not have identical failure-time dist. block system ( k, n, pfrate, mfrate ) comp CPU exp ( pfrate ) comp mem exp ( mfrate ) parallel CPUs CPU CPU kofn mems k, n, mem series subsystem CPUs mems end
Comments: any line starting with the symbol “*” printing output. printing F(t) in symbolic form cdf (system;1, 3, 0.00139, 0.00764) define reliability function at time t func rel(t, k, n, pf, mf)\ 1-value (t; system; k, n, pf, mf) \ means continuation to the next line Returning F(t) at time t numerically loop k, 1, 3, 1 exprmean (system; k, 3, 0.00139, 0.00764) exprrel (10, k, 3, 0.00139, 0.00764) exprrel (365, k, 3, 0.00139, 0.00764) end end
=0.3 =0.05 C =0.1 D A B C E =0.01 D C =0.25 block block1a comp A exp(0.05) comp B exp(0.01) comp C exp(0.3) comp D exp(0.25) comp E exp(0.1) parallel threeC C C C parallel twoD D D series sys1 A B threeC twoD E end Fig. 9.1 p. 156 printing: 5 decimal places format 5 cdf cdf (block1a) expr 1-value(10; block1a) end Print 1-F(t)=R(t) at t=10
Availability Modeling See p.354 text on a user-defined distribution syntax: poly name(param-list) dist. gen\ triple\ triple of the form (aj, kj, bj ) polyunavail(mu, lambda)\ gen\ 1, 0, 0\ -mu/(lambda+mu), 0, 0\ -lambda/(lambda+mu), 0, -(lambda+mu) block block1a comp A unavail(muA, lambdaA) comp B unavail(muB, lambdaB) . . . end bindmuA 1 bindlambdaA0.0001 . To define 1-Ai(t) print steady state availability A() expr pinf(block1a) print instantaneous availability at t=100 expr 1-value(100; block1a) end
e.g. failure or M1 M2 M3 P1 P2 AND AND M1 M2 M3 P1 P2 Fault Trees P.39, chap.2 A pictorial representation of events that can cause the occurrence of an undesirable event. An event at a high level is reduced to a combination of lower level events by means of logic gates AND: when all fail, the failure event occurs. OR: when one fails, the failure event occurs. K out of n: when at least k out of n components fail, the failure event occurs.