The Square Histogram Method
We are given a histogram, with vertical bars having heights proportional to the probability with which we want to produce a value indicated by the label at the base.

A simple such histogram, layed flat, might be:

a:********************************32

b:**************************27

c:************************26

d:************12

e:***3

The idea is to cut the bars into pieces then reassemble them into a square histogram, all heights equal, with each final bar having a lower part, as well as an upper part indicating where it came from. A single uniform random variable U can then be used to choose one of the final bars and to indicate whether to use the lower or upper part. There are many ways to do this cutting and reassembling; the simplest seems to be the Robin Hood Algorithm: Take from richest to bring the poorest up to average.

STEP 1: The original (horizontal) histogram, average "height" 20:

a:******************************** 32

b:**************************       27

c:************************         26

d:************                     12

e:***                               3

Take 17 from strip 'a' to bring strip 'e' up to average. Record donor and use old 'poor' level to mark lower part of donee:

a:***************            15

b:************************** 27

c:************************   26

d:************               12

e:**|****************|       20 (a)

Then bring 'd' up to average with donor 'b'. Record donor and use old 'poor' level to mark lower part of donee:

a:***************           15

b:******************        19

c:************************  26

d:***********|*******|      20 (b)

e:*****|*************|      20 (a)

Then bring 'a' up to average with donor 'c'. Record donor and use old 'poor' level to mark lower part of donee:

a:***************|***|      20(c)

b:*******************       19

c:*********************     21

d:***********|*******|      20(b)

e:*****|*************|      20(a)

Finally, bring 'b' up to average with donor 'c'. Record donor and use old 'poor' level to mark lower part of donee:

a:**************|****|      20(c)

b:******************||      20(c)

c:*******************|      20

d:***********|*******|      20(b)

e:*****|*************|      20(a)

We now have a "squared histogram", i.e., a rectangle with 4 strips of equal area, each strip with two regions. A single uniform variate U can be used to generate a,b,c,d,e with the required probabilities, .32, .27, .26, .12 .06.

Setup: Make tables,

 

 V[1]=a    K[1]=c    T[1]=0+16/20

 V[2]=b    K[2]=c    T[2]=1+19/20

 V[3]=c    K[3]=c    T[3]=2+20/20

 V[4]=d    K[4]=b    T[4]=3+12/20

 V[5]=e    K[5]=a    T[5]=4+ 6/20

Generation Process:

Let j be the integer part of 1+5*U, with U uniform in (0,1). If U < T[j] return V[j], else return V[K[j]]. In many applications no V table is necessary: V[i]=i and the generating procedure becomes If U < T[j] return j, else return K[j].

For more, visit the Web site: Modeling & Simulation Resources.

References & Further Readings:
Aiello W., S. Rajagopalan, and R. Venkatesan, Design of practical and provably good random number generators, Journal of Algorithms, 29, 358-389, 1998.
Dagpunar J., Principles of Random Variate Generation, Clarendon, 1988.
Fishman G., Monte Carlo, Springer, 1996.
James, Fortran version of L'Ecuyer generator, Comput. Phys. Comm., 60, 329-344, 1990.
Knuth D., The Art of Computer Programming, Vol. 2, Addison-Wesley, 1998.
L'Ecuyer P., Efficient and portable combined random number generators, Comm. ACM, 31, 742-749, 774, 1988.
L'Ecuyer P., Uniform random number generation, Ann. Op. Res., 53, 77-120, 1994.
L'Ecuyer P., Random number generation. In Handbook on Simulation, J. Banks (ed.), Wiley, 1998.
Maurer U., A universal statistical test for random bit generators, J. Cryptology, 5, 89-105, 1992.
Sobol' I., and Y. Levitan, A pseudo-random number generator for personal computers, Computers & Mathematics with Applications, 37(4), 33-40, 1999.
Tsang W-W., A decision tree algorithm for squaring the histogram in random number generation, Ars Combinatoria, 23A, 291-301, 1987


Test for Randomness

We need to test for both randomness as well as uniformity. The tests can be classified in 2 categories: Empirical or statistical tests, and theoretical tests.
Theoretical tests deal with the properties of the generator used to create the realization with desired distribution, and do not look at the number generated at all. For example, we would not use a generator with poor qualities to generate random numbers.
Statistical tests are based solely on the random observations produced.

Test for Randomness:

A. Test for independence:
Plot the xi realization vs xi+1. If there is independence, the graph will not show any distinctive patterns at all, but will be perfectly scattered.

B. Runs tests.(run-ups, run-downs):
This is a direct test of the independence assumption. There are two test statistics to consider: one based on a normal approximation and another using numerical approximations.

Test based on Normal approximation:
Suppose you have N random realizations. Let a be the total number of runs in a sequence. If the number of positive and negative runs are greater than say 20, the distribution of a is reasonably approximated by a Normal distribution with mean (2N - 1) /3 and (16N - 29) / 90. Reject the hypothesis of independence or existence of runs if | Zo|
> Z(1-alpha/2) where Zo is the Z score.

C. Correlation tests:
Do the random numbers exhibit discernible correlation? Compute the sample Autcorrelation Function.

Frequency or Uniform Distribution Test:
Use Kolmogorov-Smirimov test to determine if the realizations follow a U(0,1)

References & Further Readings:
Headrick T., Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions, Computational Statistics and Data Analysis, 40 (4), 685-711, 2002.
Karian Z., and E. Dudewicz, Modern Statistical Systems and GPSS Simulation, CRC Press, 1998.
Kleijnen J., and W. van Groenendaal, Simulation: A Statistical Perspective, Wiley, Chichester, 1992
Lewis P., and E. Orav, Simulation Methodology for Statisticians, Operations Analysts, and Engineers, Wadsworth Inc., 1989
Madu Ch., and Ch-H. Kuei, Experimental Statistical Designs and Analysis in Simulation Modeling, Greenwood Publishing Group, 1993.
Pang K., Z. Yang, S. Hou, and P. Leung, Non-uniform random variate generation by the vertical strip method, European Journal of Operational Research, 142(3), 595-609, 2002.
Robert C., and G. Casella, Monte Carlo Statistical Methods, Springer, 1999.


Modeling & Simulation

Simulation in general is to pretend that one deals with a real thing while really working with an imitation. In operations research the imitation is a computer model of the simulated reality. A flight simulator on a PC is also a computer model of some aspects of the flight: it shows on the screen the controls and what the "pilot" (the youngster who operates it) is supposed to see from the "cockpit" (his armchair).

Why to use models? To fly a simulator is safer and cheaper than the real airplane. For precisely this reason, models are used in industry commerce and military: it is very costly, dangerous and often impossible to make experiments with real systems. Provided that models are adequate descriptions of reality (they are valid), experimenting with them can save money, suffering and even time.

When to use simulations? Systems that change with time, such as a gas station where cars come and go (called dynamic systems) and involve randomness. Nobody can guess at exactly which time the next car should arrive at the station, are good candidates for simulation. Modeling complex dynamic systems theoretically need too many simplifications and the emerging models may not be therefore valid. Simulation does not require that many simplifying assumptions, making it the only tool even in absence of randomness.

How to simulate? Suppose we are interested in a gas station. We may describe the behavior of this system graphically by plotting the number of cars in the station; the state of the system. Every time a car arrives the graph increases by one unit while a departing car causes the graph to drop one unit. This graph (called sample path), could be obtained from observation of a real station, but could also be artificially constructed. Such artificial construction and the analysis of the resulting sample path (or more sample paths in more complex cases) consists of the simulation.

Types of simulations:

Discrete event. The above sample path consisted of only horizontal and vertical lines, as car arrivals and departures occurred at distinct points of time, what we refer to as events. Between two consecutive events, nothing happens - the graph is horizontal. When the number of events are finite, we call the simulation "discrete event."

In some systems the state changes all the time, not just at the time of some discrete events. For example, the water level in a reservoir with given in and outflows may change all the time. In such cases "continuous simulation" is more appropriate, although discrete event simulation can serve as an approximation.

Further consideration of discrete event simulations.

How is simulation performed? Simulations may be performed manually. Most often, however, the system model is written either as a computer program (for an example click here) or as some kind of input into simulator software.

System terminology:

State: A variable characterizing an attribute in the system such as level of stock in inventory or number of jobs waiting for processing.

Event: An occurrence at a point in time which may change the state of the system, such as arrival of a customer or start of work on a job.

Entity: An object that passes through the system, such as cars in an intersection or orders in a factory. Often an event (e.g., arrival) is associated with an entity (e.g., customer).

Queue: A queue is not only a physical queue of people, it can also be a task list, a buffer of finished goods waiting for transportation or any place where entities are waiting for something to happen for any reason.

Creating: Creating is causing an arrival of a new entity to the system at some point in time.

Scheduling: Scheduling is the act of assigning a new future event to an existing entity.

Random variable: A random variable is a quantity that is uncertain, such as interarrival time between two incoming flights or number of defective parts in a shipment.

Random variate: A random variate is an artificially generated random variable.

Distribution: A distribution is the mathematical law which governs the probabilistic features of a random variable.

A Simple Example:

Building a simulation gas station with a single pump served by a single service man. Assume that arrival of cars as well their service times are random. At first identify the:

states: number of cars waiting for service and number of cars served at any moment

events: arrival of cars, start of service, end of service

entities: these are the cars

queue: the queue of cars in front of the pump, waiting for service

random realizations: interarrival times, service times

distributions: we shall assume exponential distributions for both the interarrival time and service time.

Next, specify what to do at each event. The above example would look like this: At event of entity arrival: Create next arrival. If the server is free, send entity for start of service. Otherwise it joins the queue. At event of service start: Server becomes occupied. Schedule end of service for this entity. At event of service end: Server becomes free. If any entities waiting in queue: remove first entity from the queue; send it for start of service.

Some initiation is still required, for example, the creation of the first arrival. Lastly, the above is translated into code. This is easy with an appropriate library which has subroutines for creation, scheduling, proper timing of events, queue manipulations, random variate generation and statistics collection.

How to simulate? Besides the above, the program records the number of cars in the system before and after every change, together with the length of each event.


Development of Systems Simulation


Discrete event systems (DES) are dynamic systems which evolve in time by the occurrence of events at possibly irregular time intervals. DES abound in real-world applications. Examples include traffic systems, flexible manufacturing systems, computer-communications systems, production lines, coherent lifetime systems, and flow networks. Most of these systems can be modeled in terms of discrete events whose occurrence causes the system to change from one state to another. In designing, analyzing and operating such complex systems, one is interested not only in performance evaluation but also in sensitivity analysis and optimization.

A typical stochastic system has a large number of control parameters that can have a significant impact on the performance of the system. To establish a basic knowledge of the behavior of a system under variation of input parameter values and to estimate the relative importance of the input parameters, sensitivity analysis applies small changes to the nominal values of input parameters. For systems simulation, variations of the input parameter values cannot be made infinitely small. The sensitivity of the performance measure with respect to an input parameter is therefore defined as (partial) derivative.

Sensitivity analysis is concerned with evaluating sensitivities (gradients, Hessian, etc.) of performance measures with respect to parameters of interest. It provides guidance for design and operational decisions and plays a pivotal role in identifying the most significant system parameters, as well as bottleneck subsystems. I have carried out research in the fields of sensitivity analysis and stochastic optimization of discrete event systems with an emphasis on computer simulation models. This part of lecture is dedicated to the estimation of an entire response surface of complex discrete event systems (DES) from a single sample path (simulation), such as the expected waiting time of a customer in a queuing network, with respect to the controllable parameters of the system, such as service rates, buffer sizes and routing probabilities. With the response surfaces at hand, we are able to perform sensitivity analysis and optimization of a DES from a single simulation, that is, to find the optimal parameters of the system and their sensitivities (derivatives), with respect to uncontrollable system parameters, such as arrival rates in a queuing network. We identified three distinct processes. Descriptive Analysis includes: Problem Identification & Formulation, Data Collection and Analysis, Computer Simulation Model Development, Validation, Verification and Calibration, and finally Performance Evaluation. Prescriptive Analysis: Optimization or Goal Seeking. These are necessary components for Post-prescriptive Analysis: Sensitivity, and What-If Analysis. The prescriptive simulation attempts to use simulation to prescribe decisions required to obtain specified results. It is subdivided into two topics- Goal Seeking and Optimization. Recent developments on "single-run" algorithms for the needed sensitivities (i.e. gradient, Hessian, etc.) make the prescriptive simulation feasible.

Problem Formulation: Identify controllable and uncontrollable inputs. Identify constraints on the decision variables. Define measure of system performance and an objective function. Develop a preliminary model structure to interrelate the inputs and the measure of performance.