The Square Histogram Method
We are given a histogram, with vertical bars having heights proportional to the
probability with which we want to produce a value indicated by the label at the
base.
A simple such histogram, layed flat, might be:
a:********************************32
b:**************************27
c:************************26
d:************12
e:***3
The idea is to cut the bars into
pieces then reassemble them into a square histogram, all heights equal, with
each final bar having a lower part, as well as an upper part indicating where
it came from. A single uniform random variable U can then be used to choose one
of the final bars and to indicate whether to use the lower or upper part. There
are many ways to do this cutting and reassembling; the simplest seems to be the
Robin Hood Algorithm: Take from richest to bring the poorest up to average.
STEP 1: The original (horizontal)
histogram, average "height" 20:
a:********************************
32
b:************************** 27
c:************************ 26
d:************ 12
e:*** 3
Take 17 from strip 'a' to bring
strip 'e' up to average. Record donor and use old 'poor' level to mark lower
part of donee:
a:*************** 15
b:**************************
27
c:************************ 26
d:************ 12
e:**|****************| 20 (a)
Then bring 'd'
up to average with donor 'b'. Record donor and use old 'poor' level to mark
lower part of donee:
a:*************** 15
b:****************** 19
c:************************ 26
d:***********|*******| 20 (b)
e:*****|*************| 20 (a)
Then bring 'a' up to average with
donor 'c'. Record donor and use old 'poor' level to mark lower part of donee:
a:***************|***| 20(c)
b:******************* 19
c:********************* 21
d:***********|*******| 20(b)
e:*****|*************| 20(a)
Finally, bring 'b' up to average
with donor 'c'. Record donor and use old 'poor' level to mark lower part of donee:
a:**************|****| 20(c)
b:******************|| 20(c)
c:*******************| 20
d:***********|*******| 20(b)
e:*****|*************| 20(a)
We now have a "squared
histogram", i.e., a rectangle with 4 strips of equal area, each strip with
two regions. A single uniform variate U can be used
to generate a,b,c,d,e with the required
probabilities, .32, .27, .26, .12 .06.
Setup: Make tables,
V[1]=a K[1]=c
T[1]=0+16/20
V[2]=b K[2]=c
T[2]=1+19/20
V[3]=c K[3]=c
T[3]=2+20/20
V[4]=d K[4]=b
T[4]=3+12/20
V[5]=e K[5]=a
T[5]=4+ 6/20
Generation Process:
Let j be the integer part of
1+5*U, with U uniform in (0,1). If U <
T[j] return V[j], else return V[K[j]]. In many
applications no V table is necessary: V[i]=i and the generating procedure
becomes If U < T[j] return j, else return K[j].
For more, visit the Web site: Modeling
& Simulation Resources.
References & Further
Aiello
W., S. Rajagopalan, and R. Venkatesan,
Design of practical and provably good random number generators, Journal of
Algorithms, 29, 358-389, 1998.
Dagpunar J., Principles of Random Variate Generation, Clarendon, 1988.
Fishman G.,
James, Fortran version of L'Ecuyer
generator, Comput. Phys. Comm., 60,
329-344, 1990.
Knuth D., The Art of Computer Programming,
Vol. 2, Addison-Wesley, 1998.
L'Ecuyer P., Efficient and portable combined random
number generators, Comm. ACM, 31, 742-749, 774, 1988.
L'Ecuyer P., Uniform random number generation, Ann.
Op. Res., 53, 77-120, 1994.
L'Ecuyer P., Random number generation. In Handbook on Simulation, J. Banks (ed.), Wiley, 1998.
Sobol' I., and Y. Levitan,
A pseudo-random number generator for personal computers, Computers &
Mathematics with Applications, 37(4), 33-40, 1999.
Tsang W-W., A decision tree algorithm for squaring the histogram in random
number generation, Ars Combinatoria,
23A, 291-301, 1987
We need
to test for both randomness as well as uniformity. The tests can be classified
in 2 categories: Empirical or statistical tests, and theoretical tests.
Theoretical tests deal with the properties of the generator used to create the
realization with desired distribution, and do not look at the number generated
at all. For example, we would not use a generator with poor qualities to
generate random numbers.
Statistical tests are based solely on the random observations produced.
Test for Randomness:
A. Test for independence:
Plot the xi realization vs xi+1.
If there is independence, the graph will not show any distinctive patterns at
all, but will be perfectly scattered.
B. Runs tests.(run-ups,
run-downs):
This is a direct test of the independence assumption. There are two test
statistics to consider: one based on a normal approximation and another using numerical approximations.
Test based on
Suppose you have N random realizations. Let a be the
total number of runs in a sequence. If the number of positive and negative runs
are greater than say 20, the distribution of a is
reasonably approximated by a Normal distribution with mean (2N - 1) /3 and (16N
- 29) / 90. Reject the hypothesis of independence or existence of runs if | Zo| > Z(1-alpha/2) where Zo is the Z score.
C. Correlation tests:
Do the random numbers exhibit discernible correlation? Compute the sample Autcorrelation Function.
Frequency or Uniform
Distribution Test:
Use Kolmogorov-Smirimov test to determine if the realizations
follow a U(0,1)
References
& Further Readings:
Headrick T., Fast fifth-order polynomial transforms
for generating univariate and multivariate nonnormal distributions, Computational Statistics and
Data Analysis, 40 (4), 685-711, 2002.
Karian Z., and E. Dudewicz,
Modern Statistical Systems and GPSS Simulation, CRC Press, 1998.
Kleijnen J., and W. van Groenendaal,
Simulation: A Statistical Perspective, Wiley, Chichester,
1992
Lewis P., and E. Orav, Simulation Methodology for
Statisticians, Operations Analysts, and Engineers, Wadsworth Inc., 1989
Madu Ch., and Ch-H. Kuei, Experimental Statistical
Designs and Analysis in Simulation Modeling, Greenwood Publishing Group,
1993.
Pang K., Z. Yang,
Robert C., and G. Casella,
Simulation in general is to
pretend that one deals with a real thing while really working with an
imitation. In operations research the imitation is a computer model of the
simulated reality. A flight simulator on a PC is also a computer model of some
aspects of the flight: it shows on the screen the controls and what the
"pilot" (the youngster who operates it) is supposed to see from the
"cockpit" (his armchair).
Why to use models? To fly a simulator is safer and cheaper than the real
airplane. For precisely this reason, models are used in industry commerce and military:
it is very costly, dangerous and often impossible to make experiments with real
systems. Provided that models are adequate descriptions of reality (they are
valid), experimenting with them can save money, suffering and even time.
When to use simulations? Systems that change with time, such as a
gas station where cars come and go (called dynamic systems) and involve
randomness. Nobody can guess at exactly which time the next car should
arrive at the station, are good candidates for simulation. Modeling complex
dynamic systems theoretically need too many simplifications and the emerging
models may not be therefore valid. Simulation does not require that many
simplifying assumptions, making it the only tool even in absence of randomness.
How to simulate? Suppose we are interested in a gas station. We may
describe the behavior of this system graphically by plotting the number of cars
in the station; the state of the system. Every time a car arrives the graph
increases by one unit while a departing car causes the graph to drop one unit.
This graph (called sample path), could be obtained from observation of a real
station, but could also be artificially constructed. Such artificial
construction and the analysis of the resulting sample path (or more sample
paths in more complex cases) consists of the
simulation.
Types of simulations:
Discrete event. The above sample path consisted of only horizontal and
vertical lines, as car arrivals and departures occurred at distinct points of
time, what we refer to as events. Between two consecutive events, nothing
happens - the graph is horizontal. When the number of events
are finite, we call the simulation "discrete event."
In some systems the state changes
all the time, not just at the time of some discrete events. For example, the
water level in a reservoir with given in and outflows may change all the time.
In such cases "continuous simulation" is more appropriate, although
discrete event simulation can serve as an approximation.
Further
consideration of discrete event simulations.
How is simulation performed? Simulations may be performed manually. Most often,
however, the system model is written either as a computer program (for an
example click here) or as some kind of input into simulator software.
System terminology:
State: A variable characterizing
an attribute in the system such as level of stock in inventory or number of
jobs waiting for processing.
Event: An occurrence at a point in
time which may change the state of the system, such as arrival of a customer or
start of work on a job.
Entity: An object that passes
through the system, such as cars in an intersection or orders in a factory.
Often an event (e.g., arrival) is associated with an entity (e.g., customer).
Queue: A queue is not only a
physical queue of people, it can also be a task list, a buffer of finished
goods waiting for transportation or any place where entities are waiting for
something to happen for any reason.
Creating:
Creating is causing an arrival of a new entity to the system at some point in
time.
Scheduling:
Scheduling is the act of assigning a new future event to an existing entity.
Random variable: A random variable
is a quantity that is uncertain, such as interarrival
time between two incoming flights or number of defective parts in a shipment.
Random variate:
A random variate is an artificially generated random
variable.
Distribution: A distribution is
the mathematical law which governs the probabilistic features of a random
variable.
A Simple Example:
Building a simulation
gas station with a single pump served by a single service man. Assume that arrival of cars as well their service times
are random. At first identify the:
states: number of cars waiting for service and number of cars
served at any moment
events: arrival of cars, start of service, end of service
entities: these are the cars
queue: the queue of cars in front of the pump, waiting for
service
random realizations: interarrival
times, service times
distributions: we shall assume exponential distributions for both the interarrival time and service time.
Next, specify what to do at each
event. The above example would look like this: At event of entity arrival:
Create next arrival. If the server is free, send entity for start of service.
Otherwise it joins the queue. At event of service start: Server becomes
occupied. Schedule end of service for this entity. At
event of service end: Server becomes free. If any entities waiting in queue:
remove first entity from the queue; send it for start of service.
Some initiation is still required,
for example, the creation of the first arrival. Lastly, the above is translated
into code. This is easy with an appropriate library which has subroutines for
creation, scheduling, proper timing of events, queue
manipulations, random variate generation and
statistics collection.
How to simulate? Besides the
above, the program records the number of cars in the system before and after
every change, together with the length of each event.
Development of Systems Simulation
Discrete event systems (DES) are dynamic systems which evolve in time by the
occurrence of events at possibly irregular time intervals. DES abound in real-world applications. Examples include traffic
systems, flexible manufacturing systems, computer-communications systems,
production lines, coherent lifetime systems, and flow networks. Most of these
systems can be modeled in terms of discrete events whose occurrence causes the
system to change from one state to another. In designing, analyzing and
operating such complex systems, one is interested not only in performance
evaluation but also in sensitivity analysis and optimization.
A typical stochastic system has a
large number of control parameters that can have a significant impact on the
performance of the system. To establish a basic knowledge of the behavior of a
system under variation of input parameter values and to estimate the relative
importance of the input parameters, sensitivity analysis applies small changes
to the nominal values of input parameters. For systems simulation, variations
of the input parameter values cannot be made infinitely small. The sensitivity
of the performance measure with respect to an input parameter is therefore
defined as (partial) derivative.
Sensitivity
analysis is concerned with evaluating sensitivities (gradients, Hessian, etc.)
of performance measures with respect to parameters of interest. It provides
guidance for design and operational decisions and plays a pivotal role in
identifying the most significant system parameters, as well as bottleneck
subsystems. I have carried out research in the fields of sensitivity analysis
and stochastic optimization of discrete event systems with an emphasis on
computer simulation models. This part of lecture is dedicated to the estimation
of an entire response surface of complex discrete event systems (DES) from a
single sample path (simulation), such as the expected waiting time of a
customer in a queuing network, with respect to the controllable parameters of
the system, such as service rates, buffer sizes and routing probabilities. With
the response surfaces at hand, we are able to perform sensitivity analysis and
optimization of a DES from a single simulation, that is, to find the optimal
parameters of the system and their sensitivities (derivatives), with respect to
uncontrollable system parameters, such as arrival rates in a queuing network.
We identified three distinct processes. Descriptive Analysis includes: Problem
Identification & Formulation, Data Collection and Analysis, Computer
Simulation Model Development, Validation, Verification and Calibration, and
finally Performance Evaluation. Prescriptive Analysis: Optimization or Goal
Seeking. These are necessary components for Post-prescriptive Analysis:
Sensitivity, and What-If Analysis. The prescriptive simulation attempts to use
simulation to prescribe decisions required to obtain specified results. It is
subdivided into two topics- Goal Seeking and Optimization. Recent developments
on "single-run" algorithms for the needed sensitivities (i.e.
gradient, Hessian, etc.) make the prescriptive simulation feasible.
Problem Formulation: Identify controllable and uncontrollable inputs. Identify
constraints on the decision variables. Define measure of system performance and
an objective function. Develop a preliminary model structure to interrelate the
inputs and the measure of performance.