02.1 - Input Data Analysis - DATS

02.1 - Input Data Analysis - DATS

In stochastic simulations, we must decide on input probability distributions from which to generate random variables:

random variables

Random variables are observation or draws or realizations of a random variable from specified input distributions/processes

Once those input probability distributions are specified, must have a way to generate random variates from them

Specifying Univariate Input Distributions

Usually, we have real-world observed data. We want to fit a probability distribution to the observed data.

Once this is done, we can generate random variates to drive the simulation.

Choosing Probability Distributions

There are many probability distributions, continuous and discrete.

For a comprehensive list of the main distributions, visit: https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

Some common continuous distributions:

Some common discrete distributions:

It is still needed to estimate distribution parameters (fit), then test the goodness of fit

Fitting distribution to data

The first step is always to observe the data. Some things to look at are:

Then, steps to take are:

With working sample:

With test sample

There are several distribution-fitting packages in R, provided in MINITAB.

Other resources in R Studio:

Coefficient of variation

The coefficient of variation of a given distribution is defined as:
CX=σXμX
where:

It's a measure of how far the selected distribution for the sample data relies from the exponential distribution (CX=1.

In practice:

Descriptive statistics methods in R

Auxiliary Graphic Tools

Probability Plot - P Plot

probability plot

A Probability Plot is a graphical comparison of an estimate of the true distribution function of the available data X1,X2,...,Xn with the distribution function of the fitted distribution.

Probability-Probability Plot

A Probability-Probability Plot (PP Plot) is a graph of the model probabiity F^(Xi) versus the sample probability
Fn~(Xi)=qi=i0.5ni=1,2,...,n

If F^(x) and F~(x) are close together, then the P-P plot will also be approximately linear with an intercept 0 and a slope 1.

The linear correlation coefficient of the fit of the PP Plot is a measurement of the goodness of fit of the proposed distribution.

The PP plot graphs 2 funzioni di ripartizione against each other: the theoretical on the x-axis, and the empirical on the y-axis.

Quantile-Quantile Plot - QQ Plot

Quantile-Quantile plot - QQ plot

The QQ plot is used to see how well a particular data sample follows a particular theoretical distribution.

quantile-quantile plot (qq plot)

A Quantile-Quantile Plot (QQ Plot) is a graph of the standard model quantiles F1^(qi) where
qi=i0.5ni=1,2,...,n
versus x(i),i=1,2,...,n, so that x(1)<x(2)<<x(n) is the ordered sample data.

If F^1(qi) and x(n) are close together than the QQ plot will also be approximately linear with an intercept meaning location and a slope meaning location. Also, it is a measurement of the goodness of fit of the proposed distribution.