Documentation for Maple Statistics Package

Holy Cross Mathematics and Computer Science

MATH 375-6, Probability and Statistics Maple Package

Downloading the package

The Maple package is contained in a single file called MSP.map. These routines can be used on any computer with Maple 7 or later installed. This includes the Sun lab machines in Swords 219, and the PC lab machines in Beaven 335D. To get a copy for yourself, click on the link for the package from the course homepage in your web browser. This will display the Maple source code as text. From the FILE menu, select SAVE AS, and supply the path and filename you want for your copy.

If you are at one of the Beaven 335D PC's and plan to use the package there, save the file to your campus network P: drive.
If you are on a SunRay in Swords 219 and plan to use the package there, save the file to your Sun network account.

Reading the package into Maple

Suppose you have saved the package in a file called StatPac.txt in your top-level directory (folder) on either the campus network or the department Sun network. After you have logged on and launched Maple, to use these procedures, you will need to execute a command of the form:

read "p:StatPac.txt"; on the campus network (HA 408)
read "StatPac.txt"; on the Sun network (SW 219)

If you are on the Sun network and you haven't saved the package, you can also load it directly from Prof. Little's web directory by entering:

read "http://mathcs.holycross.edu/~little/ProbStat0506/MSP.map";

Unfortunately, this apparently doesn't work from the PC's on the campus Novell network.

Some relevant Maple information

Many Maple commands compute values and assign the results to a name. The format for the Maple command that does this is Name := expression. The := is called the assignment operator. The expression on the right is evaluated, and the result is assigned to the Name on the left.

Many of the procedures in the package either take as input, and/or produce as output, lists of numbers. A Maple list is an ordered list, enclosed in square brackets ( [ , ] ), with the items separated by commas. For instance, [2.3,2.4,5.6,1.2,0.9] is a Maple list with 5 entries. A list can be treated as a single object, assigned to a variable (to give the list a "name"), etc. For instance the Maple command XList:=[2.3,2.4,5.6,1.2,0.9]; assigns the list from before to the "name" XList. To refer to that list then, you can just use the name XList.

The items within a list can be accessed using subscript notation. For example, XList[3] is the third item in the list, the number 5.6.

The number of items in a list can be determined with the builtin Maple function nops. For instance, if we executed the command nops(XList);, Maple would print the output: 5.

For operations involving some or all of the items in a list, the Maple for loop structure is extremely useful. This loop is similar to the count-controlled loops provided in almost all programming languages (like BASIC, Pascal, C++, etc.) The syntax is

for "counter" from "start" to "finish" by "increment" do "body of loop" end do;

The words for, from, to, by, do, end do are Maple "reserved words"; they have particular fixed meanings and must appear exactly as shown.
The counter is a Maple variable (it can be named anything you want)
start, finish, and increment can be fixed numbers or Maple expressions that evaluate to fixed numbers. For simplicity in the following explanation, we will assume start <= finish and increment > 0. That is not necessary, though.
The "body of loop" can be any sequence of Maple commands.
The way the loop works is this: the counter is set to the value given by start; if start <= finish then the commands in the body of the loop are executed (once), and value of counter is changed to previous value, plus increment. Then the test start <= finish is repeated, and if this is true the commands in the body of the loop are executed (once more). The loop continues until counter exceeds finish.
from "start" is optional. If you don't specify start, the value start = 1 is assumed.
by "increment" is also optional. If you don't specify that, the value increment = 1 is assumed.

In other cases, we may want to select out some of the entries in a list based on some criterion and only do an action on those. The if structure does this:

if "condition" then "action1" else "action2" end if;
This represents a simple "two-way branch". If the "condition" is true when this statement is reached in the execution of the program, the "action1" is performed. If it is false then the "action2" is performed. In either case, the execution continues from the statement after the end if. The else "action2" clause is optional here. If it is omitted, the result is to do "action1" if the "condition" is true, and to do nothing if it is false.

For example, suppose we wanted to find the sum of the strictly positive entries in XList. We could say:

Sum:=0; for i to 5 do if XList[i] > 0 then Sum:=Sum+XList[i]; end if; end do;
When the loop is complete, the variable Sum will contain the sum of the entries. If we didn't know how many entries XList had, but we knew we wanted to sum them all, we could say
Sum:=0; for i to nops(XList) do Sum:=Sum+XList[i]; end do;

Many other features of Maple programming are illustrated by the procedures in the package. If you're interested, take a look. I will be happy to explain anything you are curious about.

The procedures currently in the package

Some first descriptive statistics
1. Range -- computes the range of a list of numbers (maximum minus minimum)
2. Mean -- computes the mean of a list
3. Variance -- computes the variance of a list
4. StandardDeviation -- computes the standard deviation of a list
5. Skewness -- computes the normalized 3rd moment about the mean
6. Kurtosis -- computes the normalized 4th moment about the mean
  
  This first batch all work the same way. To use them, you put the name of the procedure in a Maple command, followed by the list of numbers you want to apply the command to (or its name), in parentheses. Usage example: Variance(XList);. The output will be the corresponding statistic for the input list.
7. Percentile -- computes the 100*pth percentile of the data in a list of numbers. The input is the name of the list, followed by the value of p (a number between 0 and 1). Usage example: Percentile(XList,.75); computes the 75th percentile value for the data.
Random numbers and samples from given distributions
1. RandomNumbers -- input a positive integer n; output a list of n uniformly distributed (pseudo-)random numbers in the interval [0,1]. Usage example: RandomNumbers(10);
2. DieRoll -- input two positive integers n=number of rolls,m=number of faces on die; output a list of numbers in set {1,..,m} representing n rolls of a fair m-sided die. Usage example: DieRoll(10,6); gives 10 rolls of a standard 6-sided die.
3. FSample -- A general procedure for generating random samples from a distribution with given probability density function (PDF), f(y), supported on a finite interval a <= y <= b. Input is the cumulative distribution function (CDF), F(y), corresponding to f(y), the endpoints a,b and the size n of the sample. Usage example: FSample(F,1,4,100); generates a sample of size 100 from the distribution with CDF F(y) on the interval 1 <= y <= 4. (Caveats: No checking is done to see whether F is in fact a valid CDF. In addition, the CDF must be invertible on the interval for this procedure to work correctly -- that is, for each z with 0 <= z <= 1, there should be only one solution of the equation F(y) = z in the interval a <= y <= b.)
4. UniformSample -- input the endpoints a,b of an interval on the real line, and a number n of points. Output will be a sample of size n from the uniform distribution on [a,b]. Usage example: UniformSample(1,4,100); gives a sample of 100 points from the uniform distribution on [1,4].
5. NormalSample -- input the mean and standard deviation of a normal (Gaussian) distribution, and a number of points n. Output will be a sample of size n from the normal distribution. Usage example: NormalSample(0,1,100); gives a sample of 100 points from the normal distribution with mean 0 and standard deviation 1.
6. ExponentialSample -- similar to NormalSample, input is the parameter lambda of the exponential distribution and n. Usage example: ExponentialSample(2,100); gives a sample of 100 points from the exponential distribution with parameter lambda=2.
7. ChiSquareSample -- similar to NormalSample, input is the number of degrees of freedom, nu, of the chi-square distribution, and the number n. Usage example: ChiSquareSample(2,100); gives a sample of 100 points from the chi-square distribution with 2 degrees of freedom.
8. GammaSample -- similar to NormalSample, input is the parameters alpha and beta of the gamma distribution, and the number n. Usage example: GammaSample(2,4,100); gives a sample of 100 points from the gamma distribution with alpha = 2 and beta = 4.
9. HypergeometricSample -- input the parameters N, n, r of the hypergeometric distribution (in that order), followed by a number of samples. Usage example: HypergeometricSample(20,6,8,1000); gives a sample of 1000 points from the hypergeometric distribution with parameters N=20,n=6,r=8. The output is a list of integers in the range 0..n, though in case r < n, no values bigger than r will be generated(!)
10. Frequencies -- input a list X, endpoints of an interval a,b, and a number of intervals. The interval [a,b] is divided into n equal pieces and the number of entries from the list X in each is counted. Output is the list of frequencies. Usage example: Frequencies(XList,0,2,10); subdivides [0,2] into 10 equal subintervals and counts frequencies.
PDF's (probability density functions) and CDF's (cumulative distribution functions)

The following PDF's and CDF's are currently implemented:

BetaPDF,BinomialPDF,ChiSquarePDF,ExponentialPDF,FPDF, GammaPDF,HypergeometricPDF,PoissonPDF,TPDF,UniformPDF,WeibullPDF, BetaCDF,BinomialCDF,ChiSquareCDF,ExponentialCDF,FCDF, GammaCDF,HypergeometricCDF,PoissonCDF,TCDF,UniformCDF. Each takes one or more inputs corresponding to the parameters of the corresponding distribution (always come first), and the independent variable (last). For example, the inputs for ChiSquarePDF are the number of degrees of freedom, nu and the independent variable x. A call to that procedure like ChiSquarePDF(4,3.4) gives the value of the density function at x = 3.4. If you want to plot one of the pdf's or cdf's, use the following method. Usage example: plot(x -> NormalPDF(0,2,x),-4..4); will generate a plot of the normal density function with mean = 0, standard deviation = 2, on the interval -4..4.
Graphical routines
1. Hist -- plots a relative frequency histogram of the data in a list, on a given interval, with a given number of "bins". Usage example: Hist(XList,0,4,7); generates the histogram for XList on [0,4] with 7 equal "bins" (subdivisions). Note: if some of the data points are outside the interval, a warning is generated and only the points in the interval are used.
2. NormHist -- same idea as Hist, except heights of boxes in the histogram are scaled so that total area is 1 (for comparison with a theoretical or empirical pdf). Usage example: NormHist(XList,0,4,7); generates the normalized histogram for XList on [0,4] with 7 equal "bins" (subdivisions). Note: if some of the data points are outside the interval, a warning is generated and only the points in the interval are used.
3. ScatterPlot -- generates a scatterplot (point plot) for the data points represented by two input lists -- first is list of x-coordinates (abcissas), second is list of y-coordinates (ordinates). Usage example: ScatterPlot(XList,YList). Note: The plotting window is determined automatically and will always be large enough to show all the given points.
4. PlotEmpiricalPDF -- generates an approximation to the density function for a given input list X of numbers. (This is essentially the same as the relative frequency histogram, but scaled vertically so the total area under the graph is 1.) Usage example: PlotEmpiricalPDF(XList); The endpoints of the interval plotted and the number of subintervals can also be specified: PlotEmpiricalPDF(XList,-3,3,20); uses the interval [-3,3] and 20 "bins" for the histogram.
5. PlotEmpiricalCDF -- generates an approximation to the cumulative distribution function for a given input list X of numbers, on the interval [a,b]. Usage example: PlotEmpiricalCDF(XList,-3,3);
6. BoxWhisker -- generates a ``box-and-whisker'' plot for one or more lists. For each list, a graphical display is generated showing a thicker central ``box'' with vertical line segments drawn at the 25th, 50th, and 75th percentile values, and thin ``whiskers'' extending past the box to the minimum and the maximum of the data values. If more than one list is given as input, the box-and-whisker plots will be stacked vertically in the plot, with the first list at the bottom, etc. Usage example: BoxWhisker(XList,YList,ZList); would generated three stacked box-and-whisker plots.
Confidence Intervals
1. MeanLSCI -- computes the endpoints of an approximate large-sample (1-alpha) x 100% confidence interval for the mean of a population based on sample mean of data supplied. Usage example: MeanLSCI(XList,.05); finds 95% confidence interval for mean.
2. CIPlot -- Generates 100 random samples of size N from a normal population with mean μ and standard deviation σ. From each sample, the α-level large-sample confidence interval for the mean is computed, and the intervals are displayed relative to the population mean (stacked vertically, with intervals containing μ colored black, and intervals not containing μ colored red). The number of intervals containing the population mean μ is also computed and printed out. Usage example: CIPlot(10,3,.05,40); generates 100 95% confidence intervals for the mean, each using a sample of size N = 40 from a normal population with μ=10 and σ = 3.
Games of Chance
1. Craps -- Simulates any number of games of the dice game Craps. There are two inputs: n the number of games, and verbose which controls how much output is generated. The output is a list of n 0s and 1s (0 = loss in one game, 1 = win in one game). If verbose is true, the list of rolls in each game is shown; otherwise only the list of outcomes is printed. Usage example: Craps(10,true); simulates 10 games and prints out the rolls in each of the 10 games.

As more procedures are added through the year, this documentation will be updated.

To my personal homepage
To the Math homepage
To the Holy Cross homepage

Last modified: September 18, 2009