Holy Cross Mathematics and Computer Science



Mathematics 375 -- Probability Theory

Syllabus Fall 2011

Professor: John Little
Office: Swords 331
Office Phone: 793-2274
Office Hours: MF 10 - 12noon, W 3 - 5pm, TR 1 - 2pm, and by appointment
Email: little@mathcs.holycross.edu (preferred), or jlittle@holycross.edu
Course Homepage: http://mathcs.holycross.edu/~little/ProbStat1112/PS1.html

Course Description

This is the first half of the year-long Probability Theory - Mathematical Statistics sequence (MATH 375-376). These courses together satisfy the depth requirement for mathematics majors; they count as Applied Mathematics courses for the breadth requirement.

Statistics is the branch of the mathematical sciences that deals with with the collection, analysis, and interpretation of data. It is widely used in the physical, life, social, and management sciences

Some typical examples are: In fact, statistical reasoning is probably the most common use of mathematics in real world applications.

The most basic level of statistical data analysis deals with "descriptive statistics." Quantities such as the mean (average), median and other percentiles, variance, and standard deviation of a data set (e.g. experimental measurements, blood pressure readings from patients, etc.) plus other types of summary information can be used to understand the "shape" or distribution of the data. For example, the mean and median give two complementary measures of central tendency -- where the "middle point" is. The locations of the various percentiles, and the standard deviation give different sorts of measures of how "spread out" the numbers are.

Descriptive statistics are also used as the basis for making inferences or predictions on the overall distribution of a quantity based on a random sample from a population. To illustrate this, let us briefly discuss some of the issues involved in election polling. The statistic of interest here is most often simply the fraction of the total population of voters favoring a particular candidate. If we could ask every voter which candidate they planned to vote for, then presumably we could determine who would win the election before it occurred.

But of course that kind of "completeness" of information is virtually never available, first and foremost because it would be too difficult and costly to obtain it. There are other complicating factors too:

So political pollsters select a sample (that is, a subset) of the whole population of voters and determine the stated preferences of those in the sample. The goal is to make inferences about the preferences of the whole population from the preferences of those in the sample. If the sample were perfectly representative of the whole population, the results would definitely be correct. But of course, that is also virtually never true. There should always a large degree of randomness involved in the selection of the sample. But that means that the fraction of voters favoring candidate X in the sample will most likely differ from the true fraction to some extent. We would like to be able to estimate the error -- for instance, to say that in our sample 55% of the voters favored X, and the same is true for all voters with an error of + or - 3%. Then we must address questions such as: How large a sample do we need so that we can be reasonably certain that the error is that small? What does reasonably certain mean -- can we quantify that? and so forth.

The basis for answering this type of question is the theory of probability, so we will start there this semester. We will study:

The major tools here will be: We will then use this theoretical basis to develop the basics of mathematical statistics next semester. Some of you may have studied aspects of this in previous high school or college course work in economics, sociology, psychology, or other areas. Since this is an 300-level mathematics course, while we will start with the beginnings of the subject, we will also eventually consider the theoretical mathematical bases for comparing patterns observed in samples from a population to the patterns of the whole population. In other words, in this class we will learn not only how to apply statistical tests, but also how and why statistical tests work (with proof), and how statistical tests for new situations might be designed.

There is a week-by-week schedule at the end of this syllabus with more information on the topics we will cover. In addition, a more detailed day-by-day schedule will be maintained on the course homepage.


Course Objectives

The major objectives of the course will be:

  1. To introduce you to basic methods of data analysis and descriptive statistics (mean, median, standard deviation, percentiles, correlation coefficient) and develop your proficiency at computing these statistics by hand and with appropriate software.
  2. To introduce you to the frequentist school of probability, discrete and continuous random variables, expected values, variances.
  3. To introduce you to standard families of discrete and continuous distributions: binomial, geometric, hypergeometric, uniform, normal, gamma, beta variables, and their typical applications.
  4. To develop the basis of multivariate distributions and several methods for identifying the distribution of functions of random variables.
  5. To formulate and prove a basic form of the Central Limit Theorem.
  6. To further develop your problem-solving and proof-writing skills.

Texts

The texts for both semesters of the course are

  1. Mathematical Statistics with Applications, 7th ed. by D. Wackerly, W. Mendenhall, and R. Scheaffer. We plan to cover Chapters 1-6 and the sections in Chapter 7 on the Central Limit Theorem this semester and the rest of Chapter 7 and Chapters 8-12 and some additional topics next semester.
  2. Introductory Statistics with R, 2nd ed. by P. Dalgaard. We will use this as a reference and a source of examples on the use of the R statistical software package.

Course Assignments and Grading

The assignments for the course will consist of:

  1. Two in-class midterm exams, each worth 20% of the course grade. Tentative dates: Friday, October 7 and Friday, November 18. (If the class agrees, we might also do these as evening exams on Thursday, October 6 and Thursday, November 17. If we do the exams that way, the Friday classes would be cancelled.)
  2. Final examination, worth 25% of the course grade. (This will be given at the scheduled time for MWF 2:00pm classes, to be determined.)
  3. Weekly problem sets, worth 20% of the course grade. Notes:
  4. Group reports from discussion class meetings and computer lab reports, together worth 15% of the course grade. Note: Several assignments will involve use of the R open-source statistical computing software package. More information will be provided in class.

I will be keeping your course average in numerical form throughout the semester, and only converting to a letter for the final course grade. The course grade will be assigned according to the following conversion table (also see Note below):

Note: Depending on how the class as a whole is doing, some downward adjustments of the above letter grade boundaries may be made. No upward adjustments will be made, however. (This means, for instance, that an 85 course average would never convert to a letter grade of B- or below. But a 79 course average might convert to a letter grade of B- depending on the distribution of averages across the whole class.)

If you ever have a question about the grading policy, or about your standing in the course, please feel free to consult with me.


Schedule

The following is an approximate schedule. Some rearrangement, expansion, or contraction of topics may become necessary. I will announce any changes in class, and on the course homepage.

WeekDatesClass Topics Reading (WMS)
1 8/31,9/2 Course introduction Background reading: Chapter 1
2 9/5,7,9 Begin Sample spaces, events, probabilities 2.1-2.5
3 9/12,14,16 Conditional probabilities, independence 2.6-2.9
4 9/19,21,23 Discrete random variables, expected values 2.10-3.3
5 9/26,28,30 Binomial, Geometric and related random variables 3.4-3.7
6 10/3,5 Poisson processes, moment generating functions 3.8-3.9
10/7 Exam 1 (Chapters 1, 2, and 3.1-3.5)
10/10,12,14 No Class -- Fall Break
8 10/17,19,21 Continuous Random Variables, Normal, Gamma 4.1-4.6
9 10/24,26,28More on continuous random variables 4.7 - 4.10
10 10/31,11/2,4Multivariate distributions, independence 5.1-5.4
11 11/7,9,11Expected value, covariance 5.5-5.7
12 11/14,16Linear combinations of random variables 5.8-5.9
11/18 Exam 2 (Rest of Chapter 3, Chapters 4,5)
13 11/21 Functions of random variables 6.1-6.2
11/23, 25 No Class -- Thanksgiving Break
14 11/28,30, 12/2 Methods of distribution functions, moment generating functions6.5,
15 12/5,7,9 Central Limit Theorem, semester wrap-up 7.3-7.4
The final examination for this class will be held at the time for MWF 2:00 classes.


Departmental Statement on Academic Integrity


Why is academic integrity important?


All education is a cooperative enterprise between teachers and students. This cooperation works well only when there is trust and mutual respect between everyone involved. One of our main aims as a department is to help students become knowledgeable and sophisticated learners, able to think and work both independently and in concert with their peers. Representing another person's work as your own in any form (plagiarism or ``cheating''), and providing or receiving unauthorized assistance on assignments (collusion) are lapses of academic integrity because they subvert the learning process and show a fundamental lack of respect for the educational enterprise.

How does this apply to our courses?


You will encounter a variety of types of assignments and examination formats in mathematics and computer science courses. For instance, many problem sets in mathematics classes and laboratory assignments in computer science courses are individual assignments. While some faculty members may allow or even encourage discussion among students during work on problem sets, it is the expectation that the solutions submitted by each student will be that student's own work, written up in that student's own words. When consultation with other students or sources other than the textbook occurs, students should identify their co-workers, and/or cite their sources as they would for other writing assignments. Some courses also make use of collaborative assignments; part of the evaluation in that case may be a rating of each individual's contribution to the group effort. Some advanced classes may use take-home examinations, in which case the ground rules will usually allow no collaboration or consultation. In many computer science classes, programming projects are strictly individual assignments; the ground rules do not allow any collaboration or consultation here either.

What are the responsibilities of faculty?


It is the responsibility of faculty in the department to lay out the guidelines to be followed for specific assignments in their classes as clearly and fully as possible, and to offer clarification and advice concerning those guidelines as needed as students work on those assignments. The Department of Mathematics and Computer Science upholds the College's policy on academic honesty. We advise all students taking mathematics or computer science courses to read the statement in the current College catalog carefully and to familiarize themselves with the procedures which may be applied when infractions are determined to have occurred.

What are the responsibilities of students?


A student's main responsibility is to follow the guidelines laid down by the instructor of the course. If there is some point about the expectations for an assignment that is not clear, the student is responsible for seeking clarification. If such clarification is not immediately available, students should err on the side of caution and follow the strictest possible interpretation of the guidelines they have been given. It is also a student's responsibility to protect his/her own work to prevent unauthorized use of exam papers, problem solutions, computer accounts and files, scratch paper, and any other materials used in carrying out an assignment. We expect students to have the integrity to say ``no'' to requests for assistance from other students when offering that assistance would violate the guidelines for an assignment.

Specific Guidelines for this Course


Because of the size of this class, examinations will be given in class (with strict guidelines about what information can be consulted). No sharing of information of any form with other students will be permitted during exams. Consultation with your team-mates on the group projects and computer labs is a big part of the point of those assignments, so you should aim for complete collaboration there. On individual the problem sets, discussion of the questions with other students in the class, and with me during office hours is allowed, even encouraged. If you do take advantage of any of these options, you will be required to state that fact in a footnote accompanying the problem solution. Failure to follow this rule will be treated as a violation of the College's Academic Integrity policy.