Holy Cross Mathematics and Computer Science



Mathematics 375 -- Probability and Statistics I

Syllabus Fall 2009

Professor: John Little
Office: Swords 331 (temporary office for Fall 2009)
Office Phone: 793-2274
Office Hours: MTWF 10am - 12noon, R 11am - 12noon, W 2 - 3, and by appointment
Email: little@mathcs.holycross.edu (preferred), or jlittle@holycross.edu
Course Homepage: http://mathcs.holycross.edu/~little/ProbStat0910/PS109.html

Course Description

Statistics is the branch of the mathematical sciences that deals with with the collection, analysis, and interpretation of data. It is used very widely today in the physical, life, social, and management sciences to identify underlying patterns and relationships, and as a tool for making decisions and predictions in the presence of uncertainty. Some typical examples are:

Indeed statistical reasoning is probably the most common use of mathematics in real world applications at present.

On the most basic level, "descriptive statistics" -- quantities such as the mean (average), variance, and standard deviation of a collection of numbers (e.g. experimental measurements, blood pressure readings from patients, etc.) plus other types of summary information about a data set -- describe the "shape" or distribution of the data (for example, how "spread out" the numbers are around their "middle point"). These descriptive statistics are also used as the basis for making inferences or predictions on the overall distribution of a quantity based on a random sample.

Some of you may have studied this aspect of statistics in previous high school or college course work in mathematics, economics, sociology, psychology, or other areas. We will use these ideas too, but we will also go considerably farther than those courses did to study the theoretical mathematical bases for comparing patterns observed in samples from a population to the patterns of the whole population. In other words, in this class we will learn not only how to apply statistical tests, but also how and why statistical tests work (with proof), and how statistical tests for new situations might be designed.

Let us briefly discuss some of the issues involved in election polling. If our data included every possible measurement, then appropriate descriptive statistics applied to the data would presumably generate the information we seek. But of course that kind of "completeness" is virtually never available. In election polling the statistic of interest is most often simply the fraction of the total population of voters favoring a particular candidate. Prior to the actual election, it would be too difficult and costly to ask every possible voter which candidate they prefer to determine that fraction. Plus, there is no guarantee that the respondents will tell the truth about their actual preference (e.g. the infamous Bradley effect for African-American candidates that received a large amount of attention in the run-up to the 2008 presidential election!) Moreover, if people can choose whether or not to vote, it might not be possible ahead of time to determine exactly who will cast ballots in a given election.

So instead, pollsters select a sample (that is, a subset) of the whole population of voters and determine the stated preferences of those in the sample. The goal is to make inferences about the preferences of the whole population from the preferences of those in the sample. If the sample were perfectly representative of the whole population, the results would definitely be correct. But of course, that is also virtually never true. There will always be some amount of randomness involved in the selection of the sample. That means that the fraction of voters favoring candidate X in the sample will most likely differ from the true fraction to some extent. We would like to be able to estimate the error -- for instance, to say that in our sample 55% of the voters favored X, and the same is true for all voters with an error of + or - 3%. Then we must address questions such as: How large a sample do we need so that we can be reasonably certain that the error is that small? What does reasonably certain mean -- can we quantify that? and so forth.

The basis for answering this type of question is the theory of probability, so we will start there this semester. We will study:

The major tools here will be counting techniques for subsets, permutations, and combinations, single- and multi-variable calculus, and ideas about infinite series from Principles of Analysis. This course is the first half of a full-year sequence; we will turn to statistics per se next semester.

There is a week-by-week schedule at the end of this syllabus with more information on the topics we will cover. In addition, a more detailed day-by-day schedule will be maintained on the course homepage.


Course Objectives

The major objectives of the course will be:

  1. To introduce you to basic methods of data analysis and descriptive statistics (mean, standard deviation, correlation coefficient) and develop your proficiency at computing these statistics by hand and with appropriate software.
  2. To introduce you to the frequentist school of probability, discrete and continuous random variables, expected values, variances.
  3. To introduce you to standard families of discrete and continuous distributions: binomial, geometric, hypergeometric, uniform, normal, gamma, beta variables, and their typical applications.
  4. To develop the basis of multivariate distributions and several methods for identifying the distribution of functions of random variables.
  5. To formulate and prove a basic form of the Central Limit Theorem.
  6. To further develop your problem-solving and proof-writing skills.

Text

The text for both semesters of the course is Mathematical Statistics with Applications, 7th ed by D. Wackerly, W. Mendenhall, and R. Scheaffer. We plan to cover Chapters 1-6 and the section in Chapter 7 on the Central Limit Theorem this semester and the rest of Chapter 7 and Chapters 8-11 and some additional topics next semester.


Course Assignments and Grading

The assignments for the course will consist of:

  1. Two in-class midterm exams, each worth 20% of the course grade. Tentative dates: Thursday, October 8 and Thursday, November 19. (If desired, these exams can be scheduled in the evening to reduce time pressure.)
  2. Final Examination, worth 25% of the course grade. (given at the scheduled time for MTR 2:00pm classes: Monday, December 14 at 2:30pm.)
  3. Weekly problem sets, worth 25% of the course grade. Notes:
  4. Group reports from discussion class meetings, together worth 10% of the course grade.

I will be keeping your course average in numerical form throughout the semester, and only converting to a letter for the final course grade. The course grade will be assigned according to the following conversion table (also see Note below):

Note: Depending on how the class as a whole is doing, some downward adjustments of the above letter grade boundaries may be made. No upward adjustments will be made, however. (This means, for instance, that an 85 course average would never convert to a letter grade of B- or below. But a 79 course average might convert to a letter grade of B- depending on the distribution of averages across the whole class.)

If you ever have a question about the grading policy, or about your standing in the course, please feel free to consult with me.


Schedule

The following is an approximate schedule. Some rearrangement, expansion, or contraction of topics may become necessary. I will announce any changes in class, and on the course homepage.

WeekDatesClass Topics Reading (WMS)
1 9/3 Course introduction Background reading: Chapter 1
2 9/7,8,10 Sample spaces, events, probabilities 2.1-2.6
3 9/14,15,17 Conditional probabilities, independence 2.7-2.9
4 9/21,22,24 Discrete random variables, expected values 2.10-3.3
5 9/28,29,10/1 Binomial, Geometric and related random variables 3.4-3.7
6 10/5,6 Poisson processes, moment generating functions 3.8-3.9
10/8 Exam 1 (Chapters 1, 2, and 3.1-3.5)
10/12,13 No Class -- Columbus Day Break
7 10/15 Continuous Random Variables 4.1-4.4
8 10/19,20,22 Normal, Gamma distributions 4.5-4.6
9 10/26,27,29More on continuous random variables 4.7-4.10
10 11/2,3,5Multivariate distributions, independence 5.1-5.4
11 11/9,10,12Expected value, covariance 5.5-5.9
12 11/16,17Functions of Random Variables6.1-6.2
11/19 Exam 2 (Rest of Chapter 3, Chapters 4,5)
13 11/23,24 Method of distribution functions 6.3
11/26 No Class -- Thanksgiving Break
14 11/30, 12/1,3 Moment generating functions, CLT 6.5, 7.3-7.4
15 12/7,8 Finish CLT, Semester wrap-up
The final examination for this class will be held on Monday, December 14, at 2:30 pm.


Departmental Statement on Academic Integrity


Why is academic integrity important?


All education is a cooperative enterprise between teachers and students. This cooperation works well only when there is trust and mutual respect between everyone involved. One of our main aims as a department is to help students become knowledgeable and sophisticated learners, able to think and work both independently and in concert with their peers. Representing another person's work as your own in any form (plagiarism or ``cheating''), and providing or receiving unauthorized assistance on assignments (collusion) are lapses of academic integrity because they subvert the learning process and show a fundamental lack of respect for the educational enterprise.

How does this apply to our courses?


You will encounter a variety of types of assignments and examination formats in mathematics and computer science courses. For instance, many problem sets in mathematics classes and laboratory assignments in computer science courses are individual assignments. While some faculty members may allow or even encourage discussion among students during work on problem sets, it is the expectation that the solutions submitted by each student will be that student's own work, written up in that student's own words. When consultation with other students or sources other than the textbook occurs, students should identify their co-workers, and/or cite their sources as they would for other writing assignments. Some courses also make use of collaborative assignments; part of the evaluation in that case may be a rating of each individual's contribution to the group effort. Some advanced classes may use take-home examinations, in which case the ground rules will usually allow no collaboration or consultation. In many computer science classes, programming projects are strictly individual assignments; the ground rules do not allow any collaboration or consultation here either.

What are the responsibilities of faculty?


It is the responsibility of faculty in the department to lay out the guidelines to be followed for specific assignments in their classes as clearly and fully as possible, and to offer clarification and advice concerning those guidelines as needed as students work on those assignments. The Department of Mathematics and Computer Science upholds the College's policy on academic honesty. We advise all students taking mathematics or computer science courses to read the statement in the current College catalog carefully and to familiarize themselves with the procedures which may be applied when infractions are determined to have occurred.

What are the responsibilities of students?


A student's main responsibility is to follow the guidelines laid down by the instructor of the course. If there is some point about the expectations for an assignment that is not clear, the student is responsible for seeking clarification. If such clarification is not immediately available, students should err on the side of caution and follow the strictest possible interpretation of the guidelines they have been given. It is also a student's responsibility to protect his/her own work to prevent unauthorized use of exam papers, problem solutions, computer accounts and files, scratch paper, and any other materials used in carrying out an assignment. We expect students to have the integrity to say ``no'' to requests for assistance from other students when offering that assistance would violate the guidelines for an assignment.

Specific Guidelines for this Course


Because of the size of this class, examinations will be given in class, and the other assignments will be weekly individual problem sets. Some examinations may be given as open book and/or open notes tests. No sharing of information of any form with other students will be permitted during exams. On the problem sets, discussion of the questions with other students in the class, and with me during office hours is allowed, even encouraged. Consultation of other probability and statistics texts in the library for ideas leading to a problem solution will also be allowed. If you do take advantage of any of these options, you will be required to state that fact in a footnote accompanying the problem solution. Failure to follow this rule will be treated as a violation of the College's Academic Integrity policy.