
Holy Cross Mathematics and Computer Science
Mathematics 375 -- Probability and Statistics I
Syllabus Fall 2009
Professor: John Little
Office: Swords 331 (temporary office for Fall 2009)
Office Phone: 793-2274
Office Hours: MTWF 10am - 12noon, R 11am - 12noon, W 2 - 3, and by appointment
Email: little@mathcs.holycross.edu (preferred), or jlittle@holycross.edu
Course Homepage: http://mathcs.holycross.edu/~little/ProbStat0910/PS109.html
Course Description
Statistics is the branch of the mathematical sciences
that deals with with the collection, analysis, and
interpretation of data. It is used very widely today in the physical,
life, social, and management sciences to identify underlying patterns and
relationships, and as a tool for making decisions and predictions
in the presence of uncertainty. Some typical examples are:
- Demonstrating evidence for an empirical relationship
between different quantities in a physics experiment, where measurement
errors or fundamental elements of randomness (e.g. quantum physical
effects) may be present,
- Identifying trends in customer preferences from
online purchase data,
- Analyzing the results of treating patients with a new drug
in a clinical trial, where the reaction of an individual patient
depends on too many different types of factors to be readily predictable,
- Predicting the outcome of an election by sampling voter preferences
(see below for a fuller discussion of this case),
- Estimating how to price an insurance policy based on likely
risks to the persons covered. (This is a large part of what
actuaries do in insurance companies; Probability and
Statistics is the subject of one of the first in the series of
exams actuarial trainees must pass to become qualified.)
Indeed statistical reasoning is probably the most common use
of mathematics in real world applications at present.
On the most basic level, "descriptive statistics" -- quantities
such as the mean (average), variance, and standard deviation
of a collection of numbers (e.g. experimental measurements,
blood pressure readings from patients, etc.) plus other types of summary
information about a data set -- describe the "shape" or
distribution of the data (for example, how "spread out" the
numbers are around their "middle point"). These descriptive
statistics are also used as the basis for making inferences
or predictions on the overall distribution of a quantity
based on a random sample.
Some of you may have studied this
aspect of statistics in previous high school or college course
work in mathematics, economics, sociology, psychology, or other
areas.
We will use these ideas too, but we will also go considerably
farther than those courses did to study
the theoretical mathematical bases for comparing patterns observed
in samples from a population to the patterns of the whole population.
In other words, in this class we will learn not only how to apply
statistical tests, but also how and why statistical tests work
(with proof), and how statistical tests for new situations
might be designed.
Let us briefly discuss some of the issues involved in election
polling. If our data included every possible
measurement, then appropriate descriptive statistics applied
to the data would presumably generate the information we seek. But
of course that kind of "completeness" is virtually never
available. In election polling the statistic
of interest is most often simply the fraction of the total
population of voters favoring a particular candidate. Prior to
the actual election, it would be too difficult and costly to ask
every possible voter which candidate they prefer to determine
that fraction. Plus, there is no guarantee that the respondents
will tell the truth about their actual preference (e.g. the infamous
Bradley effect for African-American candidates that received
a large amount of attention
in the run-up to the 2008 presidential election!)
Moreover, if people can choose whether or
not to vote, it might not be possible ahead of time to determine exactly
who will cast ballots in a given election.
So instead, pollsters select a sample (that is, a subset)
of the whole population of voters and determine the stated preferences
of those in the sample. The goal is to make inferences about the
preferences of the whole population from the preferences of those
in the sample. If the sample were perfectly representative of the
whole population, the results would definitely be correct. But of
course, that is also virtually never true. There will
always be some amount of randomness involved in the selection
of the sample. That means that the fraction of voters favoring
candidate X in the sample will most likely differ from the true
fraction to some extent. We would like to be able to estimate
the error -- for instance, to say that in our sample
55% of the voters favored X, and the same is true for all voters
with an error of + or - 3%. Then we must address questions such
as: How large a sample do we need so that we can be reasonably
certain that the error is that small? What does reasonably
certain mean -- can we quantify that? and so forth.
The basis for answering this type of question is the theory of
probability, so we will start there this semester. We
will study:
- Sample spaces, events, the concepts of density and distribution
functions in the discrete and continuous cases
- Discrete and continuous random variables with a given
density function, expected value, mean, variance, etc.
- Binomial, geometric, and Poisson discrete random variables,
- Normal, Gamma, Beta, and other continuous random variables,
- Functions of random variables and multivariable density and
distribution functions
- The Central Limit Theorem (CLT) (The form we will prove
states that if we sample repeatedly and independently
from any suitable distribution, then the sample mean
tends to a normal random variable as the sample size goes
to infinity. This is one justification for the central role
of normal distributions in statistics.)
The major tools here will be counting techniques for subsets,
permutations, and combinations, single- and multi-variable calculus,
and ideas about infinite series from Principles of Analysis.
This course is the first half of a full-year sequence; we will turn
to statistics per se next semester.
There is a week-by-week schedule at the end of this syllabus with
more information on the topics we will cover. In addition,
a more detailed day-by-day schedule will be maintained on
the course homepage.
Course Objectives
The major objectives of the course will be:
- To introduce you to basic methods of data analysis and descriptive statistics
(mean, standard deviation, correlation coefficient) and develop
your proficiency at computing these statistics by hand and with appropriate software.
- To introduce you to the frequentist school of probability, discrete and
continuous random variables, expected values, variances.
- To introduce you to standard families of discrete and continuous distributions:
binomial, geometric, hypergeometric, uniform, normal, gamma, beta variables,
and their typical applications.
- To develop the basis of multivariate distributions and
several methods for identifying the distribution of functions
of random variables.
- To formulate and prove a basic form of the Central Limit Theorem.
- To further develop your problem-solving and proof-writing skills.
Text
The text for both semesters of the course is
Mathematical Statistics with Applications, 7th ed
by D. Wackerly, W. Mendenhall, and R. Scheaffer. We plan
to cover Chapters 1-6 and the section in Chapter 7 on
the Central Limit Theorem this semester and the rest of
Chapter 7 and Chapters 8-11 and some additional topics next semester.
Course Assignments and Grading
The assignments for the course will consist of:
- Two in-class midterm exams, each worth 20% of
the course grade. Tentative dates: Thursday, October 8 and Thursday,
November 19. (If desired, these exams can be scheduled
in the evening to reduce time pressure.)
- Final Examination, worth
25% of the course grade. (given at the scheduled time
for MTR 2:00pm classes: Monday, December 14 at 2:30pm.)
- Weekly problem sets, worth 25% of the course
grade. Notes:
- Because of the size of this class, in order for me to
return your work with constructive comments in a timely manner, it
may become necessary to grade only selected problems on each assignment.
If that happens, I will always select a representative sample of
computational and theoretical problems to be evaluated from that assignment.
But the selection will not be announced beforehand, and you
will be expected to do all of the problems in any case.
- I will put complete solutions of all assigned
problems on reserve in the Science Library after class on the date
the assignment is due. You may consult these and photocopy them
for your own use at any time if you wish.
- Because of the availability of these complete solutions,
because every effort will be made to return your graded problem
sets in a timely fashion, and for reasons of fairness,
no problem sets will be accepted for credit after the announced due date,
except in the case of a serious medical situation, family emergency, etc.
with authorization from your class dean. If you
are authorized to hand in a problem set late, I will ask you
sign a statement that you have not consulted the reserve
solutions in preparing your work.
- Group reports from discussion class meetings,
together worth 10% of the course grade.
I will be keeping your course average in numerical form throughout
the semester, and only converting to a letter for the final course
grade. The course grade will be assigned according to
the following conversion table (also see Note below):
- A -- 94 and above
- A- -- 90 - 93
- B+ -- 87 - 89
- B -- 84 - 86
- B- -- 80 - 83
- C+ -- 77 - 79
- C -- 74 - 76
- C- -- 70 - 73
- D+ -- 67 -- 69
- D -- 60 - 66
- F -- 59 and below.
Note: Depending on how the class as a whole is doing, some
downward adjustments of the above letter grade boundaries may be made.
No upward adjustments will be made, however. (This means, for
instance, that an 85 course average would never convert to a letter
grade of B- or below. But a 79 course average might convert to a
letter grade of B- depending on the distribution of averages
across the whole class.)
If you ever have a question about the grading policy, or about your
standing in the course, please feel free to consult with me.
Schedule
The following is an approximate schedule. Some rearrangement,
expansion, or contraction of topics may become necessary. I will announce
any changes in class, and on the course homepage.
Week | Dates | Class Topics | Reading (WMS)
|
---|
| |
|
---|
1 | 9/3 | Course introduction | Background reading: Chapter 1
|
---|
2 | 9/7,8,10 | Sample spaces, events, probabilities | 2.1-2.6
|
---|
3 | 9/14,15,17 | Conditional probabilities, independence | 2.7-2.9
|
---|
4 | 9/21,22,24 | Discrete random variables, expected values | 2.10-3.3
|
---|
5 | 9/28,29,10/1 | Binomial, Geometric and related random variables | 3.4-3.7
|
---|
6 | 10/5,6 | Poisson processes, moment generating functions | 3.8-3.9
|
---|
| 10/8 | Exam 1 (Chapters 1, 2, and 3.1-3.5) |
|
---|
| 10/12,13 | No Class -- Columbus Day Break |
|
---|
7 | 10/15 | Continuous Random Variables | 4.1-4.4
|
---|
8 | 10/19,20,22 | Normal, Gamma distributions | 4.5-4.6
|
---|
9 | 10/26,27,29 | More on continuous random variables | 4.7-4.10
|
---|
10 | 11/2,3,5 | Multivariate distributions, independence | 5.1-5.4
|
---|
11 | 11/9,10,12 | Expected value, covariance | 5.5-5.9
|
---|
12 | 11/16,17 | Functions of Random Variables | 6.1-6.2
|
---|
| 11/19 | Exam 2 (Rest of Chapter 3, Chapters 4,5) |
|
---|
13 | 11/23,24 | Method of distribution functions | 6.3
|
---|
| 11/26 | No Class -- Thanksgiving Break |
|
---|
14 | 11/30, 12/1,3 | Moment generating functions, CLT | 6.5, 7.3-7.4
|
---|
15 | 12/7,8 | Finish CLT, Semester wrap-up |
|
---|
The final examination for this class will be held on Monday, December 14, at
2:30 pm.
Departmental Statement on Academic Integrity
Why is academic integrity important?
All education is a cooperative enterprise between teachers and
students. This cooperation works well only when there is trust and
mutual respect between everyone involved.
One of our main aims as a department is to help students become
knowledgeable and sophisticated learners, able to think and work
both independently and in concert with their peers. Representing another
person's work as your own in any form (plagiarism or ``cheating''),
and providing or receiving unauthorized assistance on assignments (collusion)
are lapses of academic integrity because they subvert the learning process
and show a fundamental lack of respect for the educational enterprise.
How does this apply to our courses?
You will encounter a variety of types of assignments and examination
formats in mathematics and computer science courses. For instance,
many problem sets in mathematics classes and laboratory assignments
in computer science courses are individual assignments.
While some faculty members
may allow or even encourage discussion among
students during work on problem sets, it is the expectation that the
solutions submitted by each student will be that student's own work,
written up in that student's own words. When consultation with other
students or sources other than the textbook occurs, students should
identify their co-workers, and/or cite their sources as they would for
other writing assignments. Some courses also make use of collaborative
assignments; part of the evaluation in that case may be a rating of each
individual's contribution to the group effort.
Some advanced classes may use take-home
examinations, in which case the ground rules will usually allow no
collaboration or consultation.
In many computer science classes, programming projects are
strictly individual assignments; the ground rules
do not allow any collaboration or consultation here either.
What are the responsibilities of faculty?
It is the responsibility of faculty in the department to
lay out the guidelines to be followed for specific assignments in
their classes as clearly and fully as possible, and to
offer clarification and advice concerning those guidelines
as needed as students work on those assignments.
The Department of Mathematics and Computer Science upholds the
College's policy on academic honesty.
We advise all students taking mathematics or computer science courses
to read the statement in the current College catalog carefully and
to familiarize themselves with the procedures which may be
applied when infractions are determined to have occurred.
What are the responsibilities of students?
A student's main responsibility is to follow the guidelines laid down
by the instructor of the course. If there is some point about the
expectations for an assignment that is not clear, the student is responsible
for seeking clarification. If such clarification is not immediately available,
students should err on the side of caution and follow the strictest possible
interpretation of the guidelines they have been given.
It is also a student's responsibility to protect his/her
own work to prevent unauthorized use of exam papers, problem solutions,
computer accounts and files, scratch paper, and any other materials used in
carrying out an assignment. We expect students to have the integrity to say
``no'' to requests for assistance from other students when offering that
assistance would violate the guidelines for an assignment.
Specific Guidelines for this Course
Because of the size of this class, examinations will be given
in class, and the other assignments will be weekly individual problem
sets. Some examinations may be given as open book and/or open notes
tests. No sharing of information of any form with other students will
be permitted during exams. On the problem sets, discussion of the
questions with other students in the class, and with me during office
hours is allowed, even encouraged. Consultation of other probability and
statistics texts in the library for ideas leading to a problem solution
will also be allowed. If you do take advantage of any of these
options, you will be required to state that fact in a footnote
accompanying the problem solution. Failure to follow this rule
will be treated as a violation of the College's Academic
Integrity policy.