CSCI 307
    Data Mining

    College of the Holy Cross, Fall 2025


    Home | | Schedule | | Resources


    Instructor
    Farhad Mohsin [home]


    Lecture times
    TuTh 11:00AM - 12:15PM

    Location
    Swords 227


    Open hours
    Office location: Swords 339

    • Mondays 3:30-5:00pm
    • Tuesdays 1:00-2:00pm
    • Thursdays 10:00-11:00am
    • and by appointment and Zoom

    Canvas
    We'll use Canvas for assignment submission, lecture notes sharing etc. All assignment's written reports must be submitted in pdf format. I would prefer a digital file (written in Word or LaTex), however it is fine if you put in pictures taken of handwritten assignments, as long as it's legible. For assignments with coding components, they must be done in JuPyter Notebook, and then exported as pdfs. We will go through the procedure for this in class.


    Course description
    This course provides an introduction to Data Mining and will examine data techniques for the discovery, interpretation and visualization of patterns in large collections of data. Topics covered in this course include data mining methods such as classification, rule-based learning, decision trees, association rules, and data visualization. The work discussed originates in the fields of artificial intelligence, machine learning, statistical data analysis, data visualization, databases, and information retrieval.


    Prerequisites
    The prerequisite for this class is CSCI 132, Data Structures.
    Also note that you'll have assignments that require programming in Python. The first couple of lecture will help brush up Python syntaxes and introduce (possibly) new Python libraries that are common in data mining.


    Textbook
    There is no required textbook for this course. The field of data mining is ever-changing and I plan to teach from many different sources over the semester. However, I do recommend the following textbooks if you want to do self-study on fundamental data mining concepts from a somewhat theoretical/algorithmic perspective.

    1. Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Machine Learning: Fundamental Concepts and Algorithms, 2nd Edition, Cambridge University Press. ISBN: 978-1108473989.
    2. Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, 3rd Edition or 2nd edition. 3rd edition only available online at http://www.mmds.org/. The 2nd edition is available from Cambridge University Press. ISBN: 9781316147313

    Both books are freely available online which you can access at https://dataminingbook.info/book_html/ and http://www.mmds.org/.


    Exams
    Midterms:
    There will be two or three mid-term quizzes/exams.

    Final exam:
    A cumulative final exam will be held during finals week as scheduled.

      Final Exam: TBD

    Homework Assignments
    There will be up to ten homework assignments during the semester. These problem sets will include questions that require written answers about concepts, and also problems that involve use of the Python libraries introduced in class.

    Follow-up Discussion
    To ensure academic integrity, I will randomly select students after each assignment to discuss their submitted work. You will participate in two or three such conversations during the semester. These informal chats give you an opportunity to walk me through your solutions and thinking process, further demonstrating your understanding.


    Grading

    • Homework: 45%
    • Midterm exam:30%
    • Final exam: 25%


    Late Policy
    Assignments will usually be due at midnight of the submission date. You will have a total of 7 late days throughout the semester, which you can use without penalty. However, you can use a maximum of 3 days late for a single assignment.


    Collaboration Policy
    You are allowed to discuss strategies for solving homework problems with other students, however any work you turn in must be your own work (i.e. you may not simply copy another student's answers and turn them in as your own).
    You must clearly indicate the names of any students you work with on each assignment.

    Clarification about Artificial Intelligence or "generative AI"
    Generative AI models like ChatGPT, Claude, Gemini, GitHub/Microsoft Copilot, or similar code generation tools are clearly useful. For the purposes of CSCI 307 and the collaboration policy, you should treat generative AI models as if it were a person -- say, a very well-read, sometimes clever, and very confident but sometimes unreliable roommate. That means it is okay to ask a model for general help understanding class material, but it is not okay to put homework questions into a model, or to ask the model to solve specific tasks that an assignment has tasked you to perform. That crosses the line into simply cheating, just as asking a roommate to do your homework would be a violation of the College academic integrity policy. If you consult or use generative AI in any of your assigned work, you must cite the specific tools you used and provide a list of all prompts you used in your discussion log. You are still responsible for ensuring the correctness and accuracy of all submitted work. In addition, you are responsible for ensuring that all source materials used in your work are properly cited and be aware that generative AI can often produce output copied closely (or sometimes directly) from source material without properly citing those sources. Failure to correctly and fully cite sources constitutes plagiarism and is a violoation of the academic integrity policy.

    You may consult publicly available literature (books, articles, blog posts, coding tutorials etc) for information, but you must cite each source of ideas you adopt.

    Please familiarize yourself with the Math and CS Department's policy on Academic Integrity as well as the College's Academic Integrity Policy.


    Excused Absence Policy
    Class attendance is expected and will be counted toward the participation part of the grade. If you have a confirmed reason why you cannot attend an exam at the day or time it is given, you must contact your instructor well ahead of time to arrange to take it at another time. Please see the College Policy on excused absences.


    Reasonable Accommodations and Accessibility Services

    The instructor is committed to providing students with disabilities equal access to the educational opportunities associated with this course. For details or to request accommodation, please refer to College procedures on Requests for Reasonable Accommodations and the Office of Accessibility Services.


    Class Recordings

    Consistent with applicable federal and state law, this course may be video/audio recorded as an accommodation only with permission from the Office of Accessibility Services.



    Last modified: Aug 22, 2025