CSCI 567


Machine Learning (CSCI-567)

Fall 2008

General Information

Where: GFS 118
When: T-Th, 5:00-6:20pm
Sofus A. Macskassy
Office: SAL 216
Office Hours: By appointment
Send me an email and I will be in the office before class.
Phone: 310-414-9849 x247
NOTE: E-mail is the best way to reach me.
Teaching Assistant:
Cheol Han
Office: SAL 229
Office Hours: M 2-3, W 11-12
E-mail: cheolhan at usc dot edu


  • Nov 20, 2008: statlog.arff and statlog_test.arff are now available.
  • Nov 19, 2008: lecture slides for unsupervised learning are now available.
  • Nov 19, 2008: project page has been updated (hopefully for the last time).
  • Nov 18, 2008: project page has been updated.
  • Nov 17, 2008: project page has been updated.
  • Nov 17, 2008: lecture slides for evaluation methods are now available.
  • Nov 15, 2008: I have been asked about the 599 course I am offering in the spring of 2009. Here is a preliminary syllabus.
  • Nov 12, 2008: lecture slides for penalty methods and homework 5 are now both available.
  • Nov 12, 2008: project page has been updated.
  • Nov 11, 2008: project page is now up.
  • Nov 5, 2008: lecture slides for overfitting are now available.
  • Oct 28, 2008: lecture slides for bias-variance and homework 4 are now both available.
  • Oct 22, 2008: lecture slides for learning theory now available.
  • Oct 13, 2008: lecture slides for SVM now available.
  • Oct 7, 2008: The midterm study guide is now available.
  • Oct 6, 2008: lecture slides for Bayesian learning now available.
  • Sep 29, 2008: homework 3 available.
  • Sep 24, 2008: lecture slides for neural networks now available.
  • Sep 22, 2008: lecture slides of lecture 8 now available.
  • Sep 18, 2008: homework 2 available.
  • Sep 17, 2008: lecture slides of lecture 7 now available.
  • Sep 14, 2008: lecture slides of lecture 6 now available.
  • Sep 10, 2008: lecture slides of lecture 5 now available.
  • Sep 9, 2008: homework 1 Q5(c) wording is updated. I added the directions: use y=0 for negative instances.
  • Sep 8, 2008: homework 1 now available on schedule page.
  • Sep 8, 2008: lecture slides of lecture 4 now available.
  • Sep 2, 2008: lecture slides of lecture 3 now available.
  • Sep 1, 2008: lecture slide 2, slide 3 update regarding projects.
  • Aug 29, 2008: schedule updated - Oct 2 cancelled (and Nov 25 is now meeting after all).
  • Aug 29, 2008: lecture slides for lecture 1 and 2 now available.
  • Lecture slides will be put on the schedule page following the lecture.

Course Description

This course will present an introduction to algorithms for machine learning and data mining. These algorithms lie at the heart of many leading edge computer applications including optical character recognition, speech recognition, text mining, document classification, pattern recognition, computer intrusion detection, and information extraction from web pages. Every machine learning algorithm has both a computational aspect (how to compute the answer) and a statistical aspect (how to ensure that future predictions are accurate). Algorithms covered include linear classifiers (Gaussian maximum likelihood, Naive Bayes, and logistic regression) and non-linear classifiers (neural networks, decision trees, support-vector machines, nearest neighbor methods). The class will also introduce techniques for learning from sequential data and advanced ensemble methods such as bagging and boosting.

Prerequisites: basic knowledge of search algorithms, probability, statistics, calculus, data structures, search algorithms (gradient descent, depth-first search, greedy algorithms), linear algebra. Some AI background is recommended, but not required.


  1. The main textbook that we will be using is Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
  2. Additional recommended readings are:
  3. Lecture notes and other relevant materials are available on this web page.

Course Handouts


In this class, we will be using the WEKA of Waikato (Hamilton, New Zealand). This is a package of machine learning algorithms and data sets that is very easy to use and easy to extend.

Homework Assignments

  • homework assignments will be listed here and on the schedule.

Please turn in all homework in two forms: (i) as hardcopy at the start of class and if applicable: (ii) electronically to TA and instructor.

Written Homework and Programs are due at the beginning of class.
Some guidelines:

  • If you deliver after class but before 8am next day, then you get 25% off
  • If you deliver next day after 8am next day, then you get 50% off
  • If you do not deliever next day, then you get no points
  • If you have a valid excuse for not turning in your homework, then you need to let me know ASAP with proper documentation. No exceptions.
  • If you have problems with grading, see the TA. If you request a re-grading, then the whole assignment will be regraded (so you run the small risk of possibly losing points as well).
  • Regrading will only be done up to one week after graded work is returned! After that, no regrading will be done and the score will stand.

Each student is responsible for his/her own work. The standard departmental rules for academic dishonesty apply to all assignments in this course. Collaboration on homeworks and programs should be limited only to answering questions that can be asked and answered without using any written medium (e.g., no pencils, instant messages, or email).