Machine Learning (CSCI-567)
||Sofus A. Macskassy
Send me an email and I will be in the office before class.
NOTE: E-mail is the best way to reach me.
||M 2-3, W 11-12
||cheolhan at usc dot edu
- Nov 20, 2008: statlog.arff and statlog_test.arff are now available.
- Nov 19, 2008: lecture slides for unsupervised learning are now available.
- Nov 19, 2008: project page has been updated (hopefully for the last time).
- Nov 18, 2008: project page has been updated.
- Nov 17, 2008: project page has been updated.
- Nov 17, 2008: lecture slides for evaluation methods are now available.
- Nov 15, 2008: I have been asked about the 599 course I am offering in the spring of 2009. Here is a preliminary syllabus.
- Nov 12, 2008: lecture slides for penalty methods and homework 5 are now both available.
- Nov 12, 2008: project page has been updated.
- Nov 11, 2008: project page is now up.
- Nov 5, 2008: lecture slides for overfitting are now available.
- Oct 28, 2008: lecture slides for bias-variance and homework 4 are now both available.
- Oct 22, 2008: lecture slides for learning theory now available.
- Oct 13, 2008: lecture slides for SVM now available.
- Oct 7, 2008: The midterm study guide is now available.
- Oct 6, 2008: lecture slides for Bayesian learning now available.
- Sep 29, 2008: homework 3 available.
- Sep 24, 2008: lecture slides for neural networks now available.
- Sep 22, 2008: lecture slides of lecture 8 now available.
- Sep 18, 2008: homework 2 available.
- Sep 17, 2008: lecture slides of lecture 7 now available.
- Sep 14, 2008: lecture slides of lecture 6 now available.
- Sep 10, 2008: lecture slides of lecture 5 now available.
- Sep 9, 2008: homework 1 Q5(c) wording is updated. I added the directions: use y=0 for negative instances.
- Sep 8, 2008: homework 1 now available on schedule page.
- Sep 8, 2008: lecture slides of lecture 4 now available.
- Sep 2, 2008: lecture slides of lecture 3 now available.
- Sep 1, 2008: lecture slide 2, slide 3 update regarding projects.
- Aug 29, 2008: schedule updated - Oct 2 cancelled (and Nov 25 is now meeting after all).
- Aug 29, 2008: lecture slides for lecture 1 and 2 now available.
- Lecture slides will be put on the schedule page following the lecture.
This course will present an introduction to algorithms for machine
learning and data mining. These algorithms lie at the heart of many
leading edge computer applications including optical character
recognition, speech recognition, text mining, document classification,
pattern recognition, computer intrusion detection, and information
extraction from web pages. Every machine learning algorithm has both
a computational aspect (how to compute the answer) and a statistical
aspect (how to ensure that future predictions are accurate).
Algorithms covered include linear classifiers (Gaussian maximum
likelihood, Naive Bayes, and logistic regression) and non-linear
classifiers (neural networks, decision trees, support-vector machines,
nearest neighbor methods). The class will also introduce techniques
for learning from sequential data and advanced ensemble methods such
as bagging and boosting.
Prerequisites: basic knowledge of search algorithms, probability,
statistics, calculus, data structures, search algorithms (gradient
descent, depth-first search, greedy algorithms), linear algebra.
Some AI background is recommended, but not required.
- The main textbook that we will be using is Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
Additional recommended readings are:
- Tom Mitchell, "Machine Learning", McGraw-Hill, 1997.
- Richard O. Duda, Peter E. Hart & David G. Stork, "Pattern Classification. Second Edition", Wiley & Sons, 2001. (Make sure your copy is not the first printing (or go to David Stork's web page and download the bug fixes).
Errata files: http://rii.ricoh.com/~stork/DHS.html
- Trevor Hastie, Robert Tibshirani and Jerome Friedman, " The elements of statistical learning", Springer, 2001.
- Lecture notes and other relevant materials are available on this web page.
In this class, we will be using the WEKA
of Waikato (Hamilton, New Zealand). This is a package of machine
learning algorithms and data sets that is very easy to use and easy to
- homework assignments will be listed here and on the schedule.
Please turn in all homework in two forms: (i) as hardcopy at the start
of class and if applicable: (ii) electronically to TA and instructor.
Written Homework and Programs are due at the beginning of class.
- If you deliver after class but before 8am next day, then you get 25% off
- If you deliver next day after 8am next day, then you get 50% off
- If you do not deliever next day, then you get no points
- If you have a valid excuse for not turning in your homework, then you need to let me know ASAP with proper documentation. No exceptions.
- If you have problems with grading, see the TA. If you request a re-grading, then the whole assignment will be regraded (so you run the small risk of possibly losing points as well).
- Regrading will only be done up to one week after graded work is returned! After that, no regrading will be done and the score will stand.
Each student is responsible for his/her own work. The standard
departmental rules for academic dishonesty apply to all assignments in
this course. Collaboration on homeworks and programs should be
limited only to answering questions that can be asked and answered
without using any written medium (e.g., no pencils, instant messages,