CSCI 567
Home
Syllabus
Schedule
Resources
Projects

Blackboard

Machine Learning (CSCI-567)

Fall 2007


General Information

Location:
Where: GFS 118
When: T-Th, 5:00-6:20pm
Instructor:
Sofus A. Macskassy
Office: SAL 216
Office Hours: By appointment
Send me an email and I will be in the office before class.
Phone: 310-414-9849 x247
E-mail: csci567@usc.edu
NOTE: E-mail is the best way to reach me.
Teaching Assistant:
Cheol Han
Office: TBA
Office Hours: TBA
E-mail: cheolhan at usc dot edu

News

  • 11/15/07: Added link to adult-train-small.arff on schedule page.
  • 11/15/07>: Unsupervised learning slides (lectures 23+24) available on schedule page.
  • 11/13/07: Updated grading policies on projects and syllabus pages.
  • 11/9/07: Penalty methods and evaluation slides (lectures 21+22) available on schedule page.
  • 11/8/07: Homework 4 is ready on schedule page.
  • 11/6/07: Overfitting slides (lectures 19-20) available on schedule page.
  • 10/29/07: Bias-Variance slides (lectures 17+18) and hw4 available on schedule page. (updated 10pm)
  • 10/29/07: New projects page added. (updated at 10pm).
  • 10/17/07: Learning theory slides (lectures 15+16) available on schedule page.
  • 10/9/07: Lecture 14 slides available on schedule page.
  • 10/8/07: Lecture 13 slides available on schedule page.
  • 10/8/07: midterm study guide available.
  • 10/3/07: Lecture 12 slides available on schedule page.
  • 10/1/07: Lecture 11 slides available on schedule page.
  • 9/27/07: Lecture 10 slides available on schedule page.
  • 9/25/07: schedule page is updated.
  • 9/25/07: Lecture 9 slides updated (typos fixed) on schedule page.
  • 9/25/07: Lecture 9 slides and HW3 now on schedule page.
  • 9/20/07: schedule page is updated.
  • 9/20/07: Lecture 8 slides now on schedule page.
  • 9/18/07: Homework 2 is ready on schedule page.
  • 9/18/07: Lecture 7 slides now on schedule page.
  • 9/14/07: schedule page is updated.
  • 9/13/07: Lecture 6 slides now on schedule page.
  • 9/12/07<: Homework 1 Clarifications is updated regarding gradiants.
  • 9/11/07: Lecture 5 slides now on schedule page.
  • 9/11/07: Homework 1 Clarifications.
  • Homework 1 is ready on schedule page. I have extended due date to September 18.
  • Lecture 4 slides now on schedule page.
  • Lecture 3 slides now on schedule page.
  • Schedule has been updated (and will be updated based on how things progress).
  • Lecture slides should now be accessible.
  • Lecture 2 slides now on schedule page.
  • Lecture 1 slides now on schedule page.
  • Lecture slides will be put on the schedule page following the lecture.

Course Description

This course will present an introduction to algorithms for machine learning and data mining. These algorithms lie at the heart of many leading edge computer applications including optical character recognition, speech recognition, text mining, document classification, pattern recognition, computer intrusion detection, and information extraction from web pages. Every machine learning algorithm has both a computational aspect (how to compute the answer) and a statistical aspect (how to ensure that future predictions are accurate). Algorithms covered include linear classifiers (Gaussian maximum likelihood, Naive Bayes, and logistic regression) and non-linear classifiers (neural networks, decision trees, support-vector machines, nearest neighbor methods). The class will also introduce techniques for learning from sequential data and advanced ensemble methods such as bagging and boosting.

Prerequisites: basic knowledge of search algorithms, probability, statistics, calculus, data structures, search algorithms (gradient descent, depth-first search, greedy algorithms), linear algebra. Some AI background is recommended, but not required.

Textbook:

  1. The main textbook that we will be using is Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
    Errata: http://www.cmpe.boun.edu.tr/~ethem/i2ml/
  2. Additional recommended readings are:
  3. Lecture notes and other relevant materials are available on this web page.


Course Handouts


Software

In this class, we will be using the WEKA of Waikato (Hamilton, New Zealand). This is a package of machine learning algorithms and data sets that is very easy to use and easy to extend.


Homework Assignments

  • homework assignments will be listed here and on the schedule.

Please turn in all homework in two forms: (i) as hardcopy at the start of class and if applicable: (ii) electronically to TA and instructor.

Written Homework and Programs are due at the beginning of class.
Some guidelines:

  • If you deliver after class but before 8am next day, then you get 25% off
  • If you deliver next day after 8am next day, then you get 50% off
  • If you do not deliever next day, then you get no points
  • If you have a valid excuse for not turning in your homework, then you need to let me know ASAP with proper documentation. No exceptions.
  • If you have problems with grading, see the TA. If you request a re-grading, then the whole assignment will be regraded (so you run the small risk of possibly losing points as well).

Each student is responsible for his/her own work. The standard departmental rules for academic dishonesty apply to all assignments in this course. Collaboration on homeworks and programs should be limited only to answering questions that can be asked and answered without using any written medium (e.g., no pencils, instant messages, or email).

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees