# Midterm Study Guide -- CSCI567 -- Fall 2008

Topics to know for the midterm:

• Situations in which machine learning is useful.
• Definitions of terminology: training examples, features, classes, hypotheses, hypothesis classes, loss functions, adjustable parameters, VC dimension.
• Decision theory: How to use a loss function to decide what decision to make in order to minimize expected loss. How to handle reject options.
• Three main kinds of hypotheses: decision boundaries, conditional models P(y|X), and joint models P(X,y).
• Two main types of models: discriminative and generative.
• How to make classification decisions using each of these.
• Types of hypothesis spaces: Fixed versus variable, stochastic vs. deterministic Debate about which method is best. Factors disputed in the debate.
• Criteria for off-the-shelf learning algorithms. What does each of them mean?
• Details of specific learning algorithms and hypothesis spaces (type of decision boundary, learning algorithms, advantages and disadvantages according to the criteria for "off-the-shelf" learning):
• Linear threshold units (what can they express? What can't they express?) Ways of fitting LTUs via: LMS, Logistic regression, multi-variate gaussians, naive bayes (discrete case), linear programming.
• Decision trees (including splitting rule and methods of handling missing values)
• Nearest Neighbor (curse of dimensionality, edited NN)
• Neural networks (including both squared error and softmax error, initialization of the weights, learning network structure)
• Support Vector Machines (kernels, both formulations of quadratic programming)
• Naive Bayes (How to compute it for discrete attributes; Laplace corrections; Kernel density estimation)
• Gradient descent search. How to design a gradient descent search algorithm. Difference between batch and incremental (stochastic) gradient descent.
• Quadratic programming. What is the standard form of a quadratic programming problem?
• Bayesian Learning Theory. What is bayesian model averaging? What is MAP? What is the optimal Bayes classifier. How are they related?

# Sample questions

1. You are given a data set where it looks like the underlying data from each class come from a gaussian distribution. Which of the methods we have talked about (perceptron, logistic regression, LDA, decision trees, ...) would be the first method you would try? Why?
2. You are an expert machine learning consultant. A customer comes to you with a problem that is a good fit for machine learning ang and you ask what kind of costs are associated with false positives and false negatives. The customer does not know and believes that this may change often. Knowing that learning a model will take a long time and cannot be done often, what kind of classification models are likely to be better for this and why?
3. You want to apply gradient descent to minimze a hypothesis of the following form: `y=w1+w2*x`. What are the update rules for `w1` and ```w2? ```
4. ```What is the difference between discriminative and generative models? ```
5. ```You notice that a customer is using a neural network on a problem domain where you are fairly sure that a hyperplane can separate the positives and the negatives. Is a full neural network needed? What other methods might you suggest to the customer? ```
``` ```