Data mining is the extraction of explicit, previously unknown, and potentially useful information from data. The machine learning algorithms used for data mining sift through databases automatically, seeking regularities or patterns. Strong patterns, if found, will likely generalize to make accurate predictions on future data.
This introductory course will describe the most common styles of machine learning algorithms used for data mining. It will cover methods of inferring rules and decision trees, statistical modeling, association rules, linear models, instance-based learning, and clustering.
Of particular interest is the way in which machine learning algorithms are evaluated, and we will describe the methodology of training and testing, predicting performance, cross-validation, and other methods of estimating error rates.
The course will have a strong practical component, based on the open source Weka machine learning workbench. This is an extensive collection of state-of-the-art machine learning algorithms and data preprocessing tools presented within a uniform interactive interface. Students will learn how to apply the algorithms in Weka to a wide variety of datasets, and interpret the results.