Monday, 24 May 2010

Data mining algorithms in SQL Server 2008

Microsoft SQL Server 2008 Analysis Services ships with 9 built-in algorithms that can be used in data mining solutions. Apparently these algorithms are unchanged in SQL Server 2008 R2.

Here's my own summary of these algorithms, based on their descriptions and detail from the Data Mining Algorithms page in SQL 2008 Books Online.

Association

Useful for generating recommendations. The classic example is people who buy beer and shampoo.

Clustering

Group similar items together.

Decision Trees

Predicts both discrete and continuous attributes based on relationships. Example is figuring out the common characteristics of old customers that determine whether they are more likely to purchase again.

Linear Regression

Calculates a "line of best fit" for a series of data, and then allows prediction based on that line.

Logistic Regression

Variation of Neural Network, good for yes/no outcomes.

Naive Bayes

Classification algorithm using Bayes theorem. Good for quick results that may then be refined by other algorithms.

Neural Network

Analyse complex relationships with lots of inputs but few outputs.

Sequence Clustering

Finds most common sequences. Good for identifying popular web page site navigation trends on a website.

Time Series

Predict future values of continuous values over time. Forecast next year's sales.

No comments: