Data mining algorithms in SQL Server 2008
Microsoft SQL Server 2008 Analysis Services ships with 9 built-in algorithms that can be used in data mining solutions. Apparently these algorithms are unchanged in SQL Server 2008 R2.
- Microsoft Association Algorithm
- Microsoft Clustering Algorithm
- Microsoft Decision Trees Algorithm
- Microsoft Linear Regression Algorithm
- Microsoft Logistic Regression Algorithm
- Microsoft Naive Bayes Algorithm
- Microsoft Neural Network Algorithm
- Microsoft Sequence Clustering Algorithm
- Microsoft Time Series Algorithm
Here’s my own summary of these algorithms, based on their descriptions and detail from the Data Mining Algorithms page in SQL 2008 Books Online.
Useful for generating recommendations. The classic example is people who buy beer and shampoo.
Group similar items together.
Predicts both discrete and continuous attributes based on relationships. Example is figuring out the common characteristics of old customers that determine whether they are more likely to purchase again.
Calculates a “line of best fit” for a series of data, and then allows prediction based on that line.
Variation of Neural Network, good for yes/no outcomes.
Classification algorithm using Bayes theorem. Good for quick results that may then be refined by other algorithms.
Analyse complex relationships with lots of inputs but few outputs.
Finds most common sequences. Good for identifying popular web page site navigation trends on a website.
Predict future values of continuous values over time. Forecast next year’s sales.