Data mining algorithms in SQL Server 2008
Microsoft SQL Server 2008 Analysis Services ships with 9 built-in algorithms that can be used in data mining solutions. Apparently these algorithms are unchanged in SQL Server 2008 R2.
- Microsoft Association Algorithm
- Microsoft Clustering Algorithm
- Microsoft Decision Trees Algorithm
- Microsoft Linear Regression Algorithm
- Microsoft Logistic Regression Algorithm
- Microsoft Naive Bayes Algorithm
- Microsoft Neural Network Algorithm
- Microsoft Sequence Clustering Algorithm
- Microsoft Time Series Algorithm
Here's my own summary of these algorithms, based on their descriptions and detail from the Data Mining Algorithms page in SQL 2008 Books Online.
Association
Useful for generating recommendations. The classic example is people who buy beer and shampoo.
Clustering
Group similar items together.
Decision Trees
Predicts both discrete and continuous attributes based on relationships. Example is figuring out the common characteristics of old customers that determine whether they are more likely to purchase again.
Linear Regression
Calculates a "line of best fit" for a series of data, and then allows prediction based on that line.
Logistic Regression
Variation of Neural Network, good for yes/no outcomes.
Naive Bayes
Classification algorithm using Bayes theorem. Good for quick results that may then be refined by other algorithms.
Neural Network
Analyse complex relationships with lots of inputs but few outputs.
Sequence Clustering
Finds most common sequences. Good for identifying popular web page site navigation trends on a website.
Time Series
Predict future values of continuous values over time. Forecast next year's sales.
Categories: SQL