Tuesday, November 3, 2009

Data Mining

Data mining is generally described as data exploration to find patterns that can be useful for the business organization. It helps you understand a predictive behaviour, identify relationships and group items like customers, products etc., for example our brain is a data mining tool, where the data is stored as we accumulate through various experiences we face through our life. This data is then used to identify certain predictive patterns to be used to the future.

Microsoft has put forth algorithims offered in sql server 2005. the models are
i. decision trees
ii. naive baiyes
iii clustering
iv sequence clustering
v time series
vi association
vii neutral network

Decision Tree : Lets take example of a decision to identify couples who are likely to form a successful marriage. The input attributes will include age, religion, gender, political views, height and so on. the predictable attribute will be marrigage outcome. The first split say is based on the political views 70-30, and the second split is based on the height. In the 70% split, the second split say is height split as 60 are 6' tall and 40 are less than 5' 8" and on the 30 % split, the second split say is 58% are 6' tall and 18% are so less than 5' and 8", this makes to predict the chances of having successful marriage are sharing 70% common political views and men who are 6' tall.

Naive Bayes: This is model can be used to predict a income range for the given occupation.

Clustering: This model can be used to predict where a graph showing per capital income vs per capital dept for each country. This helps to read where there is huge number of countries, by clustering them together it is easy to read.

Time series: This model can be used to predict based on the given time. For time series algorithim you need key time, input and predictable attributes.

In my next article let us go through a step by step process of creating a data mining project.

1 comment:

  1. the problem of data corruption in the files of specified format can be fixed by the pdf file is corrupted utility

    ReplyDelete