What is Data Mining?

Welcome Everyone! This is Jwala Vegesna and here is my first blog about Data Mining. In this blog I want to give you people with a gentle introduction to Data Mining and the steps involved in Mining.

Data Mining can be defined as the process of discovering interesting patterns and knowledge from large amounts of data. The data sources can include databases, data warehouses, the Web, other information repositories, or data that are streamed into the system dynamically.

Mining of Data

Source: https://www.ovhcloud.com
                                                                             

There are several other names for data mining which include "Knowledge mining from data", "Knowledge Extraction", "Data Analysis", "Data Archaelogy" and "Data Dredging".

As data mining can be treated as a process of Knowledge Discovery from Data, popularly known as KDD, here we will learn about how KDD process works. This process is an iterative sequence of following given steps:

1. Data Cleaning
2. Data Integration
3. Data Selection
4. Data Transformation
5. Data Mining
6. Pattern Evaluation
7. Knowledge Presentation

KDD Process
    
                                                                      Source: https://www.includehelp.com

In Data Cleaning, noisy and inconsistent data are identified and are removed to get the clean and pure data. Once the data is cleaned, next step is to integrate the data.
In Data Integration, required data is collected from different sources and are combined together to form a large database.
The next step is the Data Selection where required data is selected form the database for data analysis and then this selected data is transformed into an appropriate form by applying summary and aggregation methods. This step is called Data Transformation.
Once the data is transformed into an appropriate form, Data Mining  is applied to extract patterns form the database. On these patterns, Pattern Evaluation is performed to know the interesting patterns that represents the knowledge based on some knowledge measures.
The last step is the Knowledge Presentation where the mined knowledge is presented to users with the help of visualization and knowledge representation techniques.  

After the completion of KDD process, we get some knowledge that can be implemented in various areas for analysis purpose. 

In the next blog we will discuss about various techniques involved in each step of KDD. Hope everyone had enjoyed reading my blog and can leave a comment in the below specified comment box if having any doubts.






Comments