Skip main navigation

New offer! Get 30% off one whole year of Unlimited learning. Subscribe for just £249.99 £174.99. T&Cs apply

Data analysis and data mining

What is the definition of data mining how to do data mining? In this article, Dr Ming Yan discusses his recent research.

Data analysis

Data analysis refers to the process of purposely collecting data, studying and summarizing the data in detail, extracting useful information from it and forming conclusions, the purpose of which is to focus, extract and refine information from a pile of disorganized data, and to explore the intrinsic laws of data objects. Conceptually, the task of data analysis can be broken down into activities such as locating, identifying, distinguishing, classifying, clustering, distributing, arranging, comparing, comparing internal and external connections, correlating, and relating.

Data visualization-based analysis tasks, on the other hand, include identifying, deciding, visualizing, comparing, reasoning, configuring, and locating. Data-based decision making, on the other hand, can be broken down into identifying goals, evaluating available options, selecting target options, and executing options. In terms of statistical applications, data analysis can be categorized into descriptive statistical analysis, exploratory data analysis, and validation data analysis.

Data analysis has evolved from statistics and has demonstrated great value in various industries. The representative directions of data analysis include statistical analysis, exploratory data analysis, and validation data analysis, etc. Among them, exploratory data analysis mainly emphasizes the search for previously undiscovered features and information from data, while validation data analysis emphasizes the analysis of data to verify or falsify the hypotheses that have been proposed.

The combination of data analysis with natural language processing, numerical computation, cognitive science, computer vision, etc., derives different kinds of analysis methods and corresponding analysis software, e.g., MATLAB in the field of scientific computation, Weka in the field of machine learning, SPSS/Text, SAS Text Miner in the field of natural language processing, and OpenCV in the field of computer vision.

Data Mining

Data mining refers to the theory and method of designing specific algorithms to explore and discover knowledge or patterns from a large amount of data sets, which is a key step of knowledge discovery in the discipline of knowledge engineering. Specific data mining methods can be designed for different data types, such as numerical data, text data, relational data, streaming data, web data and multimedia data.

There are various definitions of data mining, and the intuitive definition is the process of exploring and analyzing data through automated or semi-automated methods to extract implied, potentially useful information and knowledge from large, incomplete, noisy, fuzzy, and random data that people do not know in advance.

Data mining is not data querying or web searching; it incorporates ideas from statistics, databases, artificial intelligence, pattern recognition, and machine learning theories, with a special focus on challenging problems such as processing of anomalous data, high-dimensional data, heterogeneous and heterogeneous data. Basic data mining tasks fall into two categories: predicting future values of other variables based on some variables, i.e., predictive methods (e.g., classification, regression); and describing data in terms of human-interpretable patterns (e.g., clustering, pattern mining, association rule discovery).

Data mining is considered a specialized approach to data analysis, and the essential difference with traditional data analysis is that the former mines knowledge without explicit assumptions, the resulting information is characterized by three features: unknown, valid, and useful, and data mining tasks tend to be predictive rather than traditional descriptive tasks. The inputs to data mining can be databases or data warehouses or other data source types.

In predictive approaches, conclusions from analysis of the data can be constructed as global models and such global models are applied to observe the values of predictable target attributes, whereas descriptive tasks aim to summarize the data using local patterns that reflect implicit relationships and features.

Your task

Please give a brief definition of data analysis and data mining.

Share your thoughts and ideas in the comments below.

© Communication University of China
This article is from the free online

Introduction to Digital Media

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now