Skip main navigation

CRISP-DM Details

An article presenting more details on CRISP-DM.
© Luleå University of Technology

CRISP-DM is the methodology we recommend using in data science projects.

Business understanding

A data science projects starts with business understanding. At this stage, problem definition, project objectives, and model requirements are supposed to be defined using business terminologies. A project manager and project sponsor have assumed roles on the customer side. The project sponsor should help make available domain expert(s) during project time.

Data understanding

It should be noted that the problem definition and project objectives have a direct impact on the data that is collected. During data collection, data scientists identify and collect the available data resources (structured, semi-structured, unstructured) relevant to the problem domain, in coordination with both domain experts and IT resources from the beneficiary side. Techniques such as data sampling could be used in this process. However, in-memory analytics as well as cloud-based, resources could allow data scientists in the project to use full-fledged datasets. Encompassing all available, relevant, and appropriate quality datasets increase model accuracy. Exploratory data analysis, EDA, is then used to enable visualizations of the dataset using visualization tools.

Data Pre-processing

This phase comprehends the tasks to construct the dataset which will be used in the next modeling phase. Data pre-processing tasks include data cleansing (for example missing values), duplicate elimination, data normalization, outlier identification, treatment, format transformation, etc. It is also possible that new features will be added during this phase. In terms of time, this phase consumes a substantial amount of project time. The data is then logically modeled and then physically populated in the most appropriate data storage and management tool from the data science ecosystem.

Modeling

When the dataset has been pre-processed, checked for quality and logically and physically modeled, it is time for the data science analytical phase known as modeling. The modeling phase focuses on developing descriptive, predictive, or prescriptive analytics. Training, development, and testing datasets are used to build and refine the models. Models could serve segmentation, classification, regression, outliers, and deviation detection purposes. The use of machine learning techniques and mathematical, or statistical models is a function of the problem, objectives, and the dataset available. The modeling phase is ideally iterative, including multiple iterations of running different techniques and algorithms on the datasets. The project team member, in line with project objectives and problem definition, utilizes data visualization tools (such as Orange) from the data science ecosystem to elucidate hidden knowledge and patterns.

Evaluation

Before the deployment phase, models are evaluated. The project team, including the data scientists, evaluate the models to ensure correctness. Different metrics and measures are towards model evaluation, used such as confidence, support, confusion matrix, accuracy, f-measure, and prediction power metrics such as mean absolute errors, mean absolute deviation, etc. The process also includes comparative metrics such as tables and graphs.

Deployment

Upon successful completion of the necessary tasks to find out a satisfactory model(s), such model(s) needs to be approved by the project sponsor. After that, it is then deployed into the production environment. Deployment could be full or partial depending on business needs. Deployment may be model(s) delivered to the business stakeholder(s) and/or decision makers, or – being embedded into a complex workflow and scoring process managed by an application.

© Luleå University of Technology
This article is from the free online

Data Science for Climate Change

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now