Skip main navigation


An article presenting who is doing what within the respective phase of CRISP-DM.

In the previous article, you learned about the process of CRISP-DM. Now we will explain the roles of the people involved in accomplishing the process.

If you recall, the CRISP-DM process is made up of six phases. In the business understanding phase, we will need to develop a data science role chart identifying the different personas involved in the project.

Here it is important to distinguish between a decision-maker who is developing a small or trial data science project for their own sake to make a decision or address a problem and an organization’s full-scale project. The roles will differ.

In case it is a data science activity made to support your own decisions or address problems, you will basically need to carry out the CRISP-DM phases on your own. That is possible, and is nowadays recommended to support data driven decisions and organizations’ journey towards digital transformation. This has become a viable option, thanks to the low-code/no-code platforms. You will see such use-case examples later in this course.

On the other hand, if the data science project is organizational, or perhaps societal, then we need a full team of the below personas:

Project manager: This persona assumes responsibilities for the project time, budget, scope, and quality. The project manager also manages risks and the documentation of the project.

Solution architect: This persona decides on the infrastructural and architectural components of the project including software, hardware, cloud utilization, etc. The solution architect is also active in the data understanding, data preprocessing, and modeling phases, (phases 2, 3 & 4, in the CRISP-DM).

Data Engineer: This persona is responsible for acquiring, preparing, and managing the data. The data engineer is also active in the data understanding and data preprocessing phases (phases 2 & 3, in the CRISP-DM).

Data scientist: This persona is the one responsible for analyzing the data using appropriate data science algorithms such as AI, statistical or mathematical models. That role assumes domain knowledge in the problem area, for example climate change. In case the data scientist does not have the required domain knowledge, then we need to add the roles another role in the form of a domain expert. The data scientist is also active across all phases, but mostly in the modeling, evaluation and deployment phases, (phases 4 & 5 in the CRISP-DM).

Application developer: This persona will need to write code in a programming language such as for example Python in order to embed or integrate the project outcomes, models, or data into other applications. The application developer is also active in the data understanding, data preprocessing, and modeling phases (phases 2, 3 & 4 in the CRISP-DM).

© Luleå University of Technology
This article is from the free online

Data Science for Climate Change

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now