Skip to 0 minutes and 0 seconds To better understand the knowledge and skills required to be a data scientist and how data science projects work, we can explore the data science life cycle. Imagine the data science life cycle as a digital onion, with each layer being a distinct process that works together to form a whole data product or solution. We can take these layers apart to explore this life cycle of data. At the core of this process is the problem. And by asking relevant questions about the nature of the problem, we can define a set of objectives that might lead us to a solution. Data mining involves gathering and scraping all the necessary or available data to tackle the problem.
Skip to 0 minutes and 57 seconds Pre existing datasets may provide solutions to similar problems, but a unique question may call for unique data. Data preparation. Data doesn’t always come neatly packaged, and there are often inconsistencies and data types, misspelled attributes, or even missing and duplicate values that can cause problems later on. Garbage in, garbage out. Data cleaning is often the most time consuming stage, often taking 50 to 80% of the overall process. Once data has been cleaned, it can then be transformed or modified based on requirements. Data exploration. By plotting and visualising the data, we can identify general trends or relationships between data. What story is the data telling us?
Skip to 1 minute and 54 seconds Understanding patterns and bias in our data allows us to form hypothesis about the problem that we can test in the next stages. Feature Engineering is the process of using domain knowledge to transform your raw data into informative features that represent the problem you’re trying to solve. Data modelling involves training machine learning models like KNN decision tree and naive bayes, evaluating their performance and identifying which model best suits the type of data we have available and the problem we’re trying to solve. Data visualisation means communicating the findings in simple yet effective visual ways to key stakeholders using graphs, plots, infographics, interactive visualisations and even storytelling methods that best conveys meaningful insight to decision makers.
Skip to 2 minutes and 55 seconds Deployment, after successfully processing and modelling our data, we return to the beginning of the life cycle to evaluate whether we have achieved our objectives and gained insight into solving the problem. If we have, we can then deploy the model to tackle the problem. Monitoring real time analytics to report and maintain model performance, completing the data science lifecycle may also yield unexpected insights that aren’t necessarily useful for solving the problem we set out with. But that’s the beauty of a cycle. It has continuity and allows us to expand and iterate upon our findings, leading to more powerful insights and knowledge.
A day in the life
What does it meant to be a data scientist?
Watch the video to find out what being a data scientist involves - skills, tasks, knowledge and interests - and how all the pieces fit together.
Data scientists can be found in many different kinds of organisations, large and small. They could work across government, in business, in labs, and in research.
The skills of a data scientist may be the same, no matter where they work, but how they apply them, reasons behind the work they do and how their work impacts others may vary considerably.
Feel free to browse the YouTube videos below to see how the data scientist role varies across industries.
The big picture - gathering and presenting data for actionable change. Watch the TEDxBerkley talk Data science for the environment, where Dan Hammer (2018) talks about his work for Global Forest Watch. He makes an interesting statement in his talk: ‘you can’t change what you can’t see’ (referring to the visualisation of data).
Improving processes - being the ‘data person’ and promoting data-driven decision-making. Watch Day in the Life: Data Scientist, where Alena Crivello talks about her work in data science at Chevron.
Teaming up - working productively with a range of people to address important problems. Watch this video describing the collaboration between pharmaceutical and data science specialists to find new treatments for serious diseases: Data Science and Artificial Intelligence - Turning Data into Knowledge.
The Daily Routine - the day-to-day activities most data scientists do, no matter their subject area. A number of data scientists share what they do in a series of blog posts from KDNuggets: A Day in the Life of a Data Scientist: Part 4.
Have the above examples raised any questions or concerns for you?
Share your thoughts in the comments area below.
AstraZeneca. (2020, February 18). Data science and artificial intelligence - Turning data into knowledge [Video]. Youtube. https://www.youtube.com/watch?v=LSGk9pVfujM
Crivello, A. (2018, February 12). Day in the life: Data scientist [Video]. YouTube. https://www.youtube.com/watch?v=_Wk9T_G-u4o
Hammer, D. (2018, March 8). Data science for the environment [Video]. YouTube. TEDxBerkley. https://www.youtube.com/watch?v=ph439t-kTIE
Mayo, M. (2018, April 2). A day in the life of a data scientist: Part 4. KDnuggets. https://www.kdnuggets.com/2018/04/day-life-data-scientist-part-4.html
© Coventry University. CC BY-NC 4.0