Did you ever wonder what’s a typical day of a data scientist like? A data scientist needs to explore data given to us and provide actionable insights, but how do we do that and is that all we do? Do we just sit in-front of a computer and code all day? Do we spend most of our day reading papers? Or is it something completely different? Let me walk you through my day as a Data Scientist.
Firstly, Reading and/or Replying emails
Bet you’re not expecting this. Like any other role, Data Scientists do actively communicate with others through emails and I like to do this first thing in the morning. Some of the possible email exchanges;
- Update on any users’ demands
- Update your progress
- Data-related exchanges to better understand your data
- Collaboration with others in your team
Data Scientists do not work in a silo, we have to work with our users, project managers, and other team members. Hence, keeping everyone updated and staying on the same page ensures that the projects can progress smoothly. Emails are an efficient way to achieve that.
Next, Working on Data
Then of course as a Data Scientist, I need to work on my data daily. At any point of time I could be;
- Cleaning the data
- Doing exploratory data analysis and visualization
- Working on data pre-processing and feature engineering
- Training model and optimization
So yes, during this time I will be behind my computer working my ass out. Despite popular belief, we, Data Scientists, do not memorize all functions or code snippets we used (at least for me). The ability to find information and solutions online is also part of a data scientist skill set. Different from what you saw in movies, coding is much more than just writing lines and lines of codes. I found myself referring to the documentation or StackOverflow as often as I write code. Even for the simplest tasks, I regularly do a simple search in StackOverflow to see if I can further optimize my code or simply write cleaner codes.
Furthermore, if you work in a team of data scientists, version control is also as important as writing codes. GitHub is built for this and you should use some form of version control if you have not already, even if you work alone. Overwriting your previous codes with no way of retrieving it is a mistake you don’t wish to make, especially if your new codes break something. This, I speak from experience.
Here is a light-hearted portrayal of the process. (https://www.youtube.com/watch?v=rR4n-0KYeKQ)
You would have heard that communication is one of the soft skills essential for a data scientist. I can’t emphasize enough how true that is. Presentation is one task a data scientist can’t avoid, no matter your seniority or the type of organization you work in. On a typical day, I could be attending one or more meetings that involve some form of presentation from me.
Why is presentation important? Cause most likely you are the only one who knows what you have done. Your users/customers do not know what you doing or what you did and your project managers might not have the specialty in your area. Presenting serves to update relevant stakeholders of the progress, helps to manage expectations based on your technical assessment, and converts your analysis into actionable business insights.
Working on a data science project is not about using the best algorithms or getting the best accuracy. The success of the project depends largely on the outcome derived from your work. Hence, no amount of analysis is helpful until you can convey your findings to the final users. This means that presenting and translating technical terms into layman’s terms are essential for the work of a data scientist.
As exhausting as it sounds, this is what I do regularly as a data scientist.
After that, Writing Reports/Documentation
Now reporting and documentation don’t technically fall in ‘A Typical Day’, but I thought I would mention here to provide a more complete picture. Every time I reach a significant time frame or when the project ends, I’m required to provide some form of reporting for the work done.
This might not be practiced in all organizations but I consider it a good practice. When reporting you can be as technical as you can get. The purpose of the report is to document what you did and as a reference for others if they wish to replicate your work. Data Science, as the name suggests, is part of the scientific community, and reproducibility and replicability are important to ensure the reliability of your work.
Do not underestimate the strength of documentation. As mundane as it sounds, it will greatly benefit others and even yourself, especially when you look back at your codes months later
Finally, Reading Research Articles
“A Typical Day’ would have ended by now. But that’s for a 9-5 work as a Data Scientist. For me, Data Science is life and a way of living. Hence, I would often browse through social media to keep myself updated on the newest tech in data science after working hours. If you follow the right people, your news feed will be constantly updated with the latest news in the field and you can gain a lot just from using social media.
The field of data science is and will change rapidly in the foreseeable future. Therefore, make it a habit to read about the changes and advancement in data science. 15 – 30 minutes a day is all it needs and this will definitely help in your career as a data scientist.
Do you see yourself becoming a Data Scientist? Its never too late to start. See ‘How to become a Data Scientist in 2020‘ for our step-by-step guide to become one now.