Author: Ali Arsalan Yaqoob
On your journey as a data scientist, being able to develop an effective learning strategy is very important as it will determine your trajectory and career outcome. Even if you are starting a structured graduate program where you are bound to learn the skills you need, it is good to think of the coursework like additional resources towards achieving your own goals.
When structuring your learning within data science, the naive approach would be:
- Identifying what core skill-set makes up a data scientist.
- Identifying where you are, proficiency-wise, in those areas.
- Building up your skills in areas you are weak in.
Being able to take into account what type of role you will be playing in a team as a data scientist will determine the skill-sets you need. A Data Science and Analytics company, Mango Solutions, uses the following six competencies to assess a data science team for their "Data Science Radar" tool. While knowing about these competencies can help a team to identify their needs and where they stand as a whole, it can also be useful for a self-assessment of skills.
Based on the above approach, you could keep the competencies in mind and see where you lie in terms of proficiency in each of those areas. Then you could systematically, one by one, try to improve your skills in each of those areas until you master them. One of the biggest problems with this approach is knowing when to stop or knowing when you know enough. It is nice to know about a specific topic, from start to finish. However, you only need to know 20% about any topic in order to do 80% of what you need (See the Pareto Principle).
Another problem, with this approach is that, when you are developing your skills in one area, you are not connecting the dots between all of the other areas. Being able to see how all of these skill-sets connect and complement each other is important in being able to understand the data science workflow. This type of intuition can only be acquired through working on projects and having a very directed approach of learning.
In your career as a data scientist, you will see that the process of working on any data science project is iterative and you have to constantly go back and reevaluate something or improve on some part of the process. The same iterative mentality can be applied to developing an effective learning strategy. However, to apply this mentality to your learning you will need to be comfortable with the knowledge that you will not know everything about the tools you are using to solve a specific problem. You should be comfortable with learning just enough to be able to solve a problem and not make grueling mistakes or unclear assumptions. This approach is more effective as it teaches you how to identify what you need to learn to carry out a task and connect the dots between different parts of the project.
Imagine you are blindfolded and dropped off in an unknown forest. What is the first thing you do? You look around and try to identify where you are. You explore. You try to get a lay of the land. You identify what you need to do in order to survive; your goal of survival is linked to you getting food water and shelter. Getting food water and shelter is linked to you acquiring the skills you need to be able to get those things. Acquiring those skills is linked to actually working on getting those three things and working with what you know already. Everything is linked to having a clear and defined goal - survival. Having an end goal in your learning process will bring a lot of clarity in terms both knowing when to stop and knowing what you need to know. It will also allow you to see the learning as a whole rather than bits and pieces that are supposed to fit together.
However, having a clear and defined goal is difficult when you are just starting your journey as a data scientist. There is a lot of noise regarding what you could be doing and what you should be doing and everyone has an opinion (just like this article). But it is important to realize that the best way to cut through the noise and get right to the signal is to know where the signal is coming from. There are two ways to narrow your scope of finding an end objective :
- Identifying an industry of interest. You should ask yourself the following questions: What type of industry are you most inclined towards? What type of problems are usually worked on in this industry? Do those problems and the solutions to those problems sound interesting to you? Can you reach out to data scientists in that specific industry and ask them about what drives them to work on the problems they work on?
- Identifying the type of data or problems you wish to work on : Do you find the idea of working with image data and computer vision interesting or do you wish to work with something different like audio or music? Do you find the types of problems that lie within natural language processing to be interesting? Do you wish to work with prediction tasks or inference tasks? Or do you wish to be able to tell a good story through beautiful data visualizations?
Asking yourself these questions will direct your research and help you gain a clearer picture about what you are interested in, and in turn help structure your learning. The more you work towards a specific goal in mind, the better you will be able to identify what to learn and how to stop when you have learned enough. It would be also be very beneficial to read papers about topics you are interested in and be aware of the new developments in your field of interest. Connectedpapers.com is a great resource for you to use to identify papers that are interesting and similar to your scope of interest. It will also help you see what technologies are being used, and the type of skills you will need to know to gain a better conceptual understanding.
Lastly, being able to connect the dots between different parts of what you are learning, can only be done by applying what you learn. This can be done through working on projects that you find interesting and going through the process from start to finish. A good way to work on a project can by re-implementing a research project on your own and then writing about it in an explanatory blogpost that helps someone else understand the project better. A good resource for going through code in research projects is paperswithcode.com. This contains papers with the datasets they used and code published online.