Phil Marius

Data Scientist, Data Engineer, Linux, and OSS Fan

09 May 2020

Six Months Into Data Science

Six months ago, I graduated from my master’s degree in data science and I’m surprised at how much I’ve learned since. With data science being such a young field, I thought I’d write about my experiences for those looking to get into the field or those already in it. This is an account of personal experiences so others may differ.

Data Science is a Team Sport

In the past six months I’ve had two jobs. Both companies have taught me that the beast that is real-world data is best tackled with a team of people. Higher education left me ill prepared for how unpredictable and disorganised data can be, and how different skillsets can be utilitised to make sense of its mess. Just a few of these include:

  • The client whisperer: interprets results and understands the client’s wishes best
  • The docker guru: to manage infrastructure and deploy models
  • The one who can read linear algebra and understand it: to find the latest research papers and translate them to the team
  • The “I’ve begun to dream in dataframe operations”: to build optimised data processing pipelines
  • The SQL genius: to quickly build queries to extract data
  • The one steering the ship: to ensure the team doesn’t get sidetracked with time consuming and unnecessary projects

With my background of a BSc in Computer Science and a MSc in Data Science, I currently spend a lot of my time doing data engineering tasks as that’s often where my strengths lie. In the past six months I have optimised data processing pipelines, packaged code previously run on notebooks, and managed streaming infrastructure on cloud services - amongst other engineering focussed jobs. Data science is often sold as “cool model building” and “implementing the cutting edge of machine learning” and I’ve found that this often isn’t the case. Not that I haven’t enjoyed my job, but it isn’t as “sexy” as Harvard Business Review might suggest. I have come to learn that it’s not what an individual can do in data science but instead it’s what a team can do together.

No One Knows What Data Science Really Is

I’m looking at you, mum.

Jokes aside, this argument is probably most notable amongst data scientists themselves with many disagreeing on what the role actually entails. The data scientist is supposedly “the sexiest job of the 21st century” but what do they actually do? I’ve yet to find out.

As I’ve mentioned above, I feel a team boasting a diverse range of skillsets is best suited for data science. I feel that there isn’t really “one skillset” that defines what data science actually is. My most common go-to is a combination of data analytics and software engineering, which still doesn’t really cover everything. The best explanation I’ve probably seen is Wikipedia’s own:

“Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data”

i.e. the literal science of data

I had an interesting conversation the other day with a colleague about data science. We have this “high fiving” scheme at work, where someone can “high five” someone else to show them appreciation each week for something they had done. I had “high fived” a teammate commending their data science skills and this sparked a debate where they claimed it was actually data engineering they had done, not data science. I argued that yes, it was data engineering but that is part of what data science is. Whereas they argued that data science is more model building, and data engineering is building the software to enable data science. I still don’t know who was right.

Remote Working is a Must for Future Jobs

(You’ve probably read this 1000 times already but hear me out)

As of May 2020, most of the world is in some form of lockdown with most unable to leave the house aside from essential journeys and for exercise. Most people are working from home permanently until the lockdown lifts, and many are experiencing remote working for the first time. However, this isn’t the first time I’ve worked remotely as I was fortunate enough to experience remote work in my first job as it was a totally remote company! Our so-called “hub” consisted of 3 of us in one coworking space, which happened to be the city I was living in. However, it wasn’t mandatory to go in, and often I worked from home - or was the only one in the office. As long as you started work in the morning, made standup at lunch time (we had colleagues in India and Argentina so had to find a time that worked for everyone) and worked until the end of the day, you could do that from anywhere. I even visited a friend doing his PhD in a different city one weekend, I travelled down Thursday and worked from his living room Friday.

What was quickly made clear to me is that there are very few necessary times for data scientists to actually be in the office. My data science workflow to date has been very similar to a software engineer’s, a discipline that has capabilities to be totally remote. My current company’s work has reinforced that idea. Even before lockdown, it was possible to do all our work from home and even from different cities. The company has an office in my home city, and I’ve often gone home to see family and friends and work from the office there too.

I find this flexibility improves my work ethic because of the trust a company has to give you to work from home autonomously. I find myself producing better work, and not being constantly distracted by people wandering by is an added bonus. Also, I relish the ability to travel to see friends and family, making my work and social life behave more congruously without negative effect to either. And so, much to my surprise, the ability to work remotely has made it very close to the top of my priority list for my work life.

Conclusion

I know I’m quite young for a data scientist and I’m right at the beginning of my career, but the data science field has really shown itself to be a totally different career path than my initial expectation. It’s a fledgling industry and I’m excited to see where it goes. It’s such a mix of disciplines and skills that there is always something more to learn, and having to stay on top of academic reading can be quite the challenge.