Supercharging Your Data Analysis Skills: My Journey with R Programming and Literature Reviews

Sneha Mehrin
6 min readJul 16, 2023

--

For many data analysts, the journey is often limited to creating dashboards and fulfilling ad-hoc requests. This was my case too for so many years. My consulting days revolved around listening to user needs and building analytical solutions that facilitated decision-making. However, transitioning to a startup presented a unique set of challenges: disoriented pipelines, data quality issues, the constant influx of data requests, and the task of self-serve-enabled analysis.

Despite these challenges, I stuck around, drawn by immense opportunities, invigorating culture, and wonderful people. In late 2022 and early half of 2023, I embarked on a self-learning journey that fundamentally changed my approach to data analysis.

After trying numerous courses in Udemy, Coursera, and Skillshare, I finally found two transformative approaches that elevated my data analysis skills to a whole new level: learning R programming and conducting literature reviews.

Why R?

What lured me into the world of R was the “Tidy Tuesday” screencasts by David Robinson. David Robinson is a legend and has earned a significant reputation for his expertise in R and data analysis. The way he could take an unknown dataset and derive insights in a structured manner in under an hour really inspired me to take the plunge into R. Watching his screencasts made me realize how easy it is to do data manipulations using Tidyverse and create unique visualizations using ggplot.

So, I decided to give R a try.

Literature Review

The second thing that really scaled my analysis was Literature Review. I am the sort of person who finds it very difficult to create things out of nothing. This stressed me out initially during my start-up days because the questions were so vague and there was no clear direction. My VP suggested getting inspiration from research papers and this proved to be a game changer.

A tool I found extremely useful for this was Elicit

However, instead of limiting myself to complex deep-learning papers, I focused on those related to the problems I was solving at work.

For instance, if I was investigating factors influencing app retention, I’d search for relevant research papers.

Research papers gave me so many ideas on the different techniques that I could use for my own data analysis.

What did I learn in R?

My learning journey in R was guided primarily by three resources: David Robinson’s screencasts, Matt Dancho’s Business Science with R course, and Julia Silge’s Tidy Tuesday.

David Robinson’s Screencasts

David’s mastery of handling unfamiliar data and delivering a comprehensive analysis within an hour is truly inspiring. To harness his techniques, I used a four-step process:

  1. Watch the screencasts.
  2. Try to do the analysis on my own.
  3. Research on R syntax or statistical techniques whenever I got stuck.
  4. Log all stumbling blocks.

While initially, it took me longer due to my unfamiliarity with R and the techniques used, I soon found a pattern in his techniques that made subsequent sessions much smoother.

Business Science with R by Matt Dancho

This course, albeit pricey, greatly boosted my productivity and understanding of R. I started this course once I had a brief familiarity with R after watching David Robinson’s screencast. In my opinion, this course is highly valuable if you already have a job and want to immediately apply the learnings to supercharge your productivity

Matt’s teachings focus on automating R workflow and providing faster insights by adopting tools like auto ml, script generation, and tidy evaluation. It won’t go into the specifics of ML or statistical techniques, but it is uniquely useful if you have some idea of stats and want to strengthen your understanding of R and speed up your analysis process.

Tidy Tuesday by Julia Silge

Julia’s screencasts, focusing more on tidymodels and ML aspects, provided an excellent next step once I mastered the basics. With her blog and youtube, I was able to apply different ML techniques to my work to derive faster insights.

My plan focussed around getting familiar with R and statistical techniques using David’s screencast, learn techniques to scale analysis process using Matt’s Course and dive into ML using Julia Silge’s screencast

My Upskilling Plan

With the blessings of my manager, I set aside three hours each day for focused study and started applying what I learned at work. I broke down my learning into several phases over six months:

  1. First Two Months: Tidy Tuesday by David Robinson.

Initially, I started out with random screencasts in no particular order. My goal here was to get familiar with R, learn to use statistical techniques to derive insights, and develop a structured framework around this process.

I started by working through five screencasts. My methodology was as follows: I would spend an hour watching a screencast, then attempt to reproduce the same on my own. The remaining time was allocated to understanding the nuances of R syntax and learning about the statistical techniques employed in the screencast.

For instance, if a screencast used a log odds ratio to examine health risk patterns across different age groups, I would delve deeper into the concept of log odds. Thankfully, with tools like ChatGPT, the learning process has become significantly easier — it’s like having complex ideas explained to you in the simplest way possible.

After completing the 5 screencasts, I started to dive deeper into specific topics. My aim was to get better at hypothesis testing, linear regression, and logistic regression.

I found a really helpful website that has all of David Robinson’s videos, which I could search using keywords like “hypothesis testing”.

Sticking to the pattern of five, I would choose a subject like hypothesis testing, study five related videos, and look up any syntax or methods that confused me. Typically, each subject took me around 15 hours to fully grasp.

The game-changer for me was immediately applying these techniques to my day-to-day work. Seeking input from my managers and colleagues in data science led to them pointing out areas for improvement in my work, which, in turn, significantly enhanced my learning journey. This approach meant I was not only learning but also putting the new concepts into practice in real-time.

2. Next Two Months: Business Science with R by Matt Dancho.

Having a solid understanding of the basics and noticing some early successes at work, I was eager to organize and expand my analysis. I turned to Matt Dancho’s “Business in Data Science & 2” course to dig deeper and learn handy techniques like auto ml and tidy evaluation for better code reusability. I applied the auto ml techniques directly to a research paper that I was working on and saw great results. I highly recommend his course, if you have the money to spare and have some background in data analysis, and really want to go to the next level.

I dedicated about three hours daily to his course; the second part was a bit more challenging due to its machine learning focus, but the learning experience was invaluable.

3. Final Two Months: Tidy Tuesday by Julia Silge

By this stage, I was quite proficient with R and had managed to automate my scripts, applying them to my daily tasks. The joy derived from this achievement was immense. In such a short span, boosting my analysis skill and reducing the time spent on analysis was more than I had hoped for. I was primed for the next phase.

My next move was to incorporate more machine-learning techniques into my repertoire. For this, I turned to Julia Silge. She is an accomplished data scientist with a particular focus on machine learning, especially tidy models. Her teachings helped expand my toolkit further.

Key Learnings

It took me around half a year to become proficient with R and its statistical techniques, but the benefits were enormous. Here are some key takeaways from my journey:

  1. Begin Right Away: Usually, I delay tasks, seeking the perfect conditions or resources. However, the most critical step is to take the plunge. You can begin with anything, be it a blog post, an article, or any available material. If you’re curious about the power of data analysis, I highly recommend David Robinson’s screencasts.
  2. Maintain Regularity: Allocate at least two hours each day, ideally during your peak productivity hours. You could negotiate with your manager to include this in your sprint goals. Regardless of how you manage, consistency is essential.
  3. Consistency Is Key: It’s crucial to have a project where you can apply what you’re learning. My manager was quite supportive, but this practice is paramount if you wish to improve your skills. If you don’t apply what you’re learning regularly, you’ll soon lose touch.
  4. Share Your Work And Invite Feedback: Even if you think your work isn’t impressive, don’t hesitate to share it. People may question your data or methods, but this process will help you improve and think critically. Overcoming the fear of showcasing your work is a crucial step.

--

--