Image for post
Image for post
Photo by Scott Graham on Unsplash

If you are an aspiring data analyst or data scientist, you already know SQL is one of the basic skill sets you should have.

But if you are like me, you probably would have scoured the internet on becoming an expert in SQL and still cannot use this powerful tool efficiently at work.

Most resources on the internet would give you a boilerplate template to study SQL —

a) Attend some courses.

b) Do some problems.

But do you know why this approach fails in an actual work setting? …


4 Villains of decision making

Image for post
Image for post
Photo by Oliver Roos on Unsplash

Every day you make a multitude of decisions in your mind. Should I wear this dress or not? Do I like this person? Should I exercise today?

Most of these decisions you make in an instant, but how confident are you in making decisions which radically change your life?

Being an organized, checklist kind of person, I almost often resort to a pro’s and cons list. Most of the time, I don’t even know if I made the right choice, or if other options exist. I just simply sit back and assume that I made the right choice and treat myself to a bar of chocolate, hoping for the universe to perform its miracle. …


Four step process of dimensional modelling

Image for post
Image for post
Photo Courtesy : Aditya Chinchure (Unsplash)

Throughout my career as a data analyst, I have seen companies asking for skills such as SQL, Stats ,Python in the job description. Nobody mentions anything about dimensional modelling and I am not quite sure why.

Of the many projects that I built, one of the fundamental things that I had to before building any analytics dashboard is to figure out the dimensional model in the downstream system. Of course, not every project requires this- but this is one of the key skills that can set you apart as a data analyst.

The data warehouse toolkit by Ralph Kimball is one of the best books that I have read in this area. In this article, I give a quick overview of the basics of dimensional modelling in any analytics project. …


An article to present SQL joins in a different ,fundamental and basic manner

Image for post
Image for post
Photo by Franki Chamaki on Unsplash

Most of you are already familiar with the concept of Joins. It is a simple concept of joining two input tables based on a key. In this article, however, I hope to explain the concept of joins in a slightly different way. I am hoping this would solidify the understanding of joins in a much more concrete way ,allowing it be used in ETL or SQL queries more efficiently.

So let’s dive in:

What is a Join?

  • A join is essentially a table operator which operates on two tables.
  • There are three fundamental types of Joins : emphasis on the word fundamental because most of you are familiar with left join, inner join, right join & self join. …

Comprehensive overview on beginner and advanced sql concepts including basic sql queries, window functions, subqueries, cte’s etc

Image for post
Image for post
Image courtesy from unsplash(Frank Chamaki)

SQL is an integral part of Data Science and analytics. However, many people just know to right basic queries and do not really focus on the basics. This ultimately results in poorly written queries and performance issues in analytics.

Recently, I have been reading a lot of books on SQL and database management and wanted to compile the knowledge into a series of articles.

I plan to organise the content of this series in the below format

  1. Introduction to SQL
  2. Basic SQL Queries
  3. Joins and Subqueries
  4. Common Table Expressions
  5. Window Functions
  6. Programmatic SQL

What is SQL?

  • SQL stands for Structured Query Language. …

An overview on generating fake data to play with for data engineering projects

Kinesis Data Generator is a tool which can be used to generate mock data and send to Kinesis Firehose or Streams. This can be an easy way to test your pipeline with streaming data, if you do not have enough data to play with.

In this article, I am using Kinesis Data Generator to send mock Stack Overflow data mimicking the original json structure which I streamed using Stackapi.

Step 1: Go to this link https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html & create an Amazon Cognito user and download the CloudFormation template

Image for post
Image for post

Step 2: Configure the stack- Choose Template is ready option.


Can small lifestyle changes really bring happiness?

I used to be the person who wanted results fast. If there was a commercial offering me a pill to magically eradicate my problems, I would gladly pay every penny. But unfortunately, life isn’t that easy and sometimes you just have to figure out your own happiness.

I wouldn’t say that my life is all bad. I have a decent job, a good environment, really amazing friends and family.

However, I never really felt contentment or true happiness. I am not sure I really understood, what happiness meant to me. I used to immerse myself in my work and just pretend to be busy all the time. For so long, I thought,If I could just get that perfect job, then I would be truly happy. However, I am starting to realise that work is just a part of your life and in order to be truly happy , you really need to figure out different aspects of your life and make consistent , tiny changes to it. …


An overview of designing & building a technical architecture for an analytics problem.

Image for post
Image for post
https://memegenerator.net

This article is a continuation of the previous post and will outline how to transform our user requirements into a technical design and architecture.

Let’s summarise our two major requirements:


Overview on how to ingest stack overflow data using Kinesis Firehose and Boto3 and store in S3

Image for post
Image for post
https://www.tvbeurope.com/resources

This article is a part of the series and continuation of the previous post.

Why using Streaming data ingestion?

Traditional enterprises follow a methodology of batch processing where you gather the data, load it periodically into a database, and analyse it hours, days, or weeks later.

However, due to the numerous data sources that continuously generates streams of data, it has become imperative for most of the business to process and analyse massive scale of data within a latency of milliseconds.

Apache Kafka and Amazon Kinesis are two of the more widely adopted messaging queue systems.

Two Main Services Offered by Amazon


Image for post
Image for post
https://bq-magazine.com/the-7-habits-of-good-data-scientists/

If you are a computer programmer or working in any tech related industry, then chances are that, at least once a day google for answers in Stack Overflow.

Stack Overflow is a question and answer site for professional and enthusiast programmers.The website offers a platform for users to ask and answer questions, and through active participation to vote questions and answers up or down.

This series is aimed at providing a comprehensive view on building ,designing and developing an analytics\AI data pipeline for stack overflow using the AWS stack and finally build a dashboard in Einstein Analytics.

Pipelines are the heart of analytics and ML and quite often this is the hardest part of an analytics or ML problem. If you have a well designed pipeline, then half your battle is over. …

About

Sneha Mehrin

Data + Design + User Psychology

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store