If you are an aspiring data analyst or data scientist, you already know SQL is one of the basic skill sets you should have.
But if you are like me, you probably would have scoured the internet on becoming an expert in SQL and still cannot use this powerful tool efficiently at work.
Most resources on the internet would give you a boilerplate template to study SQL —
a) Attend some courses.
b) Do some problems.
But do you know why this approach fails in an actual work setting? …
Every day you make a multitude of decisions in your mind. Should I wear this dress or not? Do I like this person? Should I exercise today?
Most of these decisions you make in an instant, but how confident are you in making decisions which radically change your life?
Being an organized, checklist kind of person, I almost often resort to a pro’s and cons list. Most of the time, I don’t even know if I made the right choice, or if other options exist. I just simply sit back and assume that I made the right choice and treat myself to a bar of chocolate, hoping for the universe to perform its miracle. …
Throughout my career as a data analyst, I have seen companies asking for skills such as SQL, Stats ,Python in the job description. Nobody mentions anything about dimensional modelling and I am not quite sure why.
Of the many projects that I built, one of the fundamental things that I had to before building any analytics dashboard is to figure out the dimensional model in the downstream system. Of course, not every project requires this- but this is one of the key skills that can set you apart as a data analyst.
The data warehouse toolkit by Ralph Kimball is one of the best books that I have read in this area. In this article, I give a quick overview of the basics of dimensional modelling in any analytics project. …
Most of you are already familiar with the concept of Joins. It is a simple concept of joining two input tables based on a key. In this article, however, I hope to explain the concept of joins in a slightly different way. I am hoping this would solidify the understanding of joins in a much more concrete way ,allowing it be used in ETL or SQL queries more efficiently.
So let’s dive in:
SQL is an integral part of Data Science and analytics. However, many people just know to right basic queries and do not really focus on the basics. This ultimately results in poorly written queries and performance issues in analytics.
Recently, I have been reading a lot of books on SQL and database management and wanted to compile the knowledge into a series of articles.
I plan to organise the content of this series in the below format
What is SQL?
Kinesis Data Generator is a tool which can be used to generate mock data and send to Kinesis Firehose or Streams. This can be an easy way to test your pipeline with streaming data, if you do not have enough data to play with.
In this article, I am using Kinesis Data Generator to send mock Stack Overflow data mimicking the original json structure which I streamed using Stackapi.
Step 1: Go to this link https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html & create an Amazon Cognito user and download the CloudFormation template
Step 2: Configure the stack- Choose Template is ready option.
Can small lifestyle changes really bring happiness?
I used to be the person who wanted results fast. If there was a commercial offering me a pill to magically eradicate my problems, I would gladly pay every penny. But unfortunately, life isn’t that easy and sometimes you just have to figure out your own happiness.
I wouldn’t say that my life is all bad. I have a decent job, a good environment, really amazing friends and family.
However, I never really felt contentment or true happiness. I am not sure I really understood, what happiness meant to me. I used to immerse myself in my work and just pretend to be busy all the time. For so long, I thought,If I could just get that perfect job, then I would be truly happy. However, I am starting to realise that work is just a part of your life and in order to be truly happy , you really need to figure out different aspects of your life and make consistent , tiny changes to it. …
This article is a continuation of the previous post and will outline how to transform our user requirements into a technical design and architecture.
Let’s summarise our two major requirements:
This article is a part of the series and continuation of the previous post.
Why using Streaming data ingestion?
Traditional enterprises follow a methodology of batch processing where you gather the data, load it periodically into a database, and analyse it hours, days, or weeks later.
However, due to the numerous data sources that continuously generates streams of data, it has become imperative for most of the business to process and analyse massive scale of data within a latency of milliseconds.
Apache Kafka and Amazon Kinesis are two of the more widely adopted messaging queue systems.
If you are a computer programmer or working in any tech related industry, then chances are that, at least once a day google for answers in Stack Overflow.
Stack Overflow is a question and answer site for professional and enthusiast programmers.The website offers a platform for users to ask and answer questions, and through active participation to vote questions and answers up or down.
This series is aimed at providing a comprehensive view on building ,designing and developing an analytics\AI data pipeline for stack overflow using the AWS stack and finally build a dashboard in Einstein Analytics.
Pipelines are the heart of analytics and ML and quite often this is the hardest part of an analytics or ML problem. If you have a well designed pipeline, then half your battle is over. …
About