How to Generate mock streaming data using Kinesis Data Generator.

An overview on generating fake data to play with for data engineering projects

Sneha Mehrin
3 min readAug 18, 2020

Kinesis Data Generator is a tool which can be used to generate mock data and send to Kinesis Firehose or Streams. This can be an easy way to test your pipeline with streaming data, if you do not have enough data to play with.

In this article, I am using Kinesis Data Generator to send mock Stack Overflow data mimicking the original json structure which I streamed using Stackapi.

Step 1: Go to this link https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html & create an Amazon Cognito user and download the CloudFormation template

Step 2: Configure the stack- Choose Template is ready option.

Step 3: Upload the json file that you downloaded in Step 2

Step 4: Specify user name and password.

Step 6: Leave Default Options & Create Stack

Step 7:Login to Kinesis data generator using the below url and sign in with your credentials

https://awslabs.github.io/amazon-kinesis-data-generator/web

Step 8:Generate Streams using the template provided by Kinesis Data Generator

You will need to select the region in which the firehose was created

Choose the number of Records per Second to send

Create a template similar to the stack overflow actual data

Sample of the json stack overflow data

Below is a sample I used

{
“questionid”: {{random.number(100000)}},
“view_count”: {{random.number(
{
“min”:0,
“max”:1000
}
)}},
“is_answered”: “{{random.arrayElement(
[“True”,”False”]
)}}”,
“answer_count”: {{random.number(
{
“min”:0,
“max”:20
}
)}},
“score”: {{random.number(
{
“min”:0,
“max”:50
}
)}},
“creation_date”: {{random.arrayElement(
[1546300800]
)}}

}

Here, I want the creation_date to be constant and only take the current date.

Step 10 : Send data to Kinesis Firehose

The firehose- streams will be created in your s3 bucket in the format you specified when you created the delivery stream.

--

--