How to Generate mock streaming data using Kinesis Data Generator.
An overview on generating fake data to play with for data engineering projects
Kinesis Data Generator is a tool which can be used to generate mock data and send to Kinesis Firehose or Streams. This can be an easy way to test your pipeline with streaming data, if you do not have enough data to play with.
In this article, I am using Kinesis Data Generator to send mock Stack Overflow data mimicking the original json structure which I streamed using Stackapi.
Step 1: Go to this link https://awslabs.github.io/amazon-kinesis-data-generator/web/help.html & create an Amazon Cognito user and download the CloudFormation template
Step 2: Configure the stack- Choose Template is ready option.
Step 3: Upload the json file that you downloaded in Step 2
Step 4: Specify user name and password.
Step 6: Leave Default Options & Create Stack
Step 7:Login to Kinesis data generator using the below url and sign in with your credentials
Step 8:Generate Streams using the template provided by Kinesis Data Generator
You will need to select the region in which the firehose was created
Choose the number of Records per Second to send
Create a template similar to the stack overflow actual data
Below is a sample I used
Here, I want the creation_date to be constant and only take the current date.
Step 10 : Send data to Kinesis Firehose
The firehose- streams will be created in your s3 bucket in the format you specified when you created the delivery stream.