Skip to main content

Creating Fake Data In SQL

This is the first time I've created fake data using SQL, and some steps were quite challenging. That's why I've decided to write a post about it. The mission is to create a table with fake data, using random values. Below is the empty Customers table that I need to fill. Of course, other than the ID column, which is the primary key.


First of all, I need names with gender info and surnames to fill the table. But don't get me wrong, I'm not going to insert those names directly to this table, because I need names and surnames to match randomly. For that reason, I've downloaded a name list with genders and a surname list from www.randomlists.com. I also needed a list of USA states. I created one more table for that. This is how those tables look.

In Names table I have 300 female names and 300 male names. In Surnames Table I have 600 surnames. You might be wondering why I put them to different tables. They have to be at different tables because if they are at the same table, meaning there is one row for one name and surname, I can't match them randomly with each other. For the purpose of giving them different ID's, they are stored at different tables. States table has one row for every state. These tables are my row materials, and below is the loop to fill that table.


This might look very confusing but don't get discouraged. Like my math teacher said: 'There are no difficult things. There are many small easy things that came together.' What a cool guy right? All we need to do is deconstructing it step by step.

Declare part is where I declare the variables that I need. We can declare them inside the loop too, but if it's outside, SQL doesn't have to calculate it everytime and operates faster. The first variable @I is to tell SQL how many times it should repeat adding the data. In line 12, While @I < 1000 means repeat this process 1000 times because I need my table to have 1000 records. In line 15 and 16, I've used RAND() function to get a random number between 0 and 1 and multiplied it by 600 because I have 600 rows in the Names and Surnames table. To get rid of the part after comma, I've used the ROUND function. That way, I can get a different number between 1 and 600 every time, to match the ID of the table. In line 17, I multiplied RAND() with 50 because I have 50 rows in my State table.

In line 17, I create random birthdates. I choose '1950-01-01' as B.C. From that date to 2004.01.01 there are 19.710 days. If I add a number between 0-19.710 to my B.C., it means my youngest customer will be 18 years old like I needed. Line 18 is to subtract birthdate from today to find the age.

In line 19, to create a fake social security number, I get a random number between 0 and 1 and multiply it with 999999999 so that I can get 9 digit SSN. The problem is, that calculation can create 7 or 8 digit numbers too. If we multiply 0.01 with 999999999 we get 7 digit number. That's why by using if condition, I add 8 and 9 digit numbers to the output, so that they can be 9 digits in total.

In line 24, I need 11 digits numbers starting with 1. To get that first I multiply 9... with 9... which gives me a number starting with 1, if its 10 digits then I add it up with 10000000000 so that I can have an 11 digit number starting with 1. Line 28 is the part where I insert the random results to the table I created beforehand and line 31 is setting the incremental value for @I variable.

That's how I did it. During the process I got error because I declared @PHONE as int, instead of bigint which obviously was not able to show 11 digit numbers. I made that datatype change in my empty table as well. This loop is to create 1000 rows. By changing the 'WHILE' value you can choose the number of rows that you'd like to create.

Comments

Popular posts from this blog

Manipulating Grand Totals with ISINSCOPE in Power BI

ISINSCOPE function was introduced in with November 2018 Power BI update. We can say it's a very young player in the game, but it's definitely magic. It can be used to manage lots of issues related to hierarchies, but in this post, I'm going to focus on manipulating the grand total line with it.  This is my budget forecast table. The budget forecast that you see for every month is not the forecast for every month, but it's the forecast for the whole year. The budget forecast for 2020 has changed every month. That's why we have different numbers for every row. Below you see the table visual with the year-month column and a measure that shows the values for every month. The thing that I want you to pay attention to is the total line. I wanted to see the last month's value for the total line and this is the measure for that.  This measure calculates this: If the year-month column has one value then take the average of that value. If it doesn't have one value, li...

How to Add Flags to Power BI Reports

Adding flags to Power BI Reports might be easier said than done. If you have a dataset which contains data of various countries, adding country flags looks cool. The process to do it, on the other hand, might be compelling. Now I am going to share with you the obstacles I  have faced during the process and of course, how I overcame them. First I made an excel list of countries that my dataset has. Then I have downloaded to my computer all the flags I need from Wikipedia and converted them to Base64 link via an image encoder website  base64-image.de  and added those links to my excel file, which looks like this: After I added my excel file to Power BI, I converted the data category of URL column from 'uncategorized' to 'image URL', which might be a small step for you but it's a big step for the success of the flag operation. Then I created one to many relationship with DimLocation table and chose 'both' as filter direction, so that I can filter from both t...

Customizing Themes With Advanced Elements in Power BI

Advanced Elements is not an intuitive description, that's why I'm going to show you what I'm talking about directly. For my recent project, I've decided to create my own theme for the report. When I was exploring Customize Theme tab, I have come up with the Advanced section. While text, visuals, page, and filter pane sections are self-explanatory, this advanced section seemed quite enigmatic to me. What are the first-level elements? Or the others? I think Microsoft agrees with me upon being enigmatic because the section comes with "Learn more" option: Click to  Learn More . But when I tried to learn more, it didn't make any sense to me, because I'm a visual person and those descriptions are quite hard to remember. That's why I have decided to be a good samaritan and visualize it, and then share it with you.  As you see, while the border is a third-level element, the shadow is a first-level element. Legend, x-axis, and y-axis are second-level elemen...