This is the first time I've created fake data using SQL, and some steps were quite challenging. That's why I've decided to write a post about it. The mission is to create a table with fake data, using random values. Below is the empty Customers table that I need to fill. Of course, other than the ID column, which is the primary key.
First of all, I need names with gender info and surnames to fill the table. But don't get me wrong, I'm not going to insert those names directly to this table, because I need names and surnames to match randomly. For that reason, I've downloaded a name list with genders and a surname list from www.randomlists.com. I also needed a list of USA states. I created one more table for that. This is how those tables look.
In Names table I have 300 female names and 300 male names. In Surnames Table I have 600 surnames. You might be wondering why I put them to different tables. They have to be at different tables because if they are at the same table, meaning there is one row for one name and surname, I can't match them randomly with each other. For the purpose of giving them different ID's, they are stored at different tables. States table has one row for every state. These tables are my row materials, and below is the loop to fill that table.
This might look very confusing but don't get discouraged. Like my math teacher said: 'There are no difficult things. There are many small easy things that came together.' What a cool guy right? All we need to do is deconstructing it step by step.
Declare part is where I declare the variables that I need. We can declare them inside the loop too, but if it's outside, SQL doesn't have to calculate it everytime and operates faster. The first variable @I is to tell SQL how many times it should repeat adding the data. In line 12, While @I < 1000 means repeat this process 1000 times because I need my table to have 1000 records. In line 15 and 16, I've used RAND() function to get a random number between 0 and 1 and multiplied it by 600 because I have 600 rows in the Names and Surnames table. To get rid of the part after comma, I've used the ROUND function. That way, I can get a different number between 1 and 600 every time, to match the ID of the table. In line 17, I multiplied RAND() with 50 because I have 50 rows in my State table.
In line 17, I create random birthdates. I choose '1950-01-01' as B.C. From that date to 2004.01.01 there are 19.710 days. If I add a number between 0-19.710 to my B.C., it means my youngest customer will be 18 years old like I needed. Line 18 is to subtract birthdate from today to find the age.
In line 19, to create a fake social security number, I get a random number between 0 and 1 and multiply it with 999999999 so that I can get 9 digit SSN. The problem is, that calculation can create 7 or 8 digit numbers too. If we multiply 0.01 with 999999999 we get 7 digit number. That's why by using if condition, I add 8 and 9 digit numbers to the output, so that they can be 9 digits in total.
In line 24, I need 11 digits numbers starting with 1. To get that first I multiply 9... with 9... which gives me a number starting with 1, if its 10 digits then I add it up with 10000000000 so that I can have an 11 digit number starting with 1. Line 28 is the part where I insert the random results to the table I created beforehand and line 31 is setting the incremental value for @I variable.
That's how I did it. During the process I got error because I declared @PHONE as int, instead of bigint which obviously was not able to show 11 digit numbers. I made that datatype change in my empty table as well. This loop is to create 1000 rows. By changing the 'WHILE' value you can choose the number of rows that you'd like to create.
Comments