Skip to main content

Statistics In SQL

If you know nothing about Statistics, you've come to the right place. I'm not going to bore you with long technical definitions that make no sense to people who don't know anything about it. I'm going to show everything with examples and visuals. You just sit back and relax.

I have a Customers table. When I search for 'Abigail Parks' I get 5 records.
And I have 3 indexes for this table. First is the clustered index, which is created automatically for the primary key column. The second and third ones are non-clustered indexes, one for the NameSurname column and the other one is for the Birthdate column. Now I'm wondering which index SQL used for my query. Was it able to find NameSurname index? When I click on Display Estimated Execution Plan button I get this:
SQL was able to find the right index. In my next query, I try to find how many customers were born on the same day with the first 'Abigail Parks' on my list, which is 1973-06-05.
I have 512 people who were born that day. Now I check again if SQL was able to find the right index.

It could find the right Index again. What if I say: Select * From Customers Where NameSurname = 'Abigail Parks' and Birthdate = '1973-06-05'? Which Index is going to choose? Smart choice would be choosing the NameSurname Index since it has fewer rows. Let's see which one it's going to choose.

How does SQL know which index has fewer rows? It knows it thanks to the statistics. Every index in SQL has its own statistics recorded under the Indexes section. When I click on the properties of IX_NameSurname Statistics, I can see the histogram of the NameSurname column. 


To optimize our query performance, we need to keep our statistics up to date. Evey time we rebuild or reorganize our indexes, these statistics get updated automatically. Now I want to confuse SQL and see what happens. I'm going to add 1000 more 'Abigail Parks' to my table with stored procedure and see which index SQL is going to choose. 

After adding 1000 more 'Abigail Parks' I asked the same question again.

It was a wrong choice this time. The reason for that, those statistics are not updated. When I checked the statistics, I saw that 'Abigail Parks' was still recorded as 5 rows. So what's the price that we pay for not updated statistics? The good thing is we can choose which index to use at our query and by using SET STATISTICS IO ON formula we can see how many pages it has to read for every index.
This is the price that we pay. If my statistics were updated, SQL was going to read 107 pages for that query. But it wasn't updated and it used NameSurname column and had to read 2088 pages. If it was updated, my query would work 20 times faster. 

What to do to force SQL to use the right index? Using the WITH condition every time wouldn't be effective. The only way to do it effectively is by updating statistics. We have 4 different ways to do that. We can rebuild or reorganize our indexes. But it might not be possible to do it every day, because when our system is working, rebuilding or reorganizing indexes slows our system. Luckily, we can update our statistics without touching our indexes by using SP_UPDATESTATS. This command updates every statistic in our server. Or we can choose which table's statistics to update by using the UPDATE command: UPDATE STATISTICS 'TableName'

To optimize our queries we need to update our statistics regularly. But doing it every time by writing a query might not be the best option. In my opinion, the best option is to add it to our database maintenance plan, which is a subject for another post.

Comments

Popular posts from this blog

Manipulating Grand Totals with ISINSCOPE in Power BI

ISINSCOPE function was introduced in with November 2018 Power BI update. We can say it's a very young player in the game, but it's definitely magic. It can be used to manage lots of issues related to hierarchies, but in this post, I'm going to focus on manipulating the grand total line with it.  This is my budget forecast table. The budget forecast that you see for every month is not the forecast for every month, but it's the forecast for the whole year. The budget forecast for 2020 has changed every month. That's why we have different numbers for every row. Below you see the table visual with the year-month column and a measure that shows the values for every month. The thing that I want you to pay attention to is the total line. I wanted to see the last month's value for the total line and this is the measure for that.  This measure calculates this: If the year-month column has one value then take the average of that value. If it doesn't have one value, li...

How to Add Flags to Power BI Reports

Adding flags to Power BI Reports might be easier said than done. If you have a dataset which contains data of various countries, adding country flags looks cool. The process to do it, on the other hand, might be compelling. Now I am going to share with you the obstacles I  have faced during the process and of course, how I overcame them. First I made an excel list of countries that my dataset has. Then I have downloaded to my computer all the flags I need from Wikipedia and converted them to Base64 link via an image encoder website  base64-image.de  and added those links to my excel file, which looks like this: After I added my excel file to Power BI, I converted the data category of URL column from 'uncategorized' to 'image URL', which might be a small step for you but it's a big step for the success of the flag operation. Then I created one to many relationship with DimLocation table and chose 'both' as filter direction, so that I can filter from both t...

Customizing Themes With Advanced Elements in Power BI

Advanced Elements is not an intuitive description, that's why I'm going to show you what I'm talking about directly. For my recent project, I've decided to create my own theme for the report. When I was exploring Customize Theme tab, I have come up with the Advanced section. While text, visuals, page, and filter pane sections are self-explanatory, this advanced section seemed quite enigmatic to me. What are the first-level elements? Or the others? I think Microsoft agrees with me upon being enigmatic because the section comes with "Learn more" option: Click to  Learn More . But when I tried to learn more, it didn't make any sense to me, because I'm a visual person and those descriptions are quite hard to remember. That's why I have decided to be a good samaritan and visualize it, and then share it with you.  As you see, while the border is a third-level element, the shadow is a first-level element. Legend, x-axis, and y-axis are second-level elemen...