This is my 2nd post in a series called “8 Weeks of Indexes”.
Merriam-Webster defines index as:
a list … arranged usually in alphabetical order of some specified datum.
One of the most common real-world examples of an Index is your telephone book. The book stores information (Name, address and phone number) sorted alphabetically by last name. The pages are written in such a way that A comes before B and B comes before C, etc. If one knows their alphabet, then any name can be easily looked up. Typically, the first “key” to finding a name is at the top of the page, which tells you what section of the book you are in. If you were to locate my entry in the phonebook, you would quickly scan through the key until you found the letter B at the top of the page. Then you would continue to scan until you find the group of entries for BISHOP. And of course, then locate which entry matched my name: BISHOP, ROBERT. If there were no key at the top of the page, you would have to seek through all the pages one at a time until you got to the B section. Another excellent real-world example of an index system is the Dewey Decimal System. Libraries have been indexing their books with a numbering system for years.
So, how does this all relate to SQL Server? There are several bold print words above that translate to SQL Server terms and how SQL works the same way as a phone book. To fully understand how SQL Indexes work one really needs to know how SQL stores data. We know SQL has the .mdf files that actually stores all your data. However, the data file is made of pages that are 8 KB in size. At the top of each page is a “page header” used to store system information about that page. There are many different types of pages that store different things, but two specific types of pages I want to talk about are “data” pages and “index” pages.
A “data page” is where your actual data (based on data types) is stored and as you guessed it, the index page stores index information. The “key” to proper storage of data is a Clustered Index. A Clustered Index, physically writes and stores a row of data in a page by selected column(s) and by sort order selected. So, a Clustered Index on a user table could be by the column [Last Name], just like a phone book. This will ensure that the data rows are written in alphabetical order on each page and in turn each page will be in alphabetical order as well, very efficient. SQL Engine “scans” the index to determine exactly what page the “B” last names are located. If a table did not have a clustered index, the data would be stored in a “first come-first served” fashion. In this scenario, the SQL Engine would have to scan then entire page or multiple pages to find your entry, very inefficient. Imagine how inefficient a phone book would be if the publisher just kept adding rows to the end of the book every year without being sorted by name. How long would it take you to find my name then?
So, the key to storing data in SQL, is to have a pre-determined way you want the data rows saved to the page. Ideally this would be the most used method of finding a row, i.e. by “Last Name”.
Next week… I will start a discussion on Types of Indexes in SQL Server. There are too many to include all of them, however I will introduce and discuss in detail some of the more commonly used indexes.