I/O,I/O it’s off to … Bang my head against a wall!

I/O, Input/Output, Reads/Writes what ever you want to call it.  It is one of the most aggravating aspects of SQL Server. The physical spinning hard drives of your SAN can bring your entire database server to its knees, no matter how powerful a beast of a server you have.

When you as the DBA are at the mercy of the SAN Administrator (or better yet 3rd party SAN administrator), your hands are somewhat tied on configuration changes. So what do you do?  Bang your head against a wall and shout, “It’s not my fault, it’s not my fault!” Then the big burly guys in white shirts will come visit your cubicle, ask you some questions and then take you on a little “vacation”.

one-flew-over-the-cuckoos-nest

 

That’s not the path I want to go down.

Not having any type of 3rd party monitoring solution, I needed a way to prove which drive was having trouble and just how much work it had to do!

Now I have done my due diligence and have read about I/O stalls, I/O Latency, other I/O measurements and such; but most everything I found, because the way SQL Server works, were cumulative information. Glenn Berry’s DMV Diagnostic Information Queries is great place to learn where to get this information.  But again, it is based on SQL Server cumulative information. I hope my Read Latency is not 150+ ms!

I needed a way to say, “for this sample period” this was the read latency of a drive?

Then I found Jon Gurgul b|t, his post titled Delta and Cumulative IO Stats was a great read; however it wasn’t exactly what I needed.  But it did get me thinking about his method of capturing two result sets with a time delay and finding the difference.  So why couldn’t I do this for Glenn Berry’s Read/Write Latency script.

Below is what I came up with.  It probably is not perfect, but it works for me. I have setup a SQL Agent job to run this code every 15 minutes with a 5 min delay between result sets.  Then the results are stored in a Diagnostic database so I can chart and report on trends.

I am eager to see the trending over time and truly see what my Latency time for my SAN drives are!

   1:  

   2: IF OBJECT_ID(N'tempdb..#Sample1') IS NOT NULL BEGIN DROP TABLE #Sample1 END;

   3: IF OBJECT_ID(N'tempdb..#Sample2') IS NOT NULL BEGIN DROP TABLE #Sample2 END;

   4: SELECT 

   5:     LEFT(UPPER(mf.physical_name), 2) AS Drive

   6:     ,SUM(num_of_reads) AS num_of_reads

   7:     ,SUM(io_stall_read_ms) AS io_stall_read_ms

   8:     ,SUM(num_of_writes) AS num_of_writes

   9:     ,SUM(io_stall_write_ms) AS io_stall_write_ms

  10:     ,SUM(num_of_bytes_read) AS num_of_bytes_read

  11:     ,SUM(num_of_bytes_written) AS num_of_bytes_written

  12:     ,SUM(io_stall) AS io_stall

  13: INTO #Sample1

  14: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs

  15: INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  16:     ON vfs.database_id = mf.database_id 

  17:     AND vfs.file_id = mf.file_id

  18: GROUP BY LEFT(UPPER(mf.physical_name), 2);

  19:  

  20: WAITFOR DELAY '00:05:00'

  21:  

  22: SELECT 

  23:     LEFT(UPPER(mf.physical_name), 2) AS Drive

  24:     ,SUM(num_of_reads) AS num_of_reads

  25:     ,SUM(io_stall_read_ms) AS io_stall_read_ms

  26:     ,SUM(num_of_writes) AS num_of_writes

  27:     ,SUM(io_stall_write_ms) AS io_stall_write_ms

  28:     ,SUM(num_of_bytes_read) AS num_of_bytes_read

  29:     ,SUM(num_of_bytes_written) AS num_of_bytes_written

  30:     ,SUM(io_stall) AS io_stall

  31: INTO #Sample2

  32: FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs

  33: INNER JOIN sys.master_files AS mf WITH (NOLOCK)

  34:     ON vfs.database_id = mf.database_id 

  35:     AND vfs.file_id = mf.file_id

  36: GROUP BY LEFT(UPPER(mf.physical_name), 2);

  37:  

  38: INSERT INTO Diagnostic_WH.dbo.[IO-Stats]

  39:     (Drive

  40:     ,DriveType

  41:     ,num_of_reads

  42:     ,io_stall_reads_ms

  43:     ,num_bytes_read

  44:     ,read_latency

  45:     ,num_of_writes

  46:     ,io_stall_write_ms

  47:     ,num_bytes_written

  48:     ,write_latency

  49:     ,io_stall

  50:     ,RowDateTime)

  51: SELECT

  52:     t.Drive

  53:     ,CASE 

  54:         WHEN t.Drive='C:' THEN 'SSISDB/System'

  55:         WHEN t.Drive='H:' THEN 'mdf Data'

  56:         WHEN t.Drive='L:' THEN 'ldf LOG'

  57:         WHEN t.Drive='E:' THEN 'tempdb'

  58:         WHEN t.Drive='T:' THEN 'Indexes'

  59:     END AS [DriveType]

  60:     ,t.num_of_reads-s.num_of_reads [num_of_reads]

  61:     ,t.io_stall_read_ms-s.io_stall_read_ms [io_stall_read_ms]

  62:     ,(t.num_of_bytes_read-s.num_of_bytes_read) [num_bytes_read]

  63:     ,CASE 

  64:         WHEN (t.num_of_reads-s.num_of_reads)=0 THEN 0

  65:         ELSE (t.io_stall_read_ms-s.io_stall_read_ms)

  66:                 /(t.num_of_reads-s.num_of_reads)

  67:         END AS [read_latency]

  68:     ,t.num_of_writes-s.num_of_writes [num_of_writes]

  69:     ,t.io_stall_write_ms-s.io_stall_write_ms [io_stall_write_ms]

  70:     ,t.num_of_bytes_written-s.num_of_bytes_written [num_bytes_written]

  71:     ,CASE

  72:         WHEN (t.num_of_writes-s.num_of_writes)=0 THEN 0

  73:         ELSE (t.io_stall_write_ms-s.io_stall_write_ms)

  74:                 /(t.num_of_writes-s.num_of_writes)

  75:         END AS [write_latency]

  76:     ,t.io_stall-s.io_stall [io_stall]

  77:     ,GETDATE()

  78: FROM #Sample1 s

  79: INNER JOIN #Sample2 t

  80:     ON s.Drive=t.Drive

  81: ORDER BY t.Drive