Full-stack Philosophies

James Morle's Blog

RSS Feed

Testing Transport Latency

Posted on 8:06 am February 24, 2012 by James Morle

(Updated 04/04/12 to fix bug with large raw disk partitions)

I need your help!

One of the things that I have struggled to get data for in my storage research is the latency of the interface between server and storage device. It's easy to get numbers for the actual storage device, but the latency numbers for the interface are typically not published. That's a shame, because the latency at this piece of the architecture is going to become increasingly important as we move wholesale to SSD. I don't mean flash specifically here, I mean semiconductor storage devices, of which flash is a component. But this article isn't about that, except by implication, it's just about the piece of wet string that connects storage devices to servers (the transport).

I decided that the best way to get the transport latency figures was to measure them through some kind of test, and that's the subject of this article. I have written a small piece of C code (Linux only, currently) that I believe produces a reasonable estimate of the transport latency, and I would like to start using it to gather data from anyone that would like to share it.

The test is very simple. It simply opens a file with the O_DIRECT flag and reads the same single 4KB block 10,000 times. A 'file' can be a file that you happen to have in the filesystem, or a disk device of any kind (multipath, raw disk slice, etc). The important thing is that the file exists on a storage device (disk, ssd, DRAM, whatever) that is connected via an identifiable interface and topology. For example, if I run the test on my laptop I run it against a file in a filesystem which is stored on a SATA2 SSD. The SSD part of that sentence doesn't matter because I am just testing, in this case, the SATA2 interface. On a Fibre Channel system, my test file might be located on a LUN in a storage array, connected by some kind of Fibre Channel SAN topology. In this case, I would just be testing the SAN transport.

The test is not perfect: There is additional latency added by layers that must be present to complete the test, namely the entire Linux SCSI and device driver stack, and the very outer edges of the storage device's microcode. These parts all add latency, but not enough to significantly alter the significance of the results.

To build the test, simply copy and paste the following code into an editor window, and save it as 'latest.c'. Now compile the code as follows:

gcc -O -o latest latest.c

You should now have an executable binary in the current directory. Now just execute the test as follows:

./latest
...where <testfile> is the name (and path) of the file that exists on the storage device you wish to test against.
This should take somewhere between half a second and ten seconds, depending upon the storage transport being tested, and produce a file named "latest.out".
I would then be extremely grateful if you could email me the output file (about 80-100KB) along with details of the storage transport that exists between the server and the test file. I really need you to be detailed here:
  • What is the physical interface type between server and storage? This should be one of  SATA, SAS, Fibre Channel, Infiniband, Ethernet, or any other that you may have.
  • What generation of interface is it? This might be "16Gbps Fibre Channel", "1Gbps Ethernet", "SATA2", or some other combination.
  • What is the topology? This might be "Direct attached", "via single switch", "via two switches", "multipath across two switches", or any number of combinations. Please be really descriptive here!
  • What are the switches (if any)?
  • What is the storage device? This might be "direct attached HDD", "direct attached SSD", the name of a storage array, or some other value.
  • Are there any other pertinent points? For example, if you have 16Gbps Fibre Channel HBAs in the servers but only 4Gbps on the storage array, it would be nice to know that.
With that, I'll leave you with the code. In return for your help, I will do the following:
  • Review all the files that come in
  • Let you know if there's something weird happening on your system (I'll need to ask you some more questions, probably)
  • Collate the results
  • Publish a summary
Thanks!
Important Disclaimer: I've done everything I can to ensure that this code is entirely safe to run, and that it does not damage anything on your system. You should also read the code to satisfy yourself that this is the case before running it, as neither myself nor Scale Abilities Ltd will accept any liability for damages that may arise. Use this code at your own risk.
/* latest.c
 *
 * Simplistic test for measuring approximate latency of storage transport
 *
 * Copyright 2012 Scale Abilities Ltd
 *
 */

#define _GNU_SOURCE

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/time.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char *argv[]) {

#define BLOCKSIZE 4096
 #define SAMPLES 10000

char *mem,*buf;
 struct timeval tv;
 suseconds_t start, end;
 FILE *logfd;
 int infd, i;
 short res[SAMPLES];

if ( argc!=2) {
 fprintf(stderr,"Must supply filename");
 exit(-1);
 }

if ((infd=open(argv[1],O_RDONLY|O_DIRECT))<0) {
 perror("Cannot open test file: ");
 exit(-1);
 }

if ((size_t)lseek(infd,0,SEEK_END)<(size_t)BLOCKSIZE) {
 fprintf(stderr,"Test file must be %d bytes or larger\n",BLOCKSIZE);
 exit(-1);
 }

if ((logfd=fopen("latest.out","w"))==NULL) {
 perror("Cannot create output log file: ");
 exit(-1);
 }

/* Align buffer for O_DIRECT */
 mem=malloc(2*BLOCKSIZE);
 buf=(void *) (((uintptr_t)mem+BLOCKSIZE )& ~ 0x1FF);
 memset(buf,0,BLOCKSIZE);

for (i=0;i<SAMPLES;i++) {

gettimeofday(&tv,0);
 start=tv.tv_sec*1000000+tv.tv_usec;

if (pread(infd,buf,BLOCKSIZE,0)!=BLOCKSIZE) {
 perror("Read error: ");

exit(-1);
 }
 gettimeofday(&tv,0);
 res[i]=(short) ((tv.tv_sec*1000000+tv.tv_usec)-start);
 }
 close(infd);

for (i=0;i<SAMPLES;i++)
 fprintf(logfd,"%d, %d\n",i,res[i]);

fclose(logfd);
 exit(0);
}

4 comments on “Testing Transport Latency

  1. >What is the physical interface type between server and storage? This should be one of SATA, SAS, Fibre Channel, Infiniband, Ethernet, or any other that you may have.

    Hi James,

    Unless you stipulate external, attached storage you are going to get a latency reading of the controller cache on the PCI controller card. I've quoted the above to draw attention to the fact that I know of no way to attach SATA or SAS as DAS without a cache-enabled controller. Can one even buy a controller that has no cache?

    Also, I'd stipulate that this test be run on an otherwise *completely* idle system as there is plenty of opportunity for a time slice between the two gettimeofday() calls.

    • Kevin,

      Both are good points.
      I might be interested in even getting some figures for controller cache hits, though. This is probably a best case scenario for hopping out onto the PCI bus to an external memory device, so it would be good to have as a baseline. Desktop and laptop machines almost certainly don't have controller cache (unless I'm mistaken, I guess they could have a touch of write-through cache), and it would be good to capture the interface latency to SAS and SATA devices from those platforms.
      Excellent point on the idle system, though the 10,000 samples should let through some samples that don't exhibit time smear, even on a system with some kind of load.
      Cheers!

      James

  2. I'd be inclined to recommend not hammering the I/O in such a tight loop... perhaps a poll(,,N) where N is random between 10 and 100 inserted into the loop. I think there would be more opportunity to get some samples of the routine stalls some controller suffer.

    I'd also initialize the array elems so there are no ZFOD hits in the loop....or...keep it simple :-)

    • Kevin,

      Thanks for the suggestions. However, I think the current simple approach is working well. There are indeed outliers, but these can be statistically discarded - I'm looking for the base latency overhead, without all the exceptions, at this stage.
      Cheers!

      James

Leave a Reply