Saturday, January 30, 2010

Buffered IO and minor page faults

I had a problem in which my application is creating lots of minor page faults in the range of multiple ten thousands, I initially thought it is created by malloc() calls as mentioned by ezolt's paper, by further analysis showed that that problem is fixed in the glibc. I was sure that mmap will be a source of minor page faults as it is adding a free page to the process page table. So I did a strace of the application, that showed me that one mmap() call is coming after every open() call, by looking at the source code, I figured out that buffered IO calls like fopen, fread are creating this problem.

open("/root/dmesg.txt", O_RDONLY)       = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=23818, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2afbfba25000
read(3, "Linux version 2.6.18-120.el5 (bre"..., 4096) = 4096
read(3, "dle threads.\nCPU: Physical Proces"..., 4096) = 4096
read(3, "device 0000:00:1c.0 to 64\nPCI: Se"..., 4096) = 4096
read(3, "SB hub found\nhub 1-0:1.0: 8 ports"..., 4096) = 4096
read(3, "t 4\nusb-storage: waiting for devi"..., 4096) = 4096
read(3, "20\nACPI: PCI Interrupt 0000:01:04"..., 4096) = 3338
read(3, ""..., 4096)                    = 0
close(3)         

Now what is buffered IO? it is a buffered layer implemented on top of Direct IO  calls like open, read etc. Glibc library will have a buffer of typical size 4096bytes(i.e. size of page 4K), every read and write by the application is served from this buffer. If you want to know more about buffered IO read the books The C Standard Library and Unix File Systems.

The mmap() call is used by the glibc to create buffer, it is clearly mapping 4K page buffer. I really don't know why this is done this way, I asked in the glibc mailing list but nobody responded, see the post.

So how we will avoid minor page faults created by buffered IO calls? Luckily glibc provides another function called setvbuf(), by which application can provide it is own buffer.  If we provide our own buffer then glibc will not allocate buffer by using mmap(). So using setvbuf() avoids minor page faults and also improves the program performance.

10000 loops

[root@mysys testnew]# time ./read_nobuffer       --> direct IO
real    0m0.922s
user    0m0.210s
sys     0m0.711s

[root@mysys testnew]# time ./read_bufmmap     --> buffered IO with library mmap
real    0m0.321s
user    0m0.106s
sys     0m0.215s

[root@mysys testnew]# time ./read_bufsetvbuf    --> buffered IO with
user provided buffer,setvbuf()
real    0m0.178s
user    0m0.071s
sys     0m0.106s

[root@mysys testnew]#

 Minor Page Faults (see under faults/s)

[root@mysys ~]# sar -B 1 10000
Linux 2.6.18-120.el5 (mysys)     12/16/2009

06:17:43 PM  pgpgin/s pgpgout/s   fault/s  majflt/s
06:17:44 PM      0.00      0.00     51.00      0.00
06:17:45 PM      0.00      0.00     24.00      0.00
06:17:46 PM      0.00      0.00     12.00      0.00
06:17:47 PM      0.00      0.00     12.00      0.00
06:17:48 PM      0.00     31.68     11.88      0.00
06:17:49 PM      0.00      4.04     12.12      0.00
06:17:50 PM      0.00      0.00     12.00      0.00
06:17:51 PM      0.00      0.00     14.00      0.00
06:17:52 PM      0.00      0.00     12.00      0.00
06:17:53 PM      0.00      0.00    163.00      0.00   ---> direct IO
06:17:54 PM      0.00      0.00     36.00      0.00
06:17:55 PM      0.00      0.00     14.00      0.00
06:17:56 PM      0.00      0.00  10213.00      0.00  ---> buffered IO with library mmap
06:17:57 PM      0.00      0.00     13.27      0.00
06:17:58 PM      0.00     28.57    217.35      0.00  ---> buffered IO with user provided buffer,setvbuf()
06:17:59 PM      0.00      0.00     12.24      0.00
06:18:00 PM      0.00      0.00     12.12      0.00
06:18:01 PM      0.00      0.00     12.24      0.00
 
Source code for programs I used is pasted below.

[root@mysys testnew]# cat read_nobuffer.c
int main(void)
{
       char buffer[256];
       int i = 0;

       int fp;
       while(i++ < 10000)
       {
       if(!(fp = open("/root/dmesg.txt", O_RDONLY)))
               return 0;
       while(read(fp, buffer, 256));
       close(fp);
       }
       return 0;
}

[root@mysys testnew]# cat read_bufmmap.c
int main(void)
{
       char buffer[256];
       int i = 0;

       FILE *fp;
       while(i++ < 10000)
       {
       if((fp = fopen("/root/dmesg.txt", "r")) == NULL)
               return 0;
       while(!feof(fp))
               fread(buffer, 256, 1, fp);
       fclose(fp);
       }
       return 0;
}

[root@mysys testnew]# cat read_bufsetvbuf.c
char buffer123[8192];
int main(void)
{
       char buffer[256];
       int i = 0;

       FILE *fp;
       while(i++ < 10000)
       {
       if((fp = fopen("/root/dmesg.txt", "r")) == NULL)
               return 0;
       setvbuf(fp, buffer123, _IOFBF,  4096);
       while(!feof(fp))
               fread(buffer, 256, 1, fp);
       fclose(fp);
       }
       return 0;
}

2 comments:

  1. The message you linked to had an interesting suggestion: to teach glibc to maintain

    - a (small) cache of buffers (to avoid munmap() and then a further mmap())

    I haven’t investigated or thought this through much, but it might be worth trying that.

    I suspect you’d get more response from the glibc maintainers if you send a patch along with your suggestion. Then people can try it out, and if it’s worth the trouble, point you to towards getting copyright assignment in order and so on.

    ReplyDelete
  2. I feel there should be a reason behind using mmap way of buffer allocation for buffered IO, that is reason why they provide setvbuf call, for the people who want to overcome it.

    ReplyDelete