Linux libaio 非同步I/O

gaopengtttt發表於2017-11-23

水平有限,錯誤請指出


最近準備仔細看看innodb 非同步I/O的實現,而在LINUX平臺下Innodb中一般我們使用的都是libaio叫做LINUX NATIVE AIO,這有別於POSIX實現的AIO,因為以前對非同步I/O並不熟悉,因為在很多LINUX 系統程式設計書籍上都沒有介紹,而網上也是資料不多。當然其好處還是非常明顯的,能夠在使用O_DIRECT 開啟檔案的情況下,保證效能而不是消耗CPU資源在等待I/O落盤上,在MYSQL中如果使用了O_DIRECT開啟了資料檔案那麼非同步I/O將會發揮作用。當然MYSQL有一個自己模擬的非同步I/O但是在現在支援LIBAIO的LINUX中一般都是 NATIVE AIO了。所以我仔細看了一下LINUX NATIVE AIO。
我主要參考如下文章:
http://www.360doc.com/content/14/0109/14/9008018_343854180.shtml
應該是谷歌的工程師寫的,他最後列子我也詳細的研究了一遍並且進行了修改/編譯加上了中文註釋。
吐槽一把Linux系統程式設計久了不用好多都模糊了,明年一定要好好複習一下。

簡書地址
http://www.jianshu.com/p/f6371735e520


一、基本資料結構簡介

  • io_context_t:它是一個成為上下文的結構,在內部它包含一個完成佇列,線上程之間是可以共享的。
  • iocb:單次讀寫操作需求,下面是主要的一些定義
void* data;
short aio_lio_opcode;
int aio_fildes;
union
{
 strcut
 {
   void* buf;
   unsigned long nbytes;
   long long offset;
 } c;
} u; 

data:是一個使用者定義傳入資料
aio_lio_opcode:是一個標示可以取

IO_CMD_PREAD:讀
IO_CMD_PWRITE:寫
或者其他支援的標誌 

aio_fileds:是iocb讀取或者寫入的檔案描述符fd
u.c.buf:是一個讀取或者寫入的記憶體資料指標
u.c.nbytes:記憶體資料位元組長度
u.c.offset:讀取檔案的偏移量

其次union u中實際包含其他的可能的I/O型別如下,有興趣的需要在看看

 union {
               struct io_iocb_common           c;
               struct io_iocb_vector           v;
               struct io_iocb_poll             poll;
               struct io_iocb_sockaddr saddr;
       } u; 

iocb應該使用io_prep_pread和io_prep_pwrite進行初始化如下:

void io_prep_pwrite(struct iocb *iocb, int fd, void *buf, size_t count, long long offset);
void io_prep_pread(struct iocb *iocb, int fd, void *buf, size_t count, long long offset); 

我們發現 int fd,void *buf, size_t count, long long offset 剛好對應了

aio_fileds:
u.c.buf:
u.c.nbytes:
u.c.offset: 

而aio_lio_opcode可以從呼叫方式(io_prep_pread/io_prep_pwrite)看出來。

  • io_event

void* data:和iocb中的使用者資料同一指標
struct iocb *obj:也就是iocb
long loing res:讀或者寫的位元組數

二、相關函式

  • 1、建立一個io_context_t上下文
    int io_setup(unsigned nr_events, io_context_t *ctxp);

  • nr_events:本io_context_t支援的最大event最大佇列,注意和後面的io_getevents的nr相容.

  • ctxp:一根指標io_context_t用於初始化,但是這個io_context_t必須提前建立好並且賦值為0.
    LINUX MAIN PAGE:

DESCRIPTION
      io_setup()  creates  an  asynchronous  I/O  context capable of receiving at least nr_events.  ctxp must not point to an AIO context that already
      exists, and must be initialized to 0 prior to the call.  On successful creation of the AIO context, *ctxp is filled in with the  resulting  han-
      dle.

RETURN VALUE
      On success, io_setup() returns 0.  For the failure return, see NOTES.
ERRORS
      EAGAIN The specified nr_events exceeds the user's limit of available events.
      EFAULT An invalid pointer is passed for ctxp.
      EINVAL ctxp is not initialized, or the specified nr_events exceeds internal limits.  nr_events should be greater than 0.
      ENOMEM Insufficient kernel resources are available.
      ENOSYS io_setup() is not implemented on this architecture. 
  • 2、銷燬一個io_context_t上下文
    這個沒啥好說的看看原型和LINUX MAIN PAGE即可
    int io_destroy(aio_context_t ctx);
    LINUX MAIN PAGE:
DESCRIPTION
       io_destroy()  removes  the  asynchronous  I/O context from the list of I/O contexts and then destroys it.  io_destroy() can also cancel any out-
       standing asynchronous I/O actions on ctx and block on completion.

RETURN VALUE
       On success, io_destroy() returns 0.  For the failure return, see NOTES.

ERRORS
       EFAULT The context pointed to is invalid.
       EINVAL The AIO context specified by ctx is invalid.
       ENOSYS io_destroy() is not implemented on this architecture. 
  • 3、提交非同步I/O操作
    int io_submit(io_context_t ctx_id, long nr, struct iocb **iocbpp);

  • ctx_id:非同步I/O上下文

  • nr:後面iocb陣列的長度

  • iocbpp:也就是iocb的陣列
    我們可以發現一個io_submit可以提交多個iocb非同步I/O需求,但是它們之間是沒有順序的,如果提交多個iocb需求可以顯著的提高效能,正常情況下其不會被堵塞,如果被堵塞可能由於沒有使用O_DIRECT開啟檔案導致

LINUX MAIN PAGE:

DESCRIPTION
       io_submit()  queues  nr  I/O request blocks for processing in the AIO context ctx_id.  iocbpp should be an array of nr AIO control blocks, which
       will be submitted to context ctx_id.
RETURN VALUE
       On success, io_submit() returns the number of iocbs submitted (which may be 0 if nr is zero).  For the failure return, see NOTES.
ERRORS
       EAGAIN Insufficient resources are available to queue any iocbs.
       EBADF  The file descriptor specified in the first iocb is invalid.
       EFAULT One of the data structures points to invalid data.
       EINVAL The aio_context specified by ctx_id is invalid.  nr is less than 0.  The iocb at *iocbpp[0] is not properly initialized, or the operation
              specified is invalid for the file descriptor in the iocb.
       ENOSYS io_submit() is not implemented on this architecture. 
  • 4、獲取I/O完成狀態
    int io_getevents(io_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout);

  • ctx_id:上下文

  • min_nr:io_getevents函式呼叫在達到min_nr將會返回結果io_events

  • nr:最大的io_event返回個數

  • events:它是一個返回io_event的一個陣列

  • timeout:呼叫io_getevents堵塞的最大時間,如果達到這個值io_getevents函式呼叫 ,將會提前結束,返回實際的events陣列和個數,可能會少於nr。
    LINUX MAIN PAGE:

DESCRIPTION
       io_getevents()  attempts  to  read  at least min_nr events and up to nr events from the completion queue of the AIO context specified by ctx_id.
       timeout specifies the amount of time to wait for events, where a NULL timeout waits until at least min_nr events  have  been  seen.   Note  that
       timeout is relative and will be updated if not NULL and the operation blocks.

RETURN VALUE
       On success, io_getevents() returns the number of events read: 0 if no events are available, or less than min_nr if the timeout has elapsed.  For
       the failure return, see NOTES.

ERRORS
       EFAULT Either events or timeout is an invalid pointer.
       EINVAL ctx_id is invalid.  min_nr is out of range or nr is out of range.
       EINTR  Interrupted by a signal handler; see signal(7).
       ENOSYS io_getevents() is not implemented on this architecture. 

三、列子

如果要明白呢一個系統函式的使用最重要的還是看看它的使用套路,下面的列子最好能夠好好看看

點選(此處)摺疊或開啟

  1. /*************************************************************************
  2.   > File Name: test.cpp
  3.   > Author: gaopeng QQ:22389860 all right reserved
  4.   > Mail: gaopp_200217@163.com
  5.   > Created Time: Fri 17 Nov 2017 12:52:56 AM CST
  6.  ************************************************************************/

  7. #include<iostream>
  8. #include<libaio.h>
  9. #include<stdlib.h>
  10. #include<stdio.h>
  11. #include<sys/stat.h>
  12. #include<sys/types.h>
  13. #include<fcntl.h>

  14. #define PAGE_SIZE 1<<12
  15. #define CHECK_ERROR(r,w,fs,f) (check_error(r,w,fs,f,__FILE__, __LINE__))
  16. #define MAX_NR 10000
  17. #define MIN_NR 1



  18. /*
  19.  * ret:fun ret
  20.  * what:what value
  21.  * sf_flag:succuess or failure
  22.  * 1 succuess
  23.  * 0 failure
  24.  *
  25.  */


  26. int check_error(int ret,int what,int sf_flag,const char* fun, const char* szFile, const int iLine)
  27. {
  28.     if(sf_flag == 1)
  29.     {
  30.         if(ret != what)
  31.         {
  32.             printf("error file:%s line:%d",szFile,iLine);
  33.             perror(fun);
  34.             exit(-1);
  35.         }
  36.     }
  37.     else if(sf_flag == 0)
  38.     {
  39.         if(ret == what)
  40.         {
  41.             printf("error file:%s line:%d",szFile,iLine);
  42.             perror(fun);
  43.             exit(-1);
  44.         }
  45.     }
  46.     return 0;
  47. }

  48. class AIOre
  49. {
  50.     public:
  51.                 int* buffer;
  52.         virtual void Complete(int ret) = 0;
  53.         AIOre()
  54.         {
  55.                         int ret = posix_memalign((void**)(&buffer),PAGE_SIZE,PAGE_SIZE);/* 分配一個4096(bytes)大小的4096對其記憶體空間 */
  56.             CHECK_ERROR(ret,1,0,"posix_memalign");
  57.         }
  58.         virtual ~AIOre()
  59.         {
  60.                         printf("Virtual AIOre destory function to free this buffer!");
  61.             free(buffer);
  62.         }
  63. };

  64. class Adder
  65. {
  66.         public:
  67.                 virtual void Add(int amount) = 0;
  68.                 virtual ~Adder(){};
  69. };

  70. class AIORead:public AIOre
  71. {
  72.     private:
  73.         Adder* adder;//父類指標
  74.     public:
  75.         AIORead(Adder* adder):AIOre()//父類指標指向子類物件
  76.                 {
  77.                        this->adder = adder;
  78.                 }
  79.         virtual void Complete(int res)
  80.         {
  81.             //return check
  82.             int value = buffer[0];
  83.                         printf("Read of %d Completed %d res ",value,res);
  84.             //多型
  85.             adder->Add(value);
  86.         }

  87. };

  88. class AIOWrite:public AIOre
  89. {
  90.     private:
  91.         int value;
  92.     public:
  93.         AIOWrite(int value):AIOre()
  94.                 {
  95.         buffer[0] = value;
  96.                 this->value = value;
  97.                 }
  98.         virtual void Complete(int res)
  99.         {
  100.             //error check
  101.                         printf("Write of %d Completed %d \n",value,res);
  102.         }
  103. };

  104. class AIOAdder:public Adder
  105. {
  106.     public:
  107.         int fd;
  108.         io_context_t ioctx;
  109.         int counter; /* 偏移量 */
  110.                 int reap_counter; /* event個數 */
  111.                 int sum; /* */
  112.                 int length; /* 檔案大小/PAGE_SIZE */

  113.         AIOAdder(int length)
  114.         {
  115.             ioctx = 0;//必須初始化為0
  116.             counter = 0;
  117.                         reap_counter = 0;
  118.             sum = 0;
  119.             this->length = length;
  120.         }

  121.                 void init() /* 初始化開啟檔案並且預分配檔案大小 */
  122.         {
  123.             printf("Open file\n");
  124.                         fd = open("test",O_RDWR|O_DIRECT|O_CREAT,0644); //必須包含O_DIRECT
  125.             CHECK_ERROR(fd,0,-1,"open");
  126.             printf("Allocating enough space for the sum\n");
  127.             {
  128.                 int ret = fallocate(fd,0,0,PAGE_SIZE*length);/* 預先分配length*4096大小的檔案 */
  129.                 CHECK_ERROR(fd,1,0,"fallocate");
  130.             }
  131.             printf("Setting the io Context\n");
  132.             {
  133.                                 int ret = io_setup(100,&ioctx); /* 初始化ioctx*/
  134.                 CHECK_ERROR(ret,1,0,"io_setup");
  135.             }
  136.         }

  137.         virtual void Add(int amount)
  138.         {
  139.             sum += amount;
  140.             printf("Adding %d for toal of %d \n",amount,sum);
  141.         }
  142.         
  143.         void submitwr()
  144.         {
  145.             printf("submitting a wirte to %d \n",counter);
  146.             struct iocb iocb;//建立一個非同步I/O需求
  147.             struct iocb* iocbs = &iocb;
  148.                         AIORe *req = new AIOWrite(counter); /* 這裡使用counter去初始化buffer buffer 4K大小 但是counter只有4 BYTES */
  149.             /* void io_prep_pwrite(struct iocb *iocb, int fd, void *buf, size_t count, long long offset); */
  150.             /* 初始化這個非同步I/O需求 counter為偏移量 */
  151.             io_prep_pwrite(&iocb,fd,req->buffer,PAGE_SIZE,counter*PAGE_SIZE);
  152.                         iocb.data=req; /* 使用者指標實際上就是本次提交Write操作的類物件指標用於釋放buffer */
  153.             int res = io_submit(ioctx,1,&iocbs);/* 提交這個I/O不會堵塞 */
  154.             CHECK_ERROR(ret,1,0,"io_submit");
  155.         }

  156.         void writefile()
  157.         {
  158.             reap_counter = 0;
  159.                         for(counter = 0;counter < length;counter++) /* 偏移量不斷增加不斷寫入 */
  160.             {
  161.                                 submitwr(); /* 非同步提交操作 實際在多執行緒下本執行緒提交後則可以幹其他事情了不會堵塞等待而耗費CPU */
  162.                                 reap(); /* 獲得i/o狀態 */
  163.             }
  164.                         reapremain();
  165.         }

  166.         void submitrd()
  167.         {
  168.             printf("submitting a read from %d \n",counter);
  169.             struct iocb iocb;//建立一個非同步I/O需求
  170.             struct iocb* iocbs = &iocb;
  171.             AIORe *req = new AIORead(this);
  172.             /* void io_prep_pread(struct iocb *iocb, int fd, void *buf, size_t count, long long offset); */
  173.             io_prep_pread(&iocb,fd,req->buffer,PAGE_SIZE,counter*PAGE_SIZE);
  174.             iocb.data = req;
  175.             int res = io_submit(ioctx,1,&iocbs);
  176.             CHECK_ERROR(ret,1,0,"io_submit");
  177.             printf("test:%p %p\n",&iocb,iocbs);
  178.         }

  179.         void readfile()
  180.         {
  181.             reap_counter = 0;
  182.             for(counter=0;counter<length;counter++)
  183.             {
  184.                 submitrd();
  185.                 reap();//here paramter used
  186.             }
  187.             reap remaining();
  188.         }

  189.         int doreap(int min_nr)
  190.         {
  191.             printf("Reap between %ld and %ld io_events\n",min_nr,max_nr);//what mean
  192.                         struct io_event* events = new io_event[MAX_NR]; /* pending I/O max support */
  193.             struct timespec timeout;
  194.             timeout.tv_sec = 0;
  195.             timeout.tv_nsec = 100000000;
  196.             int num_events;
  197.             printf("Calling io_getevents");
  198.                         num_events = io_getevents(ioctx,min_nr,MAX_NR,events,&timeout); /* 獲得非同步I/O event個數 */
  199.                         CHECK_ERROR(num_events,-1,0,"io_getevents");
  200.             printf("Calling completion function on results");
  201.                         for(int i = 0;i<num_events;i++) /* 開始獲取每一個event並且做相應處理 */
  202.             {
  203.                                 struct io_event event = events[i];
  204.                                 AIORe* req = (AIORe*)(event.data); /* 多型AIORe可以是度或者寫及 AIOWrite/AIORead */
  205.                 req->Complete(event.res);
  206.                                 delete req; /* 到這裡一次I/O就完成了,刪除記憶體物件包含buffer */
  207.             }
  208.             delete events;
  209.             printf("Reaped %ld io_getevents",num_events);
  210.                         reap_counter = num_events+reap_counter; /* 將event個數彙總 */
  211.                         return num_events; /* 返回本次獲取的event個數 */
  212.         }
  213.     
  214.         void reap()
  215.         {
  216.                         if(counter >= MIN_NR) /* 如果大於了min_nr才開始reap */
  217.             {
  218.                                 doreap(MIN_NR);
  219.             }
  220.         }

  221.                 void reapremain() /* 做最後的reap */
  222.         {
  223.             while(reap_counter<length)
  224.             {
  225.                 doreap(1);
  226.             }
  227.         }

  228.         ~AIOAdder()
  229.         {
  230.             printf("Closing AIO context and file");
  231.             io_destroy(ioctx);
  232.             colse(fd);
  233.         }

  234.                 int Sum()
  235.         {
  236.             printf("Writing consecutive integers to file");
  237.             writefile();
  238.             printf("Rriting consecutive integers to file");
  239.             readfile();
  240.             return sum;
  241.         }

  242. };

  243. int main()
  244. {
  245.     AIOAdder adder(10000);
  246.     adder.init();/* 檔案預先分配大小 */
  247.     adder.writefile();
  248.     adder.readfile();
  249. }






作者微信


 Linux libaio 非同步I/O 

<length) {="" doreap(1);="" }="" ~aioadder()="" printf("closing="" aio="" context="" and="" file");="" io_destroy(ioctx);="" colse(fd);="" int="" sum()="" printf("writing="" consecutive="" integers="" to="" writefile();="" printf("rriting="" readfile();="" return="" sum;="" };="" main()="" aioadder="" adder(10000);="" adder.init();="" *="" 檔案預先分配大小="" adder.writefile();="" adder.readfile();="" }

來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/7728585/viewspace-2147684/,如需轉載,請註明出處,否則將追究法律責任。

相關文章