快來體驗快速通道,netty中epoll傳輸協議詳解

簡介

在前面的章節中，我們講解了kqueue的使用和原理，接下來我們再看一下epoll的使用。兩者都是更加高階的IO方式，都需要藉助native的方法實現，不同的是Kqueue用在mac系統中，而epoll用在liunx系統中。

epoll的詳細使用

epoll的使用也很簡單，我們還是以常用的聊天室為例來講解epoll的使用。

對於server端來說需要建立bossGroup和workerGroup,在NIO中這兩個group是NIOEventLoopGroup,在epoll中則需要使用EpollEventLoopGroup:

        EventLoopGroup bossGroup = new EpollEventLoopGroup(1);
        EventLoopGroup workerGroup = new EpollEventLoopGroup();

接著需要將bossGroup和workerGroup傳入到ServerBootstrap中：

ServerBootstrap b = new ServerBootstrap();
            b.group(bossGroup, workerGroup)
             .channel(EpollServerSocketChannel.class)
             .handler(new LoggingHandler(LogLevel.INFO))
             .childHandler(new NativeChatServerInitializer());

注意，這裡傳入的channel是EpollServerSocketChannel，專門用來處理epoll的請求。其他的部分和普通的NIO服務是一樣的。

接下來看下epoll的客戶端，對於客戶端來說需要建立一個EventLoopGroup,這裡使用的是EpollEventLoopGroup:

EventLoopGroup group = new EpollEventLoopGroup();

然後將這個group傳入Bootstrap中去:

Bootstrap b = new Bootstrap();
            b.group(group)
             .channel(EpollSocketChannel.class)
             .handler(new NativeChatClientInitializer());

這裡使用的channel是EpollSocketChannel，是和EpollServerSocketChannel對應的客戶端的channel。

EpollEventLoopGroup

先看下EpollEventLoopGroup的定義：

public final class EpollEventLoopGroup extends MultithreadEventLoopGroup

和KqueueEventLoopGroup一樣，EpollEventLoopGroup也是繼承自MultithreadEventLoopGroup，表示它可以開啟多個執行緒。

在使用EpollEventLoopGroup之前，需要確保epoll相關的JNI介面都已經準備完畢：

Epoll.ensureAvailability();

newChild方法用來生成EpollEventLoopGroup的子EventLoop:

    protected EventLoop newChild(Executor executor, Object... args) throws Exception {
        Integer maxEvents = (Integer) args[0];
        SelectStrategyFactory selectStrategyFactory = (SelectStrategyFactory) args[1];
        RejectedExecutionHandler rejectedExecutionHandler = (RejectedExecutionHandler) args[2];
        EventLoopTaskQueueFactory taskQueueFactory = null;
        EventLoopTaskQueueFactory tailTaskQueueFactory = null;

        int argsLength = args.length;
        if (argsLength > 3) {
            taskQueueFactory = (EventLoopTaskQueueFactory) args[3];
        }
        if (argsLength > 4) {
            tailTaskQueueFactory = (EventLoopTaskQueueFactory) args[4];
        }
        return new EpollEventLoop(this, executor, maxEvents,
                selectStrategyFactory.newSelectStrategy(),
                rejectedExecutionHandler, taskQueueFactory, tailTaskQueueFactory);
    }

從方法中可以看到，newChild接受一個executor和多個額外的引數，這些引數分別是SelectStrategyFactory，RejectedExecutionHandler，taskQueueFactory和tailTaskQueueFactory，最終將這些引數傳入EpollEventLoop中，返回一個新的EpollEventLoop物件。

EpollEventLoop

EpollEventLoop是由EpollEventLoopGroup通過使用new child方法來建立的。

對於EpollEventLoop本身來說，是一個SingleThreadEventLoop:

class EpollEventLoop extends SingleThreadEventLoop

藉助於native epoll IO的強大功能，EpollEventLoop可以在單執行緒的情況下快速進行業務處理，十分優秀。

和EpollEventLoopGroup一樣，EpollEventLoop在初始化的時候需要檢測系統是否支援epoll:

    static {
        Epoll.ensureAvailability();
    }

在EpollEventLoopGroup呼叫的EpollEventLoop的建構函式中，初始化了三個FileDescriptor,分別是epollFd,eventFd和timerFd,這三個FileDescriptor都是呼叫Native方法建立的：

this.epollFd = epollFd = Native.newEpollCreate();
this.eventFd = eventFd = Native.newEventFd();
this.timerFd = timerFd = Native.newTimerFd();

然後呼叫Native.epollCtlAdd建立FileDescriptor之間的關聯關係：

Native.epollCtlAdd(epollFd.intValue(), eventFd.intValue(), Native.EPOLLIN | Native.EPOLLET);
Native.epollCtlAdd(epollFd.intValue(), timerFd.intValue(), Native.EPOLLIN | Native.EPOLLET);

在EpollEventLoop的run方法中，首先會呼叫selectStrategy.calculateStrategy方法，拿到當前的select狀態，預設情況下有三個狀態，分別是：

    int SELECT = -1;

    int CONTINUE = -2;

    int BUSY_WAIT = -3;

這三個狀態我們在kqueue中已經介紹過了，不同的是epoll支援BUSY_WAIT狀態，在BUSY_WAIT狀態下，會去呼叫Native.epollBusyWait(epollFd, events)方法返回busy wait的event個數。

如果是在select狀態下，則會去呼叫Native.epollWait(epollFd, events, 1000)方法返回wait狀態下的event個數。

接下來會分別呼叫processReady(events, strategy)和runAllTasks方法，進行event的ready狀態回撥處理和最終的任務執行。

EpollServerSocketChannel

先看下EpollServerSocketChannel的定義：

public final class EpollServerSocketChannel extends AbstractEpollServerChannel implements ServerSocketChannel

EpollServerSocketChannel繼承自AbstractEpollServerChannel並且實現了ServerSocketChannel介面。

EpollServerSocketChannel的建構函式需要傳入一個LinuxSocket：

    EpollServerSocketChannel(LinuxSocket fd) {
        super(fd);
        config = new EpollServerSocketChannelConfig(this);
    }

LinuxSocket是一個特殊的socket,用來處理和linux的native socket連線。

EpollServerSocketChannelConfig是構建EpollServerSocketChannel的配置，這裡用到了4個配置選項，分別是SO_REUSEPORT,IP_FREEBIND,IP_TRANSPARENT,TCP_DEFER_ACCEPT和TCP_MD5SIG。每個配置項都對應著網路協議的特定含義。

我們再看一下EpollServerSocketChannel的newChildChannel方法：

    protected Channel newChildChannel(int fd, byte[] address, int offset, int len) throws Exception {
        return new EpollSocketChannel(this, new LinuxSocket(fd), address(address, offset, len));
    }

newChildChannel和KqueueServerSocketChannel方法一樣，也是返回一個EpollSocketChannel，並且將傳入的fd構造成為LinuxSocket。

EpollSocketChannel

EpollSocketChannel是由EpollServerSocketChannel建立返回的，先來看下EpollSocketChannel的定義：

public final class EpollSocketChannel extends AbstractEpollStreamChannel implements SocketChannel {

可以看到EpollSocketChannel繼承自AbstractEpollStreamChannel，並且實現了SocketChannel介面。

回到之前EpollServerSocketChannel建立EpollSocketChannel時呼叫的newChildChannel方法，這個方法會呼叫EpollSocketChannel的建構函式如下所示：

    EpollSocketChannel(Channel parent, LinuxSocket fd, InetSocketAddress remoteAddress) {
        super(parent, fd, remoteAddress);
        config = new EpollSocketChannelConfig(this);

        if (parent instanceof EpollServerSocketChannel) {
            tcpMd5SigAddresses = ((EpollServerSocketChannel) parent).tcpMd5SigAddresses();
        }
    }

從程式碼的邏輯可以看到，如果EpollSocketChannel是從EpollServerSocketChannel建立出來的話，那麼預設會開啟tcpMd5Sig的特性。

什麼是tcpMd5Sig呢？

簡單點說，tcpMd5Sig就是在TCP的資料包文中新增了MD5 sig,用來進行資料的校驗，從而提示資料傳輸的安全性。

TCP MD5是在RFC 2385中提出的，並且只在linux核心中才能開啟，也就是說如果你想使用tcpMd5Sig，那麼必須使用EpollServerSocketChannel和EpollSocketChannel。

所以如果是追求效能或者特殊使用場景的朋友，需要接觸這種native transport的時候還是很多的,可以仔細研究其中的配置選項。

再看一下EpollSocketChannel中非常重要的doConnect0方法：

    boolean doConnect0(SocketAddress remote) throws Exception {
        if (IS_SUPPORTING_TCP_FASTOPEN_CLIENT && config.isTcpFastOpenConnect()) {
            ChannelOutboundBuffer outbound = unsafe().outboundBuffer();
            outbound.addFlush();
            Object curr;
            if ((curr = outbound.current()) instanceof ByteBuf) {
                ByteBuf initialData = (ByteBuf) curr;
                long localFlushedAmount = doWriteOrSendBytes(
                        initialData, (InetSocketAddress) remote, true);
                if (localFlushedAmount > 0) {
                    outbound.removeBytes(localFlushedAmount);
                    return true;
                }
            }
        }
        return super.doConnect0(remote);
    }

在這個方法中會首先判斷是否開啟了TcpFastOpen選項，如果開啟了該選項，那麼最終會呼叫LinuxSocket的write或者sendTo方法，這些方法可以新增初始資料，可以在建立連線的同時傳遞資料，從而達到Tcp fast open的效果。

如果不是tcp fast open,那麼需要呼叫Socket的connect方法去建立傳統的連線。

總結

epoll在netty中的實現和kqueue很類似，他們的不同在於執行的平臺和具體的功能引數，如果追求高效能的朋友可以深入研究。

本文的程式碼，大家可以參考：

learn-netty4

更多內容請參考 http://www.flydean.com/53-2-netty-epoll-transport/
最通俗的解讀，最深刻的乾貨，最簡潔的教程，眾多你不知道的小技巧等你來發現！
歡迎關注我的公眾號:「程式那些事」,懂技術，更懂你！