ES系列(二):基於多播的叢集發現實現原理解析

等你歸去來發表於2021-04-18

  ES作為超強悍的搜尋引擎,除了需要具有齊全的功能支援,超高的效能,還必須要有任意擴充套件的能力。一定程度上,它是一個大資料產品。而要做擴充套件性,叢集自然少不了。然而單獨的叢集又是不夠的,能夠做的事情太少(比如通常的叢集為負載均衡式對等叢集),所以它需要自己組建合適自己的叢集。也就是服務需要自動發現,自動協調叢集例項。當然,這只是擴充套件性的第一步。

  那麼,ES是如何做到叢集間的自動發現的呢?本文就一起來探索探索吧。

 

0. 前情提要

  雖然我們想探究的是es的不用配置就可以自動發現的實現原理,但是當你去看新的es的實現時,會驚奇地發現,它已經不再支援這種功能了。即新版本的es不再支援隱式叢集發現了,實際上這個功能是在5.0之後取消掉了。

  至於為什麼會取消該功能,我想可能和可靠性有比較大的關係。當然,這個問題我們拋卻不說,只管從理論上來討論討論這事即可。

  es的2.1版本中,還有相應的叢集自動發現功能,我們就以這為參考吧。事實上,在這些已經有實現的版本中,它也只是作為一個外掛式存在,即後續版本不再支援,僅是沒有釋出該外掛而已。

  而核心原理,自然是多播或者廣播。

 

1. 自動發現原理概述

  其實平時我們會遇到很多自動發現服務的場景,比如RPC的呼叫,MQ訊息的分發,docker的叢集管理。。。

  所以,自動發現幾乎是一個平常的應用場景,那麼,一般它都是是怎麼解決的呢?通常,就是有一個註冊中心,然後各元件啟動後,將自身註冊到註冊中心,然後由註冊中心將訊息同步給到使用方,從而讓使用方感知這一變化,從而完成自動發現。這幾乎是一個通用的解決辦法,也很容易理解。

  但註冊中心會引入一個額外的服務,如果不想帶這額外的服務,則可能需要各節點間自行協調,或者說讓各自節點都成為可能的註冊中心。

  註冊中心,確實是充當了自動發現的角色,然而如何處理發現之後的步驟,則是各具體應用具體分析的了。所以,除了註冊中心這麼一個郵遞員之外,還必須上下游的配合。

  做自動發現的初衷,一是為了能夠隨時擴容,還有一定程度上的高可用。所以,通常註冊中本身就不能成為單點。當然,一般的這種元件都是叢集高可用的。為場景而生嘛!

  還有就是本文標題所說,使用多播實現動發現。具體原理原理如何,且看下文分解。

 

2. es叢集配置樣例

  es的配置還是比較簡化的,絕大部分都是預設值,只做一些簡單的配置即可。甚至對於單機的部署,下載下來什麼都不用改,立即就可以執行了。下面我們看兩個簡單的叢集配置樣例:(elasticsearch.yml)

# 多播配置下,節點向叢集傳送多播請求,其他節點收到請求後會做出響應。配置引數如下:
discovery.zen.ping.multicast.group:224.5.6.7
#
discovery.zen.ping.multicast.port:1234  
# 廣播訊息ttl
discovery.zen.ping.multicast.ttl:3 
# 繫結的地址,null表示繫結所有可用的網路介面
discovery.zen.ping.multicast.address:null
# 多播自動發現禁用開關
discovery.zen.ping.multicast.enabled:true 
# master 節點數配置
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 3s

# 單播配置下,節點向指定的主機傳送單播請求,配置如下, 使用單播時可以將多播配置禁用
discovery.zen.ping.multicast.enabled:false
discovery.zen.ping.unicast.hosts: ["172.16.0.2:9300","172.16.0.3:9300","172.16.0.5:9300"]

  稍微完整點的配置檔案:(供參考)

cluster.name: elasticsearch
# 配置es的叢集名稱,預設是elasticsearch,es會自動發現在同一網段下的es,如果在同一網段下有多個叢集,就可以用這個屬性來區分不同的叢集。
node.name: "node1"
# 節點名,預設隨機指定一個name列表中名字,該列表在es的jar包中config資料夾裡name.txt檔案中,其中有很多作者新增的有趣名字。
node.master: true
# 指定該節點是否有資格被選舉成為node,預設是true,es是預設叢集中的第一臺機器為master,如果這臺機掛了就會重新選舉master。
node.data: true
# 指定該節點是否儲存索引資料,預設為true。
# index.number_of_shards: 5
# 設定預設索引分片個數,預設為5片。
# index.number_of_replicas: 1
# 設定預設索引副本個數,預設為1個副本。
# path.conf: /path/to/conf
# 設定配置檔案的儲存路徑,預設是es根目錄下的config資料夾。
# path.data: /path/to/data
# 設定索引資料的儲存路徑,預設是es根目錄下的data資料夾,可以設定多個儲存路徑,用逗號隔開,例:
# path.data: /path/to/data1,/path/to/data2
# path.work: /path/to/work
# 設定臨時檔案的儲存路徑,預設是es根目錄下的work資料夾。
# path.logs: /path/to/logs
# 設定日誌檔案的儲存路徑,預設是es根目錄下的logs資料夾
# path.plugins: /path/to/plugins
# 設定外掛的存放路徑,預設是es根目錄下的plugins資料夾
# bootstrap.mlockall: true
# bootstrap.memory_lock: true
# bootstrap.system_call_filter: false
# 設定為true來鎖住記憶體。因為當jvm開始swapping時es的效率會降低,所以要保證它不swap,可以把ES_MIN_MEM和 ES_MAX_MEM兩個環境變數設定成同一個值,並且保證機器有足夠的記憶體分配給es。同時也要允許elasticsearch的程式可以鎖住記憶體,linux下可以通過`ulimit -l unlimited`命令。
# network.bind_host: 0.0.0.0
# 設定繫結的ip地址,可以是ipv4或ipv6的,預設為0.0.0.0。 
# network.publish_host: 192.168.0.1
# 設定其它節點和該節點互動的ip地址,如果不設定它會自動判斷,值必須是個真實的ip地址。
# network.host: 192.168.0.1
# 這個引數是用來同時設定bind_host和publish_host上面兩個引數。
# transport.tcp.port: 9300
# 設定節點間互動的tcp埠,預設是9300。
# transport.tcp.compress: true
# 設定是否壓縮tcp傳輸時的資料,預設為false,不壓縮。
# http.port: 9200
# 設定對外服務的http埠,預設為9200。
# http.max_content_length: 100mb
# 設定內容的最大容量,預設100mb
# http.enabled: false
# 是否使用http協議對外提供服務,預設為true,開啟。
# gateway.type: local
# gateway的型別,預設為local即為本地檔案系統,可以設定為本地檔案系統,分散式檔案系統,Hadoop的HDFS,和amazon的s3伺服器。
# gateway.recover_after_nodes: 1
# 設定叢集中N個節點啟動時進行資料恢復,預設為1。
# gateway.recover_after_time: 5m
# 設定初始化資料恢復程式的超時時間,預設是5分鐘。
# gateway.expected_nodes: 2
# 設定這個叢集中節點的數量,預設為2,一旦這N個節點啟動,就會立即進行資料恢復。
# cluster.routing.allocation.node_initial_primaries_recoveries: 4
# 初始化資料恢復時,併發恢復執行緒的個數,預設為4。
# cluster.routing.allocation.node_concurrent_recoveries: 2
# 新增刪除節點或負載均衡時併發恢復執行緒的個數,預設為4。
# indices.recovery.max_size_per_sec: 0
# 設定資料恢復時限制的頻寬,如入100mb,預設為0,即無限制。
# indices.recovery.concurrent_streams: 5
# 設定這個引數來限制從其它分片恢復資料時最大同時開啟併發流的個數,預設為5。
# discovery.zen.minimum_master_nodes: 1
# 設定這個引數來保證叢集中的節點可以知道其它N個有master資格的節點。預設為1,對於大的叢集來說,可以設定大一點的值(2-4)
# discovery.zen.ping.timeout: 3s
# 設定叢集中自動發現其它節點時ping連線超時時間,預設為3秒,對於比較差的網路環境可以高點的值來防止自動發現時出錯。
# discovery.zen.ping.multicast.enabled: false
# #設定是否開啟多播發現節點,預設是true。
# discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]
#設定叢集中master節點的初始列表,可以通過這些節點來自動發現新加入叢集的節點。
#下面是一些查詢時的慢日誌引數設定
# index.search.slowlog.level: TRACE
# index.search.slowlog.threshold.query.warn: 10s
# index.search.slowlog.threshold.query.info: 5s
# index.search.slowlog.threshold.query.debug: 2s
# index.search.slowlog.threshold.query.trace: 500ms
# index.search.slowlog.threshold.fetch.warn: 1s
# index.search.slowlog.threshold.fetch.info: 800ms
# index.search.slowlog.threshold.fetch.debug:500ms
# index.search.slowlog.threshold.fetch

  總之,要簡單配置很容易。以上,就可以進行es叢集部署了。也就是說已經可以自動發現了,尤其是對於多播的自動發現,你都不用配置。就可以了,即只要名字相同就會被組成同一個叢集了,是不是很神奇。

 

3. ES服務發現實現

  本次討論僅為multicast廣播版本的實現,不含其他。

  它是以plugin形式接入的,以 discovery.zen.ping.multicast.enabled 作為開關。

// org.elasticsearch.plugin.discovery.multicast.MulticastDiscoveryPlugin
public class MulticastDiscoveryPlugin extends Plugin {

    private final Settings settings;

    public MulticastDiscoveryPlugin(Settings settings) {
        this.settings = settings;
    }

    @Override
    public String name() {
        return "discovery-multicast";
    }

    @Override
    public String description() {
        return "Multicast Discovery Plugin";
    }
    
    public void onModule(DiscoveryModule module) {
        // 只有將開關開啟,才會進行多播發現模組的接入
        if (settings.getAsBoolean("discovery.zen.ping.multicast.enabled", false)) {
            module.addZenPing(MulticastZenPing.class);
        }
    }
}

  所以,所有廣播實現相關的東西,就落到了MulticastZenPing的身上了。從構造中方法,我們就可以看到,具體支援的配置引數都有哪些,以預設值如何?

    // org.elasticsearch.plugin.discovery.multicast.MulticastZenPing
    public MulticastZenPing(ThreadPool threadPool, TransportService transportService, ClusterName clusterName, Version version) {
        this(EMPTY_SETTINGS, threadPool, transportService, clusterName, new NetworkService(EMPTY_SETTINGS), version);
    }

    @Inject
    public MulticastZenPing(Settings settings, ThreadPool threadPool, TransportService transportService, ClusterName clusterName, NetworkService networkService, Version version) {
        super(settings);
        this.threadPool = threadPool;
        this.transportService = transportService;
        this.clusterName = clusterName;
        this.networkService = networkService;
        this.version = version;
        // 廣播配置引數讀取,及預設值
        this.address = this.settings.get("discovery.zen.ping.multicast.address");
        this.port = this.settings.getAsInt("discovery.zen.ping.multicast.port", 54328);
        this.group = this.settings.get("discovery.zen.ping.multicast.group", "224.2.2.4");
        this.bufferSize = this.settings.getAsInt("discovery.zen.ping.multicast.buffer_size", 2048);
        this.ttl = this.settings.getAsInt("discovery.zen.ping.multicast.ttl", 3);

        this.pingEnabled = this.settings.getAsBoolean("discovery.zen.ping.multicast.ping.enabled", true);

        logger.debug("using group [{}], with port [{}], ttl [{}], and address [{}]", group, port, ttl, address);
        // 註冊業務處理器 MulticastPingResponseRequestHandler 處理 "internal:discovery/zen/multicast" 請求
        this.transportService.registerRequestHandler(ACTION_NAME, MulticastPingResponse.class, ThreadPool.Names.SAME, new MulticastPingResponseRequestHandler());
    }

  構造例項完成後,等待後續的ES程式的start呼叫。此時,才會進行廣播channel的建立,即廣播監聽與傳送。

    // org.elasticsearch.plugin.discovery.multicast.MulticastZenPing.doStart
    @Override
    protected void doStart() {
        try {
            // we know OSX has bugs in the JVM when creating multiple instances of multicast sockets
            // causing for "socket close" exceptions when receive and/or crashes
            boolean shared = settings.getAsBoolean("discovery.zen.ping.multicast.shared", Constants.MAC_OS_X);
            // OSX does not correctly send multicasts FROM the right interface
            boolean deferToInterface = settings.getAsBoolean("discovery.zen.ping.multicast.defer_group_to_set_interface", Constants.MAC_OS_X);
            // 呼叫本模組的channel工具類,channel相關的操作都由其統一實現
            multicastChannel = MulticastChannel.getChannel(nodeName(), shared,
                    new MulticastChannel.Config(port, group, bufferSize, ttl,
                            // don't use publish address, the use case for that is e.g. a firewall or proxy and
                            // may not even be bound to an interface on this machine! use the first bound address.
                            networkService.resolveBindHostAddress(address)[0],
                            deferToInterface),
                    new Receiver());
        } catch (Throwable t) {
            String msg = "multicast failed to start [{}], disabling. Consider using IPv4 only (by defining env. variable `ES_USE_IPV4`)";
            if (logger.isDebugEnabled()) {
                logger.debug(msg, t, ExceptionsHelper.detailedMessage(t));
            } else {
                logger.info(msg, ExceptionsHelper.detailedMessage(t));
            }
        }
    }
    
    // multicast.MulticastChannel.getChannel
    /**
     * Builds a channel based on the provided config, allowing to control if sharing a channel that uses
     * the same config is allowed or not.
     */
    public static MulticastChannel getChannel(String name, boolean shared, Config config, Listener listener) throws Exception {
        if (!shared) {
            return new Plain(listener, name, config);
        }
        return Shared.getSharedChannel(listener, config);
    }
    
    // 以簡版實現看過程
    /**
     * Simple implementation of a channel.
     */
    @SuppressForbidden(reason = "I bind to wildcard addresses. I am a total nightmare")
    private static class Plain extends MulticastChannel {
        private final ESLogger logger;
        private final Config config;

        private volatile MulticastSocket multicastSocket;
        private final DatagramPacket datagramPacketSend;
        private final DatagramPacket datagramPacketReceive;

        private final Object sendMutex = new Object();
        private final Object receiveMutex = new Object();

        private final Receiver receiver;
        private final Thread receiverThread;

        Plain(Listener listener, String name, Config config) throws Exception {
            super(listener);
            this.logger = ESLoggerFactory.getLogger(name);
            this.config = config;
            this.datagramPacketReceive = new DatagramPacket(new byte[config.bufferSize], config.bufferSize);
            this.datagramPacketSend = new DatagramPacket(new byte[config.bufferSize], config.bufferSize, InetAddress.getByName(config.group), config.port);
            // 通過multcastSocket 完成廣播功能
            this.multicastSocket = buildMulticastSocket(config);
            this.receiver = new Receiver();
            this.receiverThread = daemonThreadFactory(Settings.builder().put("name", name).build(), "discovery#multicast#receiver").newThread(receiver);
            this.receiverThread.start();
        }

        private MulticastSocket buildMulticastSocket(Config config) throws Exception {
            SocketAddress addr = new InetSocketAddress(InetAddress.getByName(config.group), config.port);
            MulticastSocket multicastSocket = new MulticastSocket(config.port);
            try {
                multicastSocket.setTimeToLive(config.ttl);
                // OSX is not smart enough to tell that a socket bound to the
                // 'lo0' interface needs to make sure to send the UDP packet
                // out of the lo0 interface, so we need to do some special
                // workarounds to fix it.
                if (config.deferToInterface) {
                    // 'null' here tells the socket to deter to the interface set
                    // with .setInterface
                    multicastSocket.joinGroup(addr, null);
                    multicastSocket.setInterface(config.multicastInterface);
                } else {
                    multicastSocket.setInterface(config.multicastInterface);
                    multicastSocket.joinGroup(InetAddress.getByName(config.group));
                }
                multicastSocket.setReceiveBufferSize(config.bufferSize);
                multicastSocket.setSendBufferSize(config.bufferSize);
                multicastSocket.setSoTimeout(60000);
            } catch (Throwable e) {
                IOUtils.closeWhileHandlingException(multicastSocket);
                throw e;
            }
            return multicastSocket;
        }

        public Config getConfig() {
            return this.config;
        }
        // 傳送廣播訊息
        @Override
        public void send(BytesReference data) throws Exception {
            synchronized (sendMutex) {
                datagramPacketSend.setData(data.toBytes());
                multicastSocket.send(datagramPacketSend);
            }
        }

        @Override
        protected void close(Listener listener) {
            receiver.stop();
            receiverThread.interrupt();
            if (multicastSocket != null) {
                IOUtils.closeWhileHandlingException(multicastSocket);
                multicastSocket = null;
            }
            try {
                receiverThread.join(10000);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }
        // 接收廣播訊息
        private class Receiver implements Runnable {

            private volatile boolean running = true;

            public void stop() {
                running = false;
            }

            @Override
            public void run() {
                while (running) {
                    try {
                        synchronized (receiveMutex) {
                            try {
                                multicastSocket.receive(datagramPacketReceive);
                            } catch (SocketTimeoutException ignore) {
                                continue;
                            } catch (Exception e) {
                                if (running) {
                                    if (multicastSocket.isClosed()) {
                                        logger.warn("multicast socket closed while running, restarting...");
                                        multicastSocket = buildMulticastSocket(config);
                                    } else {
                                        logger.warn("failed to receive packet, throttling...", e);
                                        Thread.sleep(500);
                                    }
                                }
                                continue;
                            }
                        }
                        // 接收到訊息後,監聽者進行業務處理
                        if (datagramPacketReceive.getData().length > 0) {
                            listener.onMessage(new BytesArray(datagramPacketReceive.getData()), datagramPacketReceive.getSocketAddress());
                        }
                    } catch (Throwable e) {
                        if (running) {
                            logger.warn("unexpected exception in multicast receiver", e);
                        }
                    }
                }
            }
        }
    }

  可以看到,廣播訊息的實現,是基於java的MulticastSocket進行實現的,也可以看到它是基於udp的,即可靠性並不保證。通過一個死等接收廣播訊息的receiver執行緒,實現廣播訊息的監聽,並最終通過listener進行訊息的業務處理。所以,廣播是框架,而業務核心則是監聽者listener的實現了。

  而這裡的listener則是通過在 MulticastZenPing 中實現的 Receiver 完成的。

    // multicast.MulticastZenPing.Receiver
    private class Receiver implements MulticastChannel.Listener {
        // 廣播訊息處理入口
        @Override
        public void onMessage(BytesReference data, SocketAddress address) {
            int id = -1;
            DiscoveryNode requestingNodeX = null;
            ClusterName clusterName = null;

            Map<String, Object> externalPingData = null;
            XContentType xContentType = null;

            try {
                boolean internal = false;
                if (data.length() > 4) {
                    int counter = 0;
                    for (; counter < INTERNAL_HEADER.length; counter++) {
                        if (data.get(counter) != INTERNAL_HEADER[counter]) {
                            break;
                        }
                    }
                    if (counter == INTERNAL_HEADER.length) {
                        internal = true;
                    }
                }
                if (internal) {
                    StreamInput input = StreamInput.wrap(new BytesArray(data.toBytes(), INTERNAL_HEADER.length, data.length() - INTERNAL_HEADER.length));
                    Version version = Version.readVersion(input);
                    input.setVersion(version);
                    id = input.readInt();
                    clusterName = ClusterName.readClusterName(input);
                    requestingNodeX = readNode(input);
                } else {
                    xContentType = XContentFactory.xContentType(data);
                    if (xContentType != null) {
                        // an external ping
                        try (XContentParser parser = XContentFactory.xContent(xContentType).createParser(data)) {
                            externalPingData = parser.map();
                        }
                    } else {
                        throw new IllegalStateException("failed multicast message, probably message from previous version");
                    }
                }
                if (externalPingData != null) {
                    handleExternalPingRequest(externalPingData, xContentType, address);
                } else {
                    handleNodePingRequest(id, requestingNodeX, clusterName);
                }
            } catch (Exception e) {
                if (!lifecycle.started() || (e instanceof EsRejectedExecutionException)) {
                    logger.debug("failed to read requesting data from {}", e, address);
                } else {
                    logger.warn("failed to read requesting data from {}", e, address);
                }
            }
        }

        @SuppressWarnings("unchecked")
        private void handleExternalPingRequest(Map<String, Object> externalPingData, XContentType contentType, SocketAddress remoteAddress) {
            if (externalPingData.containsKey("response")) {
                // ignoring responses sent over the multicast channel
                logger.trace("got an external ping response (ignoring) from {}, content {}", remoteAddress, externalPingData);
                return;
            }

            if (multicastChannel == null) {
                logger.debug("can't send ping response, no socket, from {}, content {}", remoteAddress, externalPingData);
                return;
            }

            Map<String, Object> request = (Map<String, Object>) externalPingData.get("request");
            if (request == null) {
                logger.warn("malformed external ping request, no 'request' element from {}, content {}", remoteAddress, externalPingData);
                return;
            }
            // 讀取廣播方的 cluster_name, 如果相同則認為是同一個叢集
            final String requestClusterName = request.containsKey("cluster_name") ? request.get("cluster_name").toString() : request.containsKey("clusterName") ? request.get("clusterName").toString() : null;
            if (requestClusterName == null) {
                logger.warn("malformed external ping request, missing 'cluster_name' element within request, from {}, content {}", remoteAddress, externalPingData);
                return;
            }
        
            if (!requestClusterName.equals(clusterName.value())) {
                logger.trace("got request for cluster_name {}, but our cluster_name is {}, from {}, content {}",
                        requestClusterName, clusterName.value(), remoteAddress, externalPingData);
                return;
            }
            if (logger.isTraceEnabled()) {
                logger.trace("got external ping request from {}, content {}", remoteAddress, externalPingData);
            }

            try {
                DiscoveryNode localNode = contextProvider.nodes().localNode();

                XContentBuilder builder = XContentFactory.contentBuilder(contentType);
                builder.startObject().startObject("response");
                builder.field("cluster_name", clusterName.value());
                builder.startObject("version").field("number", version.number()).field("snapshot_build", version.snapshot).endObject();
                builder.field("transport_address", localNode.address().toString());

                if (contextProvider.nodeService() != null) {
                    for (Map.Entry<String, String> attr : contextProvider.nodeService().attributes().entrySet()) {
                        builder.field(attr.getKey(), attr.getValue());
                    }
                }

                builder.startObject("attributes");
                for (Map.Entry<String, String> attr : localNode.attributes().entrySet()) {
                    builder.field(attr.getKey(), attr.getValue());
                }
                builder.endObject();

                builder.endObject().endObject();
                multicastChannel.send(builder.bytes());
                if (logger.isTraceEnabled()) {
                    logger.trace("sending external ping response {}", builder.string());
                }
            } catch (Exception e) {
                logger.warn("failed to send external multicast response", e);
            }
        }

        private void handleNodePingRequest(int id, DiscoveryNode requestingNodeX, ClusterName requestClusterName) {
            if (!pingEnabled || multicastChannel == null) {
                return;
            }
            final DiscoveryNodes discoveryNodes = contextProvider.nodes();
            final DiscoveryNode requestingNode = requestingNodeX;
            if (requestingNode.id().equals(discoveryNodes.localNodeId())) {
                // that's me, ignore
                return;
            }
            if (!requestClusterName.equals(clusterName)) {
                if (logger.isTraceEnabled()) {
                    logger.trace("[{}] received ping_request from [{}], but wrong cluster_name [{}], expected [{}], ignoring",
                            id, requestingNode, requestClusterName.value(), clusterName.value());
                }
                return;
            }
            // don't connect between two client nodes, no need for that...
            if (!discoveryNodes.localNode().shouldConnectTo(requestingNode)) {
                if (logger.isTraceEnabled()) {
                    logger.trace("[{}] received ping_request from [{}], both are client nodes, ignoring", id, requestingNode, requestClusterName);
                }
                return;
            }
            final MulticastPingResponse multicastPingResponse = new MulticastPingResponse();
            multicastPingResponse.id = id;
            multicastPingResponse.pingResponse = new PingResponse(discoveryNodes.localNode(), discoveryNodes.masterNode(), clusterName, contextProvider.nodeHasJoinedClusterOnce());

            if (logger.isTraceEnabled()) {
                logger.trace("[{}] received ping_request from [{}], sending {}", id, requestingNode, multicastPingResponse.pingResponse);
            }
            // 加入叢集
            if (!transportService.nodeConnected(requestingNode)) {
                // do the connect and send on a thread pool
                threadPool.generic().execute(new Runnable() {
                    @Override
                    public void run() {
                        // connect to the node if possible
                        try {
                            transportService.connectToNode(requestingNode);
                            transportService.sendRequest(requestingNode, ACTION_NAME, multicastPingResponse, new EmptyTransportResponseHandler(ThreadPool.Names.SAME) {
                                @Override
                                public void handleException(TransportException exp) {
                                    logger.warn("failed to receive confirmation on sent ping response to [{}]", exp, requestingNode);
                                }
                            });
                        } catch (Exception e) {
                            if (lifecycle.started()) {
                                logger.warn("failed to connect to requesting node {}", e, requestingNode);
                            }
                        }
                    }
                });
            } else {
                transportService.sendRequest(requestingNode, ACTION_NAME, multicastPingResponse, new EmptyTransportResponseHandler(ThreadPool.Names.SAME) {
                    @Override
                    public void handleException(TransportException exp) {
                        if (lifecycle.started()) {
                            logger.warn("failed to receive confirmation on sent ping response to [{}]", exp, requestingNode);
                        }
                    }
                });
            }
        }
    }

  處理方法就是,收到某個節點的廣播訊息,則讀取叢集名,相同則認為是同一叢集。傳送訊息資訊,以及連線到該節點,從而保持節點間的通訊鏈路。

  還有其他許多細節,略去不說。但我們已經從整體上解答了,es是如何進行自動叢集節點發現的了,一個傳送廣播訊息,同一廣播組的例項收到訊息後,讀取cluster_name,從而判定是否是同一叢集,進而自動組網。

  但我們也要明白,廣播訊息的不可靠性。在一些可靠性要求很高的場景,往往會發生一些意想不到的事。這可能是我們在用廣播訊息時要注意的最大問題,通過對比收益與風險,就可以知道是否值得使用該技術了。

 

4. 廣播技術在dubbo中的應用

  dubbo中也有應用到廣播的場景,但一般僅限於做測試時使用。它基於multicast實現註冊中心,和es的應用也算是異曲同工。用廣播實現註冊中心的最大好處是,無需再引入第三方的元件,即可完成系統的構建,從而減少測試的複雜依賴問題。根據MulticastSocket廣播的程式設計正規化,理論上這二者差別不會太大。我們就通過具體的dubbo的廣播註冊中心實現來驗證一番:

    // com.alibaba.dubbo.registry.multicast.MulticastRegistry#MulticastRegistry
    public MulticastRegistry(URL url) {
        super(url);
        if (url.isAnyHost()) {
            throw new IllegalStateException("registry address == null");
        }
        if (! isMulticastAddress(url.getHost())) {
            throw new IllegalArgumentException("Invalid multicast address " + url.getHost() + ", scope: 224.0.0.0 - 239.255.255.255");
        }
        try {
            mutilcastAddress = InetAddress.getByName(url.getHost());
            mutilcastPort = url.getPort() <= 0 ? DEFAULT_MULTICAST_PORT : url.getPort();
            mutilcastSocket = new MulticastSocket(mutilcastPort);
            mutilcastSocket.setLoopbackMode(false);
            mutilcastSocket.joinGroup(mutilcastAddress);
            Thread thread = new Thread(new Runnable() {
                public void run() {
                    byte[] buf = new byte[2048];
                    DatagramPacket recv = new DatagramPacket(buf, buf.length);
                    // 一直等待廣播訊息的到來,即不管是有人上線,有人下線,都會往該廣播地址傳送訊息,當前服務即會收到訊息
                    while (! mutilcastSocket.isClosed()) {
                        try {
                            mutilcastSocket.receive(recv);
                            // 此處假設最大收到的訊息長度為 2048, 如果超出該限制將可能發生未知錯誤或者被忽略
                            // 強制轉換為字串訊息
                            String msg = new String(recv.getData()).trim();
                            int i = msg.indexOf('\n');
                            // 只接收一行訊息,多出部分將被忽略
                            if (i > 0) {
                                msg = msg.substring(0, i).trim();
                            }
                            // 收到廣播訊息,通知應用,進行業務響應
                            MulticastRegistry.this.receive(msg, (InetSocketAddress) recv.getSocketAddress());
                            Arrays.fill(buf, (byte)0);
                        } catch (Throwable e) {
                            if (! mutilcastSocket.isClosed()) {
                                logger.error(e.getMessage(), e);
                            }
                        }
                    }
                }
            }, "DubboMulticastRegistryReceiver");
            thread.setDaemon(true);
            thread.start();
        } catch (IOException e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
        this.cleanPeriod = url.getParameter(Constants.SESSION_TIMEOUT_KEY, Constants.DEFAULT_SESSION_TIMEOUT);
        if (url.getParameter("clean", true)) {
            this.cleanFuture = cleanExecutor.scheduleWithFixedDelay(new Runnable() {
                public void run() {
                    try {
                        clean(); // 清除過期者
                    } catch (Throwable t) { // 防禦性容錯
                        logger.error("Unexpected exception occur at clean expired provider, cause: " + t.getMessage(), t);
                    }
                }
            }, cleanPeriod, cleanPeriod, TimeUnit.MILLISECONDS);
        } else {
            this.cleanFuture = null;
        }
    }
    // 處理廣播訊息,作出相應動作
    private void receive(String msg, InetSocketAddress remoteAddress) {
        if (logger.isInfoEnabled()) {
            logger.info("Receive multicast message: " + msg + " from " + remoteAddress);
        }
        // 註冊訊息
        if (msg.startsWith(Constants.REGISTER)) {
            URL url = URL.valueOf(msg.substring(Constants.REGISTER.length()).trim());
            registered(url);
        } 
        // 解註冊訊息
        else if (msg.startsWith(Constants.UNREGISTER)) {
            URL url = URL.valueOf(msg.substring(Constants.UNREGISTER.length()).trim());
            unregistered(url);
        } 
        // 訂閱訊息
        else if (msg.startsWith(Constants.SUBSCRIBE)) {
            URL url = URL.valueOf(msg.substring(Constants.SUBSCRIBE.length()).trim());
            Set<URL> urls = getRegistered();
            if (urls != null && urls.size() > 0) {
                for (URL u : urls) {
                    if (UrlUtils.isMatch(url, u)) {
                        String host = remoteAddress != null && remoteAddress.getAddress() != null 
                                ? remoteAddress.getAddress().getHostAddress() : url.getIp();
                        if (url.getParameter("unicast", true) // 消費者的機器是否只有一個程式
                                && ! NetUtils.getLocalHost().equals(host)) { // 同機器多程式不能用unicast單播資訊,否則只會有一個程式收到資訊
                            // 單播註冊訊息
                            unicast(Constants.REGISTER + " " + u.toFullString(), host);
                        } else {
                            // 傳送廣播註冊訊息
                            broadcast(Constants.REGISTER + " " + u.toFullString());
                        }
                    }
                }
            }
        }/* else if (msg.startsWith(UNSUBSCRIBE)) {
        }*/
    }

    // com.alibaba.dubbo.registry.support.FailbackRegistry#FailbackRegistry
    public FailbackRegistry(URL url) {
        super(url);
        int retryPeriod = url.getParameter(Constants.REGISTRY_RETRY_PERIOD_KEY, Constants.DEFAULT_REGISTRY_RETRY_PERIOD);
        this.retryFuture = retryExecutor.scheduleWithFixedDelay(new Runnable() {
            public void run() {
                // 檢測並連線註冊中心
                try {
                    retry();
                } catch (Throwable t) { // 防禦性容錯
                    logger.error("Unexpected error occur at failed retry, cause: " + t.getMessage(), t);
                }
            }
        }, retryPeriod, retryPeriod, TimeUnit.MILLISECONDS);
    }
    // 廣播訊息出去(傳送訊息)
    private void broadcast(String msg) {
        if (logger.isInfoEnabled()) {
            logger.info("Send broadcast message: " + msg + " to " + mutilcastAddress + ":" + mutilcastPort);
        }
        try {
            byte[] data = (msg + "\n").getBytes();
            DatagramPacket hi = new DatagramPacket(data, data.length, mutilcastAddress, mutilcastPort);
            mutilcastSocket.send(hi);
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
    }
    // 單播註冊訊息
    private void unicast(String msg, String host) {
        if (logger.isInfoEnabled()) {
            logger.info("Send unicast message: " + msg + " to " + host + ":" + mutilcastPort);
        }
        try {
            byte[] data = (msg + "\n").getBytes();
            DatagramPacket hi = new DatagramPacket(data, data.length, InetAddress.getByName(host), mutilcastPort);
            mutilcastSocket.send(hi);
        } catch (Exception e) {
            throw new IllegalStateException(e.getMessage(), e);
        }
    }

  更多資訊:

ES系列(二):基於多播的叢集發現實現原理解析
    // 更多實現
    
    protected void doRegister(URL url) {
        broadcast(Constants.REGISTER + " " + url.toFullString());
    }

    protected void doUnregister(URL url) {
        broadcast(Constants.UNREGISTER + " " + url.toFullString());
    }

    protected void doSubscribe(URL url, NotifyListener listener) {
        if (Constants.ANY_VALUE.equals(url.getServiceInterface())) {
            admin = true;
        }
        broadcast(Constants.SUBSCRIBE + " " + url.toFullString());
        synchronized (listener) {
            try {
                listener.wait(url.getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT));
            } catch (InterruptedException e) {
            }
        }
    }

    protected void doUnsubscribe(URL url, NotifyListener listener) {
        if (! Constants.ANY_VALUE.equals(url.getServiceInterface())
                && url.getParameter(Constants.REGISTER_KEY, true)) {
            unregister(url);
        }
        broadcast(Constants.UNSUBSCRIBE + " " + url.toFullString());
    }

    public boolean isAvailable() {
        try {
            return mutilcastSocket != null;
        } catch (Throwable t) {
            return false;
        }
    }

    public void destroy() {
        super.destroy();
        try {
            if (cleanFuture != null) {
                cleanFuture.cancel(true);
            }
        } catch (Throwable t) {
            logger.warn(t.getMessage(), t);
        }
        try {
            mutilcastSocket.leaveGroup(mutilcastAddress);
            mutilcastSocket.close();
        } catch (Throwable t) {
            logger.warn(t.getMessage(), t);
        }
    }

    protected void registered(URL url) {
        for (Map.Entry<URL, Set<NotifyListener>> entry : getSubscribed().entrySet()) {
            URL key = entry.getKey();
            if (UrlUtils.isMatch(key, url)) {
                Set<URL> urls = received.get(key);
                if (urls == null) {
                    received.putIfAbsent(key, new ConcurrentHashSet<URL>());
                    urls = received.get(key);
                }
                urls.add(url);
                List<URL> list = toList(urls);
                for (NotifyListener listener : entry.getValue()) {
                    notify(key, listener, list);
                    synchronized (listener) {
                        listener.notify();
                    }
                }
            }
        }
    }

    protected void unregistered(URL url) {
        for (Map.Entry<URL, Set<NotifyListener>> entry : getSubscribed().entrySet()) {
            URL key = entry.getKey();
            if (UrlUtils.isMatch(key, url)) {
                Set<URL> urls = received.get(key);
                if (urls != null) {
                    urls.remove(url);
                }
                List<URL> list = toList(urls);
                for (NotifyListener listener : entry.getValue()) {
                    notify(key, listener, list);
                }
            }
        }
    }

    protected void subscribed(URL url, NotifyListener listener) {
        List<URL> urls = lookup(url);
        notify(url, listener, urls);
    }

    private List<URL> toList(Set<URL> urls) {
        List<URL> list = new ArrayList<URL>();
        if (urls != null && urls.size() > 0) {
            for (URL url : urls) {
                list.add(url);
            }
        }
        return list;
    }

    public void register(URL url) {
        super.register(url);
        registered(url);
    }

    public void unregister(URL url) {
        super.unregister(url);
        unregistered(url);
    }

    public void subscribe(URL url, NotifyListener listener) {
        super.subscribe(url, listener);
        subscribed(url, listener);
    }

    public void unsubscribe(URL url, NotifyListener listener) {
        super.unsubscribe(url, listener);
        received.remove(url);
    }

    public List<URL> lookup(URL url) {
        List<URL> urls= new ArrayList<URL>();
        Map<String, List<URL>> notifiedUrls = getNotified().get(url);
        if (notifiedUrls != null && notifiedUrls.size() > 0) {
            for (List<URL> values : notifiedUrls.values()) {
                urls.addAll(values);
            }
        }
        if (urls == null || urls.size() == 0) {
            List<URL> cacheUrls = getCacheUrls(url);
            if (cacheUrls != null && cacheUrls.size() > 0) {
                urls.addAll(cacheUrls);
            }
        }
        if (urls == null || urls.size() == 0) {
            for (URL u: getRegistered()) {
                if (UrlUtils.isMatch(url, u)) {
                    urls.add(u);
                }
            }
        }
        if (Constants.ANY_VALUE.equals(url.getServiceInterface())) {
            for (URL u: getSubscribed().keySet()) {
                if (UrlUtils.isMatch(url, u)) {
                    urls.add(u);
                }
            }
        }
        return urls;
    }

    public MulticastSocket getMutilcastSocket() {
        return mutilcastSocket;
    }

    public Map<URL, Set<URL>> getReceived() {
        return received;
    }
View Code

 

相關文章