聊聊elasticsearch的NodesFaultDetection

go4it發表於2019-05-10

本文主要研究一下elasticsearch的NodesFaultDetection

NodesFaultDetection

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/fd/NodesFaultDetection.java

public class NodesFaultDetection extends AbstractComponent {

    public static interface Listener {

        void onNodeFailure(DiscoveryNode node, String reason);
    }

    private final ThreadPool threadPool;

    private final TransportService transportService;


    private final boolean connectOnNetworkDisconnect;

    private final TimeValue pingInterval;

    private final TimeValue pingRetryTimeout;

    private final int pingRetryCount;

    // used mainly for testing, should always be true
    private final boolean registerConnectionListener;


    private final CopyOnWriteArrayList<Listener> listeners = new CopyOnWriteArrayList<Listener>();

    private final ConcurrentMap<DiscoveryNode, NodeFD> nodesFD = newConcurrentMap();

    private final FDConnectionListener connectionListener;

    private volatile DiscoveryNodes latestNodes = EMPTY_NODES;

    private volatile boolean running = false;

    public NodesFaultDetection(Settings settings, ThreadPool threadPool, TransportService transportService) {
        super(settings);
        this.threadPool = threadPool;
        this.transportService = transportService;

        this.connectOnNetworkDisconnect = componentSettings.getAsBoolean("connect_on_network_disconnect", true);
        this.pingInterval = componentSettings.getAsTime("ping_interval", timeValueSeconds(1));
        this.pingRetryTimeout = componentSettings.getAsTime("ping_timeout", timeValueSeconds(30));
        this.pingRetryCount = componentSettings.getAsInt("ping_retries", 3);
        this.registerConnectionListener = componentSettings.getAsBoolean("register_connection_listener", true);

        logger.debug("[node  ] uses ping_interval [{}], ping_timeout [{}], ping_retries [{}]", pingInterval, pingRetryTimeout, pingRetryCount);

        transportService.registerHandler(PingRequestHandler.ACTION, new PingRequestHandler());

        this.connectionListener = new FDConnectionListener();
        if (registerConnectionListener) {
            transportService.addConnectionListener(connectionListener);
        }
    }

    public NodesFaultDetection start() {
        if (running) {
            return this;
        }
        running = true;
        return this;
    }

    public NodesFaultDetection stop() {
        if (!running) {
            return this;
        }
        running = false;
        return this;
    }

    public void close() {
        stop();
        transportService.removeHandler(PingRequestHandler.ACTION);
        transportService.removeConnectionListener(connectionListener);
    }

    //......
}
複製程式碼
  • NodesFaultDetection繼承了AbstractComponent,它定義了一個CopyOnWriteArrayList型別的listeners,一個ConcurrentMap的nodesFD,connectionListener、latestNodes、running等屬性
  • 其構造器讀取connect_on_network_disconnect(預設true)、ping_interval(預設1s)、ping_timeout(預設30s)、ping_retries(預設為3)、register_connection_listener(預設true)配置,然後給transportService註冊了PingRequestHandler.ACTION的PingRequestHandler,新增了FDConnectionListener
  • start方法用於設定running為true;stop用於設定running為false;close方法先執行stop,然後從transportService移除PingRequestHandler.ACTION的handler,並移除connectionListener

PingRequestHandler

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/fd/NodesFaultDetection.java

    class PingRequestHandler extends BaseTransportRequestHandler<PingRequest> {

        public static final String ACTION = "discovery/zen/fd/ping";

        @Override
        public PingRequest newInstance() {
            return new PingRequest();
        }

        @Override
        public void messageReceived(PingRequest request, TransportChannel channel) throws Exception {
            // if we are not the node we are supposed to be pinged, send an exception
            // this can happen when a kill -9 is sent, and another node is started using the same port
            if (!latestNodes.localNodeId().equals(request.nodeId)) {
                throw new ElasticSearchIllegalStateException("Got pinged as node [" + request.nodeId + "], but I am node [" + latestNodes.localNodeId() + "]");
            }
            channel.sendResponse(new PingResponse());
        }

        @Override
        public String executor() {
            return ThreadPool.Names.SAME;
        }
    }

    static class PingRequest extends TransportRequest {

        // the (assumed) node id we are pinging
        private String nodeId;

        PingRequest() {
        }

        PingRequest(String nodeId) {
            this.nodeId = nodeId;
        }

        @Override
        public void readFrom(StreamInput in) throws IOException {
            super.readFrom(in);
            nodeId = in.readString();
        }

        @Override
        public void writeTo(StreamOutput out) throws IOException {
            super.writeTo(out);
            out.writeString(nodeId);
        }
    }

    private static class PingResponse extends TransportResponse {

        private PingResponse() {
        }

        @Override
        public void readFrom(StreamInput in) throws IOException {
            super.readFrom(in);
        }

        @Override
        public void writeTo(StreamOutput out) throws IOException {
            super.writeTo(out);
        }
    }
複製程式碼
  • PingRequestHandler的newInstance方法用於建立PingRequest,該物件定義了nodeId屬性用於標識它要請求的目標nodeId;而messageReceived方法用於響應PingRequest請求,它會先判斷目標nodeId是否跟localNodeId一致,一致的話則返回PingResponse

FDConnectionListener

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/fd/NodesFaultDetection.java

    private class FDConnectionListener implements TransportConnectionListener {
        @Override
        public void onNodeConnected(DiscoveryNode node) {
        }

        @Override
        public void onNodeDisconnected(DiscoveryNode node) {
            handleTransportDisconnect(node);
        }
    }

    private void handleTransportDisconnect(DiscoveryNode node) {
        if (!latestNodes.nodeExists(node.id())) {
            return;
        }
        NodeFD nodeFD = nodesFD.remove(node);
        if (nodeFD == null) {
            return;
        }
        if (!running) {
            return;
        }
        nodeFD.running = false;
        if (connectOnNetworkDisconnect) {
            try {
                transportService.connectToNode(node);
                nodesFD.put(node, new NodeFD());
                threadPool.schedule(pingInterval, ThreadPool.Names.SAME, new SendPingRequest(node));
            } catch (Exception e) {
                logger.trace("[node  ] [{}] transport disconnected (with verified connect)", node);
                notifyNodeFailure(node, "transport disconnected (with verified connect)");
            }
        } else {
            logger.trace("[node  ] [{}] transport disconnected", node);
            notifyNodeFailure(node, "transport disconnected");
        }
    }

    private void notifyNodeFailure(final DiscoveryNode node, final String reason) {
        threadPool.generic().execute(new Runnable() {
            @Override
            public void run() {
                for (Listener listener : listeners) {
                    listener.onNodeFailure(node, reason);
                }
            }
        });
    }
複製程式碼
  • FDConnectionListener在onNodeDisconnected的時候會執行handleTransportDisconnect;該方法會將該node從nodesFD中移除,標記該nodeFD的running為false
  • 如果connectOnNetworkDisconnect為true則對該node進行connect,成功則放入nodesFD,並註冊對該node進行SendPingRequest的延時任務,延時pingInterval執行;如果connect異常或者connectOnNetworkDisconnect為false,否執行notifyNodeFailure方法
  • notifyNodeFailure方法則會觸發NodesFaultDetection.Listener.onNodeFailure回撥,這裡回撥ZenDiscovery的NodeFailureListener的onNodeFailure方法

ZenDiscovery

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/ZenDiscovery.java

public class ZenDiscovery extends AbstractLifecycleComponent<Discovery> implements Discovery, DiscoveryNodesProvider {
	//......

    @Inject
    public ZenDiscovery(Settings settings, ClusterName clusterName, ThreadPool threadPool,
                        TransportService transportService, ClusterService clusterService, NodeSettingsService nodeSettingsService,
                        DiscoveryNodeService discoveryNodeService, ZenPingService pingService) {
        super(settings);
        this.clusterName = clusterName;
        this.threadPool = threadPool;
        this.clusterService = clusterService;
        this.transportService = transportService;
        this.discoveryNodeService = discoveryNodeService;
        this.pingService = pingService;

        // also support direct discovery.zen settings, for cases when it gets extended
        this.pingTimeout = settings.getAsTime("discovery.zen.ping.timeout", settings.getAsTime("discovery.zen.ping_timeout", componentSettings.getAsTime("ping_timeout", componentSettings.getAsTime("initial_ping_timeout", timeValueSeconds(3)))));
        this.sendLeaveRequest = componentSettings.getAsBoolean("send_leave_request", true);

        this.masterElectionFilterClientNodes = settings.getAsBoolean("discovery.zen.master_election.filter_client", true);
        this.masterElectionFilterDataNodes = settings.getAsBoolean("discovery.zen.master_election.filter_data", false);

        logger.debug("using ping.timeout [{}], master_election.filter_client [{}], master_election.filter_data [{}]", pingTimeout, masterElectionFilterClientNodes, masterElectionFilterDataNodes);

        this.electMaster = new ElectMasterService(settings);
        nodeSettingsService.addListener(new ApplySettings());

        this.masterFD = new MasterFaultDetection(settings, threadPool, transportService, this);
        this.masterFD.addListener(new MasterNodeFailureListener());

        this.nodesFD = new NodesFaultDetection(settings, threadPool, transportService);
        this.nodesFD.addListener(new NodeFailureListener());

        this.publishClusterState = new PublishClusterStateAction(settings, transportService, this, new NewClusterStateListener());
        this.pingService.setNodesProvider(this);
        this.membership = new MembershipAction(settings, transportService, this, new MembershipListener());

        transportService.registerHandler(RejoinClusterRequestHandler.ACTION, new RejoinClusterRequestHandler());
    }

    protected void doStart() throws ElasticSearchException {
        Map<String, String> nodeAttributes = discoveryNodeService.buildAttributes();
        // note, we rely on the fact that its a new id each time we start, see FD and "kill -9" handling
        String nodeId = UUID.randomBase64UUID();
        localNode = new DiscoveryNode(settings.get("name"), nodeId, transportService.boundAddress().publishAddress(), nodeAttributes);
        latestDiscoNodes = new DiscoveryNodes.Builder().put(localNode).localNodeId(localNode.id()).build();
        nodesFD.updateNodes(latestDiscoNodes);
        pingService.start();

        // do the join on a different thread, the DiscoveryService waits for 30s anyhow till it is discovered
        asyncJoinCluster();
    }

    public void publish(ClusterState clusterState) {
        if (!master) {
            throw new ElasticSearchIllegalStateException("Shouldn't publish state when not master");
        }
        latestDiscoNodes = clusterState.nodes();
        nodesFD.updateNodes(clusterState.nodes());
        publishClusterState.publish(clusterState);
    }

    private class NodeFailureListener implements NodesFaultDetection.Listener {

        @Override
        public void onNodeFailure(DiscoveryNode node, String reason) {
            handleNodeFailure(node, reason);
        }
    }

    private void handleNodeFailure(final DiscoveryNode node, String reason) {
        if (lifecycleState() != Lifecycle.State.STARTED) {
            // not started, ignore a node failure
            return;
        }
        if (!master) {
            // nothing to do here...
            return;
        }
        clusterService.submitStateUpdateTask("zen-disco-node_failed(" + node + "), reason " + reason, new ProcessedClusterStateUpdateTask() {
            @Override
            public ClusterState execute(ClusterState currentState) {
                DiscoveryNodes.Builder builder = new DiscoveryNodes.Builder()
                        .putAll(currentState.nodes())
                        .remove(node.id());
                latestDiscoNodes = builder.build();
                currentState = newClusterStateBuilder().state(currentState).nodes(latestDiscoNodes).build();
                // check if we have enough master nodes, if not, we need to move into joining the cluster again
                if (!electMaster.hasEnoughMasterNodes(currentState.nodes())) {
                    return rejoin(currentState, "not enough master nodes");
                }
                // eagerly run reroute to remove dead nodes from routing table
                RoutingAllocation.Result routingResult = allocationService.reroute(newClusterStateBuilder().state(currentState).build());
                return newClusterStateBuilder().state(currentState).routingResult(routingResult).build();
            }

            @Override
            public void clusterStateProcessed(ClusterState clusterState) {
                sendInitialStateEventIfNeeded();
            }
        });
    }

	//......
}
複製程式碼
  • ZenDiscovery的構造器建立了NodesFaultDetection,並給它新增了NodeFailureListener;該listener實現了NodesFaultDetection.Listener介面,其onNodeFailure回撥執行的是handleNodeFailure方法,它會執行ProcessedClusterStateUpdateTask,將該node從currentState.nodes()中移除,然後判斷masterNode數量是否滿足minimumMasterNodes,不夠的話會執行rejoin方法,夠的話則執行allocationService.reroute
  • 其doStart方法會根據配置檔案的node配置建立localNode,然後加入到latestDiscoNodes中,之後執行nodesFD.updateNodes(latestDiscoNodes)方法,然後執行pingService.start()及asyncJoinCluster()
  • 其publish方法則根據clusterState的nodes來更新本地的latestDiscoNodes,然後執行nodesFD.updateNodes(latestDiscoNodes)方法,最後執行publishClusterState.publish(clusterState)

NodesFaultDetection.updateNodes

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/fd/NodesFaultDetection.java

public class NodesFaultDetection extends AbstractComponent {
	//......

    public void updateNodes(DiscoveryNodes nodes) {
        DiscoveryNodes prevNodes = latestNodes;
        this.latestNodes = nodes;
        if (!running) {
            return;
        }
        DiscoveryNodes.Delta delta = nodes.delta(prevNodes);
        for (DiscoveryNode newNode : delta.addedNodes()) {
            if (newNode.id().equals(nodes.localNodeId())) {
                // no need to monitor the local node
                continue;
            }
            if (!nodesFD.containsKey(newNode)) {
                nodesFD.put(newNode, new NodeFD());
                threadPool.schedule(pingInterval, ThreadPool.Names.SAME, new SendPingRequest(newNode));
            }
        }
        for (DiscoveryNode removedNode : delta.removedNodes()) {
            nodesFD.remove(removedNode);
        }
    }

	//......
}
複製程式碼
  • NodesFaultDetection提供了updateNodes方法用於更新自身的latestNodes,該方法呼叫了nodes.delta(prevNodes)來計算DiscoveryNodes.Delta,它的addedNodes方法返回新增的node,而emovedNodes()方法返回刪除的node
  • 對於newNode先判斷是否在nodesFD,如果不在的話,則會新增到nodesFD中,並註冊一個SendPingRequest的延時任務,延時pingInterval執行
  • 對於removedNode則將其從nodesFD中移除;handleTransportDisconnect方法也會將一個disconnect的node從ndoesFD中移除,如果重試一次成功則會再次放入nodesFD中

SendPingRequest

elasticsearch-0.90.0/src/main/java/org/elasticsearch/discovery/zen/fd/NodesFaultDetection.java

    private class SendPingRequest implements Runnable {

        private final DiscoveryNode node;

        private SendPingRequest(DiscoveryNode node) {
            this.node = node;
        }

        @Override
        public void run() {
            if (!running) {
                return;
            }
            transportService.sendRequest(node, PingRequestHandler.ACTION, new PingRequest(node.id()), options().withHighType().withTimeout(pingRetryTimeout),
                    new BaseTransportResponseHandler<PingResponse>() {
                        @Override
                        public PingResponse newInstance() {
                            return new PingResponse();
                        }

                        @Override
                        public void handleResponse(PingResponse response) {
                            if (!running) {
                                return;
                            }
                            NodeFD nodeFD = nodesFD.get(node);
                            if (nodeFD != null) {
                                if (!nodeFD.running) {
                                    return;
                                }
                                nodeFD.retryCount = 0;
                                threadPool.schedule(pingInterval, ThreadPool.Names.SAME, SendPingRequest.this);
                            }
                        }

                        @Override
                        public void handleException(TransportException exp) {
                            // check if the master node did not get switched on us...
                            if (!running) {
                                return;
                            }
                            if (exp instanceof ConnectTransportException) {
                                // ignore this one, we already handle it by registering a connection listener
                                return;
                            }
                            NodeFD nodeFD = nodesFD.get(node);
                            if (nodeFD != null) {
                                if (!nodeFD.running) {
                                    return;
                                }
                                int retryCount = ++nodeFD.retryCount;
                                logger.trace("[node  ] failed to ping [{}], retry [{}] out of [{}]", exp, node, retryCount, pingRetryCount);
                                if (retryCount >= pingRetryCount) {
                                    logger.debug("[node  ] failed to ping [{}], tried [{}] times, each with  maximum [{}] timeout", node, pingRetryCount, pingRetryTimeout);
                                    // not good, failure
                                    if (nodesFD.remove(node) != null) {
                                        notifyNodeFailure(node, "failed to ping, tried [" + pingRetryCount + "] times, each with maximum [" + pingRetryTimeout + "] timeout");
                                    }
                                } else {
                                    // resend the request, not reschedule, rely on send timeout
                                    transportService.sendRequest(node, PingRequestHandler.ACTION, new PingRequest(node.id()),
                                            options().withHighType().withTimeout(pingRetryTimeout), this);
                                }
                            }
                        }

                        @Override
                        public String executor() {
                            return ThreadPool.Names.SAME;
                        }
                    });
        }
    }
複製程式碼
  • SendPingRequest方法會往目標node傳送PingRequest,其超時時間為pingRetryTimeout;其handleResponse方法會判斷該node是否在nodesFD中,如果已經被移除了則忽略,如果改nodeFD的running為false,也忽略,否則重置其retryCount,並重新註冊SendPingRequest的延時任務,延時pingInterval執行
  • 如果請求出現TransportException則判斷是否是ConnectTransportException,如果是則忽略,因為該異常已經由往transportService註冊的FDConnectionListener的onNodeDisconnected來處理
  • 如果是其他異常則增加nodeFD.retryCount,當retryCount大於等於配置的pingRetryCount時,則會將該node從nodesFD中移除,並回撥notifyNodeFailure方法,具體就是回撥了ZenDiscovery的handleNodeFailure方法;如果沒有超過配置的pingRetryCount則會進行重試,重新傳送PingRequest請求

小結

  • NodesFaultDetection給transportService註冊了PingRequestHandler.ACTION的PingRequestHandler,新增了FDConnectionListener;PingRequestHandler用於響應PingRequest請求,返回PingResponse;FDConnectionListener則用於處理ConnectTransportException異常
  • FDConnectionListener的onNodeDisconnected方法會將該node從nodesFD中移除,標記該nodeFD的running為false;如果connectOnNetworkDisconnect為true則會重試一次(對該node進行connect,成功則放入nodesFD,並註冊對該node進行SendPingRequest的延時任務,延時pingInterval執行);如果connect異常或者connectOnNetworkDisconnect為false,否執行notifyNodeFailure方法;notifyNodeFailure方法則會觸發NodesFaultDetection.Listener.onNodeFailure回撥,這裡回撥ZenDiscovery的NodeFailureListener的onNodeFailure方法
  • ZenDiscovery的doStart方法及publish方法都會執行NodesFaultDetection的updateNodes方法來更新latestNodes,對於新的node則註冊延時任務SendPingRequest
  • SendPingRequest執行成功時會重置retryCount並繼續註冊SendPingRequest的延時任務,如果是非TransportException則進行重試,重試次數超過限制則觸發notifyNodeFailure,回撥NodesFaultDetection.Listener.onNodeFailure方法,這裡回撥ZenDiscovery的NodeFailureListener的onNodeFailure方法
  • ZenDiscovery的NodeFailureListener實現了NodesFaultDetection.Listener介面,其onNodeFailure回撥執行的是handleNodeFailure方法,它會執行ProcessedClusterStateUpdateTask,將該node從currentState.nodes()中移除,然後判斷masterNode數量是否滿足minimumMasterNodes,不夠的話會執行rejoin方法,夠的話則執行allocationService.reroute

doc

相關文章