nacos註冊中心單節點ap架構原始碼解析

bei_er發表於2022-12-31

一、註冊流程

單nacos節點流程圖如下:

流程圖可以知,Nacos註冊流程包括客戶端的服務註冊、服務例項列表拉取、定時心跳任務;以及服務端的定時檢查服務例項任務、服務例項更新推送5個功能。

服務註冊:當客戶端啟動的時候會根據當前微服務的配置資訊把微服務註冊到nacos服務端。

服務例項列表拉取:當客戶端啟動的時候從nacos服務端獲取當前服務的名稱已經註冊的例項資料,並把這些例項資料快取在客戶端的serviceInfoMap 物件中。

定時心跳任務:當客戶端向nacos服務註冊臨時例項物件的時候,會建立一個延期的任務去往服務端傳送心跳資訊。如果傳送心跳資訊成功,則又會建立一個延期任務往服務端註冊心跳資訊,一直重複該邏輯。nacos服務端接收到客戶端的心跳資訊就是更新客戶端例項的最後心跳時間。該時間用來判斷例項是否健康和是否需要刪除。

定時檢查服務例項任務:nacos服務端在建立空服務物件的時候會透過執行緒池來定時執行檢查服務,其主要邏輯為判斷當前時間和最後心跳時間之差是否大於健康超時時間和刪除例項超時時間,如果大於,則更新例項的健康狀態和刪除當前例項。定時執行的規則為5秒之後執行檢查,並且每次執行完檢查之後,5秒之後再次執行檢查。

服務例項更新推送:當有客戶端更新例項物件時,服務端會先獲取該客戶端的服務名稱下所有已經註冊的客戶端例項,並會針每一個客戶端傳送一個更新serviceinfo的udp訊息,客戶端監聽收到nacos服務端傳送的udp資料後進行本地快取的更新。

二、客戶端

一、服務註冊

根據spring-cloud-starter-alibaba-nacos-discovery的spring.factories檔案,找到服務註冊啟動配置類。

spring.factories檔案內容為如下,

org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
  com.alibaba.cloud.nacos.discovery.NacosDiscoveryAutoConfiguration,\
  com.alibaba.cloud.nacos.ribbon.RibbonNacosAutoConfiguration,\
  com.alibaba.cloud.nacos.endpoint.NacosDiscoveryEndpointAutoConfiguration,\
  com.alibaba.cloud.nacos.registry.NacosServiceRegistryAutoConfiguration,\
  com.alibaba.cloud.nacos.discovery.NacosDiscoveryClientConfiguration,\
  com.alibaba.cloud.nacos.discovery.reactive.NacosReactiveDiscoveryClientConfiguration,\
  com.alibaba.cloud.nacos.discovery.configclient.NacosConfigServerAutoConfiguration,\
  com.alibaba.cloud.nacos.NacosServiceAutoConfiguration
org.springframework.cloud.bootstrap.BootstrapConfiguration=\
  com.alibaba.cloud.nacos.discovery.configclient.NacosDiscoveryClientConfigServiceBootstrapConfiguration

根據名稱判斷可以得出 NacosServiceRegistryAutoConfiguration 為服務註冊啟動配置類,原始碼如下

@Configuration(proxyBeanMethods = false)
@EnableConfigurationProperties
@ConditionalOnNacosDiscoveryEnabled
@ConditionalOnProperty(value = "spring.cloud.service-registry.auto-registration.enabled",
		matchIfMissing = true)
@AutoConfigureAfter({ AutoServiceRegistrationConfiguration.class,
		AutoServiceRegistrationAutoConfiguration.class,
		NacosDiscoveryAutoConfiguration.class })
public class NacosServiceRegistryAutoConfiguration {

	@Bean
	public NacosServiceRegistry nacosServiceRegistry(
			NacosDiscoveryProperties nacosDiscoveryProperties) {
		return new NacosServiceRegistry(nacosDiscoveryProperties);
	}

	@Bean
	@ConditionalOnBean(AutoServiceRegistrationProperties.class)
	public NacosRegistration nacosRegistration(
			ObjectProvider<List<NacosRegistrationCustomizer>> registrationCustomizers,
			NacosDiscoveryProperties nacosDiscoveryProperties,
			ApplicationContext context) {
		return new NacosRegistration(registrationCustomizers.getIfAvailable(),
				nacosDiscoveryProperties, context);
	}

	@Bean
	@ConditionalOnBean(AutoServiceRegistrationProperties.class)
	public NacosAutoServiceRegistration nacosAutoServiceRegistration(
			NacosServiceRegistry registry,
			AutoServiceRegistrationProperties autoServiceRegistrationProperties,
			NacosRegistration registration) {
		return new NacosAutoServiceRegistration(registry,
				autoServiceRegistrationProperties, registration);
	}

關鍵類 NacosAutoServiceRegistration 的類圖結構如下

上圖可知,NacosAutoServiceRegistration 實現了 ApplicationListener介面,該監聽器會在SpringBoot啟動的時候會自動呼叫 onApplicationEvent方法,onApplicationEvent具體實現方法如下

public void onApplicationEvent(WebServerInitializedEvent event) {
    this.bind(event);
}

@Deprecated
public void bind(WebServerInitializedEvent event) {
    ApplicationContext context = event.getApplicationContext();
    if (!(context instanceof ConfigurableWebServerApplicationContext) || !"management".equals(((ConfigurableWebServerApplicationContext)context).getServerNamespace())) {
        this.port.compareAndSet(0, event.getWebServer().getPort());
        // 具體的啟動方法
        this.start();
    }
}

具體的啟動方法this.start();方法的程式碼如下,

public void start() {
    if (!this.isEnabled()) {
        if (logger.isDebugEnabled()) {
            logger.debug("Discovery Lifecycle disabled. Not starting");
        }

    } else {
        if (!this.running.get()) {
            this.context.publishEvent(new InstancePreRegisteredEvent(this, this.getRegistration()));
            // 關鍵邏輯
            this.register();
            if (this.shouldRegisterManagement()) {
                this.registerManagement();
            }

            this.context.publishEvent(new InstanceRegisteredEvent(this, this.getConfiguration()));
            this.running.compareAndSet(false, true);
        }

    }

關鍵邏輯為this.register();方法程式碼如下

protected void register() {
    if (!this.registration.getNacosDiscoveryProperties().isRegisterEnabled()) {
        log.debug("Registration disabled.");
        return;
    }
    if (this.registration.getPort() < 0) {
        this.registration.setPort(getPort().get());
    }
    super.register();
}

關鍵邏輯為super.register();方法程式碼如下,

protected void register() {
    this.serviceRegistry.register(this.getRegistration());
}

關鍵邏輯為this.serviceRegistry.register方法程式碼如下,

@Override
public void register(Registration registration) {

    if (StringUtils.isEmpty(registration.getServiceId())) {
        log.warn("No service to register for nacos client...");
        return;
    }
	// 根據配置屬性構建NamingService物件
    NamingService namingService = namingService();
    // 獲取服務名,預設為 ${spring.application.name}
    String serviceId = registration.getServiceId();
    // 獲取組名 ,預設為 DEFAULT_GROUP
    String group = nacosDiscoveryProperties.getGroup();

    // 建立註冊例項
    Instance instance = getNacosInstanceFromRegistration(registration);

    try {
        // 發起註冊
        namingService.registerInstance(serviceId, group, instance);
        log.info("nacos registry, {} {} {}:{} register finished", group, serviceId,
                 instance.getIp(), instance.getPort());
    }
    catch (Exception e) {
        log.error("nacos registry, {} register failed...{},", serviceId,
                  registration.toString(), e);
        // rethrow a RuntimeException if the registration is failed.
        // issue : https://github.com/alibaba/spring-cloud-alibaba/issues/1132
        rethrowRuntimeException(e);
    }
}

先透過getNacosInstanceFromRegistration方法建立例項物件,getNacosInstanceFromRegistration程式碼如下,

private Instance getNacosInstanceFromRegistration(Registration registration) {
    Instance instance = new Instance();
    // 獲取服務ip
    instance.setIp(registration.getHost());
    // 獲取服務
    instance.setPort(registration.getPort());
    // 獲取權重
    instance.setWeight(nacosDiscoveryProperties.getWeight());
    // 獲取叢集名稱
    instance.setClusterName(nacosDiscoveryProperties.getClusterName());
  
    instance.setEnabled(nacosDiscoveryProperties.isInstanceEnabled());
    // 獲取後設資料
    instance.setMetadata(registration.getMetadata());
    // 獲取是否為臨時例項
    instance.setEphemeral(nacosDiscoveryProperties.isEphemeral());
    return instance;
}

然後透過namingService.registerInstance方法發起註冊,registerInstance方法的程式碼如下,

public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
    // 檢查 例項是否合法 
    // heart beat timeout must(預設15秒) < heart beat interval (預設5秒)拋異常
    // ip delete timeout must(預設30 秒) < heart beat interval(預設5秒)拋異常
    NamingUtils.checkInstanceIsLegal(instance);
    // 構建 groupName@@serviceName
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    // 如果是臨時例項,則建立心跳資訊,定時給nacos服務傳送
    if (instance.isEphemeral()) {
        BeatInfo beatInfo = this.beatReactor.buildBeatInfo(groupedServiceName, instance);
        this.beatReactor.addBeatInfo(groupedServiceName, beatInfo);
    }
	// 向 nacos-service 註冊例項
    this.serverProxy.registerService(groupedServiceName, groupName, instance);
}

先檢查例項是否合法,然後構建服務名稱,規則為groupName@@serviceName。透過this.serverProxy.registerService方法向 nacos-service 註冊例項,程式碼如下,

public void registerService(String serviceName, String groupName, Instance instance) throws NacosException {

    NAMING_LOGGER.info("[REGISTER-SERVICE] {} registering service {} with instance: {}", namespaceId, serviceName,instance);

    final Map<String, String> params = new HashMap<String, String>(16);
    //設定 namespaceId
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    //設定 serviceName
    params.put(CommonParams.SERVICE_NAME, serviceName);
    //設定 groupName
    params.put(CommonParams.GROUP_NAME, groupName);
    //設定 clusterName
    params.put(CommonParams.CLUSTER_NAME, instance.getClusterName());
    params.put("ip", instance.getIp());
    params.put("port", String.valueOf(instance.getPort()));
    params.put("weight", String.valueOf(instance.getWeight()));
    params.put("enable", String.valueOf(instance.isEnabled()));
    params.put("healthy", String.valueOf(instance.isHealthy()));
    params.put("ephemeral", String.valueOf(instance.isEphemeral()));
    params.put("metadata", JacksonUtils.toJson(instance.getMetadata()));
	// 呼叫 nacos-service 的nacosUrlInstance介面註冊例項
    reqApi(UtilAndComs.nacosUrlInstance, params, HttpMethod.POST);

}

透過向reqApi方法向nacos服務端註冊當前例項資料,其實就是向 ${spring.cloud.nacos.discovery.server-addr}/nacos/v1/ns/instance 傳送POST請求。該請求地址對應的nacos服務端的原始碼的naming工程中InstanceController的register方法,程式碼如下,

public String register(HttpServletRequest request) throws Exception {
    final String namespaceId = WebUtils
        .optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    final String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);
	//根據請求構建 Instance 物件
    final Instance instance = parseInstance(request);
	//註冊 Instance 物件,serviceManager物件中儲存了所有的服務物件。
    serviceManager.registerInstance(namespaceId, serviceName, instance);
    return "ok";
}

先根據請求物件構建Instance物件,然後透過serviceManager.registerInstance方法用來註冊Instance物件,registerInstance程式碼如下

public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
	// 如果 namespaceId 為 key 的資料為空,則建立 service ,並初始化service
    createEmptyService(namespaceId, serviceName, instance.isEphemeral());
	// 獲取 service 物件
    Service service = getService(namespaceId, serviceName);
	// 如果 service為空 則報錯
    if (service == null) {
        throw new NacosException(NacosException.INVALID_PARAM,
                                 "service not found, namespace: " + namespaceId + ", service: " + serviceName);
    }
	// 新增例項
    addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
}

如果 namespaceId為key的資料為空,則建立 service,並初始化service。然後呼叫addInstance新增例項物件,addInstance方法程式碼如下,

public void addInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
    throws NacosException {
	  // 根據 名稱空間 和 服務名稱 構建 key
        String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
        // 獲取 service
        Service service = getService(namespaceId, serviceName);
        // 同步鎖
        synchronized (service) {
            // 獲取服務下的例項集合(服務已有 + 新增的例項)
            List<Instance> instanceList = addIpAddresses(service, ephemeral, ips);
            Instances instances = new Instances();
            instances.setInstanceList(instanceList);
            // 根據KEY新增服務的例項
            consistencyService.put(key, instances);
        }
}

addIpAddresses方法中會呼叫updateIpAddresses方法,且action為 add。該方法根據action的值來獲取該服務下的最新例項集合(新增例項或刪除例項加上目前服務已有的例項資料合集)。如果action為add表示新增,則方法最後返回的集合物件中會把該服務中已有的例項集合加上新增的例項集合資料一起返回 ;如果action為 remove表示刪除,則方法最後返回的集合物件中會把該服務中已有的例項集合刪除掉需要刪除的例項集合資料。後面透過呼叫consistencyService.put(key, instances)方法來把updateIpAddresses方法返回的值直接新增consistencyService的例項中。updateIpAddresses方法的程式碼如下,

public List<Instance> updateIpAddresses(Service service, String action, boolean ephemeral, Instance... ips)
    throws NacosException {
    // 從本地快取中獲取服務的例項資料
    Datum datum = consistencyService
        .get(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), ephemeral));
    // 獲取 當前服務下所有的 例項
    List<Instance> currentIPs = service.allIPs(ephemeral);
    // 建立當前例項資料map
    Map<String, Instance> currentInstances = new HashMap<>(currentIPs.size());
    // 建立 當前例項Id set
    Set<String> currentInstanceIds = Sets.newHashSet();

    // 遍歷當前服務的所有例項,新增到 建立當前例項資料 map 和 當前例項Id集合
    for (Instance instance : currentIPs) {
        currentInstances.put(instance.toIpAddr(), instance);
        currentInstanceIds.add(instance.getInstanceId());
    }
    // 構造 例項集合物件的 map
    Map<String, Instance> instanceMap;
    // 如果有快取資料
    if (datum != null && null != datum.value) {
        // 從本地快取中以及當前服務的記憶體資料獲取最新服務的例項資料
        instanceMap = setValid(((Instances) datum.value).getInstanceList(), currentInstances);
    }
    // 如果沒有快取資料
    else {
        // 建立 instanceMap
        instanceMap = new HashMap<>(ips.length);
    }
    // 遍歷引數傳過來的例項物件
    for (Instance instance : ips) {
        // 如果 service 不包括 例項的 ClusterName 則建立 例項 Cluster,並初始化
        if (!service.getClusterMap().containsKey(instance.getClusterName())) {
            Cluster cluster = new Cluster(instance.getClusterName(), service);
            cluster.init();
            service.getClusterMap().put(instance.getClusterName(), cluster);
            Loggers.SRV_LOG
                .warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
                      instance.getClusterName(), instance.toJson());
        }
        // 如果是刪除,則從 instanceMap 中 刪除 該例項
        if (UtilsAndCommons.UPDATE_INSTANCE_ACTION_REMOVE.equals(action)) {
            instanceMap.remove(instance.getDatumKey());
        }
        // 如果是新增
        else {
            //獲取已存在的 例項
            Instance oldInstance = instanceMap.get(instance.getDatumKey());
            if (oldInstance != null) {
                instance.setInstanceId(oldInstance.getInstanceId());
            } else {
                // 生成 例項 id
                instance.setInstanceId(instance.generateInstanceId(currentInstanceIds));
            }
            // instanceMap 新增instance例項
            instanceMap.put(instance.getDatumKey(), instance);
        }

    }
    // 如果集合小於0 ,並且是新增操作則拋異常
    if (instanceMap.size() <= 0 && UtilsAndCommons.UPDATE_INSTANCE_ACTION_ADD.equals(action)) {
        throw new IllegalArgumentException(
            "ip list can not be empty, service: " + service.getName() + ", ip list: " + JacksonUtils
            .toJson(instanceMap.values()));
    }
    // 返回 服務中最新的例項資料
    return new CopyOnWriteArrayList<>(instanceMap.values());
}

透過updateIpAddresses方法拿到需要更新的例項集合物件後,再透過consistencyService.put(key, instances)把拿到的例項集合物件新增到實現了PersistentConsistencyServiceDelegateImpl或者EphemeralConsistencyService介面的例項物件中,consistencyService.put(key, instances)的原始碼如下,

@Override
public void put(String key, Record value) throws NacosException {
    // 根據key獲取具體的 consistencyService ,並且向其中新增具體的 key 和 value
    mapConsistencyService(key).put(key, value);
}

根據key獲取具體的 consistencyService ,並且向其中新增具體的 key 和 value。consistencyService中根據key獲取叢集的例項物件(臨時服務物件EphemeralConsistencyService和持久服務物件PersistentConsistencyServiceDelegateImpl)

private ConsistencyService mapConsistencyService(String key) {
    // 根據key返回具體的服務物件
    return KeyBuilder.matchEphemeralKey(key) ? ephemeralConsistencyService : persistentConsistencyService;
}

如果是註冊的臨時例項節點,這裡取到的是實現了ephemeralConsistencyService介面的DistroConsistencyServiceImpl 物件,它的put原始碼如下:

@Override
public void put(String key, Record value) throws NacosException {
    // 新增key 和 value
    onPut(key, value);
    distroProtocol.sync(new DistroKey(key, KeyBuilder.INSTANCE_LIST_KEY_PREFIX), DataOperation.CHANGE,
                        globalConfig.getTaskDispatchPeriod() / 2);
}

透過onPut方法新增key 和 value,opPut方法的程式碼如下,

public void onPut(String key, Record value) {
    // 如果是臨時節點例項,則建立 Datum 並儲存在 dataStore 中
    if (KeyBuilder.matchEphemeralInstanceListKey(key)) {
        Datum<Instances> datum = new Datum<>();
        datum.value = (Instances) value;
        datum.key = key;
        datum.timestamp.incrementAndGet();
        dataStore.put(key, datum);
    }
    // 如果 監聽物件不包括 key 則返回
    if (!listeners.containsKey(key)) {
        return;
    }
    // 向notifier物件新增通知任務
    notifier.addTask(key, DataOperation.CHANGE);
}

如果是臨時例項節點,則建立 Datum 並儲存在 dataStore 中,然後透過notifier.addTask用來向notifier物件新增通知任務,且操作型別為DataOperation.CHANGE,addTask方法的程式碼如下:

public void addTask(String datumKey, DataOperation action) {
    // 如果services包括了當前的 datumKey ,並且是修改操作 則直接返回
    if (services.containsKey(datumKey) && action == DataOperation.CHANGE) {
        return;
    }
    // 如果是修改操作,則向 services 新增 datumKey
    if (action == DataOperation.CHANGE) {
        services.put(datumKey, StringUtils.EMPTY);
    }
    // 向 tasks中新增 Pair 物件
    tasks.offer(Pair.with(datumKey, action));
}

以上程式碼的tasks是用來存放具體例項key和動作型別的物件,它是一個ArrayBlockingQueue物件,DistroConsistencyServiceImpl 物件的init方法程式碼如下,

@PostConstruct
public void init() {
    GlobalExecutor.submitDistroNotifyTask(notifier);
}

根據以上程式碼可知,在DistroConsistencyServiceImpl 例項物件初始化之後會往GlobalExecutor執行緒池物件中新增了一個notifier物件。notifier物件為一個實現了Runnable 的例項。上面的程式碼會執行notifier物件的run方法,notifier的run方法程式碼如下:

public void run() {
    Loggers.DISTRO.info("distro notifier started");
    // 死迴圈遍歷
    for (; ; ) {
        try {
            // 獲取 tasks的資料,如果沒有資料會阻塞當前執行緒,直到tasks有資料為止。
            Pair<String, DataOperation> pair = tasks.take();
            // 處理資料
            handle(pair);
        } catch (Throwable e) {
            Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
        }
    }
}

上面是一個死迴圈,tasks.take()是一個阻塞式獲取資料的方法,如果tasks沒有資料則會阻塞當前執行緒直到tasks.take()拿到資料,拿到資料之後會呼叫handle方法處理,handle程式碼如下,

private void handle(Pair<String, DataOperation> pair) {
    try {
        String datumKey = pair.getValue0();
        DataOperation action = pair.getValue1();
        // 先從 services 中刪除 key
        services.remove(datumKey);

        int count = 0;
        // 根據 key 獲取 服務物件資料
        ConcurrentLinkedQueue<RecordListener> recordListeners = listeners.get(datumKey);
        if (recordListeners == null) {
            Loggers.DISTRO.info("[DISTRO-WARN] RecordListener not found, key: {}", datumKey);
            return;
        }
        for (RecordListener listener : recordListeners) {
            count++;
            try {
                // 如果是新增
                if (action == DataOperation.CHANGE) {
                    Datum datum = dataStore.get(datumKey);
                    if (datum != null) {
                        // 更新 serivce 的例項資料
                        listener.onChange(datumKey, datum.value);
                    } else {
                        Loggers.DISTRO.info("[DISTRO-WARN] data not found, key: {}", datumKey);
                    }
                    continue;
                }
                // 如果是刪除
                if (action == DataOperation.DELETE) {
                    listener.onDelete(datumKey);
                    continue;
                }
            } catch (Throwable e) {
                Loggers.DISTRO.error("[NACOS-DISTRO] error while notifying listener of key: {}", datumKey, e);
            }
        }

        if (Loggers.DISTRO.isDebugEnabled()) {
            Loggers.DISTRO
                .debug("[NACOS-DISTRO] datum change notified, key: {}, listener count: {}, action: {}",
                       datumKey, count, action.name());
        }
    } catch (Throwable e) {
        Loggers.DISTRO.error("[NACOS-DISTRO] Error while handling notifying task", e);
    }
}

根據action 為 DataOperation.CHANGE,程式碼中執行的程式碼分支為listener.onChange(datumKey, datum.value),該方法的邏輯為修改服務的例項資料,原始碼如下

public void onChange(String key, Instances value) throws Exception {

    Loggers.SRV_LOG.info("[NACOS-RAFT] datum is changed, key: {}, value: {}", key, value);

    for (Instance instance : value.getInstanceList()) {

        if (instance == null) {
            // Reject this abnormal instance list:
            throw new RuntimeException("got null instance " + key);
        }

        if (instance.getWeight() > 10000.0D) {
            instance.setWeight(10000.0D);
        }

        if (instance.getWeight() < 0.01D && instance.getWeight() > 0.0D) {
            instance.setWeight(0.01D);
        }
    }
    // 更新 service 的 例項集合
    updateIPs(value.getInstanceList(), KeyBuilder.matchEphemeralInstanceListKey(key));

    recalculateChecksum();
}

以上程式碼先遍歷所有的例項資料設定權值,再透過updateIPs方法更新服務例項,updateIPs方法的程式碼如下:

public void updateIPs(Collection<Instance> instances, boolean ephemeral) {
    // 根據 clusterMap 建立 ipMap物件
    Map<String, List<Instance>> ipMap = new HashMap<>(clusterMap.size());
    // 根據 clusterMap 初始化 ipMap物件
    for (String clusterName : clusterMap.keySet()) {
        ipMap.put(clusterName, new ArrayList<>());
    }
	// 遍歷最新的例項集合資料
    for (Instance instance : instances) {
        try {
            if (instance == null) {
                Loggers.SRV_LOG.error("[NACOS-DOM] received malformed ip: null");
                continue;
            }
            // 如果叢集名稱為null ,則設定預設的叢集名稱 DEFAULT
            if (StringUtils.isEmpty(instance.getClusterName())) {
                instance.setClusterName(UtilsAndCommons.DEFAULT_CLUSTER_NAME);
            }
            // 如果當前 service 的clusterMap不包括 例項的 叢集名稱,則需要建立新的叢集物件
            if (!clusterMap.containsKey(instance.getClusterName())) {
                Loggers.SRV_LOG
                    .warn("cluster: {} not found, ip: {}, will create new cluster with default configuration.",
                          instance.getClusterName(), instance.toJson());
                Cluster cluster = new Cluster(instance.getClusterName(), this);
                cluster.init();
                getClusterMap().put(instance.getClusterName(), cluster);
            }

            // 如果當前 ipMap 不包括 當前例項的 叢集名稱,則需要建立新的叢集物件
            List<Instance> clusterIPs = ipMap.get(instance.getClusterName());
            if (clusterIPs == null) {
                clusterIPs = new LinkedList<>();
                ipMap.put(instance.getClusterName(), clusterIPs);
            }
			// 給當前的 叢集物件賦值 例項資料。
            clusterIPs.add(instance);
        } catch (Exception e) {
            Loggers.SRV_LOG.error("[NACOS-DOM] failed to process ip: " + instance, e);
        }
    }
	// 遍歷 ipMap物件,給 clusterMap 替換最新的 entryIPs
    for (Map.Entry<String, List<Instance>> entry : ipMap.entrySet()) {
        //make every ip mine
        List<Instance> entryIPs = entry.getValue();
        // 給 clusterMap 替換最新的 entryIPs
        clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral);
    }

    setLastModifiedMillis(System.currentTimeMillis());
    // 釋出
    getPushService().serviceChanged(this);
    StringBuilder stringBuilder = new StringBuilder();

    for (Instance instance : allIPs()) {
        stringBuilder.append(instance.toIpAddr()).append("_").append(instance.isHealthy()).append(",");
    }

    Loggers.EVT_LOG.info("[IP-UPDATED] namespace: {}, service: {}, ips: {}", getNamespaceId(), getName(),
                         stringBuilder.toString());

}

以上程式碼先根據當前服務下的叢集資訊構造構造ipMap物件,然後遍歷最新的例項集合資料更新ipMap物件,最後迴圈呼叫clusterMap.get(entry.getKey()).updateIps(entryIPs, ephemeral)方法來更新當前叢集中的例項列表資料。updateIps方法程式碼如下:

public void updateIps(List<Instance> ips, boolean ephemeral) {
    // 獲取 本叢集中的 例項集合
    Set<Instance> toUpdateInstances = ephemeral ? ephemeralInstances : persistentInstances;
    // 根據old的例項資料 構建 hashmap
    HashMap<String, Instance> oldIpMap = new HashMap<>(toUpdateInstances.size());
    // 根據例項的 key 新增到 oldIpMap中
    for (Instance ip : toUpdateInstances) {
        oldIpMap.put(ip.getDatumKey(), ip);
    }
    // 獲取更新的 例項資料 List
    List<Instance> updatedIPs = updatedIps(ips, oldIpMap.values());
    if (updatedIPs.size() > 0) {
        for (Instance ip : updatedIPs) {
            Instance oldIP = oldIpMap.get(ip.getDatumKey());

            // do not update the ip validation status of updated ips
            // because the checker has the most precise result
            // Only when ip is not marked, don't we update the health status of IP:
            if (!ip.isMarked()) {
                ip.setHealthy(oldIP.isHealthy());
            }
            if (ip.isHealthy() != oldIP.isHealthy()) {
                // ip validation status updated
                Loggers.EVT_LOG.info("{} {SYNC} IP-{} {}:{}@{}", getService().getName(),
                                     (ip.isHealthy() ? "ENABLED" : "DISABLED"), ip.getIp(), ip.getPort(), getName());
            }

            if (ip.getWeight() != oldIP.getWeight()) {
                // ip validation status updated
                Loggers.EVT_LOG.info("{} {SYNC} {IP-UPDATED} {}->{}", getService().getName(), oldIP.toString(),
                                     ip.toString());
            }
        }
    }
    // 獲取新增的 例項資料
    List<Instance> newIPs = subtract(ips, oldIpMap.values());
    if (newIPs.size() > 0) {
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-NEW} cluster: {}, new ips size: {}, content: {}", getService().getName(),
                  getName(), newIPs.size(), newIPs.toString());

        for (Instance ip : newIPs) {
            HealthCheckStatus.reset(ip);
        }
    }
    // 獲取刪除的 例項資料
    List<Instance> deadIPs = subtract(oldIpMap.values(), ips);

    if (deadIPs.size() > 0) {
        Loggers.EVT_LOG
            .info("{} {SYNC} {IP-DEAD} cluster: {}, dead ips size: {}, content: {}", getService().getName(),
                  getName(), deadIPs.size(), deadIPs.toString());

        for (Instance ip : deadIPs) {
            HealthCheckStatus.remv(ip);
        }
    }
    // 根據傳進來的 例項集合 建立需要更新的例項set 
    toUpdateInstances = new HashSet<>(ips);

    // 直接替換
    if (ephemeral) {
        ephemeralInstances = toUpdateInstances;
    } else {
        persistentInstances = toUpdateInstances;
    }
}

以上程式碼就是更新cluster物件下的例項資料邏輯,根據程式碼可知在cluster物件中更新例項資料就是拿傳進來的例項列表建立set集合直接替換的。

二、服務例項列表拉取

客戶端程式啟動之後,會執行com.alibaba.cloud.nacos.discovery.NacosWatch類的start()方法,此方法中會執行以下語句,

namingService.subscribe(properties.getService(), properties.getGroup(),
                        Arrays.asList(properties.getClusterName()), eventListener);

此方法用來獲取當前服務的例項資料,subscribe方法程式碼如下,

public void subscribe(String serviceName, String groupName, List<String> clusters, EventListener listener)
    throws NacosException {
    // 獲取服務列表資料
    hostReactor.subscribe(NamingUtils.getGroupedName(serviceName, groupName), StringUtils.join(clusters, ","),
                          listener);
}

透過hostReactor.subscribe方法獲取服務列表資料,subscribe方法的程式碼如下,

public void subscribe(String serviceName, String clusters, EventListener eventListener) {
    notifier.registerListener(serviceName, clusters, eventListener);
    // 獲取服務列表資料
    getServiceInfo(serviceName, clusters);
}

透過getServiceInfo方法獲取服務列表資料,getServiceInfo的程式碼如下:

NAMING_LOGGER.debug("failover-mode: " + failoverReactor.isFailoverSwitch());
String key = ServiceInfo.getKey(serviceName, clusters);
if (failoverReactor.isFailoverSwitch()) {
    return failoverReactor.getService(key);
}
// 根據服務名稱和叢集名稱獲取本地的服務列表資料
ServiceInfo serviceObj = getServiceInfo0(serviceName, clusters);
if (null == serviceObj) {
    serviceObj = new ServiceInfo(serviceName, clusters);
    serviceInfoMap.put(serviceObj.getKey(), serviceObj);
    updatingMap.put(serviceName, new Object());
    // 如果本地服務例項資料為null,則去獲取最新的服務例項列表
    updateServiceNow(serviceName, clusters);
    updatingMap.remove(serviceName);

} else if (updatingMap.containsKey(serviceName)) {
    if (UPDATE_HOLD_INTERVAL > 0) {
        // hold a moment waiting for update finish
        synchronized (serviceObj) {
            try {
                serviceObj.wait(UPDATE_HOLD_INTERVAL);
            } catch (InterruptedException e) {
                NAMING_LOGGER
                    .error("[getServiceInfo] serviceName:" + serviceName + ", clusters:" + clusters, e);
            }
        }
    }
}
scheduleUpdateIfAbsent(serviceName, clusters);
return serviceInfoMap.get(serviceObj.getKey());

以上程式碼可知,會根據服務名稱和clusters名稱獲取本地快取serviceInfoMap物件中的服務列表資料。如果本地服務例項資料為null,則透過updateServiceNow方法去nacos服務端獲取最新的服務例項列表。updateServiceNow方法程式碼如下:

try {
    // 更新本地服務方法
    updateService(serviceName, clusters);
} catch (NacosException e) {
    NAMING_LOGGER.error("[NA] failed to update serviceName: " + serviceName, e);
}

updateService的程式碼如下:

public void updateService(String serviceName, String clusters) throws NacosException {
    ServiceInfo oldService = getServiceInfo0(serviceName, clusters);
    try {
		// 呼叫服務代理類獲取服務例項列表,pushReceiver.getUdpPort()會隨機生成一個udp埠
        String result = serverProxy.queryList(serviceName, clusters, pushReceiver.getUdpPort(), false);
        if (StringUtils.isNotEmpty(result)) {
            // 如果 result不為空,則向本地快取 serviceInfoMap 新增服務例項列表
            processServiceJson(result);
        }
    } finally {
        if (oldService != null) {
            synchronized (oldService) {
                oldService.notifyAll();
            }
        }
    }
}

透過呼叫服務代理類serverProxy的queryList方法獲取服務例項列表,pushReceiver.getUdpPort()會獲pushReceiver的udp埠,pushReceiver物件是一個udp資料接收類,用來接收nacos伺服器傳送的udp資料,比如服務例項更新的訊息。serverProxy.query方法的程式碼如下:

public String queryList(String serviceName, String clusters, int udpPort, boolean healthyOnly)
    throws NacosException {
	// 構造請求引數
    final Map<String, String> params = new HashMap<String, String>(8);
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    params.put(CommonParams.SERVICE_NAME, serviceName);
    params.put("clusters", clusters);
    // 客戶端的upd埠,服務端回撥客戶端udp埠會用到
    params.put("udpPort", String.valueOf(udpPort));
    params.put("clientIP", NetUtils.localIP());
    params.put("healthyOnly", String.valueOf(healthyOnly));
	// 向nacos伺服器獲取服務列表資料,並返回
    return reqApi(UtilAndComs.nacosUrlBase + "/instance/list", params, HttpMethod.GET);
}

在構造的請求引數中包括了客戶端的udpPort,該引數在服務端回撥介面會用到。reqApi方法其實就向nacos伺服器的/nacos/v1/ns/instance/list介面傳送了請求訊息,該介面對應的nacos服務端的原始碼的naming工程中InstanceController的list方法,程式碼如下,

@GetMapping("/list")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.READ)
public ObjectNode list(HttpServletRequest request) throws Exception {

    String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);

    String agent = WebUtils.getUserAgent(request);
    String clusters = WebUtils.optional(request, "clusters", StringUtils.EMPTY);
    String clientIP = WebUtils.optional(request, "clientIP", StringUtils.EMPTY);
    int udpPort = Integer.parseInt(WebUtils.optional(request, "udpPort", "0"));
    String env = WebUtils.optional(request, "env", StringUtils.EMPTY);
    boolean isCheck = Boolean.parseBoolean(WebUtils.optional(request, "isCheck", "false"));

    String app = WebUtils.optional(request, "app", StringUtils.EMPTY);

    String tenant = WebUtils.optional(request, "tid", StringUtils.EMPTY);

    boolean healthyOnly = Boolean.parseBoolean(WebUtils.optional(request, "healthyOnly", "false"));
    // 獲取例項列表資料
    return doSrvIpxt(namespaceId, serviceName, agent, clusters, clientIP, udpPort, env, isCheck, app, tenant,
                     healthyOnly);
}

以上程式碼先構造相關引數資訊,然後透過doSrvIpxt方法來獲取例項列表資料,doSrvIpxt程式碼如下:

public ObjectNode doSrvIpxt(String namespaceId, String serviceName, String agent, String clusters, String clientIP,
                            int udpPort, String env, boolean isCheck, String app, String tid, boolean healthyOnly) throws Exception {

    ClientInfo clientInfo = new ClientInfo(agent);
    ObjectNode result = JacksonUtils.createEmptyJsonNode();
    // 根據名稱空間id和服務名稱獲取服務
    Service service = serviceManager.getService(namespaceId, serviceName);
    long cacheMillis = switchDomain.getDefaultCacheMillis();

    // now try to enable the push
    try {
        // 如果埠大於0 ,並且是支援的客戶端
        if (udpPort > 0 && pushService.canEnablePush(agent)) {
            // 新增 PushClient 物件
            pushService
                .addClient(namespaceId, serviceName, clusters, agent, new InetSocketAddress(clientIP, udpPort),
                           pushDataSource, tid, app);
            cacheMillis = switchDomain.getPushCacheMillis(serviceName);
        }
    } catch (Exception e) {
        Loggers.SRV_LOG
            .error("[NACOS-API] failed to added push client {}, {}:{}", clientInfo, clientIP, udpPort, e);
        cacheMillis = switchDomain.getDefaultCacheMillis();
    }
    // 如果服務物件為 null ,則構造資料返回
    if (service == null) {
        if (Loggers.SRV_LOG.isDebugEnabled()) {
            Loggers.SRV_LOG.debug("no instance to serve for service: {}", serviceName);
        }
        result.put("name", serviceName);
        result.put("clusters", clusters);
        result.put("cacheMillis", cacheMillis);
        result.replace("hosts", JacksonUtils.createEmptyArrayNode());
        return result;
    }
    // 檢查服務是否可用
    checkIfDisabled(service);

    List<Instance> srvedIPs;
    // 根據叢集列表獲取具體服務下面的例項列表
    srvedIPs = service.srvIPs(Arrays.asList(StringUtils.split(clusters, ",")));

    // filter ips using selector:
    if (service.getSelector() != null && StringUtils.isNotBlank(clientIP)) {
        srvedIPs = service.getSelector().select(clientIP, srvedIPs);
    }
    // 如果例項資料為空,則構造資料返回
    if (CollectionUtils.isEmpty(srvedIPs)) {

        if (Loggers.SRV_LOG.isDebugEnabled()) {
            Loggers.SRV_LOG.debug("no instance to serve for service: {}", serviceName);
        }

        if (clientInfo.type == ClientInfo.ClientType.JAVA
            && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) {
            result.put("dom", serviceName);
        } else {
            result.put("dom", NamingUtils.getServiceName(serviceName));
        }

        result.put("name", serviceName);
        result.put("cacheMillis", cacheMillis);
        result.put("lastRefTime", System.currentTimeMillis());
        result.put("checksum", service.getChecksum());
        result.put("useSpecifiedURL", false);
        result.put("clusters", clusters);
        result.put("env", env);
        result.set("hosts", JacksonUtils.createEmptyArrayNode());
        result.set("metadata", JacksonUtils.transferToJsonNode(service.getMetadata()));
        return result;
    }

    Map<Boolean, List<Instance>> ipMap = new HashMap<>(2);
    ipMap.put(Boolean.TRUE, new ArrayList<>());
    ipMap.put(Boolean.FALSE, new ArrayList<>());
    // 構造健康和不健康的例項資料
    for (Instance ip : srvedIPs) {
        ipMap.get(ip.isHealthy()).add(ip);
    }

    if (isCheck) {
        result.put("reachProtectThreshold", false);
    }

    double threshold = service.getProtectThreshold();

    if ((float) ipMap.get(Boolean.TRUE).size() / srvedIPs.size() <= threshold) {

        Loggers.SRV_LOG.warn("protect threshold reached, return all ips, service: {}", serviceName);
        if (isCheck) {
            result.put("reachProtectThreshold", true);
        }

        ipMap.get(Boolean.TRUE).addAll(ipMap.get(Boolean.FALSE));
        ipMap.get(Boolean.FALSE).clear();
    }

    if (isCheck) {
        result.put("protectThreshold", service.getProtectThreshold());
        result.put("reachLocalSiteCallThreshold", false);

        return JacksonUtils.createEmptyJsonNode();
    }

    ArrayNode hosts = JacksonUtils.createEmptyArrayNode();
    // 構造返回的例項列表物件
    for (Map.Entry<Boolean, List<Instance>> entry : ipMap.entrySet()) {
        List<Instance> ips = entry.getValue();

        if (healthyOnly && !entry.getKey()) {
            continue;
        }

        for (Instance instance : ips) {

            // remove disabled instance:
            if (!instance.isEnabled()) {
                continue;
            }

            ObjectNode ipObj = JacksonUtils.createEmptyJsonNode();

            ipObj.put("ip", instance.getIp());
            ipObj.put("port", instance.getPort());
            // deprecated since nacos 1.0.0:
            ipObj.put("valid", entry.getKey());
            ipObj.put("healthy", entry.getKey());
            ipObj.put("marked", instance.isMarked());
            ipObj.put("instanceId", instance.getInstanceId());
            ipObj.set("metadata", JacksonUtils.transferToJsonNode(instance.getMetadata()));
            ipObj.put("enabled", instance.isEnabled());
            ipObj.put("weight", instance.getWeight());
            ipObj.put("clusterName", instance.getClusterName());
            if (clientInfo.type == ClientInfo.ClientType.JAVA
                && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) {
                ipObj.put("serviceName", instance.getServiceName());
            } else {
                ipObj.put("serviceName", NamingUtils.getServiceName(instance.getServiceName()));
            }

            ipObj.put("ephemeral", instance.isEphemeral());
            hosts.add(ipObj);

        }
    }

    result.replace("hosts", hosts);
    if (clientInfo.type == ClientInfo.ClientType.JAVA
        && clientInfo.version.compareTo(VersionUtil.parseVersion("1.0.0")) >= 0) {
        result.put("dom", serviceName);
    } else {
        result.put("dom", NamingUtils.getServiceName(serviceName));
    }
    result.put("name", serviceName);
    result.put("cacheMillis", cacheMillis);
    result.put("lastRefTime", System.currentTimeMillis());
    result.put("checksum", service.getChecksum());
    result.put("useSpecifiedURL", false);
    result.put("clusters", clusters);
    result.put("env", env);
    result.replace("metadata", JacksonUtils.transferToJsonNode(service.getMetadata()));
    return result;
}

以上程式碼其實就是根據名稱空間id和服務名稱獲取服務物件,然後根據不同情況構造返回物件,正常情況會構造一個ServiceInfo型別的ObjectNode物件,整個具體過程請看上面的程式碼註釋。最後返回構造的物件。

客戶端中拿到請求/nacos/v1/ns/instance/list介面的返回值之後會轉成一個ServiceInfo物件,並且把該物件賦值給本地的快取物件serviceInfoMap,processServiceJson關鍵程式碼如下:

// 將返回值轉換成 ServiceInfo 型別的物件
ServiceInfo serviceInfo = JacksonUtils.toObj(json, ServiceInfo.class);
// 把該物件新增到本地快取中
serviceInfoMap.put(serviceInfo.getKey(), serviceInfo);

三、定時心跳任務

在客戶端向nacos服務端註冊服務的過程中,會呼叫com.alibaba.nacos.client.naming.NacosNamingService#registerInstance(java.lang.String, java.lang.String, com.alibaba.nacos.api.naming.pojo.Instance)方法,在該程式碼中有個判斷邏輯,如果是臨時例項則會建立一個BeatInfo物件新增到beatReactor中。程式碼如下:

public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
    // 檢查 例項是否合法
    // heart beat timeout must(預設15秒) > heart beat interval (預設5秒)
    // ip delete timeout must(預設30 秒)  > heart beat interval
    NamingUtils.checkInstanceIsLegal(instance);
    // 構建 groupName@@serviceName
    String groupedServiceName = NamingUtils.getGroupedName(serviceName, groupName);
    // 如果是臨時例項,則建立心跳資訊,定時給nacos服務傳送
    if (instance.isEphemeral()) {
        // 構造心跳資訊
        BeatInfo beatInfo = this.beatReactor.buildBeatInfo(groupedServiceName, instance);
        // 執行心跳定時任務
        this.beatReactor.addBeatInfo(groupedServiceName, beatInfo);
    }
	// 向 nacos-service 註冊例項
    this.serverProxy.registerService(groupedServiceName, groupName, instance);
}

beatInfo物件用來儲存心跳資訊,buildBeatInfo方法程式碼如下,

public BeatInfo buildBeatInfo(String groupedServiceName, Instance instance) {
    BeatInfo beatInfo = new BeatInfo();
    beatInfo.setServiceName(groupedServiceName);
    beatInfo.setIp(instance.getIp());
    beatInfo.setPort(instance.getPort());
    beatInfo.setCluster(instance.getClusterName());
    beatInfo.setWeight(instance.getWeight());
    beatInfo.setMetadata(instance.getMetadata());
    beatInfo.setScheduled(false);
    beatInfo.setPeriod(instance.getInstanceHeartBeatInterval());
    return beatInfo;
}

beatReactor中有一個ScheduledExecutorService型別的executorService例項用來執行定時的執行緒,addBeatInfo的程式碼如下,

public void addBeatInfo(String serviceName, BeatInfo beatInfo) {
    NAMING_LOGGER.info("[BEAT] adding beat: {} to beat map.", beatInfo);
    String key = buildKey(serviceName, beatInfo.getIp(), beatInfo.getPort());
    BeatInfo existBeat = null;
    //fix #1733
    if ((existBeat = dom2Beat.remove(key)) != null) {
        existBeat.setStopped(true);
    }
    dom2Beat.put(key, beatInfo);
    // 執行緒池新增定時任務,預設 5 秒鐘之後 執行 BeatTask
    executorService.schedule(new BeatTask(beatInfo), beatInfo.getPeriod(), TimeUnit.MILLISECONDS);
    MetricsMonitor.getDom2BeatSizeMonitor().set(dom2Beat.size());
}

根據上面的executorService.schedule()程式碼可知,BeatTask執行緒在固定的秒數之後執行,而BeatTask實現了Runnable介面,即執行BeatTask的run方法 。BeatTask的run方法程式碼如下,

public void run() {
    // 如果 beatInfo 設定了 stop ,則停止
    if (beatInfo.isStopped()) {
        return;
    }
    // 獲取下一次延期執行的時間
    long nextTime = beatInfo.getPeriod();
    try {
        // 向服務端傳送心跳資訊
        JsonNode result = serverProxy.sendBeat(beatInfo, BeatReactor.this.lightBeatEnabled);
        long interval = result.get("clientBeatInterval").asLong();
        boolean lightBeatEnabled = false;
        if (result.has(CommonParams.LIGHT_BEAT_ENABLED)) {
            lightBeatEnabled = result.get(CommonParams.LIGHT_BEAT_ENABLED).asBoolean();
        }
        BeatReactor.this.lightBeatEnabled = lightBeatEnabled;
        if (interval > 0) {
            nextTime = interval;
        }
        int code = NamingResponseCode.OK;
        if (result.has(CommonParams.CODE)) {
            code = result.get(CommonParams.CODE).asInt();
        }
        if (code == NamingResponseCode.RESOURCE_NOT_FOUND) {
            Instance instance = new Instance();
            instance.setPort(beatInfo.getPort());
            instance.setIp(beatInfo.getIp());
            instance.setWeight(beatInfo.getWeight());
            instance.setMetadata(beatInfo.getMetadata());
            instance.setClusterName(beatInfo.getCluster());
            instance.setServiceName(beatInfo.getServiceName());
            instance.setInstanceId(instance.getInstanceId());
            instance.setEphemeral(true);
            try {
                serverProxy.registerService(beatInfo.getServiceName(),
                                            NamingUtils.getGroupName(beatInfo.getServiceName()), instance);
            } catch (Exception ignore) {
            }
        }
    } catch (NacosException ex) {
        NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: {}, code: {}, msg: {}",
                            JacksonUtils.toJson(beatInfo), ex.getErrCode(), ex.getErrMsg());

    }
    // 重新提交定時任務,延期傳送心跳資訊
    executorService.schedule(new BeatTask(beatInfo), nextTime, TimeUnit.MILLISECONDS);
}

以上程式碼先獲取下一次延期執行的時間,再透過serverProxy.sendBeat()向服務端傳送心跳資訊,最後重新提交定時任務,延期傳送心跳資訊,serverProxy.sendBeat()程式碼如下,

public JsonNode sendBeat(BeatInfo beatInfo, boolean lightBeatEnabled) throws NacosException {
    if (NAMING_LOGGER.isDebugEnabled()) {
        NAMING_LOGGER.debug("[BEAT] {} sending beat to server: {}", namespaceId, beatInfo.toString());
    }
    Map<String, String> params = new HashMap<String, String>(8);
    Map<String, String> bodyMap = new HashMap<String, String>(2);
    if (!lightBeatEnabled) {
        bodyMap.put("beat", JacksonUtils.toJson(beatInfo));
    }
    params.put(CommonParams.NAMESPACE_ID, namespaceId);
    params.put(CommonParams.SERVICE_NAME, beatInfo.getServiceName());
    params.put(CommonParams.CLUSTER_NAME, beatInfo.getCluster());
    params.put("ip", beatInfo.getIp());
    params.put("port", String.valueOf(beatInfo.getPort()));
    // 向nacos伺服器傳送心跳資料,並返回
    String result = reqApi(UtilAndComs.nacosUrlBase + "/instance/beat", params, bodyMap, HttpMethod.PUT);
    return JacksonUtils.toObj(result);
}

reqApi方法其實就向nacos伺服器端的/nacos/v1/ns/instance/beat介面傳送了put型別的請求訊息,該介面對應的nacos服務端的原始碼的naming工程中InstanceController的beat方法,beat方法的程式碼如下,

@CanDistro
@PutMapping("/beat")
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public ObjectNode beat(HttpServletRequest request) throws Exception {
    ObjectNode result = JacksonUtils.createEmptyJsonNode();
    result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, switchDomain.getClientBeatInterval());
    // 獲取心跳資訊
    String beat = WebUtils.optional(request, "beat", StringUtils.EMPTY);
    RsInfo clientBeat = null;
    // 如果 beat 資料不為空,則構造 RsInfo 型別的  clientBeat 例項
    if (StringUtils.isNotBlank(beat)) {
        clientBeat = JacksonUtils.toObj(beat, RsInfo.class);
    }
    // 獲取叢集名稱
    String clusterName = WebUtils
        .optional(request, CommonParams.CLUSTER_NAME, UtilsAndCommons.DEFAULT_CLUSTER_NAME);
    // 獲取 例項的 ip
    String ip = WebUtils.optional(request, "ip", StringUtils.EMPTY);
    // 獲取 例項的 埠
    int port = Integer.parseInt(WebUtils.optional(request, "port", "0"));
    // 如果 clientBeat 不為空,則設定 相關的資訊
    if (clientBeat != null) {
        if (StringUtils.isNotBlank(clientBeat.getCluster())) {
            clusterName = clientBeat.getCluster();
        } else {
            // fix #2533
            clientBeat.setCluster(clusterName);
        }
        ip = clientBeat.getIp();
        port = clientBeat.getPort();
    }
    // 獲取 namespaceId
    String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    // 獲取 serviceName
    String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    // 檢查 ServiceName 的格式
    NamingUtils.checkServiceNameFormat(serviceName);
    Loggers.SRV_LOG.debug("[CLIENT-BEAT] full arguments: beat: {}, serviceName: {}", clientBeat, serviceName);
    // 根據 引數 獲取 具體的例項
    Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);
    // 如果 例項為 空
    if (instance == null) {
        // 如果 clientBeat 為空 則構造引數 code 為 20404的結果返回
        if (clientBeat == null) {
            result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
            return result;
        }

        Loggers.SRV_LOG.warn("[CLIENT-BEAT] The instance has been removed for health mechanism, "
                             + "perform data compensation operations, beat: {}, serviceName: {}", clientBeat, serviceName);
        // 如果 clientBeat 不為空 則構造 instance 資料,向 serviceManager 註冊例項。
        instance = new Instance();
        instance.setPort(clientBeat.getPort());
        instance.setIp(clientBeat.getIp());
        instance.setWeight(clientBeat.getWeight());
        instance.setMetadata(clientBeat.getMetadata());
        instance.setClusterName(clusterName);
        instance.setServiceName(serviceName);
        instance.setInstanceId(instance.getInstanceId());
        instance.setEphemeral(clientBeat.isEphemeral());

        serviceManager.registerInstance(namespaceId, serviceName, instance);
    }
    // 根據服務名稱獲取 服務
    Service service = serviceManager.getService(namespaceId, serviceName);
    // 如果服務為空 ,則拋異常
    if (service == null) {
        throw new NacosException(NacosException.SERVER_ERROR,
                                 "service not found: " + serviceName + "@" + namespaceId);
    }
    // 如果 clientBeat 為空,則建立該物件
    if (clientBeat == null) {
        clientBeat = new RsInfo();
        clientBeat.setIp(ip);
        clientBeat.setPort(port);
        clientBeat.setCluster(clusterName);
    }
    // 處理客戶端的 心跳物件
    service.processClientBeat(clientBeat);
    //
    result.put(CommonParams.CODE, NamingResponseCode.OK);
    if (instance.containsMetadata(PreservedMetadataKeys.HEART_BEAT_INTERVAL)) {
        result.put(SwitchEntry.CLIENT_BEAT_INTERVAL, instance.getInstanceHeartBeatInterval());
    }
    result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
    return result;
}

先獲取心跳資訊,然後構造RsInfo型別的clientBeat例項。然後透過service.processClientBeat(clientBeat)方法處理客戶端的心跳物件,processClientBeat方法的程式碼如下,

public void processClientBeat(final RsInfo rsInfo) {
    // 構造 ClientBeatProcessor 物件
    ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
    clientBeatProcessor.setService(this);
    clientBeatProcessor.setRsInfo(rsInfo);
    // 定時執行 ClientBeatProcessor 物件,這裡是立即執行,延期時間為 0
    HealthCheckReactor.scheduleNow(clientBeatProcessor);
}

ClientBeatProcessor是一個實現了Runnable的類,HealthCheckReactor是一個定時任務執行緒池,scheduleNow方法表示立即執行clientBeatProcessor物件的run方法,clientBeatProcessor.run方法程式碼如下,

public void run() {
    Service service = this.service;
    if (Loggers.EVT_LOG.isDebugEnabled()) {
        Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
    }
    // 獲取ip
    String ip = rsInfo.getIp();
    // 獲取 叢集名稱
    String clusterName = rsInfo.getCluster();
    // 獲取埠
    int port = rsInfo.getPort();
    // 從服務物件中獲取叢集物件
    Cluster cluster = service.getClusterMap().get(clusterName);
    // 從叢集物件中獲取所有的臨時例項列表
    List<Instance> instances = cluster.allIPs(true);

    for (Instance instance : instances) {
        // 找到 ip 和埠相同的 例項資料
        if (instance.getIp().equals(ip) && instance.getPort() == port) {
            if (Loggers.EVT_LOG.isDebugEnabled()) {
                Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
            }
            // 更新 最後心跳時間
            instance.setLastBeat(System.currentTimeMillis());
            if (!instance.isMarked() && !instance.isHealthy()) {
                instance.setHealthy(true);
                Loggers.EVT_LOG
                    .info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
                          cluster.getService().getName(), ip, port, cluster.getName(),
                          UtilsAndCommons.LOCALHOST_SITE);
                getPushService().serviceChanged(service);
            }
        }
    }
}

以上程式碼可知,該方法主要用來更新客戶端例項的最後心跳時間。

三、服務端介面

一、定時檢查服務例項任務

在客戶端註冊服務的時候,會呼叫nacos服務端的com.alibaba.nacos.naming.controllers.InstanceController#register方法,其中會呼叫createEmptyService方法用來建立空的服務物件,最後會呼叫service.init()方法用來初始化服務物件,init方法程式碼如下

public void init() {
    // 定時執行 service 的 run 方法 處理超時的 instance
    HealthCheckReactor.scheduleCheck(clientBeatCheckTask);
    for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
        entry.getValue().setService(this);
        entry.getValue().init();
    }
}

透過呼叫HealthCheckReactor.scheduleCheck()方法來定時執行clientBeatCheckTask,scheduleCheck的程式碼如下,

public static void scheduleCheck(ClientBeatCheckTask task) {
    // 5秒之後執行 task,並且每次執行task完之後,5秒之後再次執行 task
    futureMap.computeIfAbsent(task.taskKey(),
                              k -> GlobalExecutor.scheduleNamingHealth(task, 5000, 5000, TimeUnit.MILLISECONDS));
}

以上程式碼給定時任務執行緒池GlobalExecutor提交了一個task任務,其中task是一個實現了Runable介面的類,執行緒池每次執行的就是ClientBeatCheckTask 的run方法,run方法程式碼如下,

public void run() {
    try {
        if (!getDistroMapper().responsible(service.getName())) {
            return;
        }

        if (!getSwitchDomain().isHealthCheckEnabled()) {
            return;
        }
        // 獲取該服務下面的所有 註冊例項集合
        List<Instance> instances = service.allIPs(true);
        // first set health status of instances:
        for (Instance instance : instances) {
            // 如果 當前時間 減去 例項的最新心跳時間 如果大於 例項配置的心跳超時時間(預設15秒)
            // 並且 例項的健康狀態 true
            // 則設定服務的健康狀態為 false
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
                if (!instance.isMarked()) {
                    if (instance.isHealthy()) {
                        instance.setHealthy(false);
                        Loggers.EVT_LOG
                            .info("{POS} {IP-DISABLED} valid: {}:{}@{}@{}, region: {}, msg: client timeout after {}, last beat: {}",
                                  instance.getIp(), instance.getPort(), instance.getClusterName(),
                                  service.getName(), UtilsAndCommons.LOCALHOST_SITE,
                                  instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
                        getPushService().serviceChanged(service);
                        ApplicationUtils.publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                    }
                }
            }
        }
        if (!getGlobalConfig().isExpireInstance()) {
            return;
        }
        // then remove obsolete instances:
        for (Instance instance : instances) {

            if (instance.isMarked()) {
                continue;
            }
            // 如果 當前時間 減去 例項的最新心跳時間 如果大於 例項配置的刪除超時時間(預設30秒)
            // 則會呼叫 deleteIp 刪除方法刪除例項
            if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
                // delete instance
                Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(),
                                     JacksonUtils.toJson(instance));
                // 刪除例項
                deleteIp(instance);
            }
        }

    } catch (Exception e) {
        Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
    }

}

以上程式碼就兩個邏輯,一個邏輯是判斷當前時間減去例項的最新心跳時間是否大於例項配置的心跳超時時間(預設15秒),如果大於則設定例項的健康狀態為false;第二個邏輯是 判斷當前時間減去例項的最新心跳時間 是否大於例項配置的刪除超時時間(預設30秒),如果大於則呼叫deleteIp(instance);刪除該例項,deleteIp的程式碼如下,

NamingProxy.Request request = NamingProxy.Request.newRequest();
request.appendParam("ip", instance.getIp()).appendParam("port", String.valueOf(instance.getPort()))
    .appendParam("ephemeral", "true").appendParam("clusterName", instance.getClusterName())
    .appendParam("serviceName", service.getName()).appendParam("namespaceId", service.getNamespaceId());
// 構造url地址
String url = "http://" + IPUtil.localHostIP() + IPUtil.IP_PORT_SPLITER + EnvUtil.getPort() + EnvUtil.getContextPath()
    + UtilsAndCommons.NACOS_NAMING_CONTEXT + "/instance?" + request.toUrl();

// delete instance asynchronously:
// 向本地伺服器地址傳送刪除請求
HttpClient.asyncHttpDelete(url, null, null, new Callback<String>() {
    @Override
    public void onReceive(RestResult<String> result) {
        if (!result.ok()) {
            Loggers.SRV_LOG
                .error("[IP-DEAD] failed to delete ip automatically, ip: {}, caused {}, resp code: {}",
                       instance.toJson(), result.getMessage(), result.getCode());
        }
    }
    @Override
    public void onError(Throwable throwable) {
        Loggers.SRV_LOG
            .error("[IP-DEAD] failed to delete ip automatically, ip: {}, error: ", instance.toJson(),
                   throwable);
    }
    @Override
    public void onCancel() {

    }
});

HttpClient.asyncHttpDelete方法其實就是向 ${spring.cloud.nacos.discovery.server-addr}/nacos/v1/ns/instance 傳送Delete請求。該請求地址對應的nacos服務端的原始碼的naming工程中InstanceController的deregister方法,deregister程式碼如下,

@CanDistro
@DeleteMapping
@Secured(parser = NamingResourceParser.class, action = ActionTypes.WRITE)
public String deregister(HttpServletRequest request) throws Exception {
    // 從請求引數中構造例項物件
    Instance instance = getIpAddress(request);
    String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);
    String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
    NamingUtils.checkServiceNameFormat(serviceName);

    Service service = serviceManager.getService(namespaceId, serviceName);
    if (service == null) {
        Loggers.SRV_LOG.warn("remove instance from non-exist service: {}", serviceName);
        return "ok";
    }
	// 刪除例項資料
    serviceManager.removeInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
    return "ok";
}

removeInstance方法是關鍵,用來刪除例項資料,removeInstance程式碼如下,

public void removeInstance(String namespaceId, String serviceName, boolean ephemeral, Instance... ips)
    throws NacosException {
    // 先獲取服務物件
    Service service = getService(namespaceId, serviceName);
    // 服務物件加鎖
    synchronized (service) {
        // 呼叫刪除例項物件的方法
        removeInstance(namespaceId, serviceName, ephemeral, service, ips);
    }
}

removeInstance方法的程式碼如下,

private void removeInstance(String namespaceId, String serviceName, boolean ephemeral, Service service,
                            Instance... ips) throws NacosException {
    // 構造 key
    String key = KeyBuilder.buildInstanceListKey(namespaceId, serviceName, ephemeral);
    // 獲取服務下的例項集合(服務已有 減去 需要刪除例項)
    List<Instance> instanceList = substractIpAddresses(service, ephemeral, ips);

    Instances instances = new Instances();
    instances.setInstanceList(instanceList);
    // 根據KEY更新服務的例項
    consistencyService.put(key, instances);
}

substractIpAddresses方法用來獲取該服務下已經減去需要刪除例項的例項資料,其中呼叫的updateIpAddresses方法,action值為 remove。removeInstance方法的整體邏輯為透過updateIpAddresses方法拿到該服務中去掉刪除例項之後的例項集合物件,並把該例項集合物件新增到consistencyService物件中,consistencyService.put(key, instances)裡面的邏輯和客戶端註冊服務一樣的邏輯。updateIpAddresses方法和consistencyService.put方法已經在客戶端服務註冊章節已經講了,這裡不再講解。

二、服務例項更新推送

在客戶端更新服務例項的過程中nacos服務端會呼叫com.alibaba.nacos.naming.core.Service#updateIPs()方法(客戶端註冊服務的過程請看客戶端服務註冊章節),在該方法中會呼叫getPushService().serviceChanged(this)來發布當前服務的修改事件,即會釋出一個事件用來通知已經和nacos服務端通訊過的客戶端更新客戶端本地的服務資訊。serviceChanged 的程式碼如下,

public void serviceChanged(Service service) {
    // merge some change events to reduce the push frequency:
    if (futureMap
        .containsKey(UtilsAndCommons.assembleFullServiceName(service.getNamespaceId(), service.getName()))) {
        return;
    }
    // 釋出服務修改事件
    this.applicationContext.publishEvent(new ServiceChangeEvent(this, service));
}

applicationContext.publishEvent會觸發一個ServiceChangeEvent事件,其實就是觸發com.alibaba.nacos.naming.push.PushService#onApplicationEvent方法,其中邏輯為先根據名稱空間id和服務名稱獲取所有的客戶端map物件,然後遍歷所有客戶端物件PushClient 構造 ackEntry 物件,最後向具體的客戶端傳送 upd 訊息。獲取關鍵程式碼如下,

if (compressData != null) {
    ackEntry = prepareAckEntry(client, compressData, data, lastRefTime);
} else {
    // 構造 ackEntry 物件
    ackEntry = prepareAckEntry(client, prepareHostsQData(client), lastRefTime);
    // 新增快取
    if (ackEntry != null) {
        cache.put(key, new org.javatuples.Pair<>(ackEntry.origin.getData(), ackEntry.data));
    }
}

Loggers.PUSH.info("serviceName: {} changed, schedule push for: {}, agent: {}, key: {}",
                  client.getServiceName(), client.getAddrStr(), client.getAgent(),
                  (ackEntry == null ? null : ackEntry.key));
// 向具體的客戶端傳送 upd 訊息
udpPush(ackEntry);

以上程式碼prepareHostsQData的邏輯就是獲取該服務下客戶端所屬服務的所有例項資料,並且構造具體的Map<String,Object>物件,prepareHostsQData程式碼如下,

private static Map<String, Object> prepareHostsData(PushClient client) throws Exception {
    Map<String, Object> cmd = new HashMap<String, Object>(2);
    cmd.put("type", "dom");
    // 獲取客戶端所屬服務的所有例項資料
    cmd.put("data", client.getDataSource().getData(client));
    return cmd;
}

udpPush(ackEntry)裡面封裝了傳送udp訊息的關鍵程式碼,ackEntry封裝了udp的資料資訊。

在客戶端獲取服務例項列表的時候,會生成一個PushReceiver物件,該物件用來監聽和接收nacos服務端傳送的udp資料。該物件實現了Runnable介面,並且在構造方法中把自己提交給了一個內部屬性的執行緒池物件。構造方法如下,

public PushReceiver(HostReactor hostReactor) {
    try {
        this.hostReactor = hostReactor;
        this.udpSocket = new DatagramSocket();
        this.executorService = new ScheduledThreadPoolExecutor(1, new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
                Thread thread = new Thread(r);
                thread.setDaemon(true);
                thread.setName("com.alibaba.nacos.naming.push.receiver");
                return thread;
            }
        });

        this.executorService.execute(this);
    } catch (Exception e) {
        NAMING_LOGGER.error("[NA] init udp socket failed", e);
    }
}

根據以上程式碼可知,所以在建立PushReceiver物件之後會執行run方法,run方法的程式碼如下,

public void run() {
    while (!closed) {
        try {
            // byte[] is initialized with 0 full filled by default
            byte[] buffer = new byte[UDP_MSS];
            // 構造 DatagramPacket 物件
            DatagramPacket packet = new DatagramPacket(buffer, buffer.length);
			// 監聽upd資料
            udpSocket.receive(packet);
			// 構造服務物件的string 資料
            String json = new String(IoUtils.tryDecompress(packet.getData()), UTF_8).trim();
            NAMING_LOGGER.info("received push data: " + json + " from " + packet.getAddress().toString());

            PushPacket pushPacket = JacksonUtils.toObj(json, PushPacket.class);
            String ack;
            if ("dom".equals(pushPacket.type) || "service".equals(pushPacket.type)) {
                // 更新本地快取 serviceInfoMap 的服務物件
                hostReactor.processServiceJson(pushPacket.data);
                // send ack to server
                ack = "{\"type\": \"push-ack\"" + ", \"lastRefTime\":\"" + pushPacket.lastRefTime + "\", \"data\":"
                    + "\"\"}";
            } else if ("dump".equals(pushPacket.type)) {
                // dump data to server
                ack = "{\"type\": \"dump-ack\"" + ", \"lastRefTime\": \"" + pushPacket.lastRefTime + "\", \"data\":"
                    + "\"" + StringUtils.escapeJavaScript(JacksonUtils.toJson(hostReactor.getServiceInfoMap()))
                    + "\"}";
            } else {
                // do nothing send ack only
                ack = "{\"type\": \"unknown-ack\"" + ", \"lastRefTime\":\"" + pushPacket.lastRefTime
                    + "\", \"data\":" + "\"\"}";
            }

            udpSocket.send(new DatagramPacket(ack.getBytes(UTF_8), ack.getBytes(UTF_8).length,
                                              packet.getSocketAddress()));
        } catch (Exception e) {
            if (closed) {
                return;
            }
            NAMING_LOGGER.error("[NA] error while receiving push data", e);
        }
    }
}

以上程式碼用while一直監聽udpSocket的客戶端upd的埠。當接收到從nacos服務端傳送過來的udp資料之後會接著呼叫 hostReactor.processServiceJson()方法來更新客戶端本地的serviceInfoMap 的服務物件。

相關文章