SpringCloud升級之路2020.0.x版-33. 實現重試、斷路器以及執行緒隔離原始碼

乾貨滿滿張雜湊發表於2021-11-12

本系列程式碼地址:https://github.com/JoJoTec/spring-cloud-parent

在前面兩節,我們梳理了實現 Feign 斷路器以及執行緒隔離的思路,並說明了如何優化目前的負載均衡演算法。但是如何更新負載均衡的資料快取,以及實現重試、斷路器以及執行緒隔離的原始碼還沒提,這一節我們會詳細分析。

首先,從 spring.factories 引入,增加我們自定義 OpenFeign 配置的載入:

spring.factories

# AutoConfiguration
org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
com.github.jojotech.spring.cloud.webmvc.auto.OpenFeignAutoConfiguration

自動配置類是 OpenFeignAutoConfiguration,其內容是:

OpenFeignAutoConfiguration.java

//設定 `@Configuration(proxyBeanMethods=false)`,因為沒有 @Bean 的方法互相呼叫需要每次返回同一個 Bean,沒必要代理,關閉增加啟動速度
@Configuration(proxyBeanMethods = false)
//載入配置,CommonOpenFeignConfiguration
@Import(CommonOpenFeignConfiguration.class)
//啟用 OpenFeign 註解掃描和配置,預設配置為 DefaultOpenFeignConfiguration,其實就是 Feign 的 NamedContextFactory(即 FeignContext)的預設配置類是 DefaultOpenFeignConfiguration
@EnableFeignClients(value = "com.github.jojotech", defaultConfiguration = DefaultOpenFeignConfiguration.class)
public class OpenFeignAutoConfiguration {
}

為何要加這一層而不是直接使用 Import 的 CommonOpenFeignConfiguration?使用 @AutoConfigurationBefore@AutoConfigurationAfter 配置和其他 AutoConfiguration 載入的前後順序。 @AutoConfigurationBefore@AutoConfigurationAfter 是 spring-boot 的註解,只對於 spring.factories 載入的 AutoConfiguration 生效。所以在設計上要加上這一層,防止我們未來可能會用到這些註解

CommonOpenFeignConfiguration 中包含所有 OpenFeign 的共用的一些 Bean,這些 Bean 是單例被所有 FeignClient 公用的,包括:

  1. FeignClient 要用的 Client 的底層 HTTP Client,我們這裡使用 Apache HttpClient
  2. 將 Apache HttpClient 封裝成 FeignClient 要用的 Client 的 ApacheHttpClient
  3. spring-cloud-openfeign 的 FeignClient 用的 Client 的負載均衡實現核心類是 FeignBlockingLoadBalancerClient,我們需要將其封裝代理從而實現斷路器和執行緒隔離以及負載均衡資料採集,封裝類是我們自己實現的 FeignBlockingLoadBalancerClientDelegate。核心實現斷路器和執行緒隔離邏輯的類是 Resilience4jFeignClient。

CommonOpenFeignConfiguration.java

@Configuration(proxyBeanMethods = false)
public class CommonOpenFeignConfiguration {
    //建立 Apache HttpClient,自定義一些配置
    @Bean
    public HttpClient getHttpClient() {
        // 長連線保持5分鐘
        PoolingHttpClientConnectionManager pollingConnectionManager = new PoolingHttpClientConnectionManager(5, TimeUnit.MINUTES);
        // 總連線數
        pollingConnectionManager.setMaxTotal(1000);
        // 同路由的併發數
        pollingConnectionManager.setDefaultMaxPerRoute(1000);
        HttpClientBuilder httpClientBuilder = HttpClients.custom();
        httpClientBuilder.setConnectionManager(pollingConnectionManager);
        // 保持長連線配置,需要在頭新增Keep-Alive
        httpClientBuilder.setKeepAliveStrategy(new DefaultConnectionKeepAliveStrategy());
        return httpClientBuilder.build();
    }

    //建立使用 HttpClient 實現的 OpenFeign 的 Client 介面的 Bean
    @Bean
    public ApacheHttpClient apacheHttpClient(HttpClient httpClient) {
        return new ApacheHttpClient(httpClient);
    }

    //FeignBlockingLoadBalancerClient 的代理類,也是實現 OpenFeign 的 Client 介面的 Bean
    @Bean
    //使用 Primary 讓 FeignBlockingLoadBalancerClientDelegate 成為所有 FeignClient 實際使用的 Bean
    @Primary
    public FeignBlockingLoadBalancerClientDelegate feignBlockingLoadBalancerCircuitBreakableClient(
            ServiceInstanceMetrics serviceInstanceMetrics,
            //我們上面建立的 ApacheHttpClient Bean
            ApacheHttpClient apacheHttpClient,
            //為何使用 ObjectProvider 請參考 FeignBlockingLoadBalancerClientDelegate 原始碼的註釋
            ObjectProvider<LoadBalancerClient> loadBalancerClientProvider,
            //resilience4j 的執行緒隔離
            ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry,
            //resilience4j 的斷路器
            CircuitBreakerRegistry circuitBreakerRegistry,
            //Sleuth 的 Tracer,用於獲取請求上下文
            Tracer tracer,
            //負載均衡屬性
            LoadBalancerProperties properties,
            //為何使用這個不直接用 FeignBlockingLoadBalancerClient 請參考 FeignBlockingLoadBalancerClientDelegate 的註釋
            LoadBalancerClientFactory loadBalancerClientFactory
    ) {
        return new FeignBlockingLoadBalancerClientDelegate(
                //我們自己封裝的核心 Client 實現,加入了斷路器,執行緒隔離以及負載均衡資料採集
                new Resilience4jFeignClient(
						serviceInstanceMetrics, apacheHttpClient,
                        threadPoolBulkheadRegistry,
                        circuitBreakerRegistry,
                        tracer
                ),
                loadBalancerClientProvider,
                properties,
                loadBalancerClientFactory
        );
    }
}

其中,Resilience4jFeignClient 粘合斷路器,執行緒隔離的核心程式碼,同時也記錄了負載均衡的實際呼叫資料

Resilience4jFeignClient.java

public class Resilience4jFeignClient implements Client {
    private final ServiceInstanceMetrics serviceInstanceMetrics;
    private final ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry;
    private final CircuitBreakerRegistry circuitBreakerRegistry;
    private final Tracer tracer;
    private ApacheHttpClient apacheHttpClient;


    public Resilience4jFeignClient(
            ServiceInstanceMetrics serviceInstanceMetrics, ApacheHttpClient apacheHttpClient,
            ThreadPoolBulkheadRegistry threadPoolBulkheadRegistry,
            CircuitBreakerRegistry circuitBreakerRegistry,
            Tracer tracer
    ) {
        this.serviceInstanceMetrics = serviceInstanceMetrics;
        this.apacheHttpClient = apacheHttpClient;
        this.threadPoolBulkheadRegistry = threadPoolBulkheadRegistry;
        this.circuitBreakerRegistry = circuitBreakerRegistry;
        this.tracer = tracer;
    }

    @Override
    public Response execute(Request request, Request.Options options) throws IOException {
        //獲取定義 FeignClient 的介面的 FeignClient 註解
        FeignClient annotation = request.requestTemplate().methodMetadata().method().getDeclaringClass().getAnnotation(FeignClient.class);
        //和 Retry 保持一致,使用 contextId,而不是微服務名稱
        //contextId 會作為我們後面讀取斷路器以及執行緒隔離配置的 key
        String contextId = annotation.contextId();
        //獲取例項唯一id
        String serviceInstanceId = getServiceInstanceId(contextId, request);
        //獲取例項+方法唯一id
        String serviceInstanceMethodId = getServiceInstanceMethodId(contextId, request);

        ThreadPoolBulkhead threadPoolBulkhead;
        CircuitBreaker circuitBreaker;
        try {
            //每個例項一個執行緒池
            threadPoolBulkhead = threadPoolBulkheadRegistry.bulkhead(serviceInstanceId, contextId);
        } catch (ConfigurationNotFoundException e) {
            threadPoolBulkhead = threadPoolBulkheadRegistry.bulkhead(serviceInstanceId);
        }
        try {
            //每個服務例項具體方法一個resilience4j熔斷記錄器,在服務例項具體方法維度做熔斷,所有這個服務的例項具體方法共享這個服務的resilience4j熔斷配置
            circuitBreaker = circuitBreakerRegistry.circuitBreaker(serviceInstanceMethodId, contextId);
        } catch (ConfigurationNotFoundException e) {
            circuitBreaker = circuitBreakerRegistry.circuitBreaker(serviceInstanceMethodId);
        }
        //保持traceId
        Span span = tracer.currentSpan();
        ThreadPoolBulkhead finalThreadPoolBulkhead = threadPoolBulkhead;
        CircuitBreaker finalCircuitBreaker = circuitBreaker;
        Supplier<CompletionStage<Response>> completionStageSupplier = ThreadPoolBulkhead.decorateSupplier(threadPoolBulkhead,
                OpenfeignUtil.decorateSupplier(circuitBreaker, () -> {
                    try (Tracer.SpanInScope cleared = tracer.withSpanInScope(span)) {
                        log.info("call url: {} -> {}, ThreadPoolStats({}): {}, CircuitBreakStats({}): {}",
                                request.httpMethod(),
                                request.url(),
                                serviceInstanceId,
                                JSON.toJSONString(finalThreadPoolBulkhead.getMetrics()),
                                serviceInstanceMethodId,
                                JSON.toJSONString(finalCircuitBreaker.getMetrics())
                        );
                        Response execute = apacheHttpClient.execute(request, options);
                        log.info("response: {} - {}", execute.status(), execute.reason());
                        return execute;
                    } catch (IOException e) {
                        throw new CompletionException(e);
                    }
                })
        );
        ServiceInstance serviceInstance = getServiceInstance(request);
        try {
            serviceInstanceMetrics.recordServiceInstanceCall(serviceInstance);
            Response response = Try.ofSupplier(completionStageSupplier).get().toCompletableFuture().join();
            serviceInstanceMetrics.recordServiceInstanceCalled(serviceInstance, true);
            return response;
        } catch (BulkheadFullException e) {
            //執行緒池限流異常
            serviceInstanceMetrics.recordServiceInstanceCalled(serviceInstance, false);
            return Response.builder()
                    .request(request)
                    .status(SpecialHttpStatus.BULKHEAD_FULL.getValue())
                    .reason(e.getLocalizedMessage())
                    .requestTemplate(request.requestTemplate()).build();
        } catch (CompletionException e) {
            serviceInstanceMetrics.recordServiceInstanceCalled(serviceInstance, false);
            //內部丟擲的所有異常都被封裝了一層 CompletionException,所以這裡需要取出裡面的 Exception
            Throwable cause = e.getCause();
            //對於斷路器開啟,返回對應特殊的錯誤碼
            if (cause instanceof CallNotPermittedException) {
                return Response.builder()
                        .request(request)
                        .status(SpecialHttpStatus.CIRCUIT_BREAKER_ON.getValue())
                        .reason(cause.getLocalizedMessage())
                        .requestTemplate(request.requestTemplate()).build();
            }
            //對於 IOException,需要判斷是否請求已經傳送出去了
            //對於 connect time out 的異常,則可以重試,因為請求沒發出去,但是例如 read time out 則不行,因為請求已經發出去了
            if (cause instanceof IOException) {
                boolean containsRead = cause.getMessage().toLowerCase().contains("read");
                if (containsRead) {
                    log.info("{}-{} exception contains read, which indicates the request has been sent", e.getMessage(), cause.getMessage());
                    //如果是 read 異常,則代表請求已經發了出去,則不能重試(除非是 GET 請求或者有 RetryableMethod 註解,這個在 DefaultErrorDecoder 判斷)
                    return Response.builder()
                            .request(request)
                            .status(SpecialHttpStatus.NOT_RETRYABLE_IO_EXCEPTION.getValue())
                            .reason(cause.getLocalizedMessage())
                            .requestTemplate(request.requestTemplate()).build();
                } else {
                    return Response.builder()
                            .request(request)
                            .status(SpecialHttpStatus.RETRYABLE_IO_EXCEPTION.getValue())
                            .reason(cause.getLocalizedMessage())
                            .requestTemplate(request.requestTemplate()).build();
                }
            }
            throw e;
        }
    }

    private ServiceInstance getServiceInstance(Request request) throws MalformedURLException {
        URL url = new URL(request.url());
        DefaultServiceInstance defaultServiceInstance = new DefaultServiceInstance();
        defaultServiceInstance.setHost(url.getHost());
        defaultServiceInstance.setPort(url.getPort());
        return defaultServiceInstance;
    }

    //獲取微服務例項id,格式為:FeignClient 的 contextId:host:port,例如: test1Client:10.238.45.78:8251
    private String getServiceInstanceId(String contextId, Request request) throws MalformedURLException {
        //解析 URL
        URL url = new URL(request.url());
        //拼接微服務例項id
        return contextId + ":" + url.getHost() + ":" + url.getPort();
    }

    //獲取微服務例項方法id,格式為:FeignClient 的 contextId:host:port:methodName,例如:test1Client:10.238.45.78:8251:
    private String getServiceInstanceMethodId(String contextId, Request request) throws MalformedURLException {
        URL url = new URL(request.url());
        //通過微服務名稱 + 例項 + 方法的方式,獲取唯一id
        String methodName = request.requestTemplate().methodMetadata().method().toGenericString();
        return contextId + ":" + url.getHost() + ":" + url.getPort() + ":" + methodName;
    }
}

在上面,我們定義了幾種特殊的 HTTP 返回碼,主要目的是想將一些異常封裝成響應返回,然後通過我們後面 Feign 錯誤解碼器解碼成統一的 RetryableException,這樣在 resilience4j 的重試配置中,我們就不用配置很複雜的異常重試,僅針對 RetryableException 進行重試即可

我們想讓 spring-cloud-openfeign 的核心負載均衡 Client, 在完成呼叫 LoadBalancer 選擇例項並替換 url 之後,呼叫的 client 直接是 ApacheHttpClient 而是我們上面這個類,所以加入了 FeignBlockingLoadBalancerClientDelegate 封裝:

/**
 * 由於初始化 FeignBlockingLoadBalancerClient 需要 LoadBalancerClient
 * 但是由於 Spring Cloud 2020 之後,Spring Cloud LoadBalancer BlockingClient 的載入,強制加入了順序
 * @see org.springframework.cloud.loadbalancer.config.BlockingLoadBalancerClientAutoConfiguration
 * 這個自動配置加入了 @AutoConfigureAfter(LoadBalancerAutoConfiguration.class)
 * 導致我們在初始化的 FeignClient 的時候,無法拿到 BlockingClient
 * 所以,需要通過 ObjectProvider 封裝 LoadBalancerClient,在真正呼叫 FeignClient 的時候通過 ObjectProvider 拿到 LoadBalancerClient 來建立 FeignBlockingLoadBalancerClient
 */
public class FeignBlockingLoadBalancerClientDelegate implements Client {
    private FeignBlockingLoadBalancerClient feignBlockingLoadBalancerClient;

    private final Client delegate;
    private final ObjectProvider<LoadBalancerClient> loadBalancerClientObjectProvider;
    private final LoadBalancerProperties properties;
    private final LoadBalancerClientFactory loadBalancerClientFactory;

    public FeignBlockingLoadBalancerClientDelegate(
            Client delegate,
            ObjectProvider<LoadBalancerClient> loadBalancerClientObjectProvider,
            LoadBalancerProperties properties,
            LoadBalancerClientFactory loadBalancerClientFactory
    ) {
        this.delegate = delegate;
        this.loadBalancerClientObjectProvider = loadBalancerClientObjectProvider;
        this.properties = properties;
        this.loadBalancerClientFactory = loadBalancerClientFactory;
    }

    @Override
    public Response execute(Request request, Request.Options options) throws IOException {
        if (feignBlockingLoadBalancerClient == null) {
            synchronized (this) {
                if (feignBlockingLoadBalancerClient == null) {
                    feignBlockingLoadBalancerClient = new FeignBlockingLoadBalancerClient(
                            this.delegate,
                            this.loadBalancerClientObjectProvider.getIfAvailable(),
                            this.properties,
                            this.loadBalancerClientFactory
                    );
                }
            }
        }
        return feignBlockingLoadBalancerClient.execute(request, options);
    }
}

我們指定的 FeignClient 的 NamedContextFactory(即 FeignContext)的預設配置 DefaultOpenFeignConfiguration 中,主要粘合了重試邏輯,以及錯誤解碼器:

@Configuration(proxyBeanMethods = false)
public class DefaultOpenFeignConfiguration {

    @Bean
    public ErrorDecoder errorDecoder() {
        return new DefaultErrorDecoder();
    }

    @Bean
    public Feign.Builder resilience4jFeignBuilder(
            List<FeignDecoratorBuilderInterceptor> feignDecoratorBuilderInterceptors,
            FeignDecorators.Builder builder
    ) {
        feignDecoratorBuilderInterceptors.forEach(feignDecoratorBuilderInterceptor -> feignDecoratorBuilderInterceptor.intercept(builder));
        return Resilience4jFeign.builder(builder.build());
    }

    @Bean
    public FeignDecorators.Builder defaultBuilder(Environment environment, RetryRegistry retryRegistry) {
        String name = environment.getProperty("feign.client.name");
        Retry retry = null;
        try {
            retry = retryRegistry.retry(name, name);
        } catch (ConfigurationNotFoundException e) {
            retry = retryRegistry.retry(name);
        }
        //覆蓋其中的異常判斷,只針對 feign.RetryableException 進行重試,所有需要重試的異常我們都在 DefaultErrorDecoder 以及 Resilience4jFeignClient 中封裝成了 RetryableException
        retry = Retry.of(name, RetryConfig.from(retry.getRetryConfig()).retryOnException(throwable -> {
            return throwable instanceof feign.RetryableException;
        }).build());
        return FeignDecorators.builder().withRetry(
                retry
        );
    }
}

錯誤解碼器即把上面可以重試的異常響應碼,以及我們想重試的請求封裝成 RetryableException,程式碼就不贅述了。這樣我們就實現了自定義的實現重試、斷路器以及執行緒隔離的 FeignClient。可以通過如下方式進行配置使用:

application.yml 配置:

################ feign配置 ################
feign:
  hystrix:
    enabled: false
  client:
    config:
      default:
        # 連結超時
        connectTimeout: 500
        # 讀取超時
        readTimeout: 8000
      test1-client:
        # 連結超時
        connectTimeout: 500
        # 讀取超時
        readTimeout: 60000
################ resilience配置 ################
resilience4j.circuitbreaker:
  configs:
    default:
      registerHealthIndicator: true
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      slidingWindowType: TIME_BASED
      permittedNumberOfCallsInHalfOpenState: 3
      automaticTransitionFromOpenToHalfOpenEnabled: true
      waitDurationInOpenState: 2s
      failureRateThreshold: 30
      eventConsumerBufferSize: 10
      recordExceptions:
        - java.lang.Exception
resilience4j.retry:
  configs:
    default:
      maxRetryAttempts: 2
    test1-client:
      maxRetryAttempts: 3
resilience4j.thread-pool-bulkhead:
  configs:
    default:
      maxThreadPoolSize: 64
      coreThreadPoolSize: 32
      queueCapacity: 32
    

定義 Feignclient:

//這個會用到所有 key 為 test1-client 的配置,如果對應的配置中沒有 test1-client,就用 default
@FeignClient(name = "service1", contextId = "test1-client")
public interface TestService1Client {
    @GetMapping("/anything")
    HttpBinAnythingResponse anything();
}
//這個會用到所有 key 為 test2-client 的配置,由於我們這裡沒有 test2-client 的單獨配置,所以用的全是 default 配置
@FeignClient(name = "service1", contextId = "test2-client")
public interface TestService1Client2 {
    @GetMapping("/anything")
    HttpBinAnythingResponse anything();
}

下一節開始,我們會對這裡實現的 FeignClient 封裝進行單元測試,驗證我們的正確性。

微信搜尋“我的程式設計喵”關注公眾號,每日一刷,輕鬆提升技術,斬獲各種offer

相關文章