1. 問題
spring-cloud-gateway 閘道器新增了一個限流功能,使用的是模組自帶的限流過濾器 RequestRateLimiterGatewayFilterFactory,基於令牌桶演算法,通過 redis 實現。
其原理是 redis 中針對每個限流要素(比如針對介面限流),儲存 2 個 key:tokenKey(令牌數量),timeKey(呼叫時間)。每次介面呼叫時,更新 tokenKey 的值為:原先的值 + (當前時間 – 原先時間)* 加入令牌的速度,如果新的 tokenKey 的值大於 1,那麼允許呼叫,否則不允許;同時更新 redis 中 tokenKey,timeKey 的值。整個過程通過 lua 指令碼實現。
在加入限流功能之前,500 客戶端併發訪問,tps 為 6800 req/s,50% 時延為 70ms;加入限流功能之後,tps 為 2300 req/s,50% 時延為 205ms,同時,原先 cpu 佔用率幾乎 600%(6 核) 變成不到 400%(cpu 跑不滿了)。
2. 排查和解決過程
2.1 單個 CPU 跑滿
- 檢視單個執行緒的 cpu 佔用:
[root@auth-service imf2]# top -Hp 29360
top - 15:16:27 up 102 days, 18:04, 1 user, load average: 1.61, 0.72, 0.34
Threads: 122 total, 9 running, 113 sleeping, 0 stopped, 0 zombie
%Cpu(s): 42.0 us, 7.0 sy, 0.0 ni, 49.0 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
KiB Mem : 7678384 total, 126844 free, 3426148 used, 4125392 buff/cache
KiB Swap: 6291452 total, 2212552 free, 4078900 used. 3347956 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29415 root 20 0 6964708 1.1g 14216 R 97.9 15.1 3:01.65 java
29392 root 20 0 6964708 1.1g 14216 R 27.0 15.1 0:45.42 java
29391 root 20 0 6964708 1.1g 14216 R 24.8 15.1 0:43.95 java
29387 root 20 0 6964708 1.1g 14216 R 23.8 15.1 0:46.38 java
29388 root 20 0 6964708 1.1g 14216 R 23.4 15.1 0:48.21 java
29390 root 20 0 6964708 1.1g 14216 R 23.0 15.1 0:45.93 java
29389 root 20 0 6964708 1.1g 14216 R 22.3 15.1 0:44.36 java
執行緒 29415 幾乎跑滿了 cpu,檢視是什麼執行緒:
[root@auth-service imf2]# printf `%x
` 29415
72e7
[root@auth-service imf2]# jstack 29360 | grep 72e7
"lettuce-nioEventLoop-4-1" #40 daemon prio=5 os_prio=0 tid=0x00007f604cc92000 nid=0x72e7 runnable [0x00007f606ce90000]
果然是操作 redis 的執行緒,和預期一致。
- 檢視 redis:cpu 佔用率不超過 15%,沒有 10ms 以上的慢查詢。應該不會是 redis 的問題。
- 檢視執行緒棧資訊:
通過以下指令碼每秒記錄一次 jstack:
[root@eureka2 jstack]# cat jstack.sh
#!/bin/sh
i=0
while [ $i -lt 30 ]; do
/bin/sleep 1
i=`expr $i + 1`
jstack 29360 > "$i".txt
done
檢視 lettuce 執行緒主要執行哪些函式:
"lettuce-nioEventLoop-4-1" #36 daemon prio=5 os_prio=0 tid=0x00007f1eb07ab800 nid=0x4476 runnable [0x00007f1eec8fb000]
java.lang.Thread.State: RUNNABLE
at sun.misc.URLClassPath$Loader.findResource(URLClassPath.java:715)
at sun.misc.URLClassPath.findResource(URLClassPath.java:215)
at java.net.URLClassLoader$2.run(URLClassLoader.java:569)
at java.net.URLClassLoader$2.run(URLClassLoader.java:567)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findResource(URLClassLoader.java:566)
at org.springframework.boot.loader.LaunchedURLClassLoader.findResource(LaunchedURLClassLoader.java:57)
at java.lang.ClassLoader.getResource(ClassLoader.java:1096)
at org.springframework.core.io.ClassPathResource.resolveURL(ClassPathResource.java:155)
at org.springframework.core.io.ClassPathResource.getURL(ClassPathResource.java:193)
at org.springframework.core.io.AbstractFileResolvingResource.lastModified(AbstractFileResolvingResource.java:220)
at org.springframework.scripting.support.ResourceScriptSource.retrieveLastModifiedTime(ResourceScriptSource.java:119)
at org.springframework.scripting.support.ResourceScriptSource.isModified(ResourceScriptSource.java:109)
- locked <0x000000008c074d00> (a java.lang.Object)
at org.springframework.data.redis.core.script.DefaultRedisScript.getSha1(DefaultRedisScript.java:89)
- locked <0x000000008c074c10> (a java.lang.Object)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.eval(DefaultReactiveScriptExecutor.java:113)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.lambda$execute$0(DefaultReactiveScriptExecutor.java:105)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor$$Lambda$1268/1889039573.doInRedis(Unknown Source)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.lambda$execute$6(DefaultReactiveScriptExecutor.java:167)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor$$Lambda$1269/1954779522.get(Unknown Source)
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46)
可知該執行緒主要在執行 ReactiveRedisTemplate 類的 execute(RedisScript<T> script, List<K> keys, List<?> args) 方法,即執行 lua 指令碼。
猜想:既然是因為 lettuce-nioEventLoop 執行緒跑滿了 cpu,那麼通過建立多個 lettuce-nioEventLoop 執行緒,以充分利用多核的特點,是否可以解決呢?
以下為原始碼分析階段:
// 1. RedisConnectionFactory bean 的建立依賴 ClientResources
@Bean
@ConditionalOnMissingBean(RedisConnectionFactory.class)
public LettuceConnectionFactory redisConnectionFactory(
ClientResources clientResources) throws UnknownHostException {
LettuceClientConfiguration clientConfig = getLettuceClientConfiguration(
clientResources, this.properties.getLettuce().getPool());
return createLettuceConnectionFactory(clientConfig);
}
// 2. ClientResources bean 的建立如下
@Bean(destroyMethod = "shutdown")
@ConditionalOnMissingBean(ClientResources.class)
public DefaultClientResources lettuceClientResources() {
return DefaultClientResources.create();
}
public static DefaultClientResources create() {
return builder().build();
}
// 3. 建立 EventLoopGroupProvider 物件
protected DefaultClientResources(Builder builder) {
this.builder = builder;
// 預設為 null,執行這塊程式碼
if (builder.eventLoopGroupProvider == null) {
// 設定處理 redis 連線的執行緒數:預設為
// Math.max(1,
// SystemPropertyUtil.getInt("io.netty.eventLoopThreads",
// Math.max(MIN_IO_THREADS, Runtime.getRuntime().availableProcessors())));
// 針對多核處理器,該值一般等於 cpu 的核的數量
int ioThreadPoolSize = builder.ioThreadPoolSize;
if (ioThreadPoolSize < MIN_IO_THREADS) {
logger.info("ioThreadPoolSize is less than {} ({}), setting to: {}", MIN_IO_THREADS, ioThreadPoolSize,
MIN_IO_THREADS);
ioThreadPoolSize = MIN_IO_THREADS;
}
this.sharedEventLoopGroupProvider = false;
// 建立 EventLoopGroupProvider 物件
this.eventLoopGroupProvider = new DefaultEventLoopGroupProvider(ioThreadPoolSize);
} else {
this.sharedEventLoopGroupProvider = true;
this.eventLoopGroupProvider = builder.eventLoopGroupProvider;
}
// 以下程式碼省略 ...
}
// 4. 通過 EventLoopGroupProvider 建立 EventExecutorGroup 物件
public static <T extends EventExecutorGroup> EventExecutorGroup createEventLoopGroup(Class<T> type, int numberOfThreads) {
if (DefaultEventExecutorGroup.class.equals(type)) {
return new DefaultEventExecutorGroup(numberOfThreads, new DefaultThreadFactory("lettuce-eventExecutorLoop", true));
}
// 我們採用的是 Nio 模式,會執行這個分支
if (NioEventLoopGroup.class.equals(type)) {
return new NioEventLoopGroup(numberOfThreads, new DefaultThreadFactory("lettuce-nioEventLoop", true));
}
if (EpollProvider.isAvailable() && EpollProvider.isEventLoopGroup(type)) {
return EpollProvider.newEventLoopGroup(numberOfThreads, new DefaultThreadFactory("lettuce-epollEventLoop", true));
}
if (KqueueProvider.isAvailable() && KqueueProvider.isEventLoopGroup(type)) {
return KqueueProvider.newEventLoopGroup(numberOfThreads, new DefaultThreadFactory("lettuce-kqueueEventLoop", true));
}
throw new IllegalArgumentException(String.format("Type %s not supported", type.getName()));
}
// 5. NioEventLoopGroup 繼承了 MultithreadEventLoopGroup;
// 建立了多個 NioEventLoop;
// 每個 NioEventLoop 都是單執行緒;
// 每個 NioEventLoop 都可以處理多個連線。
public class NioEventLoopGroup extends MultithreadEventLoopGroup { ... }
public abstract class MultithreadEventLoopGroup extends MultithreadEventExecutorGroup implements EventLoopGroup { ... }
public final class NioEventLoop extends SingleThreadEventLoop { ... }
以上分析可知,預設建立的 RedisConnectionFactory bean 其實是支援多執行緒的,但通過 jstack 等方式檢視 lettuce-nioEventLoop 執行緒卻只有一個。
[root@ ~]# ss | grep 6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:36184 ::ffff:10.201.0.30:6379
檢視 redis 連線,發現只有一個。在 Netty 中,一個 EventLoop 執行緒可以處理多個 Channel,但是一個 Channel 只能繫結到一個 EventLoop,這是基於執行緒安全和同步考慮而設計的。這解釋了為什麼只有一個 lettuce-nioEventLoop。
下面繼續分析為什麼會只有一個連線呢?繼續原始碼分析:
// 1. 建立 RedisConnectionFactory bean
@Bean
@ConditionalOnMissingBean(RedisConnectionFactory.class)
public LettuceConnectionFactory redisConnectionFactory(
ClientResources clientResources) throws UnknownHostException {
LettuceClientConfiguration clientConfig = getLettuceClientConfiguration(
clientResources, this.properties.getLettuce().getPool());
return createLettuceConnectionFactory(clientConfig);
}
// 2. 檢視 createLettuceConnectionFactory(clientConfig) 方法
private LettuceConnectionFactory createLettuceConnectionFactory(
LettuceClientConfiguration clientConfiguration) {
if (getSentinelConfig() != null) {
return new LettuceConnectionFactory(getSentinelConfig(), clientConfiguration);
}
if (getClusterConfiguration() != null) {
return new LettuceConnectionFactory(getClusterConfiguration(),
clientConfiguration);
}
// 沒有哨兵模式,沒有叢集,執行這塊程式碼
return new LettuceConnectionFactory(getStandaloneConfig(), clientConfiguration);
}
// 3. 獲取 redis 連線
private boolean shareNativeConnection = true;
public LettuceReactiveRedisConnection getReactiveConnection() {
// 預設為 true
return getShareNativeConnection()
? new LettuceReactiveRedisConnection(getSharedReactiveConnection(), reactiveConnectionProvider)
: new LettuceReactiveRedisConnection(reactiveConnectionProvider);
}
LettuceReactiveRedisConnection(StatefulConnection<ByteBuffer, ByteBuffer> sharedConnection, LettuceConnectionProvider connectionProvider) {
Assert.notNull(sharedConnection, "Shared StatefulConnection must not be null!");
Assert.notNull(connectionProvider, "LettuceConnectionProvider must not be null!");
this.dedicatedConnection = new AsyncConnect(connectionProvider, StatefulConnection.class);
this.pubSubConnection = new AsyncConnect(connectionProvider, StatefulRedisPubSubConnection.class);
// 包裝 sharedConnection
this.sharedConnection = Mono.just(sharedConnection);
}
protected Mono<? extends StatefulConnection<ByteBuffer, ByteBuffer>> getConnection() {
// 直接返回 sharedConnection
if (sharedConnection != null) {
return sharedConnection;
}
return getDedicatedConnection();
}
// 4. shareNativeConnection 是怎麼來的
protected StatefulConnection<ByteBuffer, ByteBuffer> getSharedReactiveConnection() {
return shareNativeConnection ? getOrCreateSharedReactiveConnection().getConnection() : null;
}
private SharedConnection<ByteBuffer> getOrCreateSharedReactiveConnection() {
synchronized (this.connectionMonitor) {
if (this.reactiveConnection == null) {
this.reactiveConnection = new SharedConnection<>(reactiveConnectionProvider, true);
}
return this.reactiveConnection;
}
}
StatefulConnection<E, E> getConnection() {
synchronized (this.connectionMonitor) {
// 第一次通過 getNativeConnection() 獲取連線;之後直接返回該連線
if (this.connection == null) {
this.connection = getNativeConnection();
}
if (getValidateConnection()) {
validateConnection();
}
return this.connection;
}
}
分析以上原始碼,關鍵就在於 shareNativeConnection 預設為 true,導致只有一個連線。
更改 shareNativeConnection 的值為 true,並開啟 lettuce 連線池,最大連線數設定為 6;再次測試,
[root@eureka2 jstack]# ss | grep 6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:48937 ::ffff:10.201.0.30:6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:35842 ::ffff:10.201.0.30:6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:48932 ::ffff:10.201.0.30:6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:48930 ::ffff:10.201.0.30:6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:48936 ::ffff:10.201.0.30:6379
tcp ESTAB 0 0 ::ffff:10.201.0.27:48934 ::ffff:10.201.0.30:6379
[root@eureka2 jstack]# jstack 23080 | grep lettuce-epollEventLoop
"lettuce-epollEventLoop-4-6" #69 daemon prio=5 os_prio=0 tid=0x00007fcfa4012000 nid=0x5af2 runnable [0x00007fcfa81ef000]
"lettuce-epollEventLoop-4-5" #67 daemon prio=5 os_prio=0 tid=0x00007fcf94003800 nid=0x5af0 runnable [0x00007fcfa83f1000]
"lettuce-epollEventLoop-4-4" #60 daemon prio=5 os_prio=0 tid=0x00007fcfa0003000 nid=0x5ae9 runnable [0x00007fcfa8af8000]
"lettuce-epollEventLoop-4-3" #59 daemon prio=5 os_prio=0 tid=0x00007fcfb00b8000 nid=0x5ae8 runnable [0x00007fcfa8bf9000]
"lettuce-epollEventLoop-4-2" #58 daemon prio=5 os_prio=0 tid=0x00007fcf6c00f000 nid=0x5ae7 runnable [0x00007fcfa8cfa000]
"lettuce-epollEventLoop-4-1" #43 daemon prio=5 os_prio=0 tid=0x00007fcfac248800 nid=0x5a64 runnable [0x00007fd00c2b9000]
可以看到已經建立了 6 個 redis 連線,並且建立了 6 個 eventLoop 執行緒。
2.2 執行緒阻塞
再次進行壓力測試,結果如下:
[root@hystrix-dashboard wrk]# wrk -t 10 -c 500 -d 30s --latency -T 3s -s post-test.lua `http://10.201.0.27:8888/api/v1/json`
Running 30s test @ http://10.201.0.27:8888/api/v1/json
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 215.83ms 104.38ms 1.00s 75.76%
Req/Sec 234.56 49.87 434.00 71.45%
Latency Distribution
50% 210.63ms
75% 281.30ms
90% 336.78ms
99% 519.51ms
69527 requests in 30.04s, 22.43MB read
Requests/sec: 2314.14
Transfer/sec: 764.53KB
[root@eureka2 jstack]# top -Hp 23080
top - 10:08:10 up 162 days, 12:31, 2 users, load average: 2.92, 1.19, 0.53
Threads: 563 total, 9 running, 554 sleeping, 0 stopped, 0 zombie
%Cpu(s): 50.5 us, 10.2 sy, 0.0 ni, 36.2 id, 0.1 wa, 0.0 hi, 2.9 si, 0.0 st
KiB Mem : 7677696 total, 215924 free, 3308248 used, 4153524 buff/cache
KiB Swap: 6291452 total, 6291452 free, 0 used. 3468352 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23280 root 20 0 7418804 1.3g 7404 R 42.7 17.8 0:54.75 java
23272 root 20 0 7418804 1.3g 7404 S 31.1 17.8 0:44.63 java
23273 root 20 0 7418804 1.3g 7404 S 31.1 17.8 0:44.45 java
23271 root 20 0 7418804 1.3g 7404 R 30.8 17.8 0:44.63 java
23282 root 20 0 7418804 1.3g 7404 S 30.5 17.8 0:44.96 java
23119 root 20 0 7418804 1.3g 7404 R 24.8 17.8 1:27.30 java
23133 root 20 0 7418804 1.3g 7404 R 23.8 17.8 1:29.55 java
23123 root 20 0 7418804 1.3g 7404 S 23.5 17.8 1:28.98 java
23138 root 20 0 7418804 1.3g 7404 S 23.5 17.8 1:44.19 java
23124 root 20 0 7418804 1.3g 7404 R 22.8 17.8 1:32.21 java
23139 root 20 0 7418804 1.3g 7404 R 22.5 17.8 1:29.49 java
最終結果沒有任何提升,cpu 利用率依然不超過 400%,tps 也還是在 2300 request/s;單個 cpu 利用率最高不超過 50%,說明這次的瓶頸不是 cpu。
通過 jstack 檢視執行緒狀態,
"lettuce-epollEventLoop-4-3" #59 daemon prio=5 os_prio=0 tid=0x00007fcfb00b8000 nid=0x5ae8 waiting for monitor entry [0x00007fcfa8bf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.springframework.data.redis.core.script.DefaultRedisScript.getSha1(DefaultRedisScript.java:88)
- waiting to lock <0x000000008c1da690> (a java.lang.Object)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.eval(DefaultReactiveScriptExecutor.java:113)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.lambda$execute$0(DefaultReactiveScriptExecutor.java:105)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor$$Lambda$1317/1912229933.doInRedis(Unknown Source)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor.lambda$execute$6(DefaultReactiveScriptExecutor.java:167)
at org.springframework.data.redis.core.script.DefaultReactiveScriptExecutor$$Lambda$1318/1719274268.get(Unknown Source)
at reactor.core.publisher.FluxDefer.subscribe(FluxDefer.java:46)
at reactor.core.publisher.FluxDoFinally.subscribe(FluxDoFinally.java:73)
at reactor.core.publisher.FluxOnErrorResume.subscribe(FluxOnErrorResume.java:47)
at reactor.core.publisher.MonoReduceSeed.subscribe(MonoReduceSeed.java:65)
at reactor.core.publisher.MonoMapFuseable.subscribe(MonoMapFuseable.java:59)
at reactor.core.publisher.MonoFlatMap.subscribe(MonoFlatMap.java:60)
at reactor.core.publisher.Mono.subscribe(Mono.java:3608)
at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:169)
at reactor.core.publisher.MonoFlatMap.subscribe(MonoFlatMap.java:53)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:67)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:241)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)
at reactor.core.publisher.MonoProcessor.subscribe(MonoProcessor.java:457)
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:150)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)
at reactor.core.publisher.MonoHasElement$HasElementSubscriber.onNext(MonoHasElement.java:74)
at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1476)
at reactor.core.publisher.MonoProcessor.onNext(MonoProcessor.java:389)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:123)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.FluxFilter$FilterSubscriber.onNext(FluxFilter.java:107)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onNext(MonoFlatMapMany.java:238)
at reactor.core.publisher.FluxDefaultIfEmpty$DefaultIfEmptySubscriber.onNext(FluxDefaultIfEmpty.java:92)
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:114)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:76)
at io.lettuce.core.RedisPublisher$RedisSubscription.onNext(RedisPublisher.java:270)
at io.lettuce.core.RedisPublisher$SubscriptionCommand.complete(RedisPublisher.java:754)
at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:59)
at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:646)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:604)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:556)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:433)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:330)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
有 4 個 lettuce-epollEventLoop 執行緒都處於 BLOCKED 狀態,繼續檢視原始碼:
public class DefaultRedisScript<T> implements RedisScript<T>, InitializingBean {
private @Nullable ScriptSource scriptSource;
private @Nullable String sha1;
private @Nullable Class<T> resultType;
public String getSha1() {
// 1. 執行緒需要先獲取 shaModifiedMonitor 鎖
synchronized (shaModifiedMonitor) {
// 第一次呼叫時或者 lua 指令碼檔案被修改時,需要重新計算 sha1 的值
// 否則直接返回sha1
if (sha1 == null || scriptSource.isModified()) {
this.sha1 = DigestUtils.sha1DigestAsHex(getScriptAsString());
}
return sha1;
}
}
public String getScriptAsString() {
try {
return scriptSource.getScriptAsString();
} catch (IOException e) {
throw new ScriptingException("Error reading script text", e);
}
}
}
public class ResourceScriptSource implements ScriptSource {
// 只有第一次呼叫或者 lua 指令碼檔案被修改時,才會執行這個方法
@Override
public String getScriptAsString() throws IOException {
synchronized (this.lastModifiedMonitor) {
this.lastModified = retrieveLastModifiedTime();
}
Reader reader = this.resource.getReader();
return FileCopyUtils.copyToString(reader);
}
@Override
public boolean isModified() {
// 2. 每次都需要判斷 lua 指令碼是否被修改
// 執行緒需要再獲取 lastModifiedMonitor 鎖
synchronized (this.lastModifiedMonitor) {
return (this.lastModified < 0 || retrieveLastModifiedTime() > this.lastModified);
}
}
}
對於限流操作,重要性並沒有那麼高,而且計算介面呼叫次數的 lua 指令碼,一般也不會經常改動,所以沒必要獲取 sha1 的值的時候都檢視下指令碼是否有改動;如果偶爾改動的話,可以通過新增一個重新整理介面,在改動指令碼檔案後通過手動重新整理介面來改變 sha1 的值。
所以這裡,可以把同步操作去掉;我改成了這樣:
public class CustomRedisScript<T> extends DefaultRedisScript<T> {
private @Nullable String sha1;
CustomRedisScript(ScriptSource scriptSource, Class<T> resultType) {
setScriptSource(scriptSource);
setResultType(resultType);
this.sha1 = DigestUtils.sha1DigestAsHex(getScriptAsString());
}
@Override
public String getSha1() {
return sha1;
}
}
2.3 cpu 出現大量軟中斷
繼續測試,結果如下:
[root@hystrix-dashboard wrk]# wrk -t 10 -c 500 -d 30s -T 3s -s post-test.lua --latency `http://10.201.0.27:8888/api/v1/json`
Running 30s test @ http://10.201.0.27:8888/api/v1/json
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 155.60ms 110.40ms 1.07s 67.68%
Req/Sec 342.90 64.88 570.00 70.35%
Latency Distribution
50% 139.14ms
75% 211.03ms
90% 299.74ms
99% 507.03ms
102462 requests in 30.02s, 33.15MB read
Requests/sec: 3413.13
Transfer/sec: 1.10MB
cpu 利用率 500% 左右,tps 達到了 3400 req/s,效能大幅度提升。檢視 cpu 狀態:
[root@eureka2 imf2]# top -Hp 19021
top - 16:24:09 up 163 days, 18:47, 2 users, load average: 3.03, 1.08, 0.47
Threads: 857 total, 7 running, 850 sleeping, 0 stopped, 0 zombie
%Cpu0 : 60.2 us, 10.0 sy, 0.0 ni, 4.3 id, 0.0 wa, 0.0 hi, 25.4 si, 0.0 st
%Cpu1 : 64.6 us, 16.3 sy, 0.0 ni, 19.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 65.7 us, 15.8 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 54.5 us, 15.8 sy, 0.0 ni, 29.5 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 55.0 us, 17.8 sy, 0.0 ni, 27.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 53.2 us, 16.4 sy, 0.0 ni, 30.0 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 7677696 total, 174164 free, 3061892 used, 4441640 buff/cache
KiB Swap: 6291452 total, 6291452 free, 0 used. 3687692 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19075 root 20 0 7722156 1.2g 14488 S 41.4 15.9 0:55.71 java
19363 root 20 0 7722156 1.2g 14488 R 40.1 15.9 0:41.33 java
19071 root 20 0 7722156 1.2g 14488 R 37.1 15.9 0:56.38 java
19060 root 20 0 7722156 1.2g 14488 S 35.4 15.9 0:52.74 java
19073 root 20 0 7722156 1.2g 14488 R 35.1 15.9 0:55.83 java
cpu0 利用率達到了 95.7%,幾乎跑滿。但是其中出現了 si(軟中斷): 25.4%。
檢視軟中斷型別:
[root@eureka2 imf2]# watch -d -n 1 `cat /proc/softirqs`
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5
HI: 0 0 0 0 0 0
TIMER: 1629142082 990710808 852299786 606344269 586896512 566624764
NET_TX: 291570 833710 9616 5295 5358 2012064
NET_RX: 2563401537 32502894 31370533 6886869 6530120 6490002
BLOCK: 18130 1681 41404591 8751054 8695636 8763338
BLOCK_IOPOLL: 0 0 0 0 0 0
TASKLET: 39225643 0 0 817 17304 2516988
SCHED: 782335782 442142733 378856479 248794679 238417109 259695794
HRTIMER: 0 0 0 0 0 0
RCU: 690827224 504025610 464412234 246695846 254062933 248859132
其中 NET_RX,CPU0 的中斷次數遠遠大於其他 CPU,初步判斷是網路卡問題。
我這邊網路卡是 ens32,檢視網路卡的中斷號:
[root@eureka2 imf2]# cat /proc/interrupts | grep ens
18: 2524017495 0 0 0 0 7 IO-APIC-fasteoi ens32
[root@eureka2 imf2]# cat /proc/irq/18/smp_affinity
01
[root@eureka2 imf2]# cat /proc/irq/18/smp_affinity_list
0
網路卡的中斷配置到了 CPU0。(01:表示 cpu0,02:cpu1,04:cpu2,08:cpu3,10:cpu4,20:cpu5)
smp_affinity:16 進位制;smp_affinity_list:配置到了哪些 cpu。
檢視網路卡佇列模式:
[root@eureka2 ~]# lspci -vvv
02:00.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
Subsystem: VMware PRO/1000 MT Single Port Adapter
Physical Slot: 32
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0 (63750ns min), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at fd5c0000 (64-bit, non-prefetchable) [size=128K]
Region 2: Memory at fdff0000 (64-bit, non-prefetchable) [size=64K]
Region 4: I/O ports at 2000 [size=64]
[virtual] Expansion ROM at fd500000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device
Command: DPERE- ERO+ RBC=512 OST=1
Status: Dev=ff:1f.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
Kernel driver in use: e1000
Kernel modules: e1000
由於是單佇列模式,所以通過修改 /proc/irq/18/smp_affinity 的值不能生效。
可以通過 RPS/RFS 在軟體層面模擬多佇列網路卡功能。
[root@eureka2 ~]# echo 3e > /sys/class/net/ens32/queues/rx-0/rps_cpus
[root@eureka2 rx-0]# sysctl net.core.rps_sock_flow_entries=32768
[root@eureka2 rx-0]# echo 32768 > /sys/class/net/ens32/queues/rx-0/rps_flow_cnt
/sys/class/net/ens32/queues/rx-0/rps_cpus: 1e,設定模擬網路卡中斷分配到 cpu1-5 上。
繼續測試,
[root@hystrix-dashboard wrk]# wrk -t 10 -c 500 -d 30s -T 3s -s post-test.lua --latency `http://10.201.0.27:8888/api/v1/json`
Running 30s test @ http://10.201.0.27:8888/api/v1/json
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 146.75ms 108.45ms 1.01s 65.53%
Req/Sec 367.80 64.55 575.00 67.93%
Latency Distribution
50% 130.93ms
75% 200.72ms
90% 290.32ms
99% 493.84ms
109922 requests in 30.02s, 35.56MB read
Requests/sec: 3661.21
Transfer/sec: 1.18MB
[root@eureka2 rx-0]# top -Hp 19021
top - 09:39:49 up 164 days, 12:03, 1 user, load average: 2.76, 2.02, 1.22
Threads: 559 total, 9 running, 550 sleeping, 0 stopped, 0 zombie
%Cpu0 : 55.1 us, 13.0 sy, 0.0 ni, 17.5 id, 0.0 wa, 0.0 hi, 14.4 si, 0.0 st
%Cpu1 : 60.1 us, 14.0 sy, 0.0 ni, 22.5 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
%Cpu2 : 59.5 us, 14.3 sy, 0.0 ni, 22.4 id, 0.0 wa, 0.0 hi, 3.7 si, 0.0 st
%Cpu3 : 58.6 us, 15.2 sy, 0.0 ni, 22.2 id, 0.0 wa, 0.0 hi, 4.0 si, 0.0 st
%Cpu4 : 59.1 us, 14.8 sy, 0.0 ni, 22.7 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
%Cpu5 : 57.7 us, 16.2 sy, 0.0 ni, 23.0 id, 0.0 wa, 0.0 hi, 3.1 si, 0.0 st
KiB Mem : 7677696 total, 373940 free, 3217180 used, 4086576 buff/cache
KiB Swap: 6291452 total, 6291452 free, 0 used. 3533812 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19060 root 20 0 7415812 1.2g 13384 S 40.7 16.7 3:23.05 java
19073 root 20 0 7415812 1.2g 13384 R 40.1 16.7 3:20.56 java
19365 root 20 0 7415812 1.2g 13384 R 40.1 16.7 2:36.65 java
可以看到軟中斷也分配到了 cpu1-5 上;至於為什麼還是 cpu0 上軟中斷比例最高,猜測是因為還有一些其他中斷並且預設配置在 cpu0 上?
同時,tps 也從 3400 -> 3600,提升不大。
2.4 增加 redis 連線
經過以上修改,cup 利用率還是不超過 500%,說明在某些地方還是存在瓶頸。
嘗試修改了下 lettuce 連線池,
spring:
redis:
database: x
host: x.x.x.x
port: 6379
lettuce:
pool:
max-active: 18
min-idle: 1
max-idle: 18
主要是把 max-active 引數 6 增大到了 18,繼續測試:
[root@hystrix-dashboard wrk]# wrk -t 10 -c 500 -d 120s -T 3s -s post-test.lua --latency `http://10.201.0.27:8888/api/v1/json`
Running 2m test @ http://10.201.0.27:8888/api/v1/json
10 threads and 500 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 117.66ms 96.72ms 1.34s 86.48%
Req/Sec 485.42 90.41 790.00 70.80%
Latency Distribution
50% 90.04ms
75% 156.01ms
90% 243.63ms
99% 464.04ms
578298 requests in 2.00m, 187.01MB read
Requests/sec: 4815.57
Transfer/sec: 1.56MB
6 核 cpu 幾乎跑滿,同時 tps 也從 3600 -> 4800,提升明顯!
這說明之前的瓶頸出在 redis 連線上,那麼如何判斷 tcp 連線是瓶頸呢?(嘗試通過 ss、netstat 等命令檢視 tcp 傳送緩衝區、接收緩衝區、半連線佇列、全連線佇列等,未發現問題。先放著,以後在研究)