[原始碼分析] Dynomite 分散式儲存引擎 之 DynoJedisClient(2)

羅西的思考發表於2021-02-06

[原始碼分析] Dynomite 分散式儲存引擎 之 DynoJedisClient(2)

0x00 摘要

上文我們介紹了 NetFlix Dynomite 客戶端 DynoJedisClient 的 連線管理和拓撲感知部分,本文將繼續分析自動發現和故障轉移。

0x02 需求 & 思路

我們還是要回顧下基本思路和圖例。

因為要為上層遮蔽資訊,所以 DynoJedisClient 就需要應對各種複雜資訊,需要對系統有深刻的瞭解,,比如:

  • 如何維護連線,為持久連線提供連線池;
  • 如何維護拓撲;
  • 如何負載均衡;
  • 如何故障轉移;
  • 如何自動重試及發現,比如自動重試掛掉的主機。自動發現叢集中的其他主機。
  • 如何監控底層機架狀態;

因此,DynoJedisClient 的思路是:java驅動提供多個策略介面,可以用來驅動程式行為調優。包括負載均衡,重試請求,管理節點連線等等

目前圖例如下:

0x3 自動發現

自動發現 是在 ConnectionPoolImpl 的 start 方法中,啟動了執行緒,定期重新整理host狀態,進行update。

2.1 執行緒

重新整理執行緒邏輯如下,就是定期:

  • 呼叫 hostsUpdater 來獲得最新的狀態;
  • 呼叫 updateHosts 來依據這些狀態 更新 ConnectionPoolImpl 內部成員變數;
    @Override
    public Future<Boolean> start() throws DynoException {

        HostSupplier hostSupplier = cpConfiguration.getHostSupplier();

        HostStatusTracker hostStatus = hostsUpdater.refreshHosts();

        Collection<Host> hostsUp = hostStatus.getActiveHosts();

        final ExecutorService threadPool = Executors.newFixedThreadPool(Math.max(10, hostsUp.size()));
        final List<Future<Void>> futures = new ArrayList<Future<Void>>();

        // 初始化,新增host
        for (final Host host : hostsUp) {

            // Add host connection pool, but don't init the load balancer yet
            futures.add(threadPool.submit(new Callable<Void>() {
                @Override
                public Void call() throws Exception {
                    addHost(host, false);
                    return null;
                }
            }));
        }

      boolean success = started.compareAndSet(false, true);
        if (success) {
            idling.set(false);
            idleThreadPool.shutdownNow();
            selectionStrategy = initSelectionStrategy();
            cpHealthTracker.start();

            // 啟動定時執行緒
            connPoolThreadPool.scheduleWithFixedDelay(new Runnable() {

                @Override
                public void run() {
                    try {
                        // 呼叫 hostsUpdater 來獲得最新的狀態
                        HostStatusTracker hostStatus = hostsUpdater.refreshHosts();
                        
                        cpMonitor.setHostCount(hostStatus.getHostCount());
                        
                        // 呼叫updateHosts來依據這些狀態 更新 ConnectionPoolImpl 內部成員變數
                        updateHosts(hostStatus.getActiveHosts(), hostStatus.getInactiveHosts());
                    } 
                }

            }, 15 * 1000, 30 * 1000, TimeUnit.MILLISECONDS);

        }
        return getEmptyFutureTask(true);
    }

2.2 update操作

上面程式碼中,具體 updateHosts 完成更新 host 操作,分別就是新增,刪除。

@Override
public Future<Boolean> updateHosts(Collection<Host> hostsUp, Collection<Host> hostsDown) {
        boolean condition = false;
        if (hostsUp != null && !hostsUp.isEmpty()) {
            for (Host hostUp : hostsUp) {
                condition |= addHost(hostUp);
            }
        }
        if (hostsDown != null && !hostsDown.isEmpty()) {
            for (Host hostDown : hostsDown) {
                condition |= removeHost(hostDown);
            }
        }
        return getEmptyFutureTask(condition);
}

updateHosts 方法呼叫了 addHost,可以看到,其對於 selectionStrategy,cpMonitor,cpHealthTracker,cpMap 都進行相應操作,因為這些都需要針對 host 的變化做處理。

public boolean addHost(Host host, boolean refreshLoadBalancer) {
    
        HostConnectionPool<CL> connPool = cpMap.get(host);
    
        final HostConnectionPool<CL> hostPool = hostConnPoolFactory.createHostConnectionPool(host, this);

        HostConnectionPool<CL> prevPool = cpMap.putIfAbsent(host, hostPool);
    
        if (prevPool == null) {
            // This is the first time we are adding this pool.
            try {
                int primed = hostPool.primeConnections();

                if (hostPool.isActive()) {
                    if (refreshLoadBalancer) {
                        selectionStrategy.addHost(host, hostPool);
                    }
                    cpHealthTracker.initializePingHealthchecksForPool(hostPool);
                    cpMonitor.hostAdded(host, hostPool);
                } else {
                    cpMap.remove(host);
                }
                return primed > 0;
            }
        } 
}

此時邏輯如下:

+--------------------------------------------------------------------------+
|                                                                          |
|   ConnectionPoolImpl                                                     |
|                                                                          |
|                                                      Timer               |
|                                           +----------------------+       |
|                           refreshHosts    |                      |       |
|           hostsUpdater +----------------> |  connPoolThreadPool  |       |
|                                           |                      |       |
|                                           +------------+---------+       |
|                                                        |                 |
|                                                        |                 |
|                                         updateHosts    |                 |
|                                                        |                 |
|                      +----------------+-----------------------------+    |
|                      |                |                |            |    |
|                      |                |                |            |    |
|                      v                v                v            v    |
|             cpHealthTracker      selectionStrategy   cpMonitor    cpMap  |
|                                                                          |
|                                                                          |
|                                                                          |
+--------------------------------------------------------------------------+

2.3 發現

發現是通過 HostsUpdater 進行操作。

2.3.1 HostsUpdater

此類的主要作用是呼叫 HostSupplier 進行重新整理:

  • 遍歷 hostFromHostSupplier,如果 是up, down 狀態不同,則分別做不同處理,比如用hostFromHostSupplier的屬性來各種設定upHostBuilder,比如ip, hostname, status, port, DatastorePort, rack, datacenter, hashtag, password...
  • 呼叫 HostStatusTracker 進行記錄。
public class HostsUpdater {
    
    private final HostSupplier hostSupplier;
    private final TokenMapSupplier tokenMapSupplier;
    private final AtomicReference<HostStatusTracker> hostTracker = new AtomicReference<HostStatusTracker>(null);

    public HostStatusTracker refreshHosts() {

        List<Host> allHostsFromHostSupplier = hostSupplier.getHosts();

        /**
         * HostTracker should return the hosts that we get from TokenMapSupplier.
         * Hence get the hosts from HostSupplier and map them to TokenMapSupplier
         * and return them.
         */
        Set<Host> hostSet = new HashSet<>(allHostsFromHostSupplier);
        // Create a list of host/Tokens
        List<HostToken> hostTokens = tokenMapSupplier.getTokens(hostSet);

        // The key here really needs to be a object that is overlapping between
        // the host from HostSupplier and TokenMapSupplier. Since that is a
        // subset of the Host object itself, Host is the key as well as value here.
        Map<Host, Host> allHostSetFromTokenMapSupplier = new HashMap<>();
        for (HostToken ht : hostTokens) {
            allHostSetFromTokenMapSupplier.put(ht.getHost(), ht.getHost());
        }

        for (Host hostFromHostSupplier : allHostsFromHostSupplier) {
            if (hostFromHostSupplier.isUp()) {
                // 設定 up 狀態
                Host hostFromTokenMapSupplier = allHostSetFromTokenMapSupplier.get(hostFromHostSupplier);

                // 用hostFromHostSupplier的屬性來各種設定upHostBuilder,比如ip, hostname, status, port, DatastorePort, rack, datacenter, hashtag, password...
                HostBuilder upHostBuilder = new HostBuilder()
                        .setHostname()......;

                hostsUpFromHostSupplier.add(upHostBuilder.createHost());
                allHostSetFromTokenMapSupplier.remove(hostFromTokenMapSupplier);
            } else {
                // 設定 down 狀態
                Host hostFromTokenMapSupplier = allHostSetFromTokenMapSupplier.get(hostFromHostSupplier);

                // downHostBuilder,比如ip, hostname, status, port, DatastorePort, rack, datacenter, hashtag, password...
                HostBuilder downHostBuilder = new HostBuilder()
                        .setHostname()......;

                hostsDownFromHostSupplier.add(downHostBuilder.createHost());
                allHostSetFromTokenMapSupplier.remove(hostFromTokenMapSupplier);
            }
        }

        HostStatusTracker newTracker = hostTracker.get().computeNewHostStatus(hostsUpFromHostSupplier, hostsDownFromHostSupplier);
        hostTracker.set(newTracker);

        return hostTracker.get();
    }
}

2.3.2 HostStatusTracker

此類作用是記錄,其他模組從這裡提取資訊。

public class HostStatusTracker {
    // the set of active and inactive hosts
    private final Set<Host> activeHosts = new HashSet<Host>();
    private final Set<Host> inactiveHosts = new HashSet<Host>();
}

2.3.3 HostSupplier

HostSupplier就是具體重新整理,有很多實現。Dynomite 的具體的業務工作本來也就是需要依託給其他具體功能類來實現。

我們以EurekaHostsSupplier為例,就是呼叫 EurekaClient discoveryClient 獲取資訊。

public class EurekaHostsSupplier implements HostSupplier {

    private final EurekaClient discoveryClient;

    @Override
    public List<Host> getHosts() {
        return getUpdateFromEureka();
    }

    private List<Host> getUpdateFromEureka() {
        Application app = discoveryClient.getApplication(applicationName);
        List<Host> hosts = new ArrayList<Host>();
        List<InstanceInfo> ins = app.getInstances();
      
        hosts = Lists.newArrayList(Collections2.transform(ins, info -> {
            Host.Status status = info.getStatus() == InstanceStatus.UP ? Host.Status.Up : Host.Status.Down;

            String rack = null;
            try {
                if (info.getDataCenterInfo() instanceof AmazonInfo) {
                    AmazonInfo amazonInfo = (AmazonInfo) info.getDataCenterInfo();
                    rack = amazonInfo.get(MetaDataKey.availabilityZone);
                }
            } 

            Host host = new HostBuilder().setHostname(info.getHostName()).setIpAddress(info.getIPAddr()).setRack(rack).setStatus(status).createHost();
            return host;
        }));

        return hosts;
    }
}

因此,此時邏輯擴充如下:

+--------------------------------------------------------------------------------------+
|   ConnectionPoolImpl                                                                 |
|                                                                                      |
|                                                                                      |
|  +-----------------------------------+                                               |
|  | hostsUpdater                      |                                               |
|  |                                   |                                               |
|  |                                   |                                               |
|  |                                   |                          Timer                |
|  |                                   |                  +----------------------+     |
|  |                                   |  refreshHosts    |                      |     |
|  |         HostStatusTracker -------------------------> |  connPoolThreadPool  |     |
|  |                       ^           |                  |                      |     |
|  |                       |           |                  +------------+---------+     |
|  |  getUpdateFromEureka  |           |                               |               |
|  |                       |           |                               |               |
|  |                       |           |                updateHosts    |               |
|  |       +----------------------+    |                               |               |
|  |       | HostSupplier  |      |    |              +-----------------------------+  |
|  |       |               |      |    |              |                |            |  |
|  |       |               +      |    |              |                |            |  |
|  |       |      EurekaClient    |    |              v                v            v  |
|  |       |                      |    |         selectionStrategy   cpMonitor   cpMap |
|  |       +----------------------+    |                                               |
|  +-----------------------------------+                                               |
+--------------------------------------------------------------------------------------+

0x04 錯誤處理 & 負載均衡

既然我們正在執行一個群集而不是一個例項,那麼我們將在故障轉移時採取一些保護措施。

Dynomite 之中,錯誤主要有3種:

  • 無效的請求:錯誤直接返回應用上層,因為驅動程式無法知道如何處理此類請求;
  • 伺服器錯誤:驅動程式可以根據負載平衡策略嘗試下一個節點;
  • 網路超時:如果請求被標記為冪等,則驅動程式可以重試該請求。預設情況下,請求不被認為是冪等的,因此在可能的情況下將請求儘量標記是一個好習慣。
    對於冪等請求,如果在一定的延遲內沒有來自第一節點的響應,則驅動程式可以將請求傳送到第二節點。這稱為“推測重試”,用SpeculativeExecutionPolicy進行配置。

依據錯誤級別,錯誤處理 分別有 重試 與 fallback選擇 兩種,我們下面按照錯誤級別進行介紹。

4.1 重試策略

當節點發生故障或無法訪問時,驅動程式會自動並透明地嘗試其他節點並安排重新連線到後臺中的死節點。

但是 由於網路條件的臨時更改也會使節點顯示為離線,因此驅動程式還提供了一種 retry策略 來重試因網路相關錯誤而失敗的查詢。這消除了在客戶端程式碼中編寫重試邏輯的需要。

retry策略確定當請求超時或節點不可用時要採用的預設行為。

4.1.1 策略分類

Java驅動程式提供了幾個RetryPolicy實現:

  • RetryNTimes:保證一個操作可以被重試最多 N times,RetryNTimes (2) 意味著在放棄之前,最多 2 + 1 = 3 重試;
  • RunOnce:從不建議重試,始終建議重新丟擲異常;

4.1.2 策略使用

具體在執行命令時,我們可以看到,驅動會透明的嘗試其他節點並在後臺排程重新連線死亡節點:

  • 獲取重試策略;
  • 在迴圈中進行操作:
    • 執行操作;
    • 如果成功,就執行操作策略的success方法,跳出迴圈;
    • 如果失敗,就執行操作策略的failure方法;
    • 如果允許重試,就繼續執行迴圈;

簡略版程式碼如下:

@Override
public <R> OperationResult<R> executeWithFailover(Operation<CL, R> op) throws DynoException {
        RetryPolicy retry = cpConfiguration.getRetryPolicyFactory().getRetryPolicy();
        retry.begin();

        DynoException lastException = null;
        do {
            Connection<CL> connection = null;
            try {
                connection = selectionStrategy.getConnectionUsingRetryPolicy(op,
                        cpConfiguration.getMaxTimeoutWhenExhausted(), TimeUnit.MILLISECONDS, retry);
                OperationResult<R> result = connection.execute(op); // 執行操作
                retry.success();  // 如果成功,就執行操作策略的success方法,跳出迴圈 
                return result;        
            } catch (DynoException e) {
                retry.failure(e); // 如果失敗,就執行操作策略的failure方法
                lastException = e;
                if (connection != null) {
                    cpMonitor.incOperationFailure(connection.getHost(), e);
                    if (retry.allowRetry()) { // 呼叫具體 retry 實現策略
                        cpMonitor.incFailover(connection.getHost(), e);
                    }
                } 
            } 
        } while (retry.allowRetry()); // 如果允許重試,就繼續執行迴圈
}

具體我們以RetryNTimes為例。

4.1.3 RetryNTimes

可以看出來,就是通過sucess,failure來設定內部變數,以此決定是否允許重試。

public class RetryNTimes implements RetryPolicy {

    private int n;
    private final AtomicReference<RetryState> state = new AtomicReference<>(new RetryState(0, false));

    public RetryNTimes(int n, boolean allowFallback) {
        this.n = n;
        this.allowCrossZoneFallback = allowFallback;
    }

    @Override
    public void success() {
        boolean success = false;
        RetryState rs;
        while (!success) {
            rs = state.get(); // 設定內部變數
            success = state.compareAndSet(rs, new RetryState(rs.count + 1, true));
        }
    }

    @Override
    public void failure(Exception e) {
        boolean success = false;
        RetryState rs;
        while (!success) {
            rs = state.get(); // 設定內部變數
            success = state.compareAndSet(rs, new RetryState(rs.count + 1, false));
        }
    }

    @Override
    public boolean allowRetry() {
        final RetryState rs = state.get();
        return !rs.success && rs.count <= n;
    }

    private class RetryState {
        private final int count;
        private final boolean success;

        public RetryState(final int count, final boolean success) {
            this.count = count;
            this.success = success;
        }
    }
}

4.2 選擇策略

因為重試有時候不能解決問題,所以下面我們談談解決更加嚴重問題 的 fallback 選擇策略。

驅動可以對叢集中的任何節點進行查詢,然後將其稱為該查詢的協調節點。根據查詢的內容,協調器可以與其他節點通訊以滿足查詢。如果客戶端要在同一節點上引導其所有查詢,則會在叢集上產生不平衡負載,尤其是在其他客戶端執行相同操作的情況下。

為了防止單節點作為過多請求的協調節點,DynoJedisClient 驅動程式提供了一個可插拔的機制來平衡多個節點之間的查詢負載。通過選擇 HostSelectionStrategy 策略介面的實現來實現負載平衡。

每個 HostSelectionStrategy 將群集中的每個節點分類為本地,遠端或忽略。驅動程式更喜歡與本地節點的互動,並且與遠端節點保持與本地節點的更多連線。

HostSelectionStrategy 在構建時在群集上設定。驅動程式提供了兩種基本的負載平衡實現:RoundRobin Policy 和 TokenAwareSelection。

  • RoundRobinPolicy:以重複模式跨叢集中的節點分配請求以分散處理負載,在所有節點中負載均衡。
  • TokenAwareSelection:令牌感知,其使用令牌值以選擇作為所需資料的副本的節點進行請求,從而最小化必須查詢的節點的數量。這是通過使用TokenAwarePolicy包裝所選策略來實現的。

4.2.1 協調器

HostSelectionWithFallback 是選擇協調器。

  • 在眾多 HostSelectionStrategy 中進行協調,把需求具體map到特定機架rack;
  • HostSelectionWithFallback 並不負責具體實現(e.g Round Robin or Token Aware) ,而是從具體實現策略獲取連線;
  • HostSelectionWithFallback 依賴兩種策略:
    • 本地 "local" HostSelectionStrategy 會被優先使用;
    • 如果本地連線池或者本地hosts失效,HostSelectionWithFallback 會 falls back 到遠端策略 remote HostSelectionStrategy;
    • 當本地rack失效時,為了做到均勻分發負載,HostSelectionWithFallback 使用 pure round robin 來選擇 remote HostSelectionStrategy;
  • HostSelectionWithFallback 不會偏愛某種遠端策略;

下面我們看看具體定義。

HostSelectionWithFallback 的具體成員變數為:

  • 本地Host資訊,比如本地資料中心localDataCenter,本地機架localRack,本地選擇策略localSelector;
  • 遠端Host資訊,比如遠端zone名字remoteRackNames,遠端選擇策略remoteRackSelectors;
  • Token相關資訊,比如hostTokens,tokenSupplier,拓撲資訊topology;
  • 配置資訊cpConfig;
  • 監控資訊;

具體類定義如下:

public class HostSelectionWithFallback<CL> {
    // Only used in calculating replication factor
    private final String localDataCenter;
    // tracks the local zone
    private final String localRack;
    // The selector for the local zone
    private final HostSelectionStrategy<CL> localSelector;
    // Track selectors for each remote zone
    private final ConcurrentHashMap<String, HostSelectionStrategy<CL>> remoteRackSelectors = new ConcurrentHashMap<>();

    private final ConcurrentHashMap<Host, HostToken> hostTokens = new ConcurrentHashMap<>();

    private final TokenMapSupplier tokenSupplier;
    private final ConnectionPoolConfiguration cpConfig;
    private final ConnectionPoolMonitor cpMonitor;

    private final AtomicInteger replicationFactor = new AtomicInteger(-1);

    // Represents the *initial* topology from the token supplier. This does not affect selection of a host connection
    // pool for traffic. It only affects metrics such as failover/fallback
    private final AtomicReference<TokenPoolTopology> topology = new AtomicReference<>(null);

    // list of names of remote zones. Used for RoundRobin over remote zones when local zone host is down
    private final CircularList<String> remoteRackNames = new CircularList<>(new ArrayList<>());

    private final HostSelectionStrategyFactory<CL> selectorFactory;
}

4.2.2 策略

HostSelectionStrategy是選擇Host策略,具體實現是 RoundRobinSelection 和 TokenAwareSelection。

負載平衡負責建立與整個叢集(不僅在一個節點上)的連線,並維護與叢集中每個主機的連線池。負載平衡還確定主機是本地主機還是遠端主機。

它具有將某些請求傳送到某些節點的邏輯。與哪些主機建立連線以及向哪些主機傳送請求由負載平衡策略確定。

實際上,對每個請求都會算出一個查詢計劃。查詢計劃確定向哪個主機傳送請求以及以哪個順序傳送(取決於推測執行策略和重試策略)。

4.2.2.1 RoundRobinSelection

此策略如字面意思,使用 ROUND ROBIN 策略,以執行緒安全方式在環形資料結構上提供 RR 負載均衡。

  • 支援動態新增刪除 Host,就是外界當察覺到拓撲變化,會呼叫這裡介面,進行update。
  • 返回連線也就是簡單的從list中提取。
  • 提供不同的函式用來返回各種形式的連線,比如
    • getPoolForToken : 根據 token 返回;
    • getOrderedHostPools :返回排序的list;
    • getNextConnectionPool : 依據一個 circularList 返回;
    • getPoolsForTokens : 按照給定區域返回;
public class RoundRobinSelection<CL> implements HostSelectionStrategy<CL> {

    // The total set of host pools. Once the host is selected, we ask it's corresponding pool to vend a connection
    private final ConcurrentHashMap<Long, HostConnectionPool<CL>> tokenPools = new ConcurrentHashMap<Long, HostConnectionPool<CL>>();

    // the circular list of Host over which we load balance in a round robin fashion
    private final CircularList<HostToken> circularList = new CircularList<HostToken>(null);

    @Override
    public HostConnectionPool<CL> getPoolForOperation(BaseOperation<CL, ?> op, String hashtag) throws NoAvailableHostsException {

        int numTries = circularList.getSize();
        HostConnectionPool<CL> lastPool = null;

        while (numTries > 0) {
            lastPool = getNextConnectionPool();
            numTries--;
            if (lastPool.isActive() && lastPool.getHost().isUp()) {
                return lastPool;
            }
        }

        // If we reach here then we haven't found an active pool. Return the last inactive pool anyways,
        // and HostSelectionWithFallback can choose a fallback pool from another dc
        return lastPool;
    }

    @Override
    public void initWithHosts(Map<HostToken, HostConnectionPool<CL>> hPools) {

        for (HostToken token : hPools.keySet()) {
            tokenPools.put(token.getToken(), hPools.get(token));
        }
        circularList.swapWithList(hPools.keySet());
    }

    @Override
    public boolean addHostPool(HostToken host, HostConnectionPool<CL> hostPool) {

        HostConnectionPool<CL> prevPool = tokenPools.put(host.getToken(), hostPool);
        if (prevPool == null) {
            List<HostToken> newHostList = new ArrayList<HostToken>(circularList.getEntireList());
            newHostList.add(host);
            circularList.swapWithList(newHostList);
        }
        return prevPool == null;
    }
}

RR 策略大致如下,可以理解為從一個 circularList 裡面順序選擇下一個策略:

                        +--------------------------+
                        |HostSelectionWithFallback |
                        +-------------+------------+
                                      |
                       +--------------+--------------+
                       |                             |
                       v                             v
             +---------+------------+    +-----------+--------+
             | RoundRobinSelection  |    | TokenAwareSelection|
             |                      |    +--------------------+
             |                      |
             |    circularList      |
             |         +            |
             |         |            |
             +----------------------+
                       |
                       |
                       v

+--> Pool1, Pool2, Pool3,..., Pooln +----+
|                                        |
|                                        |
+----------------------------------------+

4.2.2.2 TokenAwareSelection

TokenAwareSelection使用 TOKEN AWARE 演算法進行處理。

TOKEN_AWARE就是根據主鍵token請求到相同的客戶端,就是根據token把對同一條記錄的請求,發到同一個節點。

所以此模組需要了解 dynomite ring topology,從而可以依據操作的key把其map到正確的token owner節點。

TokenAwareSelection

這種策略使用二分法查詢來依據key得到token,然後通過token定位到dynomite topology ring。

提供不同的函式用來返回各種形式的連線,比如

  • getPoolForToken : 根據 token 返回;
  • getOrderedHostPools :返回排序的list;
  • getNextConnectionPool : 依據一個 circularList 返回;
  • getPoolsForTokens : 按照給定區域返回;
public class TokenAwareSelection<CL> implements HostSelectionStrategy<CL> {

    private final BinarySearchTokenMapper tokenMapper;

    private final ConcurrentHashMap<Long, HostConnectionPool<CL>> tokenPools = new ConcurrentHashMap<Long, HostConnectionPool<CL>>();

    public TokenAwareSelection(HashPartitioner hashPartitioner) {
        this.tokenMapper = new BinarySearchTokenMapper(hashPartitioner);
    }

    /**
     * Identifying the proper pool for the operation. A couple of things that may affect the decision
     * (a) hashtags: In this case we will construct the key by decomposing from the hashtag
     * (b) type of key: string keys vs binary keys.
     * In binary keys hashtags do not really matter.
     */
    @Override
    public HostConnectionPool<CL> getPoolForOperation(BaseOperation<CL, ?> op, String hashtag) throws NoAvailableHostsException {

        String key = op.getStringKey();
        HostConnectionPool<CL> hostPool;
        HostToken hToken;

        if (key != null) {
            // If a hashtag is provided by Dynomite then we use that to create the key to hash.
            if (hashtag == null || hashtag.isEmpty()) {
                hToken = this.getTokenForKey(key);
            } else {
                String hashValue = StringUtils.substringBetween(key, Character.toString(hashtag.charAt(0)), Character.toString(hashtag.charAt(1)));
                hToken = this.getTokenForKey(hashValue);
            }
            hostPool = tokenPools.get(hToken.getToken());
        } else {
            // the key is binary
            byte[] binaryKey = op.getBinaryKey();
            hToken = this.getTokenForKey(binaryKey);
            hostPool = tokenPools.get(hToken.getToken());
        }
        return hostPool;
    }

    @Override
    public boolean addHostPool(HostToken hostToken, HostConnectionPool<CL> hostPool) {

        HostConnectionPool<CL> prevPool = tokenPools.put(hostToken.getToken(), hostPool);
        if (prevPool == null) {
            tokenMapper.addHostToken(hostToken);
            return true;
        } else {
            return false;
        }
    }
}

BinarySearchTokenMapper

上面程式碼使用到了 BinarySearchTokenMapper,所以我們再看看。

其實這個類就是key與token的對應關係,查詢時候使用了二分法。

public class BinarySearchTokenMapper implements HashPartitioner {

    private final HashPartitioner partitioner;

    private final AtomicReference<DynoBinarySearch<Long>> binarySearch = new AtomicReference<DynoBinarySearch<Long>>(null);
    
    private final ConcurrentHashMap<Long, HostToken> tokenMap = new ConcurrentHashMap<Long, HostToken>();

    @Override
    public HostToken getToken(Long keyHash) {
        Long token = binarySearch.get().getTokenOwner(keyHash);
        return tokenMap.get(token);
    }

    public void initSearchMechanism(Collection<HostToken> hostTokens) {
        for (HostToken hostToken : hostTokens) {
            tokenMap.put(hostToken.getToken(), hostToken);
        }
        initBinarySearch();
    }

    public void addHostToken(HostToken hostToken) {
        HostToken prevToken = tokenMap.putIfAbsent(hostToken.getToken(), hostToken);
        if (prevToken == null) {
            initBinarySearch();
        }
    }

    private void initBinarySearch() {
        List<Long> tokens = new ArrayList<Long>(tokenMap.keySet());
        Collections.sort(tokens);
        binarySearch.set(new DynoBinarySearch<Long>(tokens));
    }
}

Token Aware 策略如下,就是一個 map,依據 token 做 key,來選擇 Pool:

                        +--------------------------+
                        |HostSelectionWithFallback |
                        +-------------+------------+
                                      |
                       +--------------+--------------+
                       |                             |
                       v                             v
             +---------+------------+    +-----------+------------+
             | RoundRobinSelection  |    |   TokenAwareSelection  |
             |                      |    |                        |
             |                      |    |                        |
             |    circularList      |    |    ConcurrentHashMap   |
             |         +            |    |       +                |
             |         |            |    |       |                |
             +----------------------+    +------------------------+
                       |                         |
                       |                         |
                       v                         |     +-------------------+
                                                 |     | [token 1 : Pool 1]|
+--> Pool1, Pool2, Pool3,..., Pooln +----+       |     |                   |
|                                        |       +---> | [token 2 : Pool 2]|
|                                        |             |                   |
+----------------------------------------+             |      ......       |
                                                       |                   |
                                                       | [token 3 : Pool 3]|
                                                       +-------------------+

0x05 壓縮

我們最後介紹下壓縮的實現。

啟用壓縮可以減少驅動程式消耗的網路頻寬,但代價是客戶端和伺服器的CPU使用量會增加。

5.1 壓縮方式

驅動中,有兩種壓縮方式,就是簡單的不壓縮與限制壓縮Threshold。

enum CompressionStrategy {
        /**
         * Disables compression
         */
        NONE,

        /**
         * Compresses values that exceed {@link #getValueCompressionThreshold()}
         */
        THRESHOLD
}

5.2 壓縮實現

從程式碼看,使用

import java.util.zip.GZIPInputStream;
import java.util.zip.GZIPOutputStream;

具體操作如下:

private abstract class CompressionValueOperation<T> extends BaseKeyOperation<T>
            implements CompressionOperation<Jedis, T> {

        @Override
        public String compressValue(String value, ConnectionContext ctx) {
            String result = value;
            int thresholdBytes = connPool.getConfiguration().getValueCompressionThreshold();

            try {
                // prefer speed over accuracy here so rather than using
                // getBytes() to get the actual size
                // just estimate using 2 bytes per character
                if ((2 * value.length()) > thresholdBytes) {
                    result = ZipUtils.compressStringToBase64String(value);
                    ctx.setMetadata("compression", true);
                }
            } 

            return result;
        }

        @Override
        public String decompressValue(String value, ConnectionContext ctx) {
            try {
                if (ZipUtils.isCompressed(value)) {
                    ctx.setMetadata("decompression", true);
                    return ZipUtils.decompressFromBase64String(value);
                }
            } 

            return value;
        }

}

5.3 使用

以操作舉例,當需要壓縮時,就生成CompressionValueOperation。

public OperationResult<Map<String, String>> d_hgetAll(final String key) {
        if (CompressionStrategy.NONE == connPool.getConfiguration().getCompressionStrategy()) {
            return connPool.executeWithFailover(new BaseKeyOperation<Map<String, String>>(key, OpName.HGETALL) {
                @Override
                public Map<String, String> execute(Jedis client, ConnectionContext state) throws DynoException {
                    return client.hgetAll(key);
                }
            });
        } else {
            return connPool
                    .executeWithFailover(new CompressionValueOperation<Map<String, String>>(key, OpName.HGETALL) {
                        @Override
                        public Map<String, String> execute(final Jedis client, final ConnectionContext state) {
                            return CollectionUtils.transform(client.hgetAll(key),
                                    new CollectionUtils.MapEntryTransform<String, String, String>() {
                                        @Override
                                        public String get(String key, String val) {
                                            return decompressValue(val, state);
                                        }
                                    });
                        }
                    });
        }
}

0x06 總結

至此,DynoJedisClient 初步分析完畢,我們看到了 DynoJedisClient 是如何應對各種複雜資訊,比如:

  • 如何維護連線,為持久連線提供連線池;
  • 如何維護拓撲;
  • 如何負載均衡;
  • 如何故障轉移;
  • 如何自動重試及發現,比如自動重試掛掉的主機。自動發現叢集中的其他主機。
  • 如何監控底層機架狀態;

我們接下來引出 基於 DynoJedisClient 的 分散式延遲佇列 Dyno-queues ,看看它是如何實現的。

0xFF 參考

Cassandra系列(二):系統流程

Cassandra JAVA客戶端是如何做到高效能高併發的

Cassandra之Token

http://www.ningoo.net/html/2010/cassandra_token.html

cassandra權威指南讀書筆記--客戶端

關於cassandra叢集的資料一致性問題

相關文章