在rocketmq中,nameserver充當了一個配置管理者的角色,看起來好似不太重要。然而它是一個不或缺的角色,沒有了它的存在,各個broker就是一盤散沙,各自為戰。
所以,實際上,在rocketmq中,nameserver也是一個領導者的角色。它可以決定哪個訊息儲存到哪裡,哪個broker幹活或者上下線,在出現異常情況時,它要能夠及時處理。以便讓整個團隊發揮應有的作用。nameserver相當於一個分散式系統的協調者。但是這個名字,是不是看起來很熟悉?請看後續!
1. 為什麼會有nameserver?
如文章開頭所說,nameserver擔任的,差不多是一個系統協調者這麼個角色。那麼,我們知道,在分散式協調工作方面,有很多現成的元件可用。比如 zookeeper, 那麼為什麼還要自己搞一套nameserver出來?是為了刷存在感?
對於為什麼不選擇zk之類的元件實現協調者角色,初衷如何我們不得而知。但至少有幾個可知答案可以做下支撐:(以zk為例)
1. zk存在大量的叢集間通訊;
2. zk是一個比較重的元件,而本身就作為訊息中間的mq,則最好不好另外再依賴其他元件;(個人感覺)
3. zk對於資料的固化能力比較弱,配置往往受限於zk的資料格式;
總體來說,可能就是rocketmq想要做的功能在zk上不太好做,或者做起來也費勁,或者太重,索性就不要搞了。自己搞一個完全定製化的好了。事實上,rocketmq的nameserver也實現得相當簡單輕量。這也是設計者的初衷吧。
2. nameserver的啟動流程解析
一般地,一個框架級別的服務啟動,還是有些複雜的,那樣的話,我們懶得去看其具體過程。但前面說了,nameserver實現得非常輕量級,所以,其啟動也就相當簡單。所以,我們可以快速一覽其過程。
整個nameserver的啟動類是 org.apache.rocketmq.namesrv.NamesrvStartup, 工作過程大致如下:
// 入口main public static void main(String[] args) { main0(args); } public static NamesrvController main0(String[] args) { try { // 建立本服務的核心控制器, 解析各種配置引數,預設值之類的 NamesrvController controller = createNamesrvController(args); // 開啟服務, 如開啟 start(controller); String tip = "The Name Server boot success. serializeType=" + RemotingCommand.getSerializeTypeConfigInThisServer(); log.info(tip); System.out.printf("%s%n", tip); return controller; } catch (Throwable e) { e.printStackTrace(); System.exit(-1); } return null; }
所以整個啟動過程,基本就是一個 Controller 搞定了,你說不簡單嗎?額,也許不一定!整個建立 Controller 的過程就是解析引數的過程,有興趣可以開啟如下程式碼看看:
public static NamesrvController createNamesrvController(String[] args) throws IOException, JoranException { System.setProperty(RemotingCommand.REMOTING_VERSION_KEY, Integer.toString(MQVersion.CURRENT_VERSION)); //PackageConflictDetect.detectFastjson(); Options options = ServerUtil.buildCommandlineOptions(new Options()); commandLine = ServerUtil.parseCmdLine("mqnamesrv", args, buildCommandlineOptions(options), new PosixParser()); if (null == commandLine) { System.exit(-1); return null; } final NamesrvConfig namesrvConfig = new NamesrvConfig(); final NettyServerConfig nettyServerConfig = new NettyServerConfig(); nettyServerConfig.setListenPort(9876); // -c xx.properties 用於指定配置檔案,優先順序較低 if (commandLine.hasOption('c')) { String file = commandLine.getOptionValue('c'); if (file != null) { InputStream in = new BufferedInputStream(new FileInputStream(file)); properties = new Properties(); properties.load(in); MixAll.properties2Object(properties, namesrvConfig); MixAll.properties2Object(properties, nettyServerConfig); namesrvConfig.setConfigStorePath(file); System.out.printf("load config properties file OK, %s%n", file); in.close(); } } // -p 僅為列印檢視啟動引數 if (commandLine.hasOption('p')) { InternalLogger console = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_CONSOLE_NAME); MixAll.printObjectProperties(console, namesrvConfig); MixAll.printObjectProperties(console, nettyServerConfig); System.exit(0); } MixAll.properties2Object(ServerUtil.commandLine2Properties(commandLine), namesrvConfig); if (null == namesrvConfig.getRocketmqHome()) { System.out.printf("Please set the %s variable in your environment to match the location of the RocketMQ installation%n", MixAll.ROCKETMQ_HOME_ENV); System.exit(-2); } LoggerContext lc = (LoggerContext) LoggerFactory.getILoggerFactory(); JoranConfigurator configurator = new JoranConfigurator(); configurator.setContext(lc); lc.reset(); configurator.doConfigure(namesrvConfig.getRocketmqHome() + "/conf/logback_namesrv.xml"); log = InternalLoggerFactory.getLogger(LoggerName.NAMESRV_LOGGER_NAME); MixAll.printObjectProperties(log, namesrvConfig); MixAll.printObjectProperties(log, nettyServerConfig); // 將配置引數傳入controller構造例項 final NamesrvController controller = new NamesrvController(namesrvConfig, nettyServerConfig); // remember all configs to prevent discard controller.getConfiguration().registerConfig(properties); return controller; } // Controller 構造方法 // org.apache.rocketmq.namesrv.NamesrvController#NamesrvController public NamesrvController(NamesrvConfig namesrvConfig, NettyServerConfig nettyServerConfig) { this.namesrvConfig = namesrvConfig; this.nettyServerConfig = nettyServerConfig; this.kvConfigManager = new KVConfigManager(this); this.routeInfoManager = new RouteInfoManager(); this.brokerHousekeepingService = new BrokerHousekeepingService(this); this.configuration = new Configuration( log, this.namesrvConfig, this.nettyServerConfig ); this.configuration.setStorePathFromConfig(this.namesrvConfig, "configStorePath"); } // org.apache.rocketmq.common.Configuration#registerConfig /** * register config properties * * @return the current Configuration object */ public Configuration registerConfig(Properties extProperties) { if (extProperties == null) { return this; } try { readWriteLock.writeLock().lockInterruptibly(); try { merge(extProperties, this.allConfigs); } finally { readWriteLock.writeLock().unlock(); } } catch (InterruptedException e) { log.error("register lock error. {}" + extProperties); } return this; }
接下來,我們主要來看看這start()過程到底如何,複雜性必然都在這裡了。
// org.apache.rocketmq.namesrv.NamesrvStartup#start public static NamesrvController start(final NamesrvController controller) throws Exception { if (null == controller) { throw new IllegalArgumentException("NamesrvController is null"); } // 初始化controller各環境,如果失敗,則退出啟動 boolean initResult = controller.initialize(); if (!initResult) { controller.shutdown(); System.exit(-3); } // 註冊一個關閉鉤子 Runtime.getRuntime().addShutdownHook(new ShutdownHookThread(log, new Callable<Void>() { @Override public Void call() throws Exception { controller.shutdown(); return null; } })); // 核心start()方法 controller.start(); return controller; } // org.apache.rocketmq.namesrv.NamesrvController#initialize public boolean initialize() { this.kvConfigManager.load(); this.remotingServer = new NettyRemotingServer(this.nettyServerConfig, this.brokerHousekeepingService); this.remotingExecutor = Executors.newFixedThreadPool(nettyServerConfig.getServerWorkerThreads(), new ThreadFactoryImpl("RemotingExecutorThread_")); // 註冊處理器 this.registerProcessor(); // 啟動後臺掃描執行緒,掃描掉線的broker this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() { @Override public void run() { NamesrvController.this.routeInfoManager.scanNotActiveBroker(); } }, 5, 10, TimeUnit.SECONDS); // 列印日誌定時任務 this.scheduledExecutorService.scheduleAtFixedRate(new Runnable() { @Override public void run() { NamesrvController.this.kvConfigManager.printAllPeriodically(); } }, 1, 10, TimeUnit.MINUTES); if (TlsSystemConfig.tlsMode != TlsMode.DISABLED) { // Register a listener to reload SslContext try { fileWatchService = new FileWatchService( new String[] { TlsSystemConfig.tlsServerCertPath, TlsSystemConfig.tlsServerKeyPath, TlsSystemConfig.tlsServerTrustCertPath }, new FileWatchService.Listener() { boolean certChanged, keyChanged = false; @Override public void onChanged(String path) { if (path.equals(TlsSystemConfig.tlsServerTrustCertPath)) { log.info("The trust certificate changed, reload the ssl context"); reloadServerSslContext(); } if (path.equals(TlsSystemConfig.tlsServerCertPath)) { certChanged = true; } if (path.equals(TlsSystemConfig.tlsServerKeyPath)) { keyChanged = true; } if (certChanged && keyChanged) { log.info("The certificate and private key changed, reload the ssl context"); certChanged = keyChanged = false; reloadServerSslContext(); } } private void reloadServerSslContext() { ((NettyRemotingServer) remotingServer).loadSslContext(); } }); } catch (Exception e) { log.warn("FileWatchService created error, can't load the certificate dynamically"); } } // no false return true; } private void registerProcessor() { if (namesrvConfig.isClusterTest()) { this.remotingServer.registerDefaultProcessor(new ClusterTestRequestProcessor(this, namesrvConfig.getProductEnvName()), this.remotingExecutor); } else { // 只會有一個處理器處理業務 this.remotingServer.registerDefaultProcessor(new DefaultRequestProcessor(this), this.remotingExecutor); } } // 初始化完成後,接下來是 start() 方法 // org.apache.rocketmq.namesrv.NamesrvController#start public void start() throws Exception { // 開啟後臺埠服務,nameserver可連線 this.remotingServer.start(); // 檔案檢測執行緒 if (this.fileWatchService != null) { this.fileWatchService.start(); } }
可見,controller的啟動過程也非常簡單,就是設定好各初始例項,註冊處理器,然後將tcp埠開啟,即可。其中埠服務是使用netty作為通訊元件,其操作完全遵從netty程式設計正規化。可自行查閱。
@Override public void start() { this.defaultEventExecutorGroup = new DefaultEventExecutorGroup( nettyServerConfig.getServerWorkerThreads(), new ThreadFactory() { private AtomicInteger threadIndex = new AtomicInteger(0); @Override public Thread newThread(Runnable r) { return new Thread(r, "NettyServerCodecThread_" + this.threadIndex.incrementAndGet()); } }); prepareSharableHandlers(); ServerBootstrap childHandler = this.serverBootstrap.group(this.eventLoopGroupBoss, this.eventLoopGroupSelector) .channel(useEpoll() ? EpollServerSocketChannel.class : NioServerSocketChannel.class) .option(ChannelOption.SO_BACKLOG, 1024) .option(ChannelOption.SO_REUSEADDR, true) .option(ChannelOption.SO_KEEPALIVE, false) .childOption(ChannelOption.TCP_NODELAY, true) .childOption(ChannelOption.SO_SNDBUF, nettyServerConfig.getServerSocketSndBufSize()) .childOption(ChannelOption.SO_RCVBUF, nettyServerConfig.getServerSocketRcvBufSize()) .localAddress(new InetSocketAddress(this.nettyServerConfig.getListenPort())) .childHandler(new ChannelInitializer<SocketChannel>() { @Override public void initChannel(SocketChannel ch) throws Exception { ch.pipeline() .addLast(defaultEventExecutorGroup, HANDSHAKE_HANDLER_NAME, handshakeHandler) .addLast(defaultEventExecutorGroup, encoder, new NettyDecoder(), new IdleStateHandler(0, 0, nettyServerConfig.getServerChannelMaxIdleTimeSeconds()), connectionManageHandler, serverHandler ); } }); if (nettyServerConfig.isServerPooledByteBufAllocatorEnable()) { childHandler.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT); } try { ChannelFuture sync = this.serverBootstrap.bind().sync(); InetSocketAddress addr = (InetSocketAddress) sync.channel().localAddress(); this.port = addr.getPort(); } catch (InterruptedException e1) { throw new RuntimeException("this.serverBootstrap.bind().sync() InterruptedException", e1); } if (this.channelEventListener != null) { this.nettyEventExecutor.start(); } this.timer.scheduleAtFixedRate(new TimerTask() { @Override public void run() { try { NettyRemotingServer.this.scanResponseTable(); } catch (Throwable e) { log.error("scanResponseTable exception", e); } } }, 1000 * 3, 1000); }
至此,nameserver的啟動流程就完成了,果然是輕量級。至於其提供什麼樣的服務,我們下一節再講。
3. nameserver 業務處理框架
因nameserver和broker一樣,都共用了remoting模組的程式碼,即都依賴於netty的handler處理機制。所以其處理器入口都是一樣的。反正最終都是找到對應的processor, 然後處理業務即可。此處,nameserver只會提供一個預設的處理器,即DefaultRequestProcessor。所以,只需瞭解其processRequest()即可知nameserver的整體能力了。
// org.apache.rocketmq.namesrv.processor.DefaultRequestProcessor#processRequest @Override public RemotingCommand processRequest(ChannelHandlerContext ctx, RemotingCommand request) throws RemotingCommandException { if (ctx != null) { log.debug("receive request, {} {} {}", request.getCode(), RemotingHelper.parseChannelRemoteAddr(ctx.channel()), request); } switch (request.getCode()) { case RequestCode.PUT_KV_CONFIG: return this.putKVConfig(ctx, request); case RequestCode.GET_KV_CONFIG: return this.getKVConfig(ctx, request); case RequestCode.DELETE_KV_CONFIG: return this.deleteKVConfig(ctx, request); case RequestCode.QUERY_DATA_VERSION: return queryBrokerTopicConfig(ctx, request); // 註冊broker資訊,這種操作一般是在broker啟動的時候進行請求 case RequestCode.REGISTER_BROKER: Version brokerVersion = MQVersion.value2Version(request.getVersion()); if (brokerVersion.ordinal() >= MQVersion.Version.V3_0_11.ordinal()) { return this.registerBrokerWithFilterServer(ctx, request); } else { return this.registerBroker(ctx, request); } // 下線broker case RequestCode.UNREGISTER_BROKER: return this.unregisterBroker(ctx, request); // 獲取路由資訊,即哪個topic存在於哪些broker上,哪些messageQueue在哪裡等 case RequestCode.GET_ROUTEINFO_BY_TOPIC: return this.getRouteInfoByTopic(ctx, request); case RequestCode.GET_BROKER_CLUSTER_INFO: return this.getBrokerClusterInfo(ctx, request); case RequestCode.WIPE_WRITE_PERM_OF_BROKER: return this.wipeWritePermOfBroker(ctx, request); case RequestCode.GET_ALL_TOPIC_LIST_FROM_NAMESERVER: return getAllTopicListFromNameserver(ctx, request); case RequestCode.DELETE_TOPIC_IN_NAMESRV: return deleteTopicInNamesrv(ctx, request); case RequestCode.GET_KVLIST_BY_NAMESPACE: return this.getKVListByNamespace(ctx, request); case RequestCode.GET_TOPICS_BY_CLUSTER: return this.getTopicsByCluster(ctx, request); case RequestCode.GET_SYSTEM_TOPIC_LIST_FROM_NS: return this.getSystemTopicListFromNs(ctx, request); case RequestCode.GET_UNIT_TOPIC_LIST: return this.getUnitTopicList(ctx, request); case RequestCode.GET_HAS_UNIT_SUB_TOPIC_LIST: return this.getHasUnitSubTopicList(ctx, request); case RequestCode.GET_HAS_UNIT_SUB_UNUNIT_TOPIC_LIST: return this.getHasUnitSubUnUnitTopicList(ctx, request); case RequestCode.UPDATE_NAMESRV_CONFIG: return this.updateConfig(ctx, request); case RequestCode.GET_NAMESRV_CONFIG: return this.getConfig(ctx, request); default: break; } return null; }
以上就是整個nameserver提供的服務列表了,也沒啥註釋,見字如悟吧,我們也不想過多糾纏。但總體上,其處理的業務型別並不多,主要有三類:
1. 配置資訊kv的操作;
2. broker上下線管理操作;
3. topic路由資訊管理服務;
各自實現當然是按照業務處理,本無需多說,但為了解概要,我們還是挑一個重點來說說吧:broker的上線處理註冊:
// 為保持前沿起見,我們們以高版本服務展開思路(即版本大於3.0.11) public RemotingCommand registerBrokerWithFilterServer(ChannelHandlerContext ctx, RemotingCommand request) throws RemotingCommandException { final RemotingCommand response = RemotingCommand.createResponseCommand(RegisterBrokerResponseHeader.class); final RegisterBrokerResponseHeader responseHeader = (RegisterBrokerResponseHeader) response.readCustomHeader(); final RegisterBrokerRequestHeader requestHeader = (RegisterBrokerRequestHeader) request.decodeCommandCustomHeader(RegisterBrokerRequestHeader.class); if (!checksum(ctx, request, requestHeader)) { response.setCode(ResponseCode.SYSTEM_ERROR); response.setRemark("crc32 not match"); return response; } RegisterBrokerBody registerBrokerBody = new RegisterBrokerBody(); if (request.getBody() != null) { try { registerBrokerBody = RegisterBrokerBody.decode(request.getBody(), requestHeader.isCompressed()); } catch (Exception e) { throw new RemotingCommandException("Failed to decode RegisterBrokerBody", e); } } else { registerBrokerBody.getTopicConfigSerializeWrapper().getDataVersion().setCounter(new AtomicLong(0)); registerBrokerBody.getTopicConfigSerializeWrapper().getDataVersion().setTimestamp(0); } // 重點實現: registerBroker RegisterBrokerResult result = this.namesrvController.getRouteInfoManager().registerBroker( requestHeader.getClusterName(), requestHeader.getBrokerAddr(), requestHeader.getBrokerName(), requestHeader.getBrokerId(), requestHeader.getHaServerAddr(), registerBrokerBody.getTopicConfigSerializeWrapper(), registerBrokerBody.getFilterServerList(), ctx.channel()); responseHeader.setHaServerAddr(result.getHaServerAddr()); responseHeader.setMasterAddr(result.getMasterAddr()); byte[] jsonValue = this.namesrvController.getKvConfigManager().getKVListByNamespace(NamesrvUtil.NAMESPACE_ORDER_TOPIC_CONFIG); response.setBody(jsonValue); response.setCode(ResponseCode.SUCCESS); response.setRemark(null); return response; } // org.apache.rocketmq.namesrv.routeinfo.RouteInfoManager#registerBroker public RegisterBrokerResult registerBroker( final String clusterName, final String brokerAddr, final String brokerName, final long brokerId, final String haServerAddr, final TopicConfigSerializeWrapper topicConfigWrapper, final List<String> filterServerList, final Channel channel) { RegisterBrokerResult result = new RegisterBrokerResult(); try { try { // 上鎖更新各表資料 this.lock.writeLock().lockInterruptibly(); // 叢集名錶 Set<String> brokerNames = this.clusterAddrTable.get(clusterName); if (null == brokerNames) { brokerNames = new HashSet<String>(); this.clusterAddrTable.put(clusterName, brokerNames); } brokerNames.add(brokerName); boolean registerFirst = false; // broker詳細資訊表 BrokerData brokerData = this.brokerAddrTable.get(brokerName); if (null == brokerData) { registerFirst = true; brokerData = new BrokerData(clusterName, brokerName, new HashMap<Long, String>()); this.brokerAddrTable.put(brokerName, brokerData); } Map<Long, String> brokerAddrsMap = brokerData.getBrokerAddrs(); //Switch slave to master: first remove <1, IP:PORT> in namesrv, then add <0, IP:PORT> //The same IP:PORT must only have one record in brokerAddrTable Iterator<Entry<Long, String>> it = brokerAddrsMap.entrySet().iterator(); while (it.hasNext()) { Entry<Long, String> item = it.next(); if (null != brokerAddr && brokerAddr.equals(item.getValue()) && brokerId != item.getKey()) { it.remove(); } } String oldAddr = brokerData.getBrokerAddrs().put(brokerId, brokerAddr); registerFirst = registerFirst || (null == oldAddr); if (null != topicConfigWrapper && MixAll.MASTER_ID == brokerId) { if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion()) || registerFirst) { // 首次註冊或者topic變更,則更新topic資訊 ConcurrentMap<String, TopicConfig> tcTable = topicConfigWrapper.getTopicConfigTable(); if (tcTable != null) { for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) { this.createAndUpdateQueueData(brokerName, entry.getValue()); } } } } // 存活的broker資訊表 BrokerLiveInfo prevBrokerLiveInfo = this.brokerLiveTable.put(brokerAddr, new BrokerLiveInfo( System.currentTimeMillis(), topicConfigWrapper.getDataVersion(), channel, haServerAddr)); if (null == prevBrokerLiveInfo) { log.info("new broker registered, {} HAServer: {}", brokerAddr, haServerAddr); } if (filterServerList != null) { if (filterServerList.isEmpty()) { this.filterServerTable.remove(brokerAddr); } else { this.filterServerTable.put(brokerAddr, filterServerList); } } // slave節點註冊需繫結masterAddr 返回 if (MixAll.MASTER_ID != brokerId) { String masterAddr = brokerData.getBrokerAddrs().get(MixAll.MASTER_ID); if (masterAddr != null) { BrokerLiveInfo brokerLiveInfo = this.brokerLiveTable.get(masterAddr); if (brokerLiveInfo != null) { result.setHaServerAddr(brokerLiveInfo.getHaServerAddr()); result.setMasterAddr(masterAddr); } } } } finally { this.lock.writeLock().unlock(); } } catch (Exception e) { log.error("registerBroker Exception", e); } return result; }
好吧,是不是很抽象。沒關係,能知道大概意思就行了。大體上就是broker上線了,nameserver需要知道這些事,要把這資訊加入到各表項中,以備將來使用。具體理解我們應該要從業務性質出發才能透徹。反正就和我們們平時寫業務程式碼並無二致。
4. topic儲存位置策略
nameserver除了有註冊broker的核心作用外,還有一個非常核心的作用就是,為各消費者或生產者提供各topic資訊所在位置。這個位置決定了資料如何儲存以及如何訪問問題,只要這個決策出問題,則整個叢集的可靠性就無法保證了。所以,這個點需要我們深入理解下。
在kafka中,其儲存策略是和shard強相關的,一個topic分配了多少shard就決定了它可以儲存到幾個機器節點上,即kafka是以shard作為粒度分配儲存的。
但rocketmq中則不太一樣,類似的概念有:topic是最外層的儲存,而messageQueue則是內一層的儲存,它是否是按照topic儲存或者按照msgQueue存在呢?實際上,在官方文件中,已經描述清楚了: Broker 在實際部署過程中對應一臺伺服器,每個 Broker 可以儲存多個Topic的訊息,每個Topic的訊息也可以分片儲存於不同的 Broker。Message Queue 用於儲存訊息的實體地址,每個Topic中的訊息地址儲存於多個 Message Queue 中。
即rocketmq中是以message queue作為最細粒度的儲存的,實際上這基本無懸念,因為分散式儲存需要。(試想以topic為儲存粒度會帶來多少問題就知道了)
那麼,它又是如何劃分哪個message queue儲存在哪裡的呢?
// RequestCode.GET_ROUTEINFO_BY_TOPIC public RemotingCommand getRouteInfoByTopic(ChannelHandlerContext ctx, RemotingCommand request) throws RemotingCommandException { final RemotingCommand response = RemotingCommand.createResponseCommand(null); final GetRouteInfoRequestHeader requestHeader = (GetRouteInfoRequestHeader) request.decodeCommandCustomHeader(GetRouteInfoRequestHeader.class); // 獲取topic路由資訊 TopicRouteData topicRouteData = this.namesrvController.getRouteInfoManager().pickupTopicRouteData(requestHeader.getTopic()); if (topicRouteData != null) { // 順序消費配置 if (this.namesrvController.getNamesrvConfig().isOrderMessageEnable()) { String orderTopicConf = this.namesrvController.getKvConfigManager().getKVConfig(NamesrvUtil.NAMESPACE_ORDER_TOPIC_CONFIG, requestHeader.getTopic()); topicRouteData.setOrderTopicConf(orderTopicConf); } byte[] content = topicRouteData.encode(); response.setBody(content); response.setCode(ResponseCode.SUCCESS); response.setRemark(null); return response; } response.setCode(ResponseCode.TOPIC_NOT_EXIST); response.setRemark("No topic route info in name server for the topic: " + requestHeader.getTopic() + FAQUrl.suggestTodo(FAQUrl.APPLY_TOPIC_URL)); return response; } // org.apache.rocketmq.namesrv.routeinfo.RouteInfoManager#pickupTopicRouteData public TopicRouteData pickupTopicRouteData(final String topic) { TopicRouteData topicRouteData = new TopicRouteData(); boolean foundQueueData = false; boolean foundBrokerData = false; Set<String> brokerNameSet = new HashSet<String>(); List<BrokerData> brokerDataList = new LinkedList<BrokerData>(); topicRouteData.setBrokerDatas(brokerDataList); HashMap<String, List<String>> filterServerMap = new HashMap<String, List<String>>(); topicRouteData.setFilterServerTable(filterServerMap); try { try { this.lock.readLock().lockInterruptibly(); // 獲取所有topic的messageQueue資訊 List<QueueData> queueDataList = this.topicQueueTable.get(topic); if (queueDataList != null) { topicRouteData.setQueueDatas(queueDataList); foundQueueData = true; Iterator<QueueData> it = queueDataList.iterator(); while (it.hasNext()) { QueueData qd = it.next(); brokerNameSet.add(qd.getBrokerName()); } // 根據brokerName, 查詢broker資訊,如果沒找到說明該broker可能已經下線,不能算在路由資訊內 for (String brokerName : brokerNameSet) { BrokerData brokerData = this.brokerAddrTable.get(brokerName); if (null != brokerData) { BrokerData brokerDataClone = new BrokerData(brokerData.getCluster(), brokerData.getBrokerName(), (HashMap<Long, String>) brokerData .getBrokerAddrs().clone()); brokerDataList.add(brokerDataClone); // 只要找到一個broker就可以進行路由處理 foundBrokerData = true; for (final String brokerAddr : brokerDataClone.getBrokerAddrs().values()) { List<String> filterServerList = this.filterServerTable.get(brokerAddr); filterServerMap.put(brokerAddr, filterServerList); } } } } } finally { this.lock.readLock().unlock(); } } catch (Exception e) { log.error("pickupTopicRouteData Exception", e); } log.debug("pickupTopicRouteData {} {}", topic, topicRouteData); // 只有佇列資訊和broker資訊都找到時,整個路由資訊才可返回 if (foundBrokerData && foundQueueData) { return topicRouteData; } return null; } // QueueData 作為路由資訊的重要組成部分,其資料結構如下 public class QueueData implements Comparable<QueueData> { private String brokerName; private int readQueueNums; private int writeQueueNums; private int perm; private int topicSynFlag; ... } // brokerData 資料結構如下 public class BrokerData implements Comparable<BrokerData> { private String cluster; private String brokerName; private HashMap<Long/* brokerId */, String/* broker address */> brokerAddrs; ... }
ok, 從上面的實現中,我們可以看到,查詢路由資訊,是根據topic進行查詢的。而topic資訊儲存在 topicQueueTable 中。這裡有個重要點是,整個路由查詢過程,居然的queueId是無關的,那麼它又是如何定位queueId所在的位置呢?另外,這個topicQueTable裡的資料又是何時維護的呢?
首先,對於topicQueueTable的維護,是在broker註冊和解註冊時維護的,這很好理解。
// 也就前面看到的broker為master節點時的 createAndUpdateQueueData() private void createAndUpdateQueueData(final String brokerName, final TopicConfig topicConfig) { QueueData queueData = new QueueData(); queueData.setBrokerName(brokerName); queueData.setWriteQueueNums(topicConfig.getWriteQueueNums()); queueData.setReadQueueNums(topicConfig.getReadQueueNums()); queueData.setPerm(topicConfig.getPerm()); queueData.setTopicSynFlag(topicConfig.getTopicSysFlag()); List<QueueData> queueDataList = this.topicQueueTable.get(topicConfig.getTopicName()); // topic的首個broker if (null == queueDataList) { queueDataList = new LinkedList<QueueData>(); queueDataList.add(queueData); this.topicQueueTable.put(topicConfig.getTopicName(), queueDataList); log.info("new topic registered, {} {}", topicConfig.getTopicName(), queueData); } else { boolean addNewOne = true; Iterator<QueueData> it = queueDataList.iterator(); // 新增一個broker while (it.hasNext()) { QueueData qd = it.next(); if (qd.getBrokerName().equals(brokerName)) { if (qd.equals(queueData)) { addNewOne = false; } else { log.info("topic changed, {} OLD: {} NEW: {}", topicConfig.getTopicName(), qd, queueData); it.remove(); } } } if (addNewOne) { queueDataList.add(queueData); } } }
但針對queueId又是何時進行處理的呢?看起來nameserver不得而知。
事實上,資料傳送到哪個broker或從哪個broker上進行資料消費,是由各客戶端根據策略決定的。比如在producer中是這樣處理的:
// org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendDefaultImpl private SendResult sendDefaultImpl( Message msg, final CommunicationMode communicationMode, final SendCallback sendCallback, final long timeout ) throws MQClientException, RemotingException, MQBrokerException, InterruptedException { this.makeSureStateOK(); Validators.checkMessage(msg, this.defaultMQProducer); final long invokeID = random.nextLong(); long beginTimestampFirst = System.currentTimeMillis(); long beginTimestampPrev = beginTimestampFirst; long endTimestamp = beginTimestampFirst; // 此處即是nameserver返回的路由資訊,即可用的broker列表 TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic()); if (topicPublishInfo != null && topicPublishInfo.ok()) { boolean callTimeout = false; MessageQueue mq = null; Exception exception = null; SendResult sendResult = null; int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1; int times = 0; String[] brokersSent = new String[timesTotal]; for (; times < timesTotal; times++) { // 首次進入時,只是選擇一個佇列傳送 String lastBrokerName = null == mq ? null : mq.getBrokerName(); MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName); if (mqSelected != null) { mq = mqSelected; brokersSent[times] = mq.getBrokerName(); try { beginTimestampPrev = System.currentTimeMillis(); if (times > 0) { //Reset topic with namespace during resend. msg.setTopic(this.defaultMQProducer.withNamespace(msg.getTopic())); } long costTime = beginTimestampPrev - beginTimestampFirst; if (timeout < costTime) { callTimeout = true; break; } // 向選擇出來的messageQueue 傳送訊息資料 sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime); endTimestamp = System.currentTimeMillis(); this.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false); switch (communicationMode) { case ASYNC: return null; case ONEWAY: return null; case SYNC: if (sendResult.getSendStatus() != SendStatus.SEND_OK) { if (this.defaultMQProducer.isRetryAnotherBrokerWhenNotStoreOK()) { continue; } } return sendResult; default: break; } } catch (RemotingException e) ... } // org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#selectOneMessageQueue public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) { return this.mqFaultStrategy.selectOneMessageQueue(tpInfo, lastBrokerName); } // org.apache.rocketmq.client.latency.MQFaultStrategy#selectOneMessageQueue public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) { // 容錯處理,不影響策略理解 if (this.sendLatencyFaultEnable) { try { int index = tpInfo.getSendWhichQueue().getAndIncrement(); for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) { int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size(); if (pos < 0) pos = 0; MessageQueue mq = tpInfo.getMessageQueueList().get(pos); if (latencyFaultTolerance.isAvailable(mq.getBrokerName())) { if (null == lastBrokerName || mq.getBrokerName().equals(lastBrokerName)) return mq; } } final String notBestBroker = latencyFaultTolerance.pickOneAtLeast(); int writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker); if (writeQueueNums > 0) { final MessageQueue mq = tpInfo.selectOneMessageQueue(); if (notBestBroker != null) { mq.setBrokerName(notBestBroker); mq.setQueueId(tpInfo.getSendWhichQueue().getAndIncrement() % writeQueueNums); } return mq; } else { latencyFaultTolerance.remove(notBestBroker); } } catch (Exception e) { log.error("Error occurred when selecting message queue", e); } return tpInfo.selectOneMessageQueue(); } return tpInfo.selectOneMessageQueue(lastBrokerName); } // org.apache.rocketmq.client.impl.producer.TopicPublishInfo#selectOneMessageQueue // 直接使用輪詢的方式選擇一個佇列 public MessageQueue selectOneMessageQueue(final String lastBrokerName) { if (lastBrokerName == null) { // 任意選擇一個messageQueue作為傳送目標 return selectOneMessageQueue(); } else { int index = this.sendWhichQueue.getAndIncrement(); // 最大嘗試n次獲取不一樣的MQueue, 如仍然獲取不到,則隨便選擇一個即可 for (int i = 0; i < this.messageQueueList.size(); i++) { int pos = Math.abs(index++) % this.messageQueueList.size(); if (pos < 0) pos = 0; MessageQueue mq = this.messageQueueList.get(pos); if (!mq.getBrokerName().equals(lastBrokerName)) { return mq; } } return selectOneMessageQueue(); } }
好了,通過上面的描述,我們大概知道了,一個訊息要傳送往訊息server時,首先會根據topic找到所有可用的broker列表(nameserver提供),然後根據一個所謂策略選擇一個MessageQueue,最後向這個MessageQueue傳送資料即可。所以,這個MessageQueue是非常重要的,我們來看下其資料結構:
// org.apache.rocketmq.common.message.MessageQueue public class MessageQueue implements Comparable<MessageQueue>, Serializable { private static final long serialVersionUID = 6191200464116433425L; private String topic; private String brokerName; private int queueId; ... }
這是非常之簡潔啊,僅有主要的三個核心:topic(主題),brokerName(broker標識),queueId(佇列id)。
前面提到的客戶端策略,會選擇一個MessageQueue, 即會得到一個broker標識,對應一個queueId。所以,資料存放在哪個broker,是由客戶端決定的,且存放位置未知。也就是說,rocketmq中同一個topic的資料,是散亂存放在一堆broker中的。這是和我們通常的認知有一定差距的。
這樣設計有什麼好處呢?好處自然是有的,比如假如其中有些broker掛掉了,那麼整個叢集無需經過什麼再均衡策略,同樣可以工作得很好,因為客戶端可以直接向正常的broker傳送訊息即可。其他好處。。。
但是我個人覺得這樣的設計,也不見得很好,比如你不能夠很確定地定位到某條訊息在哪個broker上,完全無規律可循。另外,如果想在單queueId上保持一定的規則,則是不可能的(也許有其他曲線救國之法)。另外,對於queueId, 只是一個系統內部的概念,實際上使用者並不能指定該值。
5. MessageQueue到底存在哪裡?
按照上面說的,一個topic資料可能被存放在n個broker中,且以messageQueue的queueId作為單獨儲存。那麼,到底資料存放在哪裡?所說的n個broker到底指哪幾個broker?每個broker上到底存放了幾個queueId?這些問題如果沒有搞清楚,我們就無法說清楚這玩意。
我們先來回答第1個問題,topic資料到底存放在幾個broker中?回顧下前面broker的註冊過程可知:
// org.apache.rocketmq.namesrv.routeinfo.RouteInfoManager#registerBroker if (null != topicConfigWrapper && MixAll.MASTER_ID == brokerId) { if (this.isBrokerTopicConfigChanged(brokerAddr, topicConfigWrapper.getDataVersion()) || registerFirst) { // 首次註冊或者topic變更,則更新topic資訊 ConcurrentMap<String, TopicConfig> tcTable = topicConfigWrapper.getTopicConfigTable(); if (tcTable != null) { // 遍歷所有topic, 將當前新進的broker 加入到處理機器中 for (Map.Entry<String, TopicConfig> entry : tcTable.entrySet()) { this.createAndUpdateQueueData(brokerName, entry.getValue()); } } } }
看完這段,我們就明白了,原來所謂的n個broker可處理topic資訊,實際上指的是所有broker啊!好吧,我們也不懂為啥這麼幹同,反正就是這麼幹了,topic可能分佈在所有broker機器上。至於具體哪一臺,你猜啊!
接下來我們看第二個問題,一個broker到底儲存了幾個queueId的資料?實際上,我們稍微想想前面的實現,broker是指所有的broker,如果所有broker都是一樣的配置,那麼是不是應該讓每個broker都儲存所有queueId呢?(儘管沒啥依據,還是可以想想的嘛)
rocketmq的各客戶端(生產者、消費者)每次向伺服器傳送生產或消費請求時,都可能向nameserver請求拉取路由資訊,但這些資訊從我們前面調查的結果來看,並不包含queueId資訊。那麼,後續又是如何轉換為queueId的呢?實際上,就是在拉取了nameserver的路由資訊之後,本地再做一次分配就可以了:
// 更新topic路由資訊 // org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#tryToFindTopicPublishInfo private TopicPublishInfo tryToFindTopicPublishInfo(final String topic) { TopicPublishInfo topicPublishInfo = this.topicPublishInfoTable.get(topic); if (null == topicPublishInfo || !topicPublishInfo.ok()) { this.topicPublishInfoTable.putIfAbsent(topic, new TopicPublishInfo()); this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic); topicPublishInfo = this.topicPublishInfoTable.get(topic); } if (topicPublishInfo.isHaveTopicRouterInfo() || topicPublishInfo.ok()) { return topicPublishInfo; } else { // 從nameserver拉取路由資料 this.mQClientFactory.updateTopicRouteInfoFromNameServer(topic, true, this.defaultMQProducer); topicPublishInfo = this.topicPublishInfoTable.get(topic); return topicPublishInfo; } } // org.apache.rocketmq.client.impl.factory.MQClientInstance#updateTopicRouteInfoFromNameServer public boolean updateTopicRouteInfoFromNameServer(final String topic, boolean isDefault, DefaultMQProducer defaultMQProducer) { try { if (this.lockNamesrv.tryLock(LOCK_TIMEOUT_MILLIS, TimeUnit.MILLISECONDS)) { try { TopicRouteData topicRouteData; if (isDefault && defaultMQProducer != null) { topicRouteData = this.mQClientAPIImpl.getDefaultTopicRouteInfoFromNameServer(defaultMQProducer.getCreateTopicKey(), 1000 * 3); if (topicRouteData != null) { for (QueueData data : topicRouteData.getQueueDatas()) { int queueNums = Math.min(defaultMQProducer.getDefaultTopicQueueNums(), data.getReadQueueNums()); data.setReadQueueNums(queueNums); data.setWriteQueueNums(queueNums); } } } else { topicRouteData = this.mQClientAPIImpl.getTopicRouteInfoFromNameServer(topic, 1000 * 3); } if (topicRouteData != null) { TopicRouteData old = this.topicRouteTable.get(topic); boolean changed = topicRouteDataIsChange(old, topicRouteData); if (!changed) { changed = this.isNeedUpdateTopicRouteInfo(topic); } else { log.info("the topic[{}] route info changed, old[{}] ,new[{}]", topic, old, topicRouteData); } if (changed) { TopicRouteData cloneTopicRouteData = topicRouteData.cloneTopicRouteData(); for (BrokerData bd : topicRouteData.getBrokerDatas()) { this.brokerAddrTable.put(bd.getBrokerName(), bd.getBrokerAddrs()); } // Update Pub info { // 為每個broker分配queueId TopicPublishInfo publishInfo = topicRouteData2TopicPublishInfo(topic, topicRouteData); publishInfo.setHaveTopicRouterInfo(true); Iterator<Entry<String, MQProducerInner>> it = this.producerTable.entrySet().iterator(); while (it.hasNext()) { Entry<String, MQProducerInner> entry = it.next(); MQProducerInner impl = entry.getValue(); if (impl != null) { impl.updateTopicPublishInfo(topic, publishInfo); } } } // Update sub info { Set<MessageQueue> subscribeInfo = topicRouteData2TopicSubscribeInfo(topic, topicRouteData); Iterator<Entry<String, MQConsumerInner>> it = this.consumerTable.entrySet().iterator(); while (it.hasNext()) { Entry<String, MQConsumerInner> entry = it.next(); MQConsumerInner impl = entry.getValue(); if (impl != null) { impl.updateTopicSubscribeInfo(topic, subscribeInfo); } } } log.info("topicRouteTable.put. Topic = {}, TopicRouteData[{}]", topic, cloneTopicRouteData); this.topicRouteTable.put(topic, cloneTopicRouteData); return true; } } else { log.warn("updateTopicRouteInfoFromNameServer, getTopicRouteInfoFromNameServer return null, Topic: {}", topic); } } catch (MQClientException e) { if (!topic.startsWith(MixAll.RETRY_GROUP_TOPIC_PREFIX)) { log.warn("updateTopicRouteInfoFromNameServer Exception", e); } } catch (RemotingException e) { log.error("updateTopicRouteInfoFromNameServer Exception", e); throw new IllegalStateException(e); } finally { this.lockNamesrv.unlock(); } } else { log.warn("updateTopicRouteInfoFromNameServer tryLock timeout {}ms", LOCK_TIMEOUT_MILLIS); } } catch (InterruptedException e) { log.warn("updateTopicRouteInfoFromNameServer Exception", e); } return false; }
生產者分配queueId的實現如下:
// org.apache.rocketmq.client.impl.factory.MQClientInstance#topicRouteData2TopicPublishInfo public static TopicPublishInfo topicRouteData2TopicPublishInfo(final String topic, final TopicRouteData route) { TopicPublishInfo info = new TopicPublishInfo(); info.setTopicRouteData(route); // 為每個broker指定queueId的分配情況(最大queueId) // 這樣的配置不知道累不累 if (route.getOrderTopicConf() != null && route.getOrderTopicConf().length() > 0) { String[] brokers = route.getOrderTopicConf().split(";"); for (String broker : brokers) { String[] item = broker.split(":"); int nums = Integer.parseInt(item[1]); for (int i = 0; i < nums; i++) { MessageQueue mq = new MessageQueue(topic, item[0], i); info.getMessageQueueList().add(mq); } } info.setOrderTopic(true); } else { List<QueueData> qds = route.getQueueDatas(); Collections.sort(qds); for (QueueData qd : qds) { if (PermName.isWriteable(qd.getPerm())) { BrokerData brokerData = null; for (BrokerData bd : route.getBrokerDatas()) { if (bd.getBrokerName().equals(qd.getBrokerName())) { brokerData = bd; break; } } // 還是有broker無法處理queue哦 if (null == brokerData) { continue; } // 非master節點不能接受寫請求 if (!brokerData.getBrokerAddrs().containsKey(MixAll.MASTER_ID)) { continue; } // 根據 writeQueueNums 數量,要求該broker接受所有小於該值的queueId for (int i = 0; i < qd.getWriteQueueNums(); i++) { MessageQueue mq = new MessageQueue(topic, qd.getBrokerName(), i); info.getMessageQueueList().add(mq); } } } info.setOrderTopic(false); } return info; }
可以看出,生產者對應的broker中,負責寫的broker只能是master節點,負責所有小於writeQueueNums的資料儲存。(如果所有broker配置一樣,則相當於所有broker都儲存所有queueId),所以,這儲存關係,可能是理不清楚了。
我們再來看看消費者是如何對應queueId的呢?
// org.apache.rocketmq.client.impl.factory.MQClientInstance#topicRouteData2TopicSubscribeInfo public static Set<MessageQueue> topicRouteData2TopicSubscribeInfo(final String topic, final TopicRouteData route) { Set<MessageQueue> mqList = new HashSet<MessageQueue>(); List<QueueData> qds = route.getQueueDatas(); for (QueueData qd : qds) { if (PermName.isReadable(qd.getPerm())) { // 可讀取broker上對應的所有小於readQueueNums 的佇列 for (int i = 0; i < qd.getReadQueueNums(); i++) { MessageQueue mq = new MessageQueue(topic, qd.getBrokerName(), i); mqList.add(mq); } } } return mqList; }
原理和生產者差不多,就是通過一個 readQueueNums 來限定讀取的佇列數,基本上就是等於所有佇列了,原因可能是原本資料就儲存了所有queueId,如果消費者不讀取,又該誰來讀取呢。
好了,到此我們總算釐清了整個rocketmq的訊息儲存定位方式了。總結一句話就是:任何節點都可能有任意topic的任意queueId資料。這結果,不禁又讓我有一種千頭萬緒的感覺!
以上僅是一些正常的rocketmq資料儲存的實現,只能算是皮毛。事實上,分散式系統中一個非常重要的能力是容錯,這需要我們後續再聊。