博文推薦|Apache Pulsar 基於 Log4j2+Kafka+ELK 實現日誌的快速檢索

ApachePulsar發表於2022-01-27

本文轉載自公眾號 StreamCloudNative,作者薛鬆,就職於新大陸軟體,擔任高階軟體工程師。 

編輯:雞排,StreamNative。

關於 Apache Pulsar

Apache Pulsar 是 Apache 軟體基金會頂級專案,是下一代雲原生分散式訊息流平臺,集訊息、儲存、輕量化函式式計算為一體,採用計算與儲存分離架構設計,支援多租戶、持久化儲存、多機房跨區域資料複製,具有強一致性、高吞吐、低延時及高可擴充套件性等流資料儲存特性。 

當前已有眾多國內外大型網際網路和傳統行業公司採用 Apache Pulsar,案例分佈在人工智慧、金融、電信運營商、直播與短視訊、物聯網、零售與電子商務、線上教育等多個行業,如美國有線電視網路巨頭 Comcast、Yahoo!、騰訊、中國電信、中國移動、BIGO、VIPKID 等。 

背景介紹

Apache Pulsar 作為一個雲原生的分散式訊息系統,包括 Zookeeper、bookie、broker、functions-worker、proxy 等多個元件,並且所有的元件以分散式的方式部署在多臺主機上,因此每個元件的日誌檔案也就分散在多臺主機上,當元件出現問題時,由於日誌比較分散,想要檢查各個服務是否有報錯資訊,需要挨個服務去排查,比較麻煩,通常我們的做法是直接對日誌檔案進行 grep、awk 等命令就可以獲得想要的資訊。但是,隨著應用和服務體量的增加,支撐的節點也隨之增加,那麼傳統方法暴露出很多問題,比如:效率低下、日誌量太大如何歸檔、文字搜尋太慢怎麼辦、如何多維度查詢等等。所以我們希望通過對日誌進行聚合、監控,能夠快速的找到 Pulsar 各個服務的報錯資訊,快速的排查,使得運維更加具有目的性、針對性和直接性。

為了解決日誌檢索的問題,我們團隊考慮使用集中式日誌收集系統,將 Pulsar 所有節點上的日誌統一收集,管理,訪問。

一個完整的集中式日誌系統,需要包含以下幾個主要特點:

  • 收集-能夠採集多種來源的日誌資料;
  • 傳輸-能夠穩定的把日誌資料傳輸到中央系統;
  • 儲存-如何儲存日誌資料;
  • 分析-可以支援 UI 分析;
  • 警告-能夠提供錯誤報告,監控機制.

ELK 提供了一整套解決方案,並且都是開源軟體,之間互相配合使用,完美銜接,高效的滿足了很多場合的應用,是目前主流的一種日誌系統。我們公司擁有自研的大資料管理平臺,通過大資料管理平臺部署和管理 ELK,並且在生產系統中已經使用ELK 為多個業務系統提供了支撐服務。 ELK 是三個開源軟體的縮寫,分別表示:Elasticsearch、Logstash、Kibana , 它們都是開源軟體,最新版本已經改名為 Elastic Stack,並新增了 Beats專案,其中包括 FileBeat,它是一個輕量級的日誌收集處理工具 (Agent),Filebeat 佔用資源少,適合於在各個伺服器上搜集日誌後傳輸給 Logstash。

圖片

在上圖中可以看到,如果 Pulsar 使用這種日誌採集模式存在兩個問題:

  • 部署了 Pulsar 服務的主機必須部署一套 Filebeat 服務;
  • Pulsar 服務的日誌必須以檔案的方式落一次磁碟,佔用了主機磁碟的 IO。

為此,我們考慮 Apache Pulsar 基於 Log4j2+Kafka+ELK 實現日誌的快速檢索,Log4j2 預設支援將日誌傳送到 Kafka 的功能,使用 Kafka 自帶的 Log4j2Appender,在 Log4j2 配置檔案中進行相應的配置,即可完成將 Log4j2 產生的日誌實時傳送至 Kafka 中。

如下圖所示:

圖片

實施過程

下面以 Pulsar 2.6.2 版本為例,介紹 Apache Pulsar 基於 Log4j2+Kafka+ELK 實現日誌的快速檢索的解決方案的詳細的實施過程。 

一、準備工作

首先需要確定的是在 Kibana 中用於檢索日誌的欄位,可以對這些欄位聚合、多維度查詢,然後,Elasticsearch 根據檢索欄位進行分詞,並建立索引。

圖片

如上圖所示:我們將對 Pulsar 的日誌建立了 8 個檢索欄位,分別是:叢集名、主機名、主機IP、元件名、日誌內容、系統時間、日誌級別、叢集例項。

二、實施過程

說明:為了保證 Pulsar 原生的配置檔案和指令碼檔案的結構不被破壞,我們通過新增新的配置檔案和指令碼檔案來實現此方案。

1. 新增配置檔案

在 {PULSAR_HOME}/conf 目錄中新增以下兩個配置檔案:

1)logenv.sh 該檔案是將Pulsar 元件啟動時需要的 JVM 選項以配置的方式傳遞到 Pulsar 服務的 Java 程式中,內容示例如下:

KAFKA_CLUSTER=192.168.0.1:9092,192.168.0.2:9092,192.168.0.2:9092
PULSAR_CLUSTER=pulsar_cluster
PULSAR_TOPIC=pulsar_topic
HOST_IP=192.168.0.1
PULSAR_MODULE_INSTANCE_ID=1

以上這些欄位的意義分別是:

  • KAFKA_CLUSTER:Kafka broker list 地址;
  • PULSAR_CLUSTER:Pulsar 的叢集名稱;
  • PULSAR_TOPIC:Kafka 中用於接入 Pulsar 服務日誌的 Topic;
  • HOST_IP:Pulsar 主機的 IP;
  • PULSAR_MODULE_INSTANCE_ID:Pulsar 服務的例項標識,一個主機上可能會部署多個 Pulsar 叢集,叢集間通過例項標識來區分。

2)log4j2-kafka.yaml 

該配置檔案是從 log4j2.yaml 複製而來,在 log4j2.yaml 的基礎上新增以下修改: (說明:下圖中左側為 log4j2.yaml,右側為 log4j2-kafka.yaml。)

  • 新增 Kafka 叢集 broker list,並定義 log4j2 寫到 Kafka 中的訊息記錄格式,一條訊息中的 8 個檢索欄位以空格分割,Elasticsearch 以空格作為分割符對 8 個檢索欄位進行分詞。

圖片

•新增 kafka Appenders;

圖片

•新增 Failover;

圖片

•修改 Loggers 的 Root 和 Logger,改為非同步模式;

圖片

•log4j2-kafka.yaml 配置檔案的完整內容如下:

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#




Configuration:
  status: INFO
  monitorInterval: 30
  name: pulsar
  packages: io.prometheus.client.log4j2


  Properties:
    Property:
      - name: "pulsar.log.dir"
        value: "logs"
      - name: "pulsar.log.file"
        value: "pulsar.log"
      - name: "pulsar.log.appender"
        value: "RoutingAppender"
      - name: "pulsar.log.root.level"
        value: "info"
      - name: "pulsar.log.level"
        value: "info"
      - name: "pulsar.routing.appender.default"
        value: "Console"
      - name: "kafkaBrokers"
        value: "${sys:kafka.cluster}"
      - name: "pattern"
        value: "${sys:pulsar.cluster} ${sys:pulsar.hostname} ${sys:pulsar.hostip} ${sys:pulsar.module.type} ${sys:pulsar.module.instanceid} %date{yyyy-MM-dd HH:mm:ss.SSS} [%thread] [%c{10}] %level , %msg%n"


  # Example: logger-filter script
  Scripts:
    ScriptFile:
      name: filter.js
      language: JavaScript
      path: ./conf/log4j2-scripts/filter.js
      charset: UTF-8


  Appenders:


    #Kafka
    Kafka:
      name: "pulsar_kafka"
      topic: "${sys:pulsar.topic}"
      ignoreExceptions: "false"
      PatternLayout:
        pattern: "${pattern}"
      Property:
        - name: "bootstrap.servers"
          value: "${kafkaBrokers}"
        - name: "max.block.ms"
          value: "2000"


    # Console
    Console:
      name: Console
      target: SYSTEM_OUT
      PatternLayout:
        Pattern: "%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"


    Failover:
      name: "Failover"
      primary: "pulsar_kafka"
      retryIntervalSeconds: "600"
      Failovers:
        AppenderRef:
          ref: "RollingFile"


    # Rolling file appender configuration
    RollingFile:
      name: RollingFile
      fileName: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}"
      filePattern: "${sys:pulsar.log.dir}/${sys:pulsar.log.file}-%d{MM-dd-yyyy}-%i.log.gz"
      immediateFlush: false
      PatternLayout:
        Pattern: "%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"
      Policies:
        TimeBasedTriggeringPolicy:
          interval: 1
          modulate: true
        SizeBasedTriggeringPolicy:
          size: 1 GB
      # Delete file older than 30days
      DefaultRolloverStrategy:
          Delete:
            basePath: ${sys:pulsar.log.dir}
            maxDepth: 2
            IfFileName:
              glob: "*/${sys:pulsar.log.file}*log.gz"
            IfLastModified:
              age: 30d


    Prometheus:
      name: Prometheus


    # Routing
    Routing:
      name: RoutingAppender
      Routes:
        pattern: "$${ctx:function}"
        Route:
          -
            Routing:
              name: InstanceRoutingAppender
              Routes:
                pattern: "$${ctx:instance}"
                Route:
                  -
                    RollingFile:
                      name: "Rolling-${ctx:function}"
                      fileName : "${sys:pulsar.log.dir}/functions/${ctx:function}/${ctx:functionname}-${ctx:instance}.log"
                      filePattern : "${sys:pulsar.log.dir}/functions/${sys:pulsar.log.file}-${ctx:instance}-%d{MM-dd-yyyy}-%i.log.gz"
                      PatternLayout:
                        Pattern: "%d{ABSOLUTE} %level{length=5} [%thread] [instance: %X{instance}] %logger{1} - %msg%n"
                      Policies:
                        TimeBasedTriggeringPolicy:
                          interval: 1
                          modulate: true
                        SizeBasedTriggeringPolicy:
                          size: "20MB"
                        # Trigger every day at midnight that also scan
                        # roll-over strategy that deletes older file
                        CronTriggeringPolicy:
                          schedule: "0 0 0 * * ?"
                      # Delete file older than 30days
                      DefaultRolloverStrategy:
                          Delete:
                            basePath: ${sys:pulsar.log.dir}
                            maxDepth: 2
                            IfFileName:
                              glob: "*/${sys:pulsar.log.file}*log.gz"
                            IfLastModified:
                              age: 30d
                  - ref: "${sys:pulsar.routing.appender.default}"
                    key: "${ctx:function}"
          - ref: "${sys:pulsar.routing.appender.default}"
            key: "${ctx:function}"


  Loggers:


    # Default root logger configuration
    AsyncRoot:
      level: "${sys:pulsar.log.root.level}"
      additivity: true
      AppenderRef:
        - ref: "Failover"
          level: "${sys:pulsar.log.level}"
        - ref: Prometheus
          level: info


    AsyncLogger:
      - name: org.apache.bookkeeper.bookie.BookieShell
        level: info
        additivity: false
        AppenderRef:
          - ref: Console


      - name: verbose
        level: info
        additivity: false
        AppenderRef:
          - ref: Console


    # Logger to inject filter script
#     - name: org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl
#       level: debug
#       additivity: false
#       AppenderRef:
#         ref: "${sys:pulsar.log.appender}"
#         ScriptFilter:
#           onMatch: ACCEPT
#           onMisMatch: DENY
#           ScriptRef:
#             ref: filter.js

注意事項:

  • 日誌接入必須非同步,絕對不能影響服務效能;
  • 響應要求比較高的系統接入第三方系統,必須依賴解耦,此處的 Failover Appender 就是解耦對 Kafka 的依賴,當 Kafka Crash 時,日誌觸發 Failover,寫本地即可;
  • log4j2 Failover appender retryIntervalSeconds 的預設值是 1 分鐘,是通過異常來切換的,所以可以適量加大間隔,比如上面的 10分鐘;
  • Kafka appender ignoreExceptions 必須設定為 false,否則無法觸發 Failover;
  • 這裡有個比較大的坑是 max.block.ms Property,KafkaClient 包裡預設值是 60000ms,當 Kafka 當機時,嘗試寫 Kafka 需要 1 分鐘才能返回 Exception,之後才會觸發 Failover,當請求量大時,log4j2 佇列很快就會打滿,之後寫日誌就 Blocking,嚴重影響到主服務響應。所以要設定足夠短,佇列長度足夠長。

2、新增指令碼檔案

在 {PULSAR_HOME}/bin 目錄中新增以下兩個指令碼檔案: 1)pulsar-kafka 該指令碼檔案是從 pulsar 指令碼檔案複製而來,在 pulsar 指令碼檔案的基礎上新增如下修改: (說明:下圖中左側為 pulsar,右側為 pulsar-kafka。)

•指定 log4j2-kafka.yaml;

圖片

•新增讀取 logenv.sh 的內容;

圖片

•新增OPTS 選項,通過 pulsar-kafka 和 pulsar-daemon-kafka 兩個指令碼檔案中啟動 Pulsar 元件時將JVM 選項傳遞給 Java 程式;

圖片

•pulsar-kafka 指令碼檔案的完整內容如下:

#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#


BINDIR=$(dirname "$0")
export PULSAR_HOME=`cd -P $BINDIR/..;pwd`


DEFAULT_BROKER_CONF=$PULSAR_HOME/conf/broker.conf
DEFAULT_BOOKKEEPER_CONF=$PULSAR_HOME/conf/bookkeeper.conf
DEFAULT_ZK_CONF=$PULSAR_HOME/conf/zookeeper.conf
DEFAULT_CONFIGURATION_STORE_CONF=$PULSAR_HOME/conf/global_zookeeper.conf
DEFAULT_DISCOVERY_CONF=$PULSAR_HOME/conf/discovery.conf
DEFAULT_PROXY_CONF=$PULSAR_HOME/conf/proxy.conf
DEFAULT_STANDALONE_CONF=$PULSAR_HOME/conf/standalone.conf
DEFAULT_WEBSOCKET_CONF=$PULSAR_HOME/conf/websocket.conf
DEFAULT_LOG_CONF=$PULSAR_HOME/conf/log4j2-kafka.yaml
DEFAULT_PULSAR_PRESTO_CONF=${PULSAR_HOME}/conf/presto


# functions related variables
FUNCTIONS_HOME=$PULSAR_HOME/pulsar-functions
DEFAULT_WORKER_CONF=$PULSAR_HOME/conf/functions_worker.yml
DEFAULT_JAVA_INSTANCE_JAR=$PULSAR_HOME/instances/java-instance.jar
JAVA_INSTANCE_JAR=${PULSAR_JAVA_INSTANCE_JAR:-"${DEFAULT_JAVA_INSTANCE_JAR}"}
DEFAULT_PY_INSTANCE_FILE=$PULSAR_HOME/instances/python-instance/python_instance_main.py
PY_INSTANCE_FILE=${PULSAR_PY_INSTANCE_FILE:-"${DEFAULT_PY_INSTANCE_FILE}"}
DEFAULT_FUNCTIONS_EXTRA_DEPS_DIR=$PULSAR_HOME/instances/deps
FUNCTIONS_EXTRA_DEPS_DIR=${PULSAR_FUNCTIONS_EXTRA_DEPS_DIR:-"${DEFAULT_FUNCTIONS_EXTRA_DEPS_DIR}"}
SQL_HOME=$PULSAR_HOME/pulsar-sql
PRESTO_HOME=${PULSAR_HOME}/lib/presto


# Check bookkeeper env and load bkenv.sh
if [ -f "$PULSAR_HOME/conf/bkenv.sh" ]
then
    . "$PULSAR_HOME/conf/bkenv.sh"
fi


# Check pulsar env and load pulser_env.sh
if [ -f "$PULSAR_HOME/conf/pulsar_env.sh" ]
then
    . "$PULSAR_HOME/conf/pulsar_env.sh"
fi


if [ -f "$PULSAR_HOME/conf/logenv.sh" ]
then
    . "$PULSAR_HOME/conf/logenv.sh"
fi


# Check for the java to use
if [[ -z $JAVA_HOME ]]; then
    JAVA=$(which java)
    if [ $? != 0 ]; then
        echo "Error: JAVA_HOME not set, and no java executable found in $PATH." 1>&2
        exit 1
    fi
else
    JAVA=$JAVA_HOME/bin/java
fi


# exclude tests jar
RELEASE_JAR=`ls $PULSAR_HOME/pulsar-*.jar 2> /dev/null | grep -v tests | tail -1`
if [ $? == 0 ]; then
    PULSAR_JAR=$RELEASE_JAR
fi


# exclude tests jar
BUILT_JAR=`ls $PULSAR_HOME/pulsar-broker/target/pulsar-*.jar 2> /dev/null | grep -v tests | tail -1`
if [ $? != 0 ] && [ ! -e "$PULSAR_JAR" ]; then
    echo "\nCouldn't find pulsar jar.";
    echo "Make sure you've run 'mvn package'\n";
    exit 1;
elif [ -e "$BUILT_JAR" ]; then
    PULSAR_JAR=$BUILT_JAR
fi


#
# find the instance locations for pulsar-functions
#


# find the java instance location
if [ ! -f "${JAVA_INSTANCE_JAR}" ]; then
    # didn't find a released jar, then search the built jar
    BUILT_JAVA_INSTANCE_JAR="${FUNCTIONS_HOME}/runtime-all/target/java-instance.jar"
    if [ -z "${BUILT_JAVA_INSTANCE_JAR}" ]; then
        echo "\nCouldn't find pulsar-functions java instance jar.";
        echo "Make sure you've run 'mvn package'\n";
        exit 1;
    fi
    JAVA_INSTANCE_JAR=${BUILT_JAVA_INSTANCE_JAR}
fi


# find the python instance location
if [ ! -f "${PY_INSTANCE_FILE}" ]; then
    # didn't find a released python instance, then search the built python instance
    BUILT_PY_INSTANCE_FILE="${FUNCTIONS_HOME}/instance/target/python-instance/python_instance_main.py"
    if [ -z "${BUILT_PY_INSTANCE_FILE}" ]; then
        echo "\nCouldn't find pulsar-functions python instance.";
        echo "Make sure you've run 'mvn package'\n";
        exit 1;
    fi
    PY_INSTANCE_FILE=${BUILT_PY_INSTANCE_FILE}
fi


# find pulsar sql presto distribution location
check_presto_libraries() {
    if [ ! -d "${PRESTO_HOME}" ]; then


        BUILT_PRESTO_HOME="${SQL_HOME}/presto-distribution/target/pulsar-presto-distribution"
        if [ ! -d "${BUILT_PRESTO_HOME}" ]; then
            echo "\nCouldn't find presto distribution.";
            echo "Make sure you've run 'mvn package'\n";
            exit 1;
        fi
        PRESTO_HOME=${BUILT_PRESTO_HOME}
    fi
}


pulsar_help() {
    cat <<EOF
Usage: pulsar <command>
where command is one of:


    broker              Run a broker server
    bookie              Run a bookie server
    zookeeper           Run a zookeeper server
    configuration-store Run a configuration-store server
    discovery           Run a discovery server
    proxy               Run a pulsar proxy
    websocket           Run a web socket proxy server
    functions-worker    Run a functions worker server
    sql-worker          Run a sql worker server
    sql                 Run sql CLI
    standalone          Run a broker server with local bookies and local zookeeper


    initialize-cluster-metadata     One-time metadata initialization
    delete-cluster-metadata         Delete a cluster's metadata
    initialize-transaction-coordinator-metadata     One-time transaction coordinator metadata initialization
    initialize-namespace     namespace initialization
    compact-topic       Run compaction against a topic
    zookeeper-shell     Open a ZK shell client
    broker-tool         CLI to operate a specific broker
    tokens              Utility to create authentication tokens


    help                This help message


or command is the full name of a class with a defined main() method.


Environment variables:
   PULSAR_LOG_CONF               Log4j configuration file (default $DEFAULT_LOG_CONF)
   PULSAR_BROKER_CONF            Configuration file for broker (default: $DEFAULT_BROKER_CONF)
   PULSAR_BOOKKEEPER_CONF        Configuration file for bookie (default: $DEFAULT_BOOKKEEPER_CONF)
   PULSAR_ZK_CONF                Configuration file for zookeeper (default: $DEFAULT_ZK_CONF)
   PULSAR_CONFIGURATION_STORE_CONF         Configuration file for global configuration store (default: $DEFAULT_CONFIGURATION_STORE_CONF)
   PULSAR_DISCOVERY_CONF         Configuration file for discovery service (default: $DEFAULT_DISCOVERY_CONF)
   PULSAR_WEBSOCKET_CONF         Configuration file for websocket proxy (default: $DEFAULT_WEBSOCKET_CONF)
   PULSAR_PROXY_CONF             Configuration file for Pulsar proxy (default: $DEFAULT_PROXY_CONF)
   PULSAR_WORKER_CONF            Configuration file for functions worker (default: $DEFAULT_WORKER_CONF)
   PULSAR_STANDALONE_CONF        Configuration file for standalone (default: $DEFAULT_STANDALONE_CONF)
   PULSAR_PRESTO_CONF            Configuration directory for Pulsar Presto (default: $DEFAULT_PULSAR_PRESTO_CONF)
   PULSAR_EXTRA_OPTS             Extra options to be passed to the jvm
   PULSAR_EXTRA_CLASSPATH        Add extra paths to the pulsar classpath
   PULSAR_PID_DIR                Folder where the pulsar server PID file should be stored
   PULSAR_STOP_TIMEOUT           Wait time before forcefully kill the pulsar server instance, if the stop is not successful


These variable can also be set in conf/pulsar_env.sh
EOF
}


add_maven_deps_to_classpath() {
    MVN="mvn"
    if [ "$MAVEN_HOME" != "" ]; then
    MVN=${MAVEN_HOME}/bin/mvn
    fi


    # Need to generate classpath from maven pom. This is costly so generate it
    # and cache it. Save the file into our target dir so a mvn clean will get
    # clean it up and force us create a new one.
    f="${PULSAR_HOME}/distribution/server/target/classpath.txt"
    if [ ! -f "${f}" ]
    then
    ${MVN} -f "${PULSAR_HOME}/pom.xml" dependency:build-classpath -DincludeScope=compile -Dmdep.outputFile="${f}" &> /dev/null
    fi
    PULSAR_CLASSPATH=${CLASSPATH}:`cat "${f}"`
}


if [ -d "$PULSAR_HOME/lib" ]; then
PULSAR_CLASSPATH=$PULSAR_CLASSPATH:$PULSAR_HOME/lib/*
    ASPECTJ_AGENT_PATH=`ls -1 $PULSAR_HOME/lib/org.aspectj-aspectjweaver-*.jar`
else
    add_maven_deps_to_classpath


    ASPECTJ_VERSION=`grep '<aspectj.version>' $PULSAR_HOME/pom.xml | awk -F'>' '{print $2}' | awk -F'<' '{print $1}'`
    ASPECTJ_AGENT_PATH="$HOME/.m2/repository/org/aspectj/aspectjweaver/$ASPECTJ_VERSION/aspectjweaver-$ASPECTJ_VERSION.jar"
fi


ASPECTJ_AGENT="-javaagent:$ASPECTJ_AGENT_PATH"


# if no args specified, show usage
if [ $# = 0 ]; then
    pulsar_help;
    exit 1;
fi


# get arguments
COMMAND=$1
shift


if [ -z "$PULSAR_WORKER_CONF" ]; then
    PULSAR_WORKER_CONF=$DEFAULT_WORKER_CONF
fi


if [ -z "$PULSAR_BROKER_CONF" ]; then
    PULSAR_BROKER_CONF=$DEFAULT_BROKER_CONF
fi


if [ -z "$PULSAR_BOOKKEEPER_CONF" ]; then
    PULSAR_BOOKKEEPER_CONF=$DEFAULT_BOOKKEEPER_CONF
fi


if [ -z "$PULSAR_ZK_CONF" ]; then
    PULSAR_ZK_CONF=$DEFAULT_ZK_CONF
fi


if [ -z "$PULSAR_GLOBAL_ZK_CONF" ]; then
    PULSAR_GLOBAL_ZK_CONF=$DEFAULT_GLOBAL_ZK_CONF
fi


if [ -z "$PULSAR_CONFIGURATION_STORE_CONF" ]; then
    PULSAR_CONFIGURATION_STORE_CONF=$DEFAULT_CONFIGURATION_STORE_CONF
fi


if [ -z "$PULSAR_DISCOVERY_CONF" ]; then
    PULSAR_DISCOVERY_CONF=$DEFAULT_DISCOVERY_CONF
fi


if [ -z "$PULSAR_PROXY_CONF" ]; then
    PULSAR_PROXY_CONF=$DEFAULT_PROXY_CONF
fi


if [ -z "$PULSAR_WEBSOCKET_CONF" ]; then
    PULSAR_WEBSOCKET_CONF=$DEFAULT_WEBSOCKET_CONF
fi


if [ -z "$PULSAR_STANDALONE_CONF" ]; then
    PULSAR_STANDALONE_CONF=$DEFAULT_STANDALONE_CONF
fi


if [ -z "$PULSAR_LOG_CONF" ]; then
    PULSAR_LOG_CONF=$DEFAULT_LOG_CONF
fi


if [ -z "$PULSAR_PRESTO_CONF" ]; then
    PULSAR_PRESTO_CONF=$DEFAULT_PULSAR_PRESTO_CONF
fi


PULSAR_CLASSPATH="$PULSAR_JAR:$PULSAR_CLASSPATH:$PULSAR_EXTRA_CLASSPATH"
PULSAR_CLASSPATH="`dirname $PULSAR_LOG_CONF`:$PULSAR_CLASSPATH"
OPTS="$OPTS -Dlog4j.configurationFile=`basename $PULSAR_LOG_CONF`"


# Ensure we can read bigger content from ZK. (It might be
# rarely needed when trying to list many z-nodes under a
# directory)
OPTS="$OPTS -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true"


OPTS="-cp $PULSAR_CLASSPATH $OPTS"


OPTS="$OPTS $PULSAR_EXTRA_OPTS $PULSAR_MEM $PULSAR_GC"


# log directory & file
PULSAR_LOG_DIR=${PULSAR_LOG_DIR:-"$PULSAR_HOME/logs"}
PULSAR_LOG_APPENDER=${PULSAR_LOG_APPENDER:-"RoutingAppender"}
PULSAR_LOG_ROOT_LEVEL=${PULSAR_LOG_ROOT_LEVEL:-"info"}
PULSAR_LOG_LEVEL=${PULSAR_LOG_LEVEL:-"info"}
PULSAR_ROUTING_APPENDER_DEFAULT=${PULSAR_ROUTING_APPENDER_DEFAULT:-"Console"}


#Configure log configuration system properties
OPTS="$OPTS -Dpulsar.log.appender=$PULSAR_LOG_APPENDER"
OPTS="$OPTS -Dpulsar.log.dir=$PULSAR_LOG_DIR"
OPTS="$OPTS -Dpulsar.log.level=$PULSAR_LOG_LEVEL"
OPTS="$OPTS -Dpulsar.routing.appender.default=$PULSAR_ROUTING_APPENDER_DEFAULT"


# Functions related logging
OPTS="$OPTS -Dpulsar.functions.process.container.log.dir=$PULSAR_LOG_DIR"
# instance
OPTS="$OPTS -Dpulsar.functions.java.instance.jar=${JAVA_INSTANCE_JAR}"
OPTS="$OPTS -Dpulsar.functions.python.instance.file=${PY_INSTANCE_FILE}"
OPTS="$OPTS -Dpulsar.functions.extra.dependencies.dir=${FUNCTIONS_EXTRA_DEPS_DIR}"
OPTS="$OPTS -Dpulsar.functions.instance.classpath=${PULSAR_CLASSPATH}"
OPTS="$OPTS -Dpulsar.module.instanceid=${PULSAR_MODULE_INSTANCE_ID} -Dpulsar.module.type=$COMMAND -Dkafka.cluster=${KAFKA_CLUSTER} -Dpulsar.hostname=${HOSTNAME} -Dpulsar.hostip=${HOST_IP} -Dpulsar.cluster=${PULSAR_CLUSTER} -Dpulsar.topic=${PULSAR_TOPIC}"


ZK_OPTS=" -Dzookeeper.4lw.commands.whitelist=* -Dzookeeper.snapshot.trust.empty=true"


#Change to PULSAR_HOME to support relative paths
cd "$PULSAR_HOME"
if [ $COMMAND == "broker" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"pulsar-broker.log"}
    exec $JAVA $OPTS $ASPECTJ_AGENT -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.PulsarBrokerStarter --broker-conf $PULSAR_BROKER_CONF $@
elif [ $COMMAND == "bookie" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"bookkeeper.log"}
    # Pass BOOKIE_EXTRA_OPTS option defined in pulsar_env.sh
    OPTS="$OPTS $BOOKIE_EXTRA_OPTS"
    exec $JAVA $OPTS -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.bookkeeper.proto.BookieServer --conf $PULSAR_BOOKKEEPER_CONF $@
elif [ $COMMAND == "zookeeper" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"zookeeper.log"}
    exec $JAVA ${ZK_OPTS} $OPTS $ASPECTJ_AGENT -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.zookeeper.ZooKeeperStarter $PULSAR_ZK_CONF $@
elif [ $COMMAND == "global-zookeeper" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"global-zookeeper.log"}
    # Allow global ZK to turn into read-only mode when it cannot reach the quorum
    OPTS="${OPTS} ${ZK_OPTS} -Dreadonlymode.enabled=true"
    exec $JAVA $OPTS $ASPECTJ_AGENT -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.zookeeper.ConfigurationStoreStarter $PULSAR_GLOBAL_ZK_CONF $@
elif [ $COMMAND == "configuration-store" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"configuration-store.log"}
    # Allow global ZK to turn into read-only mode when it cannot reach the quorum
    OPTS="${OPTS} ${ZK_OPTS} -Dreadonlymode.enabled=true"
    exec $JAVA $OPTS $ASPECTJ_AGENT -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.zookeeper.ConfigurationStoreStarter $PULSAR_CONFIGURATION_STORE_CONF $@
elif [ $COMMAND == "discovery" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"discovery.log"}
    exec $JAVA $OPTS -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.discovery.service.server.DiscoveryServiceStarter $PULSAR_DISCOVERY_CONF $@
elif [ $COMMAND == "proxy" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"pulsar-proxy.log"}
    exec $JAVA $OPTS -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.proxy.server.ProxyServiceStarter --config $PULSAR_PROXY_CONF $@
elif [ $COMMAND == "websocket" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"pulsar-websocket.log"}
    exec $JAVA $OPTS -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.websocket.service.WebSocketServiceStarter $PULSAR_WEBSOCKET_CONF $@
elif [ $COMMAND == "functions-worker" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"pulsar-functions-worker.log"}
    exec $JAVA $OPTS -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.functions.worker.FunctionWorkerStarter -c $PULSAR_WORKER_CONF $@
elif [ $COMMAND == "standalone" ]; then
    PULSAR_LOG_FILE=${PULSAR_LOG_FILE:-"pulsar-standalone.log"}
    exec $JAVA $OPTS $ASPECTJ_AGENT ${ZK_OPTS} -Dpulsar.log.file=$PULSAR_LOG_FILE org.apache.pulsar.PulsarStandaloneStarter --config $PULSAR_STANDALONE_CONF $@
elif [ $COMMAND == "initialize-cluster-metadata" ]; then
    exec $JAVA $OPTS org.apache.pulsar.PulsarClusterMetadataSetup $@
elif [ $COMMAND == "delete-cluster-metadata" ]; then
    exec $JAVA $OPTS org.apache.pulsar.PulsarClusterMetadataTeardown $@
elif [ $COMMAND == "initialize-transaction-coordinator-metadata" ]; then
    exec $JAVA $OPTS org.apache.pulsar.PulsarTransactionCoordinatorMetadataSetup $@
elif [ $COMMAND == "initialize-namespace" ]; then
    exec $JAVA $OPTS org.apache.pulsar.PulsarInitialNamespaceSetup $@
elif [ $COMMAND == "zookeeper-shell" ]; then
    exec $JAVA $OPTS org.apache.zookeeper.ZooKeeperMain $@
elif [ $COMMAND == "broker-tool" ]; then
    exec $JAVA $OPTS org.apache.pulsar.broker.tools.BrokerTool $@
elif [ $COMMAND == "compact-topic" ]; then
    exec $JAVA $OPTS org.apache.pulsar.compaction.CompactorTool --broker-conf $PULSAR_BROKER_CONF $@
elif [ $COMMAND == "sql" ]; then
    check_presto_libraries
    exec $JAVA -cp "${PRESTO_HOME}/lib/*" io.prestosql.cli.Presto --server localhost:8081 "${@}"
elif [ $COMMAND == "sql-worker" ]; then
    check_presto_libraries
    exec ${PRESTO_HOME}/bin/launcher --etc-dir ${PULSAR_PRESTO_CONF} "${@}"
elif [ $COMMAND == "tokens" ]; then
      exec $JAVA $OPTS org.apache.pulsar.utils.auth.tokens.TokensCliUtils $@
elif [ $COMMAND == "help" -o $COMMAND == "--help" -o $COMMAND == "-h" ]; then
    pulsar_help;
else
    echo ""
    echo "-- Invalid command '$COMMAND' -- Use '$0 help' to get a list of valid commands"
    echo ""
    exit 1
fi

2)pulsar-daemon-kafka 

該指令碼檔案是從 pulsar-daemon 指令碼檔案複製而來,在 pulsar-daemon 指令碼檔案的基礎上新增如下修改: (說明:下圖中左側為 pulsar-daemon,右側為 pulsar-daemon-kafka。)

•新增讀取 logenv.sh 的內容;

圖片

•讀取 pulsar-kafka 的內容;

圖片

•pulsar-daemon-kafka 指令碼檔案的完整內容如下:


#!/usr/bin/env bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#


usage() {
    cat <<EOF
Usage: pulsar-daemon (start|stop) <command> <args...>
where command is one of:
    broker              Run a broker server
    bookie              Run a bookie server
    zookeeper           Run a zookeeper server
    configuration-store Run a configuration-store server
    discovery           Run a discovery server
    websocket           Run a websocket proxy server
    functions-worker    Run a functions worker server
    standalone          Run a standalone Pulsar service
    proxy               Run a Proxy Pulsar service


where argument is one of:
    -force (accepted only with stop command): Decides whether to stop the server forcefully if not stopped by normal shutdown
EOF
}


BINDIR=$(dirname "$0")
PULSAR_HOME=$(cd -P $BINDIR/..;pwd)


# Check bookkeeper env and load bkenv.sh
if [ -f "$PULSAR_HOME/conf/bkenv.sh" ]
then
    . "$PULSAR_HOME/conf/bkenv.sh"
fi


if [ -f "$PULSAR_HOME/conf/pulsar_env.sh" ]
then
    . "$PULSAR_HOME/conf/pulsar_env.sh"
fi


if [ -f "$PULSAR_HOME/conf/logenv.sh" ]
then
    . "$PULSAR_HOME/conf/logenv.sh"
fi

PULSAR_LOG_APPENDER=${PULSAR_LOG_APPENDER:-"RollingFile"}
PULSAR_STOP_TIMEOUT=${PULSAR_STOP_TIMEOUT:-30}
PULSAR_PID_DIR=${PULSAR_PID_DIR:-$PULSAR_HOME/bin}


if [ $# = 0 ]; then
    usage
    exit 1
elif [ $# = 1 ]; then
    if [ $1 == "--help" -o $1 == "-h" ]; then
        usage
        exit 1
    else
        echo "Error: no enough arguments provided."
        usage
        exit 1
    fi
fi


startStop=$1
shift
command=$1
shift


case $command in
    (broker)
        echo "doing $startStop $command ..."
        ;;
    (bookie)
        echo "doing $startStop $command ..."
        ;;
    (zookeeper)
        echo "doing $startStop $command ..."
        ;;
    (global-zookeeper)
        echo "doing $startStop $command ..."
        ;;
    (configuration-store)
        echo "doing $startStop $command ..."
        ;;
    (discovery)
        echo "doing $startStop $command ..."
        ;;
    (websocket)
        echo "doing $startStop $command ..."
        ;;
    (functions-worker)
        echo "doing $startStop $command ..."
        ;;
    (standalone)
        echo "doing $startStop $command ..."
        ;;
    (proxy)
        echo "doing $startStop $command ..."
        ;;
    (*)
        echo "Error: unknown service name $command"
        usage
        exit 1
        ;;
esac


export PULSAR_LOG_DIR=$PULSAR_LOG_DIR
export PULSAR_LOG_APPENDER=$PULSAR_LOG_APPENDER
export PULSAR_LOG_FILE=pulsar-$command-$HOSTNAME.log


pid=$PULSAR_PID_DIR/pulsar-$command.pid
out=$PULSAR_LOG_DIR/pulsar-$command-$HOSTNAME.out
logfile=$PULSAR_LOG_DIR/$PULSAR_LOG_FILE


rotate_out_log ()
{
    log=$1;
    num=5;
    if [ -n "$2" ]; then
       num=$2
    fi
    if [ -f "$log" ]; then # rotate logs
        while [ $num -gt 1 ]; do
            prev=`expr $num - 1`
            [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
            num=$prev
        done
        mv "$log" "$log.$num";
    fi
}


mkdir -p "$PULSAR_LOG_DIR"


case $startStop in
  (start)
    if [ -f $pid ]; then
      if kill -0 `cat $pid` > /dev/null 2>&1; then
        echo $command running as process `cat $pid`.  Stop it first.
        exit 1
      fi
    fi


    rotate_out_log $out
    echo starting $command, logging to $logfile
    echo Note: Set immediateFlush to true in conf/log4j2-kafka.yaml will guarantee the logging event is flushing to disk immediately. The default behavior is switched off due to performance considerations.
    pulsar=$PULSAR_HOME/bin/pulsar-kafka
    nohup $pulsar $command "$@" > "$out" 2>&1 < /dev/null &
    echo $! > $pid
    sleep 1; head $out
    sleep 2;
    if ! ps -p $! > /dev/null ; then
      exit 1
    fi
    ;;


  (stop)
    if [ -f $pid ]; then
      TARGET_PID=$(cat $pid)
      if kill -0 $TARGET_PID > /dev/null 2>&1; then
        echo "stopping $command"
        kill $TARGET_PID


        count=0
        location=$PULSAR_LOG_DIR
        while ps -p $TARGET_PID > /dev/null;
         do
          echo "Shutdown is in progress... Please wait..."
          sleep 1
          count=`expr $count + 1`


          if [ "$count" = "$PULSAR_STOP_TIMEOUT" ]; then
                break
          fi
         done


        if [ "$count" != "$PULSAR_STOP_TIMEOUT" ]; then
            echo "Shutdown completed."
        fi


        if kill -0 $TARGET_PID > /dev/null 2>&1; then
              fileName=$location/$command.out
              $JAVA_HOME/bin/jstack $TARGET_PID > $fileName
              echo "Thread dumps are taken for analysis at $fileName"
              if [ "$1" == "-force" ]
              then
                 echo "forcefully stopping $command"
                 kill -9 $TARGET_PID >/dev/null 2>&1
                 echo Successfully stopped the process
              else
                 echo "WARNNING :  $command is not stopped completely."
                 exit 1
              fi
        fi
      else
        echo "no $command to stop"
      fi
      rm $pid
    else
      echo no "$command to stop"
    fi
    ;;


  (*)
    usage
    exit 1
    ;;
esac

3、新增 Kafka Producer 依賴的 jar

在 pulsar 叢集的所有節點上的 {PULSAR_HOME}/lib 目錄中新增以下 3 個 jar:

connect-api-2.0.1.jar
disruptor-3.4.2.jar
kafka-clients-2.0.1.jar

4、啟動Pulsar 服務

  1. 為了確保 Pulsar 服務的日誌能夠正確的寫入 Kafka,先通過 bin/pulsar-kafka 前臺啟動,在沒有異常的情況下,再通過 bin/pulsar-daemon-kafka 命令後臺啟動。
  2. 以啟動 broker 為例,執行以下命令:
bin/pulsar-daemon-kafka start broker
  1. 通過 ps 命令檢視 broker 程式如下:

圖片

在上圖可以看到,我們通過 logenv.sh 配置的 OPTS 都已經傳遞到 broker 程式中,log4j2-kafka.yaml 中的 sys 標籤便可以通過這些屬性值例項化一個 Kafka Producer,broker 程式的日誌便會通過 Kafka Producer 傳送到 Kafka broker 中。

5、測試 Pulsar 日誌是否成功寫入 Kafka broker

啟動一個Kafka Consumer ,訂閱log4j2 傳送訊息的Topic,讀取到的訊息內容如下,多個檢索欄位之間以空格分開:

pulsar-cluster dapp21 192.168.0.1 broker 1 2020-12-26 17:40:14.363 [prometheus-stats-43-1] [org.eclipse.jetty.server.RequestLog] INFO - 192.168.0.1 - - [26/Dec/2020:17:40:14 +0800] "GET /metrics/ HTTP/1.1" 200 23445 "http://192.168.0.1:8080/metrics" "Prometheus/2.22.1" 4

6、日誌檢索

開啟 kibana 頁面,根據分詞的欄位進行檢索,檢索條件如下: cluster:"pulsar-cluster" AND hostname:"XXX" AND module:"broker" AND level:"INFO"

圖片

在上圖中可以看到某個時間段內的日誌檢索結果,並且可以根據需要,在檢索結果中新增 Available fields。這樣,開發或運維人員可以通過 kibana 從多個維度快速有效的分析 Pulsar 服務異常的原因。至此,就是 Apache Pulsar 基於 Log4j2+Kafka+ELK 實現日誌的快速檢索的一套完整的解決方案。

總結

目前,分散式、微服務化是比較流行的技術方向,在生產系統中,隨著業務的不斷髮展, 應用和服務體量的快速擴張,從單體/垂直架構轉移到分散式/微服務架構是自然而然的選擇,它主要表現在降低複雜度、容錯、獨立部署、水平伸縮等方面。但同時也面臨著新的挑戰,如問題排查的效率,運維監控的便捷性等。本文以 Apache Pulsar 為例,分享 Java 程式如何使用 Log4j2+Kafka+ELK 實現分散式、微服務化的日誌的快速檢索,達到服務治理的效果。

相關閱讀

關注 StreamCloudNative,與作者探討各領域技術的發展趨勢?

歡迎投稿

你是否從這篇文章中得到啟發呢?

你有沒有獨特的經驗與社群小夥伴分享、和社群共同成長呢?

Apache Pulsar 社群歡迎大家踴躍投稿。Apache Pulsar 和 StreamNative 希望為大家提供 Pulsar 經驗與知識分享的平臺,並幫助更多的社群小夥伴深入瞭解 Pulsar。掃碼新增 Bot 好友即可聯絡投稿?

點選連結 ,閱讀原文吧!

相關文章