MongoDB University課程M103 Basic Cluster Administration 學習筆記

dingdingfish發表於2020-12-21

課程共3章,需在2個月內完成。

開始於2020/12/19日12:40。完成於12月21日0:35。

命令參考:https://docs.mongodb.com/manual/reference/method/#replication

關於實驗環境,只能用課程自帶的Atlas環境,因為它要檢查這個環境,例如以下是replica set的實驗檢查結果。用慣了還覺得挺好用的:

12 total, 12 passed, 0 skipped:
[PASS] "localhost:27001 is running"
[PASS] "localhost:27002 is running"
[PASS] "localhost:27003 is running"
[PASS] "Replication is enabled on localhost:27001"
[PASS] "Replication is enabled on localhost:27002"
[PASS] "Replication is enabled on localhost:27003"
[PASS] "Replica set 'm103-repl' has the correct name"
[PASS] "The replica set 'm103-repl' contains localhost:27001"
[PASS] "The replica set 'm103-repl' contains localhost:27002"
[PASS] "The replica set 'm103-repl' contains localhost:27003"
[PASS] "The replica set enforces client authentication"
[PASS] "The replica set m103-repl uses keyfile authentication"

Chapter 0 Introduction & Setup

MongoDB的核心是mongod程式,高可用性/容錯靠replica set, 可擴充套件性靠sharding。在MongoDB的概念中,replica set加sharding cluster統稱為cluster。

建議學此課程前完成M001: MongoDB Basics。

Chapter 1: The Mongod

mongod的d表示daemon,意為守護程式。mongod是MongoDB資料庫主守護程式,監聽埠預設為localhost :27017,資料路徑預設為/data/db/var/lib/mongo。預設無使用者認證。

mongod預設配置檔案為/etc/mongod.conf。

資料庫客戶端為mongo shell,即mongo。

以下演示通過mongo shell關閉資料庫:

$ mongo
MongoDB shell version v4.4.2
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("95ba6ae8-52c7-460a-be88-63b2a814bf87") }
MongoDB server version: 4.4.2
---
The server generated these startup warnings when booting:
        2020-12-19T04:49:56.504+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
        2020-12-19T04:49:56.505+00:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'
---
---
        Enable MongoDB's free cloud-based monitoring service, which will then receive and display
        metrics about your deployment (disk utilization, CPU, operation statistics, etc).

        The monitoring data will be available on a MongoDB website with a unique URL accessible to you
        and anyone you share the URL with. MongoDB may use this information to make product
        improvements and to suggest MongoDB products and deployment options to you.

        To enable free monitoring, run the following command: db.enableFreeMonitoring()
        To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>
> db.createCollection("employees")
{ "ok" : 1 }
> use admin
switched to db admin
> db.shutdownServer()
server should be down...
> exit
bye
{"t":{"$date":"2020-12-19T05:00:13.434Z"},"s":"I",  "c":"QUERY",    "id":22791,   "ctx":"js","msg":"Failed to end logical session","attr":{"lsid":{"id":{"$uuid":"95ba6ae8-52c7-460a-be88-63b2a814bf87"}},"error":{"code":9001,"codeName":"SocketException","errmsg":"socket exception [CONNECT_ERROR] server [couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused]"}}}

$ mongo
MongoDB shell version v4.4.2
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:374:17
@(connect):2:6
exception: connect failed
exiting with code 1

通過以下命令啟動mongod:

$ sudo systemctl start mongod

其它的資料庫客戶端包括圖形介面的MongoDB Compass和各種語言的API。

mongod的命令選項可通過mongod --help檢視。其中:

  • dbpath 資料檔案路徑
  • port 監聽埠
  • auth 啟用使用者認證
  • bind_ip 監聽IP地址+埠
  • log_path 日誌檔案路徑
  • fork 後臺執行

命令列選項可參考文件:https://docs.mongodb.com/manual/reference/program/mongod

配置檔案可參考:https://docs.mongodb.com/manual/reference/configuration-options/

使用配置檔案可使命令列變得簡潔和更好的可讀寫。配置檔案為YAML格式,但其中選項的名字和命令列並不一致,例如storage.dbPath對應於dbpath

storage:
  dbPath: "/data/db"
systemLog:
  path: "/data/log/mongod.log"
  destination: "file"
replication:
  replSetName: M103
net:
  bindIp : "127.0.0.1,192.168.103.100"
tls:
  mode: "requireTLS"
  certificateKeyFile: "/etc/tls/tls.pem"
  CAFile: "/etc/tls/TLSCA.pem"
security:
  keyFile: "/data/keyfile"
processManagement:
  fork: true

mongod可通過-f--config指定配置檔案。

來看一個啟用使用者認證的示例,假設配置檔案mongod.conf如下:

net:
   port: 27000
security:
   authorization: enabled

啟動mongod:

mongod -f mongod.conf &

在admin資料庫中建立使用者:

mongo admin --host localhost:27000 --eval '
  db.createUser({
    user: "m103-admin",
    pwd: "m103-pass",
    roles: [
      {role: "root", db: "admin"}
    ]
  })

然後就可以用新建使用者連線了:

mongo --host localhost:27000 -u m103-admin -p m103-pass

在資料目錄下可看到很多檔案,這些檔案使用者不要直接改,其中wt字尾表示檔案引擎為WiredTiger:

$ ls -l /var/lib/mongo
total 156084
-rw-------. 1 mongod mongod  6619136 Dec 19 05:01 collection-0--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 collection-0-7801614791038926920.wt
-rw-------. 1 mongod mongod     4096 Dec 19 05:00 collection-0-8420749391571065922.wt
-rw-------. 1 mongod mongod  3694592 Dec 19 05:01 collection-10--3601385759165412599.wt
-rw-------. 1 mongod mongod   684032 Dec 19 05:01 collection-12--3601385759165412599.wt
-rw-------. 1 mongod mongod 21204992 Dec 19 05:01 collection-16--3601385759165412599.wt
-rw-------. 1 mongod mongod  1155072 Dec 19 05:01 collection-18--3601385759165412599.wt
-rw-------. 1 mongod mongod 16351232 Dec 19 05:01 collection-20--3601385759165412599.wt
-rw-------. 1 mongod mongod  2396160 Dec 19 05:01 collection-22--3601385759165412599.wt
-rw-------. 1 mongod mongod  1810432 Dec 19 05:01 collection-2--3601385759165412599.wt
-rw-------. 1 mongod mongod   933888 Dec 19 05:01 collection-26--3601385759165412599.wt
-rw-------. 1 mongod mongod    36864 Dec 19 05:02 collection-2-7801614791038926920.wt
-rw-------. 1 mongod mongod    77824 Dec 19 05:01 collection-28--3601385759165412599.wt
-rw-------. 1 mongod mongod   159744 Dec 19 05:01 collection-30--3601385759165412599.wt
-rw-------. 1 mongod mongod   126976 Dec 19 05:01 collection-34--3601385759165412599.wt
-rw-------. 1 mongod mongod 54034432 Dec 19 05:01 collection-36--3601385759165412599.wt
-rw-------. 1 mongod mongod  8957952 Dec 19 05:01 collection-38--3601385759165412599.wt
-rw-------. 1 mongod mongod    49152 Dec 19 05:01 collection-40--3601385759165412599.wt
-rw-------. 1 mongod mongod  7888896 Dec 19 05:01 collection-4--3601385759165412599.wt
-rw-------. 1 mongod mongod  4251648 Dec 19 05:01 collection-45--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 collection-47--3601385759165412599.wt
-rw-------. 1 mongod mongod    16384 Dec 19 05:31 collection-4-7801614791038926920.wt
-rw-------. 1 mongod mongod  1929216 Dec 19 05:01 collection-52--3601385759165412599.wt
-rw-------. 1 mongod mongod  1462272 Dec 19 05:01 collection-6--3601385759165412599.wt
-rw-------. 1 mongod mongod  6701056 Dec 19 05:01 collection-8--3601385759165412599.wt
drwx------. 2 mongod mongod     4096 Dec 19 06:11 diagnostic.data
-rw-------. 1 mongod mongod   253952 Dec 19 05:01 index-11--3601385759165412599.wt
-rw-------. 1 mongod mongod   126976 Dec 19 05:01 index-13--3601385759165412599.wt
-rw-------. 1 mongod mongod   475136 Dec 19 05:01 index-1--3601385759165412599.wt
-rw-------. 1 mongod mongod   208896 Dec 19 05:01 index-14--3601385759165412599.wt
-rw-------. 1 mongod mongod   241664 Dec 19 05:01 index-17--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-1-7801614791038926920.wt
-rw-------. 1 mongod mongod     4096 Dec 19 05:00 index-1-8420749391571065922.wt
-rw-------. 1 mongod mongod   114688 Dec 19 05:01 index-19--3601385759165412599.wt
-rw-------. 1 mongod mongod   114688 Dec 19 05:01 index-21--3601385759165412599.wt
-rw-------. 1 mongod mongod   114688 Dec 19 05:01 index-23--3601385759165412599.wt
-rw-------. 1 mongod mongod 13565952 Dec 19 05:01 index-24--3601385759165412599.wt
-rw-------. 1 mongod mongod    73728 Dec 19 05:01 index-27--3601385759165412599.wt
-rw-------. 1 mongod mongod    45056 Dec 19 05:01 index-29--3601385759165412599.wt
-rw-------. 1 mongod mongod    40960 Dec 19 05:01 index-31--3601385759165412599.wt
-rw-------. 1 mongod mongod    53248 Dec 19 05:01 index-32--3601385759165412599.wt
-rw-------. 1 mongod mongod   626688 Dec 19 05:01 index-3--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-35--3601385759165412599.wt
-rw-------. 1 mongod mongod   102400 Dec 19 05:01 index-37--3601385759165412599.wt
-rw-------. 1 mongod mongod    36864 Dec 19 05:02 index-3-7801614791038926920.wt
-rw-------. 1 mongod mongod    45056 Dec 19 05:01 index-39--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-41--3601385759165412599.wt
-rw-------. 1 mongod mongod    36864 Dec 19 05:01 index-42--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-46--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-48--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-49--3601385759165412599.wt
-rw-------. 1 mongod mongod    32768 Dec 19 05:01 index-53--3601385759165412599.wt
-rw-------. 1 mongod mongod   741376 Dec 19 05:01 index-5--3601385759165412599.wt
-rw-------. 1 mongod mongod    53248 Dec 19 05:01 index-54--3601385759165412599.wt
-rw-------. 1 mongod mongod   262144 Dec 19 05:01 index-56--3601385759165412599.wt
-rw-------. 1 mongod mongod    16384 Dec 19 05:31 index-5-7801614791038926920.wt
-rw-------. 1 mongod mongod   106496 Dec 19 05:01 index-58--3601385759165412599.wt
-rw-------. 1 mongod mongod    12288 Dec 19 06:10 index-6-7801614791038926920.wt
-rw-------. 1 mongod mongod   290816 Dec 19 05:01 index-7--3601385759165412599.wt
-rw-------. 1 mongod mongod   925696 Dec 19 05:01 index-9--3601385759165412599.wt
drwx------. 2 mongod mongod     4096 Dec 19 05:01 journal
-rw-------. 1 mongod mongod    36864 Dec 19 05:01 _mdb_catalog.wt
-rw-------. 1 mongod mongod        5 Dec 19 05:01 mongod.lock
-rw-------. 1 mongod mongod    36864 Dec 19 05:31 sizeStorer.wt
-rw-------. 1 mongod mongod      114 Dec 16 10:59 storage.bson
-rw-------. 1 mongod mongod       47 Dec 16 10:59 WiredTiger
-rw-------. 1 mongod mongod     4096 Dec 19 05:01 WiredTigerHS.wt
-rw-------. 1 mongod mongod       21 Dec 16 10:59 WiredTiger.lock
-rw-------. 1 mongod mongod     1259 Dec 19 06:10 WiredTiger.turtle
-rw-------. 1 mongod mongod   192512 Dec 19 06:10 WiredTiger.wt

另外,.lock檔案防止daemon多次啟動,collection*.wt為資料檔案,index*.wt為索引檔案。

其中有兩個目錄:

# ls -l|grep ^d
drwx------. 2 mongod mongod     4096 Dec 19 06:26 diagnostic.data
drwx------. 2 mongod mongod     4096 Dec 19 05:01 journal

diagnostic.data目錄中的檔案僅用於故障診斷,售後工程師有特殊工具檢視。日誌檔案位於/var/log/mongodb,檔名為mongod.log。

journal目錄存放了journal log,類似於Oracle的redolog,可用來恢復資料,例如異常掉電等。

臨時目錄下有一個sock檔案,保證了此埠只能啟動一個mongod:

# ls -l /tmp/*.sock
srwx------. 1 mongod mongod 0 Dec 19 05:01 /tmp/mongodb-27017.sock

介紹一些基本命令,分為db, rs(複製集)和sh(分片)。

> help
        db.help()                    help on db methods
        db.mycoll.help()             help on collection methods
        sh.help()                    sharding helpers
        rs.help()                    replica set helpers

使用者管理命令:

db.createUser()
db.dropUser()

集合管理命令:

db.<collection>.renameCollection()
db.<collection>.createIndex()
db.<collection>.drop()

資料庫管理命令:

db.dropDatabase()
db.createCollection()
db.serverStatus()

檢視幫助格式如下:

> db.<collection>.createIndex

其中的萬用字元需用實際collection名替代:

> use test
switched to db test
> db.test.createIndex
function(keys, options, commitQuorum) {
    if (arguments.length > 3) {
        throw new Error("createIndex accepts up to 3 arguments");
    }

    return this.createIndexes([keys], options, commitQuorum);
}

可以用Database Command和Shell helper執行命令,後者更簡潔,推薦使用:

# Database Command
db.runCommand(
  { "createIndexes": <collection> },
  { "indexes": [
    {
      "key": { "product": 1 }
    },
    { "name": "name_index" }
    ]
  }
)
# Shell Helper
db.<collection>.createIndex(
  { "product": 1 },
  { "name": "name_index" }
)

日誌記錄是按元件來記錄的,例如COMMAND,CONTROL,NETWORK,STORAGE等。在日誌中會顯示元件名稱,如:

$ sudo tail -f /var/log/mongodb/mongod.log
{"t":{"$date":"2020-12-19T09:57:27.185+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn4","msg":"client metadata","attr":{"remote":"127.0.0.1:36088","client":"conn4","doc":{"application":{"name":"MongoDB Shell"},"driver":{"name":"MongoDB Internal Client","version":"4.4.2"},"os":{"type":"Linux","name":"Oracle Linux Server release 7.9","architecture":"x86_64","version":"Kernel 4.14.35-2025.403.3.el7uek.x86_64"}}}}
...

其中t表示時間,s表示severity level, c表示component,其它還包括應用名稱等。
另外,如果日誌級別設高,可得到更詳細的資訊。

s需要單獨說明一下:

F - Fatal
E - Error
W - Warning
I - informational (verbosity level=0)
D - Debug (verbosity level 1-5)

dissect: 解剖,仔細分析。

獲取日誌記錄詳細程度設定,以下通過getLogComponents獲取各記錄元件的設定:

$ mongo admin --eval 'db.getLogComponents()'
MongoDB shell version v4.4.2
connecting to: mongodb://127.0.0.1:27017/admin?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("0915f48b-4e92-44ce-9a2b-92c0735604cc") }
MongoDB server version: 4.4.2
{
        "verbosity" : 0,
        "accessControl" : {
                "verbosity" : -1
        },
        "command" : {
                "verbosity" : -1
        },
        "control" : {
                "verbosity" : -1
        },
        "executor" : {
                "verbosity" : -1
        },
        "geo" : {
                "verbosity" : -1
        },
        "index" : {
                "verbosity" : -1
        },
        "network" : {
                "verbosity" : -1,
                "asio" : {
                        "verbosity" : -1
                },
                "bridge" : {
                        "verbosity" : -1
                },
                "connectionPool" : {
                        "verbosity" : -1
                }
        },
        "query" : {
                "verbosity" : -1
        },
        "replication" : {
                "verbosity" : -1,
                "election" : {
                        "verbosity" : -1
                },
                "heartbeats" : {
                        "verbosity" : -1
                },
                "initialSync" : {
                        "verbosity" : -1
                },
                "rollback" : {
                        "verbosity" : -1
                }
        },
        "sharding" : {
                "verbosity" : -1,
                "shardingCatalogRefresh" : {
                        "verbosity" : -1
                },
                "migration" : {
                        "verbosity" : -1
                }
        },
        "storage" : {
                "verbosity" : -1,
                "recovery" : {
                        "verbosity" : -1
                },
                "journal" : {
                        "verbosity" : -1
                }
        },
        "write" : {
                "verbosity" : -1
        },
        "ftdc" : {
                "verbosity" : -1
        },
        "tracking" : {
                "verbosity" : -1
        },
        "transaction" : {
                "verbosity" : -1
        },
        "test" : {
                "verbosity" : -1
        }
}

其中的數字表示log verbosity level:

  • -1 繼承上一級設定
  • 0 預設,只輸出普通訊息
  • 1-5 除錯資訊,5表示最詳細

修改日誌記錄級別示例:

mongo admin --eval 'db.setLogLevel(0, "index")'

獲取日誌可通過getLog命令:

> db.adminCommand({ "getLog": "global" })
{
        "totalLinesWritten" : 42,
        "log" : [
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.072+00:00\"},\"s\":\"I\",  \"c\":\"CONTROL\",  \"id\":20698,   \"ctx\":\"main\",\"msg\":\"***** SERVER RESTARTED *****\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.075+00:00\"},\"s\":\"I\",  \"c\":\"CONTROL\",  \"id\":23285,   \"ctx\":\"main\",\"msg\":\"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.085+00:00\"},\"s\":\"W\",  \"c\":\"ASIO\",     \"id\":22601,   \"ctx\":\"main\",\"msg\":\"No TransportLayer configured during NetworkInterface startup\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.088+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":4648601, \"ctx\":\"main\",\"msg\":\"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize.\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.088+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":4615611, \"ctx\":\"initandlisten\",\"msg\":\"MongoDB starting\",\"attr\":{\"pid\":3706,\"port\":27017,\"dbPath\":\"/var/lib/mongo\",\"architecture\":\"64-bit\",\"host\":\"ol7-vagrant\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.088+00:00\"},\"s\":\"I\",  \"c\":\"CONTROL\",  \"id\":23403,   \"ctx\":\"initandlisten\",\"msg\":\"Build Info\",\"attr\":{\"buildInfo\":{\"version\":\"4.4.2\",\"gitVersion\":\"15e73dc5738d2278b688f8929aee605fe4279b0e\",\"openSSLVersion\":\"OpenSSL 1.0.1e-fips 11 Feb 2013\",\"modules\":[],\"allocator\":\"tcmalloc\",\"environment\":{\"distmod\":\"rhel70\",\"distarch\":\"x86_64\",\"target_arch\":\"x86_64\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.088+00:00\"},\"s\":\"I\",  \"c\":\"CONTROL\",  \"id\":51765,   \"ctx\":\"initandlisten\",\"msg\":\"Operating System\",\"attr\":{\"os\":{\"name\":\"Oracle Linux Server release 7.9\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.088+00:00\"},\"s\":\"I\",  \"c\":\"CONTROL\",  \"id\":21951,   \"ctx\":\"initandlisten\",\"msg\":\"Options set by command line\",\"attr\":{\"options\":{\"config\":\"/etc/mongod.conf\",\"net\":{\"bindIp\":\"127.0.0.1\",\"port\":27017},\"processManagement\":{\"fork\":true,\"pidFilePath\":\"/var/run/mongodb/mongod.pid\",\"timeZoneInfo\":\"/usr/share/zoneinfo\"},\"storage\":{\"dbPath\":\"/var/lib/mongo\",\"journal\":{\"enabled\":true}},\"systemLog\":{\"destination\":\"file\",\"logAppend\":true,\"path\":\"/var/log/mongodb/mongod.log\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.089+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22270,   \"ctx\":\"initandlisten\",\"msg\":\"Storage engine to use detected by data files\",\"attr\":{\"dbpath\":\"/var/lib/mongo\",\"storageEngine\":\"wiredTiger\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.089+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22315,   \"ctx\":\"initandlisten\",\"msg\":\"Opening WiredTiger\",\"attr\":{\"config\":\"create,cache_size=350M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.647+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354115:647922][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 8 through 9\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.700+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354115:700558][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 9 through 9\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.847+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354115:847330][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 8/26752 to 9/256\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.938+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354115:938243][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 8 through 9\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:55.994+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354115:994462][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 9 through 9\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.040+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354116:40796][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.040+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22430,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger message\",\"attr\":{\"message\":\"[1608354116:40853][3706:0x7f1f954d4bc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (0, 0)\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.209+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":4795906, \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger opened\",\"attr\":{\"durationMillis\":1120}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.209+00:00\"},\"s\":\"I\",  \"c\":\"RECOVERY\", \"id\":23987,   \"ctx\":\"initandlisten\",\"msg\":\"WiredTiger recoveryTimestamp\",\"attr\":{\"recoveryTimestamp\":{\"$timestamp\":{\"t\":0,\"i\":0}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.211+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":4366408, \"ctx\":\"initandlisten\",\"msg\":\"No table logging settings modifications are required for existing WiredTiger tables\",\"attr\":{\"loggingEnabled\":true}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.214+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":22262,   \"ctx\":\"initandlisten\",\"msg\":\"Timestamp monitor starting\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.217+00:00\"},\"s\":\"W\",  \"c\":\"CONTROL\",  \"id\":22120,   \"ctx\":\"initandlisten\",\"msg\":\"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted\",\"tags\":[\"startupWarnings\"]}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.218+00:00\"},\"s\":\"W\",  \"c\":\"CONTROL\",  \"id\":22178,   \"ctx\":\"initandlisten\",\"msg\":\"/sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'\",\"tags\":[\"startupWarnings\"]}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.223+00:00\"},\"s\":\"I\",  \"c\":\"STORAGE\",  \"id\":20536,   \"ctx\":\"initandlisten\",\"msg\":\"Flow Control is enabled on this deployment\"}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.225+00:00\"},\"s\":\"I\",  \"c\":\"FTDC\",     \"id\":20625,   \"ctx\":\"initandlisten\",\"msg\":\"Initializing full-time diagnostic data capture\",\"attr\":{\"dataDirectory\":\"/var/lib/mongo/diagnostic.data\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.226+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":23015,   \"ctx\":\"listener\",\"msg\":\"Listening on\",\"attr\":{\"address\":\"/tmp/mongodb-27017.sock\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.226+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":23015,   \"ctx\":\"listener\",\"msg\":\"Listening on\",\"attr\":{\"address\":\"127.0.0.1\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:01:56.226+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":23016,   \"ctx\":\"listener\",\"msg\":\"Waiting for connections\",\"attr\":{\"port\":27017,\"ssl\":\"off\"}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:02:01.346+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22943,   \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:36082\",\"connectionId\":1,\"connectionCount\":1}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:02:01.347+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":51800,   \"ctx\":\"conn1\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:36082\",\"client\":\"conn1\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.2\"},\"os\":{\"type\":\"Linux\",\"name\":\"Oracle Linux Server release 7.9\",\"architecture\":\"x86_64\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T05:02:02.594+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22944,   \"ctx\":\"conn1\",\"msg\":\"Connection ended\",\"attr\":{\"remote\":\"127.0.0.1:36082\",\"connectionId\":1,\"connectionCount\":0}}",
                "{\"t\":{\"$date\":\"2020-12-19T06:36:07.744+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22943,   \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:36084\",\"connectionId\":2,\"connectionCount\":1}}",
                "{\"t\":{\"$date\":\"2020-12-19T06:36:07.744+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":51800,   \"ctx\":\"conn2\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:36084\",\"client\":\"conn2\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.2\"},\"os\":{\"type\":\"Linux\",\"name\":\"Oracle Linux Server release 7.9\",\"architecture\":\"x86_64\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:52:53.721+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22944,   \"ctx\":\"conn2\",\"msg\":\"Connection ended\",\"attr\":{\"remote\":\"127.0.0.1:36084\",\"connectionId\":2,\"connectionCount\":0}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:53:13.784+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22943,   \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:36086\",\"connectionId\":3,\"connectionCount\":1}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:53:13.784+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":51800,   \"ctx\":\"conn3\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:36086\",\"client\":\"conn3\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.2\"},\"os\":{\"type\":\"Linux\",\"name\":\"Oracle Linux Server release 7.9\",\"architecture\":\"x86_64\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:53:13.793+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22944,   \"ctx\":\"conn3\",\"msg\":\"Connection ended\",\"attr\":{\"remote\":\"127.0.0.1:36086\",\"connectionId\":3,\"connectionCount\":0}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:57:27.185+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22943,   \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:36088\",\"connectionId\":4,\"connectionCount\":1}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:57:27.185+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":51800,   \"ctx\":\"conn4\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:36088\",\"client\":\"conn4\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.2\"},\"os\":{\"type\":\"Linux\",\"name\":\"Oracle Linux Server release 7.9\",\"architecture\":\"x86_64\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:58:06.337+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22944,   \"ctx\":\"conn4\",\"msg\":\"Connection ended\",\"attr\":{\"remote\":\"127.0.0.1:36088\",\"connectionId\":4,\"connectionCount\":0}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:59:58.972+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":22943,   \"ctx\":\"listener\",\"msg\":\"Connection accepted\",\"attr\":{\"remote\":\"127.0.0.1:36090\",\"connectionId\":5,\"connectionCount\":1}}",
                "{\"t\":{\"$date\":\"2020-12-19T09:59:58.972+00:00\"},\"s\":\"I\",  \"c\":\"NETWORK\",  \"id\":51800,   \"ctx\":\"conn5\",\"msg\":\"client metadata\",\"attr\":{\"remote\":\"127.0.0.1:36090\",\"client\":\"conn5\",\"doc\":{\"application\":{\"name\":\"MongoDB Shell\"},\"driver\":{\"name\":\"MongoDB Internal Client\",\"version\":\"4.4.2\"},\"os\":{\"type\":\"Linux\",\"name\":\"Oracle Linux Server release 7.9\",\"architecture\":\"x86_64\",\"version\":\"Kernel 4.14.35-2025.403.3.el7uek.x86_64\"}}}}"
        ],
        "ok" : 1
}

似乎獲取的只是日誌的一部分:

# wc -l /var/log/mongodb/mongod.log
656 /var/log/mongodb/mongod.log
$ mongo admin  --eval '  db.products.update( { "sku" : 6902667 }, { $set : { "salePrice" : 39.99} } )'
MongoDB shell version v4.4.2
connecting to: mongodb://127.0.0.1:27017/admin?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("a4985fc8-d258-46f1-a273-5ffe74726163") }
MongoDB server version: 4.4.2
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })

Database Profiler用來採集資料庫資訊,和日誌是配合的。分為3級:

  • 0 預設,不採集資訊
  • 1 記錄超時的操作(大於slowms)
  • 2 記錄所有操作

Profiling記錄CRUD,配置和管理3類操作。

示例:

$ mongo --quiet
# 開啟一個新的資料庫
> use newDB
switched to db newDB

# 預設profiling level為0,預設slowms是100
> db.getProfilingLevel()
0
> db.setProfilingLevel(1)
{ "was" : 0, "slowms" : 100, "sampleRate" : 1, "ok" : 1 }
> show collections
system.profile

# 將slowms設為0,所有操作均被記錄。
> db.setProfilingLevel( 1, { slowms: 0 } )
{ "was" : 1, "slowms" : 100, "sampleRate" : 1, "ok" : 1 }

# 插入資料
> db.new_collection.insert( { "a": 1 } )
WriteResult({ "nInserted" : 1 })

# 可看到profiling記錄
> db.system.profile.find().pretty()
{
        "op" : "insert",
        "ns" : "newDB.new_collection",
        "command" : {
                "insert" : "new_collection",
                "ordered" : true,
                "lsid" : {
                        "id" : UUID("3d71a0d8-3f9e-4217-9c63-45a0bceb4ccc")
                },
                "$db" : "newDB"
        },
        "ninserted" : 1,
        "keysInserted" : 1,
        "numYield" : 0,
        "locks" : {
                "ParallelBatchWriterMode" : {
                        "acquireCount" : {
                                "r" : NumberLong(5)
                        }
                },
                "ReplicationStateTransition" : {
                        "acquireCount" : {
                                "w" : NumberLong(6)
                        }
                },
                "Global" : {
                        "acquireCount" : {
                                "r" : NumberLong(3),
                                "w" : NumberLong(3)
                        }
                },
                "Database" : {
                        "acquireCount" : {
                                "r" : NumberLong(2),
                                "w" : NumberLong(3)
                        }
                },
                "Collection" : {
                        "acquireCount" : {
                                "r" : NumberLong(1),
                                "w" : NumberLong(3)
                        }
                },
                "Mutex" : {
                        "acquireCount" : {
                                "r" : NumberLong(5)
                        }
                }
        },
        "flowControl" : {
                "acquireCount" : NumberLong(3),
                "timeAcquiringMicros" : NumberLong(4)
        },
        "storage" : {

        },
        "responseLength" : 45,
        "protocol" : "op_msg",
        "millis" : 13,
        "ts" : ISODate("2020-12-19T10:39:38.371Z"),
        "client" : "127.0.0.1",
        "appName" : "MongoDB Shell",
        "allUsers" : [ ],
        "user" : ""
}

接下來講安全。
Authentication判斷身份;Authorization驗證權利。

MongoDB支援4種Authentication方法,後兩種還有企業版支援:

  1. SCRAM
  2. X.509
  3. LDAP
  4. KERBEROS

Authorization是通過role賦權的,使用者關聯1到多個role。

來看一個啟用authentication的例子。最初沒有設定認證:

> use admin
switched to db admin
> db.stats()
{
        "db" : "admin",
        "collections" : 1,
        "views" : 0,
        "objects" : 1,
        "avgObjSize" : 59,
        "dataSize" : 59,
        "storageSize" : 32768,
        "indexes" : 1,
        "indexSize" : 32768,
        "totalSize" : 65536,
        "scaleFactor" : 1,
        "fsUsedSize" : 2981859328,
        "fsTotalSize" : 34887954432,
        "ok" : 1
}

修改配置檔案/etc/mongod.conf,新增以下,然後重啟mongod:

security:
   authorization: enabled

此時再次登入:

> use admin
switched to db admin
> db.stats()
{
        "ok" : 0,
        "errmsg" : "not authorized on admin to execute command { dbstats: 1.0, scale: undefined, lsid: { id: UUID(\"bd9941d7-b2e2-478a-9a07-958f6f245e8f\") }, $db: \"admin\" }",
        "code" : 13,
        "codeName" : "Unauthorized"
}

在admin庫中建立超級使用者:

db.createUser({
  user: "root",
  pwd: "root",
  roles : [ "root" ]
})

Successfully added user: { "user" : "root", "roles" : [ "root" ] }

此時用root登入,就有許可權執行db.stats()。注意每個資料庫都有自己的使用者和許可權:

mongo --username root --password root --authenticationDatabase admin
> db.stats()
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 0,
        "avgObjSize" : 0,
        "dataSize" : 0,
        "storageSize" : 4096,
        "indexes" : 1,
        "indexSize" : 4096,
        "totalSize" : 8192,
        "scaleFactor" : 1,
        "fsUsedSize" : 2981830656,
        "fsTotalSize" : 34887954432,
        "ok" : 1
}

此超級使用者可用來建立普通使用者:

db.createUser({
  user: "jdoe",
  pwd: "jdoe",
  roles : [ "readWrite" ]
})

RBAC即role based access control。role由一系列許可權 (privilege)組成+網路訪問控制組成,許可權包括action加resource。resource又包括database,collection,account等。網路訪問控制指客戶端和伺服器地址的限制。

以下為常見的管理員型別所需的角色:

  • Database User 包括read和readWrite 角色
  • Database Admin 除以上,還包括dbAdmin,userAdmin,dbOwner角色
  • Cluster Admin,除以上外,還包括clusterAdmin,clusterManager,clusterMonitor,hostManager角色
  • Backup/Restore 包括backup,restore角色
  • Super User 包括root角色

以上除root角色是針對所有資料庫,其它都是針對單個資料庫。但可以新增AnyDatabase字尾,例如readWrite變為readWriteAnyDatabase。

來看一下內建角色userAdmin, dbAdmin和dbOwner的例項:

db.createUser(
  { user: "security_officer",
    pwd: "h3ll0th3r3",
    roles: [ { db: "admin", role: "userAdmin" } ]
  }
)

db.createUser(
  { user: "dba",
    pwd: "c1lynd3rs",
    roles: [ { db: "admin", role: "dbAdmin" } ]
  }
)

db.grantRolesToUser( "dba",  [ { db: "playground", role: "dbOwner"  } ] )

db.runCommand( { rolesInfo: { role: "dbOwner", db: "playground" }, showPrivileges: true} )

# 所有改資料的操作userAdmin都不能做
> db.runCommand( { rolesInfo: { role: "userAdmin", db: "admin" }, showPrivileges: true} )
{
        "roles" : [
                {
                        "role" : "userAdmin",
                        "db" : "admin",
                        "isBuiltin" : true,
                        "roles" : [ ],
                        "inheritedRoles" : [ ],
                        "privileges" : [
                                {
                                        "resource" : {
                                                "db" : "admin",
                                                "collection" : ""
                                        },
                                        "actions" : [
                                                "changeCustomData",
                                                "changePassword",
                                                "createRole",
                                                "createUser",
                                                "dropRole",
                                                "dropUser",
                                                "grantRole",
                                                "revokeRole",
                                                "setAuthenticationRestriction",
                                                "viewRole",
                                                "viewUser"
                                        ]
                                }
                        ],
                        "inheritedPrivileges" : [
                                {
                                        "resource" : {
                                                "db" : "admin",
                                                "collection" : ""
                                        },
                                        "actions" : [
                                                "changeCustomData",
                                                "changePassword",
                                                "createRole",
                                                "createUser",
                                                "dropRole",
                                                "dropUser",
                                                "grantRole",
                                                "revokeRole",
                                                "setAuthenticationRestriction",
                                                "viewRole",
                                                "viewUser"
                                        ]
                                }
                        ]
                }
        ],
        "ok" : 1
}

以下為mongo相關的utility:

$ find /usr/bin -name mongo*
/usr/bin/mongod
/usr/bin/mongodump
/usr/bin/mongoexport
/usr/bin/mongofiles
/usr/bin/mongoimport
/usr/bin/mongotop
/usr/bin/mongorestore
/usr/bin/mongostat
/usr/bin/mongos
/usr/bin/mongo

mongostat每1秒輸出統計資訊:

$ mongostat --port 27017
insert query update delete getmore command dirty used flushes vsize   res qrw arw net_in net_out conn                time
    *0    *0     *0     *0       0     0|0  0.0% 0.1%       0 1.48G 88.0M 0|0 1|0   111b   42.1k    3 Dec 19 12:55:49.961
    *0    *0     *0     *0       0     1|0  0.0% 0.1%       0 1.48G 88.0M 0|0 1|0   112b   42.3k    3 Dec 19 12:55:50.960
...

mongodump和mongorestore格式為二進位制的BSON檔案,因此效率很高:

$ mongodump  --db sample_restaurants --collection restaurants
2020-12-19T13:04:19.116+0000    writing sample_restaurants.restaurants to dump/sample_restaurants/restaurants.bson
2020-12-19T13:04:19.217+0000    done dumping sample_restaurants.restaurants (25359 documents)

$ ls dump/sample_restaurants
restaurants.bson  restaurants.metadata.json

$ cat dump/sample_restaurants/restaurants.metadata.json
{"indexes":[{"v":{"$numberInt":"2"},"key":{"_id":{"$numberInt":"1"}},"name":"_id_"}],"uuid":"0fb26249f40a478a9df70a4a2677caea","collectionName":"restaurants"}

$ mongorestore --drop dump/
2020-12-19T13:05:57.200+0000    preparing collections to restore from
2020-12-19T13:05:57.205+0000    reading metadata for sample_restaurants.restaurants from dump/sample_restaurants/restaurants.metadata.json
2020-12-19T13:05:57.218+0000    restoring sample_restaurants.restaurants from dump/sample_restaurants/restaurants.bson
2020-12-19T13:05:57.600+0000    no indexes to restore
2020-12-19T13:05:57.600+0000    finished restoring sample_restaurants.restaurants (25359 documents, 0 failures)
2020-12-19T13:05:57.600+0000    25359 document(s) restored successfully. 0 document(s) failed to restore.

mongoexport和mongoimport格式為文字的JSON檔案,可讀性好:

$ mongoexport  --db sample_restaurants --collection restaurants -o restaurants.json
2020-12-19T13:06:59.892+0000    connected to: mongodb://localhost/
2020-12-19T13:07:00.892+0000    [#######.................]  sample_restaurants.restaurants  8000/25359  (31.5%)
2020-12-19T13:07:01.553+0000    [########################]  sample_restaurants.restaurants  25359/25359  (100.0%)
2020-12-19T13:07:01.553+0000    exported 25359 records

# 預設恢復到test資料庫
$ mongoimport  restaurants.json
2020-12-19T13:07:50.801+0000    no collection specified
2020-12-19T13:07:50.801+0000    using filename 'restaurants' as collection
2020-12-19T13:07:50.806+0000    connected to: mongodb://localhost/
2020-12-19T13:07:52.997+0000    25359 document(s) imported successfully. 0 document(s) failed to import.

一個複雜些的例子:

mongoimport  --host localhost:27000 -u m103-application-user -p m103-application-pass --file=/dataset/products.json -d applicationData -c products --authenticationDatabase=admin

Chapter 2: Replication

replica set是一組MongoDB,由1個primary和多個secondary組成,primary和secondary形成非同步的複製關係。如果primary失效,會通過投票選舉選出一個secondary提升為primary,此過程成為failover。

只有primary允許讀寫,secondary只允許讀。連線replica set時,預設mongo shell會將連線導向primary。

複製有兩種,二進位制複製和基於語句的複製,實際就是物理和邏輯複製。物理複製對作業系統型別以及資料庫版本可能有限制,但空間和處理效率更高。物理複製依賴於binlog,而邏輯複製依賴於oplog,即基於語句的log。

複製的操作需保證idempotency。也就是可反覆執行多次而不改變結果。

replica set中還可以有一種arbiter型別的成員,沒有資料,僅參與投票,並且不能成為primary。應儘量避免使用。其它型別包括hidden和delayed。

replica set最多有50個成員,為避免選舉時間太長,投票成員只能最多7個。

建立3節點replica set實驗

以下為建立3節點replica set的全過程。首先編輯3份mongod配置檔案,以下為第一份,其它的只需更改埠,及dbPath的路徑:

storage:
  dbPath: /var/mongodb/db/1
net:
  bindIp: localhost
  port: 27001
security:
  authorization: enabled
  keyFile: /var/mongodb/pki/m103-keyfile
systemLog:
  destination: file
  path: /var/mongodb/logs/mongod1.log
  logAppend: true
processManagement:
  fork: true
replication:
  replSetName: m103-example

注意與普通mongod配置的不同如下,即除了authorization做client認證外,還新增了keyFile做relpica set各成員間的認證:

security:
  authorization: enabled
  keyFile: /var/mongodb/pki/m103-keyfile
...
replication:
  replSetName: m103-example

然後生成key file:

# openssl rand -base64 741 > /var/mongodb/pki/m103-keyfile
# chmod 600 /var/mongodb/pki/m103-keyfile

啟動所有3個MongoDB,然後確認:

# ps -ef|grep mongod
  342 root      0:01 mongod -f mongod_1.conf
  376 root      0:01 mongod -f mongod_2.conf
  408 root      0:01 mongod -f mongod_3.conf

登入到第一個MongoDB,初始化replica set,此時只有1個成員,而且角色是master:

# mongo --port 27001
MongoDB shell version v4.0.5
connecting to: mongodb://127.0.0.1:27001/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("b6b1e914-8d73-46bd-8cfa-48c784837e70") }
MongoDB server version: 4.0.5
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
        http://docs.mongodb.org/
Questions? Try the support group
        http://groups.google.com/group/mongodb-user
> rs.initiate()
{
        "info2" : "no configuration specified. Using a default configuration for the set",
        "me" : "localhost:27001",
        "ok" : 1
}

m103-example:OTHER> use admin
switched to db admin

m103-example:PRIMARY> db.createUser({
  user: "m103-admin",
  pwd: "m103-pass",
  roles: [
    {role: "root", db: "admin"}
  ]
})

Successfully added user: {
        "user" : "m103-admin",
        "roles" : [
                {
                        "role" : "root",
                        "db" : "admin"
                }
        ]
}

m103-example:PRIMARY> exit
bye

登入replica set,注意連線串中指定了replica set的名字,檢視replica set狀態,此時只有1個PRIMARY成員:

mongo --host "m103-example/127.0.0.1:27001" -u "m103-admin" -p "m103-pass" --authenticationDatabase "admin"

m103-example:PRIMARY> rs.status()
{
        "set" : "m103-example",
        "date" : ISODate("2020-12-19T23:47:00.406Z"),
        "myState" : 1,
        "term" : NumberLong(1),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1608421612, 1),
                        "t" : NumberLong(1)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1608421612, 1),
                        "t" : NumberLong(1)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1608421612, 1),
                        "t" : NumberLong(1)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1608421612, 1),
                        "t" : NumberLong(1)
                }
        },
        "lastStableCheckpointTimestamp" : Timestamp(1608421592, 1),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:27001",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 812,
                        "optime" : {
                                "ts" : Timestamp(1608421612, 1),
                                "t" : NumberLong(1)
                        },
                        "optimeDate" : ISODate("2020-12-19T23:46:52Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1608421111, 2),
                        "electionDate" : ISODate("2020-12-19T23:38:31Z"),
                        "configVersion" : 1,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1608421612, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608421612, 1),
                "signature" : {
                        "hash" : BinData(0,"Yuj9Rm4uiFi22KSt/AJu+w8dKbg="),
                        "keyId" : NumberLong("6908116074235953153")
                }
        }
}

新增另2個成員,需指定主機名加埠。然後確認所有成員已新增:

m103-example:PRIMARY> rs.add("127.0.0.1:27002")
{
        "ok" : 1,
        "operationTime" : Timestamp(1608421851, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608421851, 1),
                "signature" : {
                        "hash" : BinData(0,"b8jDJ38ZDheOLabuia5FMcHFIrU="),
                        "keyId" : NumberLong("6908116074235953153")
                }
        }
}
m103-example:PRIMARY> rs.add("127.0.0.1:27003")
{
        "ok" : 1,
        "operationTime" : Timestamp(1608421859, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608421859, 1),
                "signature" : {
                        "hash" : BinData(0,"NTlSIlq06+TYs3mLvHzwOIC8OH8="),
                        "keyId" : NumberLong("6908116074235953153")
                }
        }
}

m103-example:PRIMARY> rs.isMaster()
{
        "hosts" : [
                "localhost:27001",
                "127.0.0.1:27002",
                "127.0.0.1:27003"
        ],
        "setName" : "m103-example",
        "setVersion" : 3,
        "ismaster" : true,
        "secondary" : false,
        "primary" : "localhost:27001",
        "me" : "localhost:27001",
        "electionId" : ObjectId("7fffffff0000000000000001"),
        "lastWrite" : {
                "opTime" : {
                        "ts" : Timestamp(1608421892, 1),
                        "t" : NumberLong(1)
                },
                "lastWriteDate" : ISODate("2020-12-19T23:51:32Z"),
                "majorityOpTime" : {
                        "ts" : Timestamp(1608421892, 1),
                        "t" : NumberLong(1)
                },
                "majorityWriteDate" : ISODate("2020-12-19T23:51:32Z")
        },
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "maxWriteBatchSize" : 100000,
        "localTime" : ISODate("2020-12-19T23:51:39.645Z"),
        "logicalSessionTimeoutMinutes" : 30,
        "minWireVersion" : 0,
        "maxWireVersion" : 7,
        "readOnly" : false,
        "ok" : 1,
        "operationTime" : Timestamp(1608421892, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608421892, 1),
                "signature" : {
                        "hash" : BinData(0,"t7tCKSj+M17kRF+HQAKeoLEdgiY="),
                        "keyId" : NumberLong("6908116074235953153")
                }
        }
}

可在PRIMARY節點執行以下命令,模擬 failover:

rs.stepDown()
rs.isMaster()

複製配置

複製配置用以定義replica set,其決定了replica set的拓撲及各成員的角色。

和複製配置相關的命令包括:

  • rs.add
  • rs.conf
  • rs.initiate
  • rs.remove
  • rs.reconfigure

來看一個配置檔案的示例,我把說明標準在尾部:

{
  _id: <string>,
  version: <int>, // 配置版本,配置發生改變時加1,如增加節點
  term: <int>,
  protocolVersion: <number>,
  writeConcernMajorityJournalDefault: <boolean>,
  configsvr: <boolean>,
  members: [
    {
      _id: <int>, // replica set成員的ID
      host: <string>, // 主機名+埠
      arbiterOnly: <boolean>, // 是否arbiter node
      buildIndexes: <boolean>, 
      hidden: <boolean>, // 是否hidden node,hidden node對應用不可見
      priority: <number>, // 競爭primary的優先順序,0-1000。hidden和arbiter節點的優先順序必須為0
      tags: <document>,
      slaveDelay: <int>, // 是否為delay node,和primary 保持一定時間的延遲,以防邏輯錯誤。預設為0.
      votes: <number>
    },
    ...
  ],
  settings: {
    chainingAllowed : <boolean>,
    heartbeatIntervalMillis : <int>,
    heartbeatTimeoutSecs: <int>,
    electionTimeoutMillis : <int>,
    catchUpTimeoutMillis : <int>,
    getLastErrorModes : <document>,
    getLastErrorDefaults : <document>,
    replicaSetId: <ObjectId>
  }
}

複製命令

第一個是rs.status,使用心跳資料包告replica set的健康狀態。例如:

{
   "set" : "replset",
   "date" : ISODate("2020-03-05T05:24:45.567Z"),
   "myState" : 1,
   "term" : NumberLong(3),
   "syncSourceHost" : "",
   "syncSourceId" : -1,
   "heartbeatIntervalMillis" : NumberLong(2000),
   "majorityVoteCount" : 2,
   "writeMajorityCount" : 2,
   "votingMembersCount" : 3,            // Available starting in v4.4
   "writableVotingMembersCount" : 3,    // Available starting in v4.4
   "optimes" : {
      "lastCommittedOpTime" : {
         "ts" : Timestamp(1583385878, 1),
         "t" : NumberLong(3)
      },
      "lastCommittedWallTime" : ISODate("2020-03-05T05:24:38.122Z"),
      "readConcernMajorityOpTime" : {
         "ts" : Timestamp(1583385878, 1),
         "t" : NumberLong(3)
      },
      "readConcernMajorityWallTime" : ISODate("2020-03-05T05:24:38.122Z"),
      "appliedOpTime" : {
         "ts" : Timestamp(1583385878, 1),
         "t" : NumberLong(3)
      },
      "durableOpTime" : {
         "ts" : Timestamp(1583385878, 1),
         "t" : NumberLong(3)
      },
      "lastAppliedWallTime" : ISODate("2020-03-05T05:24:38.122Z"),
      "lastDurableWallTime" : ISODate("2020-03-05T05:24:38.122Z")
   },
   "lastStableRecoveryTimestamp" : Timestamp(1583385868, 2),
   "electionCandidateMetrics" : {
      "lastElectionReason" : "stepUpRequestSkipDryRun",
      "lastElectionDate" : ISODate("2020-03-05T05:24:28.061Z"),
      "electionTerm" : NumberLong(3),
      "lastCommittedOpTimeAtElection" : {
         "ts" : Timestamp(1583385864, 1),
         "t" : NumberLong(2)
      },
      "lastSeenOpTimeAtElection" : {
         "ts" : Timestamp(1583385864, 1),
         "t" : NumberLong(2)
      },
      "numVotesNeeded" : 2,
      "priorityAtElection" : 1,
      "electionTimeoutMillis" : NumberLong(10000),
      "priorPrimaryMemberId" : 1,
      "numCatchUpOps" : NumberLong(0),
      "newTermStartDate" : ISODate("2020-03-05T05:24:28.118Z"),
      "wMajorityWriteAvailabilityDate" : ISODate("2020-03-05T05:24:28.228Z")
   },
   "electionParticipantMetrics" : {
      "votedForCandidate" : true,
      "electionTerm" : NumberLong(2),
      "lastVoteDate" : ISODate("2020-03-05T05:22:33.306Z"),
      "electionCandidateMemberId" : 1,
      "voteReason" : "",
      "lastAppliedOpTimeAtElection" : {
         "ts" : Timestamp(1583385748, 1),
         "t" : NumberLong(1)
      },
      "maxAppliedOpTimeInSet" : {
         "ts" : Timestamp(1583385748, 1),
         "t" : NumberLong(1)
      },
      "priorityAtElection" : 1
   },
   "members" : [
      {
         "_id" : 0,
         "name" : "m1.example.net:27017",
         "health" : 1,
         "state" : 1,
         "stateStr" : "PRIMARY",
         "uptime" : 269,
         "optime" : {
            "ts" : Timestamp(1583385878, 1),
            "t" : NumberLong(3)
         },
         "optimeDate" : ISODate("2020-03-05T05:24:38Z"),
         "syncSourceHost" : "",
         "syncSourceId" : -1,
         "infoMessage" : "",
         "electionTime" : Timestamp(1583385868, 1),
         "electionDate" : ISODate("2020-03-05T05:24:28Z"),
         "configVersion" : 1,
         "configTerm" : 0,
         "self" : true,
         "lastHeartbeatMessage" : ""
      },
      {
         "_id" : 1,
         "name" : "m2.example.net:27017",
         "health" : 1,
         "state" : 2,
         "stateStr" : "SECONDARY",
         "uptime" : 266,
         "optime" : {
            "ts" : Timestamp(1583385878, 1),
            "t" : NumberLong(3)
         },
         "optimeDurable" : {
            "ts" : Timestamp(1583385878, 1),
            "t" : NumberLong(3)
         },
         "optimeDate" : ISODate("2020-03-05T05:24:38Z"),
         "optimeDurableDate" : ISODate("2020-03-05T05:24:38Z"),
         "lastHeartbeat" : ISODate("2020-03-05T05:24:44.114Z"),
         "lastHeartbeatRecv" : ISODate("2020-03-05T05:24:43.999Z"),
         "pingMs" : NumberLong(0),
         "lastHeartbeatMessage" : "",
         "syncSourceHost" : "m3.example.net:27017",
         "syncSourceId" : 2,
         "infoMessage" : "",
         "configVersion" : 1
      },
      {
         "_id" : 2,
         "name" : "m3.example.net:27017",
         "health" : 1,
         "state" : 2,
         "stateStr" : "SECONDARY",
         "uptime" : 266,
         "optime" : {
            "ts" : Timestamp(1583385878, 1),
            "t" : NumberLong(3)
         },
         "optimeDurable" : {
            "ts" : Timestamp(1583385878, 1),
            "t" : NumberLong(3)
         },
         "optimeDate" : ISODate("2020-03-05T05:24:38Z"),
         "optimeDurableDate" : ISODate("2020-03-05T05:24:38Z"),
         "lastHeartbeat" : ISODate("2020-03-05T05:24:44.114Z"),
         "lastHeartbeatRecv" : ISODate("2020-03-05T05:24:43.998Z"),
         "pingMs" : NumberLong(0),
         "lastHeartbeatMessage" : "",
         "syncSourceHost" : "m1.example.net:27017",
         "syncSourceId" : 0,
         "infoMessage" : "",
         "configVersion" : 1
      }
   ],
   "ok" : 1,
   "$clusterTime" : {
      "clusterTime" : Timestamp(1583385878, 1),
      "signature" : {
         "hash" : BinData(0,"9C2qcGVkipEGJW3iF90qxb/gIwc="),
         "keyId" : NumberLong("6800589497806356482")
      }
   },
   "operationTime" : Timestamp(1583385878, 1)
}

其中stateStr表示角色;uptime和optime表示啟動時間和最後一次活動的時間。self指明哪一個是本成員。lastHeartbeat和lastHeartbeatRecv指明最近一次從primary接受心跳和primary收到心跳的時間。heartbeatIntervalMillis指明心跳週期,預設2秒。

rs.isMaster()描述成員在replica set中的角色,比rs.status()簡潔。可以看到它其實是helper command:

> rs.isMaster
function() {
    return db.isMaster();
}

m103-repl:PRIMARY> rs.isMaster()
{
        "hosts" : [
                "localhost:27001",
                "localhost:27002",
                "localhost:27003",
                "localhost:27004"
        ],
        "setName" : "m103-repl",
        "setVersion" : 4,
        "ismaster" : true,
        "secondary" : false,
        "primary" : "localhost:27001",
        "me" : "localhost:27001",
        "electionId" : ObjectId("7fffffff0000000000000001"),
        "lastWrite" : {
                "opTime" : {
                        "ts" : Timestamp(1608430249, 1),
                        "t" : NumberLong(1)
                },
                "lastWriteDate" : ISODate("2020-12-20T02:10:49Z"),
                "majorityOpTime" : {
                        "ts" : Timestamp(1608430249, 1),
                        "t" : NumberLong(1)
                },
                "majorityWriteDate" : ISODate("2020-12-20T02:10:49Z")
        },
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "maxWriteBatchSize" : 100000,
        "localTime" : ISODate("2020-12-20T02:10:50.474Z"),
        "logicalSessionTimeoutMinutes" : 30,
        "minWireVersion" : 0,
        "maxWireVersion" : 7,
        "readOnly" : false,
        "ok" : 1,
        "operationTime" : Timestamp(1608430249, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608430249, 1),
                "signature" : {
                        "hash" : BinData(0,"Moj1nVJ0xnBvB2c5BFKNKQ4Bn1Y="),
                        "keyId" : NumberLong("6908154801956061185")
                }
        }
}

db.serverStatus()[‘repl’]與rs.isMaster()類似。

rs.printReplicationInfo()返回與當前節點相關的oplog資訊,只包括最早和最遲的oplog條目的時間,內容不包括在其中。

m103-repl:PRIMARY> db.serverStatus()['repl']
{
        "hosts" : [
                "localhost:27001",
                "127.0.0.1:27002",
                "127.0.0.1:27003"
        ],
        "setName" : "m103-repl",
        "setVersion" : 3,
        "ismaster" : true,
        "secondary" : false,
        "primary" : "localhost:27001",
        "me" : "localhost:27001",
        "electionId" : ObjectId("7fffffff0000000000000001"),
        "lastWrite" : {
                "opTime" : {
                        "ts" : Timestamp(1608427597, 1),
                        "t" : NumberLong(1)
                },
                "lastWriteDate" : ISODate("2020-12-20T01:26:37Z"),
                "majorityOpTime" : {
                        "ts" : Timestamp(1608427597, 1),
                        "t" : NumberLong(1)
                },
                "majorityWriteDate" : ISODate("2020-12-20T01:26:37Z")
        },
        "rbid" : 1
}

Local DB

如果是獨立資料庫,在local DB中只有啟動日誌:

> use local
switched to db local
> show collections
startup_log

如果是replica set,會有更多的collection:

m103-repl:PRIMARY> use local
switched to db local
m103-repl:PRIMARY> show collections
oplog.rs
replset.election
replset.minvalid
replset.oplogTruncateAfterPoint
startup_log

其中最重要的是oplog.rs。

m103-repl:PRIMARY> db.oplog.rs.findOne()
{
        "ts" : Timestamp(1608427456, 1),
        "h" : NumberLong("2194467939577895840"),
        "v" : 2,
        "op" : "n",
        "ns" : "",
        "wall" : ISODate("2020-12-20T01:24:16.031Z"),
        "o" : {
                "msg" : "initiating set"
        }
}

oplog.rs是capped collection,即大小是固定的,迴圈使用。大小預設為剩餘空間的5%:

m103-repl:PRIMARY> var stats = db.oplog.rs.stats()
m103-repl:PRIMARY> stats.capped
true
m103-repl:PRIMARY> stats.size
8666
m103-repl:PRIMARY> stats.maxSize
1038090240

m103-repl:PRIMARY> rs.printReplicationInfo()
configured oplog size:   990MB
log length start to end: 771secs (0.21hrs)
oplog first event time:  Sun Dec 20 2020 01:24:16 GMT+0000 (UTC)
oplog last event time:   Sun Dec 20 2020 01:37:07 GMT+0000 (UTC)
now:                     Sun Dec 20 2020 01:37:13 GMT+0000 (UTC)

oplog.rs是複製實現的核心機制。複製視窗與當前系統負載相關。一個操作可能產生多個條目。

以下通過建立collection產生oplog,可以看到op為c:

m103-repl:PRIMARY> use m103
switched to db m103
m103-repl:PRIMARY> db.createCollection('messages')
{
        "ok" : 1,
        "operationTime" : Timestamp(1608428625, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608428625, 1),
                "signature" : {
                        "hash" : BinData(0,"4oRFX3IY2Ks71n5UtRZmpPXFpDY="),
                        "keyId" : NumberLong("6908143325803446273")
                }
        }
}
m103-repl:PRIMARY> use local
switched to db local
m103-repl:PRIMARY> db.oplog.rs.find( { "o.msg": { $ne: "periodic noop" } } ).sort( { $natural: -1 } ).limit(1).pretty()
{
        "ts" : Timestamp(1608428625, 1),
        "t" : NumberLong(1),
        "h" : NumberLong("-7113852664014283695"),
        "v" : 2,
        "op" : "c",
        "ns" : "m103.$cmd",
        "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"),
        "wall" : ISODate("2020-12-20T01:43:45.966Z"),
        "o" : {
                "create" : "messages",
                "idIndex" : {
                        "v" : 2,
                        "key" : {
                                "_id" : 1
                        },
                        "name" : "_id_",
                        "ns" : "m103.messages"
                }
        }
}

然後插入100條記錄,產生更多oplog,可以看到op為i,即insert:

m103-repl:PRIMARY> use m103
switched to db m103
m103-repl:PRIMARY> for ( i=0; i< 100; i++) { db.messages.insert( { 'msg': 'not yet', _id: i } ) }
WriteResult({ "nInserted" : 1 })
m103-repl:PRIMARY> db.messages.count()
100

m103-repl:PRIMARY> use local
switched to db local
m103-repl:PRIMARY> db.oplog.rs.find({"ns": "m103.messages"}).sort({$natural: -1})
{ "ts" : Timestamp(1608428745, 36), "t" : NumberLong(1), "h" : NumberLong("324392587144061119"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.196Z"), "o" : { "_id" : 99, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 35), "t" : NumberLong(1), "h" : NumberLong("1542781151193176961"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.196Z"), "o" : { "_id" : 98, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 34), "t" : NumberLong(1), "h" : NumberLong("3344822831505708229"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.186Z"), "o" : { "_id" : 97, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 33), "t" : NumberLong(1), "h" : NumberLong("2719770265193099508"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.186Z"), "o" : { "_id" : 96, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 32), "t" : NumberLong(1), "h" : NumberLong("2128940404736444685"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.176Z"), "o" : { "_id" : 95, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 31), "t" : NumberLong(1), "h" : NumberLong("3527781969636911415"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.166Z"), "o" : { "_id" : 94, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 30), "t" : NumberLong(1), "h" : NumberLong("5772633884996181654"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.166Z"), "o" : { "_id" : 93, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 29), "t" : NumberLong(1), "h" : NumberLong("-1920220233546517465"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.166Z"), "o" : { "_id" : 92, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 28), "t" : NumberLong(1), "h" : NumberLong("2011816892918418096"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.156Z"), "o" : { "_id" : 91, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 27), "t" : NumberLong(1), "h" : NumberLong("-8309191925200103644"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.156Z"), "o" : { "_id" : 90, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 26), "t" : NumberLong(1), "h" : NumberLong("-6255571449904552371"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.146Z"), "o" : { "_id" : 89, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 25), "t" : NumberLong(1), "h" : NumberLong("-6172050930609074982"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.136Z"), "o" : { "_id" : 88, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 24), "t" : NumberLong(1), "h" : NumberLong("-2599873126843602125"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.136Z"), "o" : { "_id" : 87, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 23), "t" : NumberLong(1), "h" : NumberLong("-8445756439992094873"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.126Z"), "o" : { "_id" : 86, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 22), "t" : NumberLong(1), "h" : NumberLong("-4669876226182418669"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.126Z"), "o" : { "_id" : 85, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 21), "t" : NumberLong(1), "h" : NumberLong("9012211230301179909"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.116Z"), "o" : { "_id" : 84, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 20), "t" : NumberLong(1), "h" : NumberLong("8725443360378106292"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.116Z"), "o" : { "_id" : 83, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 19), "t" : NumberLong(1), "h" : NumberLong("-2842685721826447206"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.106Z"), "o" : { "_id" : 82, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 18), "t" : NumberLong(1), "h" : NumberLong("5652924376477340452"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.106Z"), "o" : { "_id" : 81, "msg" : "not yet" } }
{ "ts" : Timestamp(1608428745, 17), "t" : NumberLong(1), "h" : NumberLong("-5454208204845108214"), "v" : 2, "op" : "i", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "wall" : ISODate("2020-12-20T01:45:45.096Z"), "o" : { "_id" : 80, "msg" : "not yet" } }
Type "it" for more

更新資料並檢視oplog,可以看到op為u,即update:

m103-repl:PRIMARY> use m103
switched to db m103
m103-repl:PRIMARY> db.messages.updateMany( {}, { $set: { author: 'norberto' } } )
{ "acknowledged" : true, "matchedCount" : 100, "modifiedCount" : 100 }
m103-repl:PRIMARY> use local
switched to db local
m103-repl:PRIMARY> db.oplog.rs.find( { "ns": "m103.messages" } ).sort( { $natural: -1 } )
{ "ts" : Timestamp(1608428945, 100), "t" : NumberLong(1), "h" : NumberLong("815401207626051100"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 99 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 99), "t" : NumberLong(1), "h" : NumberLong("-2172008942223275005"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 98 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 98), "t" : NumberLong(1), "h" : NumberLong("-1197358821251512435"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 97 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 97), "t" : NumberLong(1), "h" : NumberLong("7371868888650897742"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 96 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 96), "t" : NumberLong(1), "h" : NumberLong("-8879488120146995647"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 95 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 95), "t" : NumberLong(1), "h" : NumberLong("3133613828918733543"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 94 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 94), "t" : NumberLong(1), "h" : NumberLong("7154822089389396482"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 93 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 93), "t" : NumberLong(1), "h" : NumberLong("-2362384075373213307"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 92 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 92), "t" : NumberLong(1), "h" : NumberLong("3598331643077918652"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 91 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 91), "t" : NumberLong(1), "h" : NumberLong("1623560456396089841"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 90 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 90), "t" : NumberLong(1), "h" : NumberLong("-7929048016230732324"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 89 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 89), "t" : NumberLong(1), "h" : NumberLong("8968767688656019698"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 88 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 88), "t" : NumberLong(1), "h" : NumberLong("7496330015315343468"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 87 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 87), "t" : NumberLong(1), "h" : NumberLong("6887231771345348275"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 86 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 86), "t" : NumberLong(1), "h" : NumberLong("-8334308475501116289"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 85 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 85), "t" : NumberLong(1), "h" : NumberLong("-4389818842553368457"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 84 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 84), "t" : NumberLong(1), "h" : NumberLong("-6012401881812808014"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 83 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 83), "t" : NumberLong(1), "h" : NumberLong("-6992455086603682893"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 82 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 82), "t" : NumberLong(1), "h" : NumberLong("-6832157638883052545"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 81 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
{ "ts" : Timestamp(1608428945, 81), "t" : NumberLong(1), "h" : NumberLong("-4636270968271759698"), "v" : 2, "op" : "u", "ns" : "m103.messages", "ui" : UUID("4044f5dd-5986-4f34-97e5-8b0194902c3a"), "o2" : { "_id" : 80 }, "wall" : ISODate("2020-12-20T01:49:05.405Z"), "o" : { "$v" : 1, "$set" : { "author" : "norberto" } } }
Type "it" for more

local資料庫中的資料不會被複制,這也是其名為local的原因。

更改replica set配置

這裡給了個例子,全程用rs.isMaster()檢視配置。首先增加2個節點,型別為secondary和arbiter:

rs.add("m103:27014")
rs.addArb("m103:28000")

由於預算原因砍掉了arbiter:

rs.remove("m103:28000")

將其中一個secondary改為hidden:

cfg = rs.conf()
cfg.members[3].votes = 0
cfg.members[3].hidden = true
cfg.members[3].priority = 0
rs.reconfig(cfg)

整個過程並沒有重啟replica set成員。

Replica Set的讀與寫

只有primary允許讀寫。secondary只允許讀,但必須連線到secondary後通過以下命令開啟:

mongo --host "m103:27012" -u "m103-admin" -p "m103-pass"
--authenticationDatabase "admin"
rs.slaveOk()

Failover and Elections

重新選舉可能由於維護,如滾動升級。這也體現了replica set可以保證可用性。在滾動升級中,通常是升級所有secondary後再升級primary。

客戶端連線replica set時,始終會連線primary。

如果primary不能訪問到majority,會自動stepDown然後變成secondary。例如有3個成員的replica set,若兩個secondary全部當機,則餘下的primary會變為secondary。

replica 的成員數建議奇數,以防split brain。

當前primary不可用或其stepdown(下臺)成secondary時會觸發選舉。

如果優先順序相同,擁有最新資料的成員會成為primary。

priority為1或以上可參加選舉,為0可投票但不參與選舉。priority越大,成為primary可能性越大,但不保證。不能參與選舉的成員c成為passive成員。

Write Concern

Write Concern用來保證durability,成員的確認越多,durability越高。

Write Concern定義如下:
0 - 不等待確認
1 - 預設,只需primary確認
2+ - 等待primary和1到多個secondary確認
majority - 等待大多數成員確認,如果成員數為3則majority為2。如此定義最靈活。

Write Concern也支援shard cluster和standard cluster(這是什麼???),會將配置推送到所有shard。

Write Concern還有其它選項:
wtimeout - 超時即認為失敗
j - true或false,要求在journal中提交後才返回確認

Write Concern影響的命令包括insert, update, delete和findAndModify。Write Concern越大,等待時間越長。因此通常的設定是majority。

如果無法滿足Write Concern要求,會一直等待直到超時(如果超時沒設會一直等待下去),則應用會報失敗。但超時不代表寫失敗。

如果Write Concern為1,primary確認後,但在複製到其它成員前失效,對於應用來說寫已經成功,但當primary恢復後,此寫操作會回退(是的,會回退,這是因為其它的secondary形成了majority,而他們並沒有接受到這條資料,而這個老的primary變為secondary)。

Read Concerns

Read Concerns和Write Concerns類似,針對讀操作。Read Concerns和Write Concerns配合使用可保證最佳的durability。

Read Concerns設定為:

  • local 最新的資料,不保證durability
  • available 除了對於shard cluster,其它和local一樣。在replica set中是預設配置。
  • majority 只返回寫到majority成功的資料,durability更好,但資料未必最新,這是由於讀取後資料被修改,並還未來得及複製。
  • linearizable 在majority之上,還保證了只讀到自己的寫操作

如何選擇需考慮fast, latest和safe 三要素。

  • latest和fast,可選擇local和available,但不保證durability
  • fast和safe,可選擇majority,但不能保證資料最新?
  • linearizable最慢,而且只能對單個document操作

假設資料寫到primary後,在還沒來得及複製到其它成員前,應用來讀取資料,然後primary失效。這是雖然應用讀到了資料,但當此primary恢復後,此條資料會回退,儘管這條資料會被記錄在日誌中。所以此時的行為取決於你的架構設計,需小心

Read concern可以避免以上問題。

不滿足Read concern並不代表資料會丟,只是表示資料在讀取時還沒有傳播到足夠多的成員。

Read Preferences

read preference可選擇讀取的成員。級別如下:

  • primary 預設,secondary只用於可用性
  • primaryPreferred primary優先
  • secondary
  • secondaryPreferred secondary優先
  • nearest 網路延遲最低的成員,一般用於跨地域的replica set

除primary模式,其餘都不能保證是最新的資料。換句話說,如果3個節點中有2個失效,除primary外,其它都可以保證讀到資料,因為此時剩下的那個節點變為secondary。

雖然可將讀定向到secondary,secondary雖然也可以設定Read Concern為local和available,但並不能保證資料最新。

Chapter 3: Sharding

Sharding是MongoDb的橫向擴充套件方案,每一個shard儲存資料全集的一部分。而replica set中每一個成員存的都是資料全集。

Sharding可擴充套件儲存,記憶體,處理能力和吞吐量。

sharding的程式是mongos,不再是mongod。

何時考慮Sharding

何時sharding?首先必須考慮縱向擴充套件的經濟可行性(economically viable)。如果可能並可行,應首先考慮縱向擴充套件。如果效能提升2倍需要10倍的錢,則可以考慮橫向擴充套件。

還需考慮管理的開銷。例如磁碟容量增加後,備份恢復時間都會增加。而採用sharding則可以並行備份和恢復。

考慮操作是否可並行化,如地理分佈的資料,如單執行緒操作(聚合操作)等,都適合並行。

Sharding架構

mongos相當於應用和shard之間的路由器,負責將查詢導向後端的shard。可以有多個mongos以保證高可用性。
shard的後設資料資訊存於config servers。如果查詢並不依賴shard key,則請求將發生到所有shard。

shard是以collection為單位的,不做sharding的collection均存於primary shard。此外primary shard還可能(mongos也可以做)用於匯聚的合併操作。

設定Sharded Cluster

大致過程為:

  1. 建立config server的replica set
  2. 配置和啟動mongos,mongos需關聯config server
  3. 為replica set啟用sharding(通過滾動升級的形式)
  4. 為cluster新增shard

mongos不存資料,因此無需指定dbpath。mongos的使用者認證設定繼承自config server。

mongos.conf示例檔案如下,其中最關鍵的部分就是制定configDB:

sharding:
  configDB: csrs/localhost:27004,localhost:27005,localhost:27006
security:
  keyFile: /var/mongodb/pki/m103-keyfile
net:
  bindIp: localhost
  port: 26000
systemLog:
  destination: file
  path: /var/mongodb/logs/mongos.log
  logAppend: true
processManagement:
  fork: true

啟動mongos:

mongos -f mongos.conf

假設config server的replica set均已起動,shard的replica set均配置完成並啟動:

bash-4.4# ps -ef|grep mongod
  241 root      0:04 mongod --port 27004 --dbpath /var/mongodb/db/4 --auth --logpath /var/mongodb/logs/mongod4.log --logappend --fork --replSet csrs --keyFile /var/mongodb/pki/m103-keyfile --configsvr
  242 root      0:04 mongod --port 27006 --dbpath /var/mongodb/db/6 --auth --logpath /var/mongodb/logs/mongod6.log --logappend --fork --replSet csrs --keyFile /var/mongodb/pki/m103-keyfile --configsvr
  243 root      0:03 mongod --port 27003 --dbpath /var/mongodb/db/3 --auth --logpath /var/mongodb/logs/mongod3.log --logappend --fork --replSet shard1 --keyFile /var/mongodb/pki/m103-keyfile --shardsvr
  244 root      0:04 mongod --port 27005 --dbpath /var/mongodb/db/5 --auth --logpath /var/mongodb/logs/mongod5.log --logappend --fork --replSet csrs --keyFile /var/mongodb/pki/m103-keyfile --configsvr
  245 root      0:03 mongod --port 27001 --dbpath /var/mongodb/db/1 --auth --logpath /var/mongodb/logs/mongod1.log --logappend --fork --replSet shard1 --keyFile /var/mongodb/pki/m103-keyfile --shardsvr
  246 root      0:03 mongod --port 27002 --dbpath /var/mongodb/db/2 --auth --logpath /var/mongodb/logs/mongod2.log --logappend --fork --replSet shard1 --keyFile /var/mongodb/pki/m103-keyfile --shardsvr
  868 root      0:00 grep mongod

連線到mongos,檢視shard:

$ mongo --port 26000 --username m103-admin --password m103-pass --authenticationDatabase admin
# 注意此時shards為空
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fdf39025c4fd0d6267ca5cf")
  }
  shards:
  active mongoses:
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }

新增單個shard:

mongos> sh.addShard("shard1/localhost:27001")
{
        "shardAdded" : "shard1",
        "ok" : 1,
        "operationTime" : Timestamp(1608465462, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608465462, 3),
                "signature" : {
                        "hash" : BinData(0,"tp9oK8G8mPImYYHxz6dnMQjbrLk="),
                        "keyId" : NumberLong("6908303034162348061")
                }
        }
}
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fdf39025c4fd0d6267ca5cf")
  }
  shards:
        {  "_id" : "shard1",  "host" : "shard1/localhost:27001,localhost:27002,localhost:27003",  "state" : 1 }
  active mongoses:
        "4.0.5" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }

Config DB

以下為查詢config DB的部分命令:

mongos> use config
switched to db config
mongos> show collections
changelog
chunks
lockpings
locks
migrations
mongos
shards
tags
transactions
version

mongos> db.databases.find().pretty()
{
        "_id" : "m103",
        "primary" : "shard1",
        "partitioned" : true,
        "version" : {
                "uuid" : UUID("d76fa5f8-dd28-42b5-b13b-1c2feaca0b91"),
                "lastMod" : 1
        }
}

mongos> db.collections.find().pretty()
{
        "_id" : "config.system.sessions",
        "lastmodEpoch" : ObjectId("5fdf5bbba0633a394b503e73"),
        "lastmod" : ISODate("1970-02-19T17:02:47.296Z"),
        "dropped" : false,
        "key" : {
                "_id" : 1
        },
        "unique" : false,
        "uuid" : UUID("84e1711b-59a4-40d8-a5c0-29693c12e03d")
}
{
        "_id" : "m103.products",
        "lastmodEpoch" : ObjectId("5fdf5d8fa0633a394b5044a1"),
        "lastmod" : ISODate("1970-02-19T17:02:47.296Z"),
        "dropped" : false,
        "key" : {
                "sku" : 1
        },
        "unique" : false,
        "uuid" : UUID("ffe35e95-d39e-4cca-b392-cbf2a967470e")
}
mongos> db.shards.find().pretty()
{
        "_id" : "shard1",
        "host" : "shard1/localhost:27001,localhost:27002,localhost:27003",
        "state" : 1
}

mongos> db.chunks.find().pretty()
{
        "_id" : "config.system.sessions-_id_MinKey",
        "ns" : "config.system.sessions",
        "min" : {
                "_id" : { "$minKey" : 1 }
        },
        "max" : {
                "_id" : { "$maxKey" : 1 }
        },
        "shard" : "shard1",
        "lastmod" : Timestamp(1, 0),
        "lastmodEpoch" : ObjectId("5fdf5bbba0633a394b503e73"),
        "history" : [
                {
                        "validAfter" : Timestamp(1608473531, 5),
                        "shard" : "shard1"
                }
        ]
}
{
        "_id" : "m103.products-sku_MinKey",
        "ns" : "m103.products",
        "min" : {
                "sku" : { "$minKey" : 1 }
        },
        "max" : {
                "sku" : { "$maxKey" : 1 }
        },
        "shard" : "shard1",
        "lastmod" : Timestamp(1, 0),
        "lastmodEpoch" : ObjectId("5fdf5d8fa0633a394b5044a1"),
        "history" : [
                {
                        "validAfter" : Timestamp(1608473999, 3),
                        "shard" : "shard1"
                }
        ]
}

mongos> db.mongos.find().pretty()
{
        "_id" : "ANayP6BmIwnwD2hTWw:26000",
        "advisoryHostFQDNs" : [ ],
        "mongoVersion" : "4.0.5",
        "ping" : ISODate("2020-12-20T12:11:27.184Z"),
        "up" : NumberLong(100),
        "waiting" : true
}

shard按shard key被分為chunk,每一個chunk包括shard key的最小值和最大值。

Config DB是內部使用的,除售後工程師外,其它人不要去改它。

Shard Keys

根據shard key可定位到具體的shard和chunk。

collection中的每一個shard都必須包括shard key field。

shard key field必須先建立索引,並且在建立shard後不允許修改,即shard key和shard key的值都不允許修改。建立shard後,不允許拆除,也就是不允許逆操作。從這一點看,shard key 類似於主鍵。

為collection建立shard的過程為:

  1. 為資料庫啟用shard - sh.enableSharding(<database>)
  2. 為collection的shard field建立索引 - db.<collection>.createIndex( )
  3. 為collection啟用shard - sh.shardCollection("<database>.<collection>", {shard key } )

在一個資料庫中,可以同時包括shard和非shard的collection。

好的shard key必須保證良好的寫分佈性,具有以下特徵:

  • Cardinality 較高的基數或唯一值,象使用者名稱,身份證ID
  • Frequency 低出現頻率,即對於指定shard key的低重複率,否則會造成shard的大小不均衡,從而形成熱點
  • Monotonic change 非單調改變或非線性增長,例如時間戳,同樣會造成shard不均衡

以上情況應避免,但並不保證是好的shard key。好的shard 可以可以保證資料均勻的寫分佈,提供讀隔離(類似於分割槽)。要在測試環境中測試充分,一旦shard,再變回非shard非常困難,應儘量避免。

Hashed Shard Keys

雜湊shard key不是說在document中存放了雜湊值,而是對shard key的值進行雜湊運算,然後決定資料的分佈。例如對於時間戳,如果做成雜湊shard key就可避免熱點資料。

對於雜湊shard key的範圍查詢很有可能需要查詢所有shard,即scatter-gather。

雜湊shard key只能用於非陣列的單個field,不支援快速排序。

為collection建立雜湊shard的過程為:

  1. 為資料庫啟用shard - sh.enableSharding(<database>)
  2. 為collection的shard field建立索引 - db.<collection>.createIndex( <field> : "hashed")
  3. 為collection啟用shard - sh.shardCollection("<database>.<collection>", {<shard key field> : "hashed"} )

為collection啟用shard實驗

mongoimport  --host localhost:26000 -u m103-admin -p m103-pass --file=/dataset/products.json -d m103 -c products --authenticationDatabase=admin
2020-12-20T14:08:17.560+0000    connected to: mongodb://localhost:26000/
2020-12-20T14:08:19.698+0000    9966 document(s) imported successfully. 0 document(s) failed to import.

$ mongo --port 26000 --username m103-admin --password m103-pass --authenticationDatabase admin
mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("5fdf5a91d5ef8f4fa9641167")
  }
  shards:
        {  "_id" : "shard1",  "host" : "shard1/localhost:27001,localhost:27002,localhost:27003",  "state" : 1 }
        {  "_id" : "shard2",  "host" : "shard2/localhost:27007,localhost:27008,localhost:27009",  "state" : 1 }
  active mongoses:
        "4.0.5" : 1
  autosplit:
        Currently enabled: yes
  balancer:
        Currently enabled:  yes
        Currently running:  no
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                No recent migrations
  databases:
        {  "_id" : "config",  "primary" : "config",  "partitioned" : true }
        {  "_id" : "m103",  "primary" : "shard1",  "partitioned" : false,  "version" : {  "uuid" : UUID("d76fa5f8-dd28-42b5-b13b-1c2feaca0b91"),  "lastMod" : 1 } }

mongos> sh.enableSharding("m103")
{
        "ok" : 1,
        "operationTime" : Timestamp(1608473541, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608473541, 3),
                "signature" : {
                        "hash" : BinData(0,"AwBqBTGWhi5JEtzjJ25ddE+GVck="),
                        "keyId" : NumberLong("6908339932226387997")
                }
        }
}

mongos> use m103
switched to db m103
mongos> db.products.findOne()
{
        "_id" : ObjectId("573f7197f29313caab89b21b"),
        "sku" : 20000008,
        "name" : "Come Into The World - CD",
        "type" : "Music",
        "regularPrice" : 14.99,
        "salePrice" : 14.99,
        "shippingWeight" : "0.25"
}

mongos> db.products.createIndex({"sku":1})
{
        "raw" : {
                "shard1/localhost:27001,localhost:27002,localhost:27003" : {
                        "createdCollectionAutomatically" : false,
                        "numIndexesBefore" : 1,
                        "numIndexesAfter" : 2,
                        "ok" : 1
                }
        },
        "ok" : 1,
        "operationTime" : Timestamp(1608473773, 3),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608473773, 3),
                "signature" : {
                        "hash" : BinData(0,"eAXEft6nZ716Isdfek2g9YKN/t4="),
                        "keyId" : NumberLong("6908339932226387997")
                }
        }
}

mongos> sh.shardCollection("m103.products", {"sku" : 1 } )
{
        "collectionsharded" : "m103.products",
        "collectionUUID" : UUID("ffe35e95-d39e-4cca-b392-cbf2a967470e"),
        "ok" : 1,
        "operationTime" : Timestamp(1608473999, 14),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1608473999, 14),
                "signature" : {
                        "hash" : BinData(0,"k7avfESitevf56Lf04/scXjGusw="),
                        "keyId" : NumberLong("6908339932226387997")
                }
        }
}

Chunk

chunk是document的邏輯分組。chunk在任一時刻只能位於一個shard中。

config DB包含了chunk與shard的對映關係。

將shard key按其值分割槽,即為chunk。chunk的大小(chunkSize)預設為64M,處於1M到1024M之間。

chunk的下限是inclusive的,上限是exclusive的。

chunk是可以在shard間移動的,因此可以保證shard的分佈均勻。chunk中的document也可能一定到其它chunk,即chunk可以拆分和合並。

chunk size,shard key的技術和出現頻率決定了chunk的數量。

如果只能有1個chunk,也就只能有一個shard。

修改chunk後,資料重新分佈必須有外部行為引發,例如匯入新的資料。重新平衡需要時間。

Balancer

balancer負責將chunk均勻的分佈在shard之間。

balancer執行在config server的primary成員上。自動執行,無需人工干涉。

balancer每次可遷移的chunk數為:floor(#shard/2),這麼做是為了降低效能影響。

可以啟停balancer,在指定時間窗執行balancer:

sh.startBalancer(timeout, interval)
sh.stopBalancer(timeout, interval)
sh.setBalancerState(boolean)

Queries in a Sharded Cluster

mongos是查詢的介面,處理叢集中所有的查詢。決定了去往的1個或多個shard,結果在mongos進行合併,類似於map/reduce。

對於sort(), limit(),skip()操作,都是先在各shard執行,然後在mongos進行合併。

Targeted Queries vs Scatter Gather

如果能決定去哪一個shard,則稱為targeted query,targeted query要求查詢中帶shard key,不過帶shard key的範圍查詢仍需要訪問所有shard;否則稱為Scatter/Gather,例如不帶shard key的查詢。

考試

考試分數佔50%,都是選擇。
在這裡插入圖片描述

相關文章