在MongoDB下使用JS和Python的效能比較

banq發表於2013-12-25


在普通硬體上,MongoDB能夠實現每秒插入80000記錄。

時間事件樣本資料如下:

{
"_id" : ObjectId("5298a5a03b3f4220588fe57c"),
"created_on" : ISODate("2012-04-22T01:09:53Z"),
"value" : 0.1647851116706831
}
<p class="indent">


當我們想要得到的隨機值,我們認為使用JavaScript或Python生成它們的(我們可以在Java中嘗試過,但我們想盡可能快地把它寫)。我們不知道哪一個會比較快,所以我們決定對它們進行測試。我們的第一次嘗試是透過在MongoDB shell中執行JavaScript:


var minDate = new Date(2012, 0, 1, 0, 0, 0, 0);
var maxDate = new Date(2013, 0, 1, 0, 0, 0, 0);
var delta = maxDate.getTime() - minDate.getTime();
 
var job_id = arg2;
 
var documentNumber = arg1;
var batchNumber = 5 * 1000;
 
var job_name = 'Job#' + job_id
var start = new Date();
 
var batchDocuments = new Array();
var index = 0;
 
while(index < documentNumber) {
    var date = new Date(minDate.getTime() + Math.random() * delta);
    var value = Math.random();
    var document = {        
        created_on : date,
        value : value
    };
    batchDocuments[index % batchNumber] = document;
    if((index + 1) % batchNumber == 0) {
        db.randomData.insert(batchDocuments);
    }
    index++;
    if(index % 100000 == 0) {   
        print(job_name + ' inserted ' + index + ' documents.');
    }
}
print(job_name + ' inserted ' + documentNumber + ' in ' + (new Date() - start)/1000.0 + 's');
<p class="indent">

執行後結果是:
mongo random --eval "var arg1=50000000;arg2=1" create_random.js
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 566.294s

時間花費566.294s,平均88293 inserts/second。

而使用Python腳步後的輸出時間:

python create_random.py 50000000
Job#1 inserted 100000 documents.
Job#1 inserted 200000 documents.
Job#1 inserted 300000 documents.
...
Job#1 inserted 49900000 documents.
Job#1 inserted 50000000 in 1713.501 s

時間花費1713.501s, 平均比javascript慢29180 inserts/second。但是不用洩氣,我們可以讓Python 利用四核的潛力,每個CPU執行一個。

import sys
import pymongo
import time
import subprocess
import multiprocessing
 
from datetime import datetime
 
cpu_count = multiprocessing.cpu_count()
 
# obtain a mongo connection
connection = pymongo.Connection('mongodb://localhost', safe=True)
 
# obtain a handle to the random database
db = connection.random
collection = db.randomData
 
total_documents_count = 50 * 1000 * 1000;
inserted_documents_count = 0
sleep_seconds = 1
sleep_count = 0
 
for i in range(cpu_count):
    documents_number = str(total_documents_count/cpu_count)
    print documents_number
    subprocess.Popen(['python', '../create_random.py', documents_number, str(i)])
 
start = datetime.now();
 
while (inserted_documents_count < total_documents_count) is True:
    inserted_documents_count = collection.count()
    if (sleep_count > 0 and sleep_count % 60 == 0):  
        print 'Inserted ', inserted_documents_count, ' documents.'     
    if (inserted_documents_count < total_documents_count):
        sleep_count += 1
        time.sleep(sleep_seconds)   
 
print 'Inserting ', total_documents_count, ' took ', (datetime.now() - start).total_seconds(), 's'  
<p class="indent">


這次執行結果是:
python create_random_parallel.py
Job#3 inserted 100000 documents.
Job#2 inserted 100000 documents.
Job#0 inserted 100000 documents.
Job#1 inserted 100000 documents.
Job#3 inserted 200000 documents.
...
Job#2 inserted 12500000 in 571.819 s
Job#0 inserted 12400000 documents.
Job#3 inserted 10800000 documents.
Job#1 inserted 12400000 documents.
Job#0 inserted 12500000 documents.
Job#0 inserted 12500000 in 577.061 s
Job#3 inserted 10900000 documents.
Job#1 inserted 12500000 documents.
Job#1 inserted 12500000 in 578.427 s
Job#3 inserted 11000000 documents.
...
Job#3 inserted 12500000 in 623.999 s
Inserting 50000000 took 624.655 s

平均80044 inserts/seconds,成績符合我們的預期。但是比JS還慢一些,下面我們使用子程式再最佳化:


for i in range(cpu_count):
    documents_number = str(total_documents_count/cpu_count)
    script_name = 'create_random_' + str(i + 1) + '.bat'
    script_file = open(script_name, 'w')
    script_file.write('mongo random --eval "var arg1=' + documents_number +';arg2=' + str(i + 1) +'" ../create_random.js');
    script_file.close()
    subprocess.Popen(script_name) 
<p class="indent">


最後得到83437 inserts/second成績,但是還是沒有擊敗Javascript的88293 inserts/second成績。

測試網站程式碼:Github

相關文章