Basic Aggregation in MongoDB 2.1 with Python
Why a new framework?
If you've been following along with this article series, you've been introduced to MongoDB's mapreducecommand, which up until MongoDB 2.1 has been the go-to aggregation tool for MongoDB. (There's also the group() command, but it's really no more than a less-capable and un-shardable version of mapreduce(), so we'll ignore it here.)
So if you already have mapreduce() in your toolbox, why would you ever want something else?
Mapreduce is hard; let's go shopping
The first motivation behind the new framework is that, while mapreduce() is a flexible and powerful abstraction for aggregation, it's really overkill in many situations, as it requires you to re-frame. your problem into a form. that's amenable to calculation using mapreduce().
For instance, when I want to calculate the mean value of a property in a series of documents, trying to break that down into appropriate map, reduce, and finalize steps imposes some extra cognitive overhead that we'd like to avoid. So the new aggregation framework is (IMO) simpler.
The Javascript. global interpreter lock is evil
The MapReduce algorithm, the basis of MongoDB's mapreduce() command, is a great approach to solving Embarrassingly Parallel problems.
Each invocation of map, reduce, and finalize is completely independent of the others (though the map/reduce/finalize phases are order-dependent), so we shouldbe able to dispatch these jobs to run in parallel without any problems.
Unfortunately, due to MongoDB's use of the SpiderMonkey Javascript. engine, each mongod process is restricted to running a single Javascript. thread at a time.
So in order to get any parallelism with a MongoDB mapreduce(), you must run it on a sharded cluster, and on a cluster with N shards, you're limited to N-way parallelism.
If you've been following along with this article series, you've been introduced to MongoDB's mapreducecommand, which up until MongoDB 2.1 has been the go-to aggregation tool for MongoDB. (There's also the group() command, but it's really no more than a less-capable and un-shardable version of mapreduce(), so we'll ignore it here.)
So if you already have mapreduce() in your toolbox, why would you ever want something else?
Mapreduce is hard; let's go shopping
The first motivation behind the new framework is that, while mapreduce() is a flexible and powerful abstraction for aggregation, it's really overkill in many situations, as it requires you to re-frame. your problem into a form. that's amenable to calculation using mapreduce().
For instance, when I want to calculate the mean value of a property in a series of documents, trying to break that down into appropriate map, reduce, and finalize steps imposes some extra cognitive overhead that we'd like to avoid. So the new aggregation framework is (IMO) simpler.
The Javascript. global interpreter lock is evil
The MapReduce algorithm, the basis of MongoDB's mapreduce() command, is a great approach to solving Embarrassingly Parallel problems.
Each invocation of map, reduce, and finalize is completely independent of the others (though the map/reduce/finalize phases are order-dependent), so we shouldbe able to dispatch these jobs to run in parallel without any problems.
Unfortunately, due to MongoDB's use of the SpiderMonkey Javascript. engine, each mongod process is restricted to running a single Javascript. thread at a time.
So in order to get any parallelism with a MongoDB mapreduce(), you must run it on a sharded cluster, and on a cluster with N shards, you're limited to N-way parallelism.
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/301743/viewspace-732370/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- MongoDB系列--深入理解MongoDB聚合(Aggregation )MongoDB
- MongoDB 新手入門 - AggregationMongoDB
- 2.1 Statistic Basic統計基礎
- 2.1Python語言簡介Python
- MongoDB University筆記總結-M001_Chapter 5: Indexing and Aggregation PipelineMongoDB筆記APTIndex
- 2.1 Python基本語法之註釋Python
- Flink CDC 2.1 正式釋出,穩定性大幅提升,新增 Oracle,MongoDB 支援OracleMongoDB
- Typescript basicTypeScript
- MongoDB University課程M103 Basic Cluster Administration 學習筆記MongoDB筆記
- Python 打包工具 PyInstaller 2.1 釋出Python
- Django 中 Aggregation聚合的使用Django
- OpenAPI Basic StructureAPIStruct
- Docker-BasicDocker
- JUnit basic annotation
- Day 3(Python + Git + MongoDb)PythonGitMongoDB
- 豬行天下之Python基礎——2.1 Python註釋與模組Python
- Oracle Reporting 3 - Aggregation LevelOracle
- IPFS_basic_use
- numpy_torch_basic
- 什麼是MongoDB?Python爬蟲為什麼使用MongoDB?MongoDBPython爬蟲
- mongodb資料庫使用03、python和mongodb的互動MongoDB資料庫Python
- python與MongoDB的連線PythonMongoDB
- python操作mongodb資料庫PythonMongoDB資料庫
- Python Guide 系列 2.1:結構化你的專案PythonGUIIDE
- 【MongoDB學習筆記】手把手教你配置Python操作MongoDBMongoDB筆記Python
- OAuth 2.1 框架OAuth框架
- crntan 2.1 原理
- Machine Learning - Basic pointsMac
- Visual Basic for ApplicationAPP
- Spark Basic RDD 操作示例Spark
- Postgres Basic Commands for Beginners
- Python操作MongoDB文件資料庫PythonMongoDB資料庫
- Python資料庫MongoDB騷操作Python資料庫MongoDB
- Python 資料庫騷操作 -- MongoDBPython資料庫MongoDB
- 在Python應用中使用MongoDBPythonMongoDB
- Introduction to MongoDB for Java, PHP and Python DevelopersMongoDBJavaPHPPythonDeveloper
- Python|Python互動之mongoDB互動詳解PythonMongoDB
- 特性(C# 和 Visual Basic) BASIC 快速建模特性的程式語言C#