Amazing Algorithms with NoSQL: A MongoDB Example Part 2
In part 1 of this article, I described the use of MongoDB to solve a specific Chemoinformatics problem, namely the computation of molecular similarities. Depending on the target Tanimoto coefficient, the MongoDB solution is able to screen a database of a million compounds in subsecond time.
To make this possible, queries only return chemical compounds which, in theory, are able to satisfy the particular target Tanimoto. Even though this optimization is in place, the number of compounds returned by this query increases significantly when the target Tanimoto is lowered.
The example code on the GitHub repository for instance, imports and indexes ~25000 chemical compounds. When a target Tanimoto of 0.8 is employed, the query returns ~700 compounds. When the target Tanimoto is lowered to 0.6, the number of returned compounds increases to ~7000.
Using the MongoDB explain functionality, one is able to observe that the internal MongoDB query execution time increases slightly, compared to the execution overhead to transfer the full list of 7000 compounds to the remote Java application. Hence, it would make more sense to perform. the calculations local to where the data is stored. Welcome to MongoDB’s build-in map-reduce functionality!
1. MongoDB molecular similarity map-reduce query
Map-reduce is a conceptual framework, introduced by Google, to enable the processing of huge datasets using a large number of processing nodes. The general idea is that a larger problem is divided in a set of smaller subproblems that can be answered (i.e. solved) by an individual processing node (the map-step).
Afterwards, the individual solutions are combined again to produce the final answer to the larger problem (the reduce-step). By making sure that the individual map and reduce steps can be computed independently of each other, this divide-and-conquer technique can be easily parallelized on a cluster of processing nodes. Let’s start by refactoring our solution to use MongoDB’s map-reduce functionality.
02.int maxnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() / 0.6);
03.int minnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() * 0.6);
04.int numberoffingerprintstoconsider = fingerprintsToFind.size() - minnumberofcompoundfingerprints;
05.
06.List fingerprintsToConsider = fingerprintsToFind.subList(0,numberoffingerprintstoconsider+1);
07.
08.// Find all compounds that satisfy the specified conditions
09.DBObject compoundquery =
10.QueryBuilder.start(FINGERPRINTS_PROPERTY).in(fingerprintsToConsider)
11..and(FINGERPRINTCOUNT_PROPERTY).lessThanEquals(maxnumberofcompoundfingerprints)
12..and(FINGERPRINTCOUNT_PROPERTY).greaterThanEquals(minnumberofcompoundfingerprints)
13..get();
14.
15.// The map fuction
16.String map = "function() { " +
17."var found = 0; " +
18."var fingerprintslength = this.fingerprints.length; " +
19."for (i = 0; i < fingerprintslength; i++) { " +
20."if (fingerprintstofind[this.fingerprints[i]] === true) { found++; } " +
21."} " +
22."if (found >= minnumberofcompoundfingerprints) { emit (this.compound_cid, {found : found, total : this.fingerprint_count} ); } " +
23."}";
24.
25.// Execute the map reduce function
26.MapReduceCommand mr = new MapReduceCommand(compoundsCollection, map, "", null, MapReduceCommand.OutputType.INLINE, compoundquery);
27.
28.// Create a hashmap for the fingerprints to find (to speed up the javascript. execution)
29.Map tofind = new HashMap();
30.for(String fingerprinttofind : fingerprintsToFind) {
31.tofind.put(fingerprinttofind,true);
32.}
33.
34.// Set the map reduce scope
35.Map scope = new HashMap();
36.scope.put("fingerprintstofind",tofind);
37.scope.put("minnumberofcompoundfingerprints",minnumberofcompoundfingerprints);
38.mr.setScope(scope);
39.
40.// Execute the map reduce
41.MapReduceOutput ut = compoundsCollection.mapReduce(mr);
42.
43.// Iterate the results
44.for (DBObject result : out.results()) {
45.String compound_cid = (String)result.get("_id");
46.DBObject value = (DBObject)result.get("value");
47.
48.// Calculate the tanimoto coefficient
49.double totalcount = (Double)value.get("total");
50.double found = (Double)value.get("found");
51.double tanimoto = (Double)value.get("found") / ((Double)value.get("total") + fingerprintsToFind.size() - (Double)value.get("found"));
52.// We still need to check whether the tanimoto is really >= the required similarity
53.if (tanimoto >= 0.6) {
54.System.out.println(compound_cid + " " + (int)(tanimoto * 100) +"%");
55.}
To make this possible, queries only return chemical compounds which, in theory, are able to satisfy the particular target Tanimoto. Even though this optimization is in place, the number of compounds returned by this query increases significantly when the target Tanimoto is lowered.
The example code on the GitHub repository for instance, imports and indexes ~25000 chemical compounds. When a target Tanimoto of 0.8 is employed, the query returns ~700 compounds. When the target Tanimoto is lowered to 0.6, the number of returned compounds increases to ~7000.
Using the MongoDB explain functionality, one is able to observe that the internal MongoDB query execution time increases slightly, compared to the execution overhead to transfer the full list of 7000 compounds to the remote Java application. Hence, it would make more sense to perform. the calculations local to where the data is stored. Welcome to MongoDB’s build-in map-reduce functionality!
1. MongoDB molecular similarity map-reduce query
Map-reduce is a conceptual framework, introduced by Google, to enable the processing of huge datasets using a large number of processing nodes. The general idea is that a larger problem is divided in a set of smaller subproblems that can be answered (i.e. solved) by an individual processing node (the map-step).
Afterwards, the individual solutions are combined again to produce the final answer to the larger problem (the reduce-step). By making sure that the individual map and reduce steps can be computed independently of each other, this divide-and-conquer technique can be easily parallelized on a cluster of processing nodes. Let’s start by refactoring our solution to use MongoDB’s map-reduce functionality.
CODE:
01.te the essential numbers02.int maxnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() / 0.6);
03.int minnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() * 0.6);
04.int numberoffingerprintstoconsider = fingerprintsToFind.size() - minnumberofcompoundfingerprints;
05.
06.List
07.
08.// Find all compounds that satisfy the specified conditions
09.DBObject compoundquery =
10.QueryBuilder.start(FINGERPRINTS_PROPERTY).in(fingerprintsToConsider)
11..and(FINGERPRINTCOUNT_PROPERTY).lessThanEquals(maxnumberofcompoundfingerprints)
12..and(FINGERPRINTCOUNT_PROPERTY).greaterThanEquals(minnumberofcompoundfingerprints)
13..get();
14.
15.// The map fuction
16.String map = "function() { " +
17."var found = 0; " +
18."var fingerprintslength = this.fingerprints.length; " +
19."for (i = 0; i < fingerprintslength; i++) { " +
20."if (fingerprintstofind[this.fingerprints[i]] === true) { found++; } " +
21."} " +
22."if (found >= minnumberofcompoundfingerprints) { emit (this.compound_cid, {found : found, total : this.fingerprint_count} ); } " +
23."}";
24.
25.// Execute the map reduce function
26.MapReduceCommand mr = new MapReduceCommand(compoundsCollection, map, "", null, MapReduceCommand.OutputType.INLINE, compoundquery);
27.
28.// Create a hashmap for the fingerprints to find (to speed up the javascript. execution)
29.Map
30.for(String fingerprinttofind : fingerprintsToFind) {
31.tofind.put(fingerprinttofind,true);
32.}
33.
34.// Set the map reduce scope
35.Map
36.scope.put("fingerprintstofind",tofind);
37.scope.put("minnumberofcompoundfingerprints",minnumberofcompoundfingerprints);
38.mr.setScope(scope);
39.
40.// Execute the map reduce
41.MapReduceOutput ut = compoundsCollection.mapReduce(mr);
42.
43.// Iterate the results
44.for (DBObject result : out.results()) {
45.String compound_cid = (String)result.get("_id");
46.DBObject value = (DBObject)result.get("value");
47.
48.// Calculate the tanimoto coefficient
49.double totalcount = (Double)value.get("total");
50.double found = (Double)value.get("found");
51.double tanimoto = (Double)value.get("found") / ((Double)value.get("total") + fingerprintsToFind.size() - (Double)value.get("found"));
52.// We still need to check whether the tanimoto is really >= the required similarity
53.if (tanimoto >= 0.6) {
54.System.out.println(compound_cid + " " + (int)(tanimoto * 100) +"%");
55.}
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/301743/viewspace-733548/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Amazing Algorithms with NoSQL: A MongoDB ExampleSQLMongoDB
- NoSQL學習——MongoDBSQLMongoDB
- angular 2 by exampleAngular
- NoSQL(MongoDB,Riak,CouchDB,Redis)SQLMongoDBRedis
- mongodb dba常用的nosql語句MongoDBSQL
- The SQL vs NoSQL Difference: MySQL vs MongoDBMySqlMongoDB
- Spring Boot 使用NoSQL之 MongoDBSpring BootSQLMongoDB
- MongoDB DBA常用的NoSQL語句(全)MongoDBSQL
- DB2 PL/SQL Example: RunstatsDB2SQL
- Spring Boot實戰系列(2)資料儲存之NoSQL資料庫MongoDBSpring BootSQL資料庫MongoDB
- Nosql 資料庫 MemCache、Redis、MongoDB 的區別SQL資料庫RedisMongoDB
- NoSQL第一篇——初次見面MongoDBSQLMongoDB
- DB2 PL/SQL Example: Sleep ProcedureDB2SQL
- Webshell-Part1&Part2Webshell
- 28個MongoDB NoSQL資料庫的面試問答MongoDBSQL資料庫面試
- DB2 PL/SQL Example: bonus_increaseDB2SQL
- Video for linux 2 example (v4l2 demo)IDELinux
- MongoDB、Hbase、Redis等NoSQL優劣勢、應用場景MongoDBRedisSQL
- MongoDB、Cassandra 和 HBase 三種 NoSQL 資料庫比較MongoDBSQL資料庫
- VISUALIZATION ALGORITHMSGo
- Spark exampleSpark
- oracle exampleOracle
- 分散式NoSQL資料庫MongoDB初體驗-v5.0.5分散式SQL資料庫MongoDB
- NoSQL高階培訓課程-HBase&&MongoDB(兩天版)SQLMongoDB
- 關聯式資料庫和NoSQL結合使用:MySQL + MongoDB資料庫MySqlMongoDB
- NoSQL 資料庫案例實戰 -- MongoDB資料備份、恢復SQL資料庫MongoDB
- [譯] part 19: golang 介面 2Golang
- 核心編譯part2編譯
- An example of polybase for OracleOracle
- Oracle By Example (OBE)Oracle
- Algorithms for Compiler DesignGoCompile
- 2.NoSQL之Redis配置與優化SQLRedis優化
- django入門-模型-part2Django模型
- Python Closures and Decorators (Part. 2)Python
- Amazing!!CSS 也能實現極光?CSS
- Amazing tree —— 二叉查詢樹
- An example about git hookGitHook
- react router animation exampleReact