Amazing Algorithms with NoSQL: A MongoDB Example Part 2

jieforest發表於2012-06-21

In part 1 of this article, I described the use of MongoDB to solve a specific Chemoinformatics problem, namely the computation of molecular similarities. Depending on the target Tanimoto coefficient, the MongoDB solution is able to screen a database of a million compounds in subsecond time.

To make this possible, queries only return chemical compounds which, in theory, are able to satisfy the particular target Tanimoto. Even though this optimization is in place, the number of compounds returned by this query increases significantly when the target Tanimoto is lowered.

The example code on the GitHub repository for instance, imports and indexes ~25000 chemical compounds. When a target Tanimoto of 0.8 is employed, the query returns ~700 compounds. When the target Tanimoto is lowered to 0.6, the number of returned compounds increases to ~7000.

Using the MongoDB explain functionality, one is able to observe that the internal MongoDB query execution time increases slightly, compared to the execution overhead to transfer the full list of 7000 compounds to the remote Java application. Hence, it would make more sense to perform. the calculations local to where the data is stored. Welcome to MongoDB’s build-in map-reduce functionality!

1. MongoDB molecular similarity map-reduce query

Map-reduce is a conceptual framework, introduced by Google, to enable the processing of huge datasets using a large number of processing nodes. The general idea is that a larger problem is divided in a set of smaller subproblems that can be answered (i.e. solved) by an individual processing node (the map-step).

Afterwards, the individual solutions are combined again to produce the final answer to the larger problem (the reduce-step). By making sure that the individual map and reduce steps can be computed independently of each other, this divide-and-conquer technique can be easily parallelized on a cluster of processing nodes. Let’s start by refactoring our solution to use MongoDB’s map-reduce functionality.

CODE:

01.te the essential numbers
02.int maxnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() / 0.6);
03.int minnumberofcompoundfingerprints = (int) (fingerprintsToFind.size() * 0.6);
04.int numberoffingerprintstoconsider = fingerprintsToFind.size() - minnumberofcompoundfingerprints;
05.
06.List fingerprintsToConsider = fingerprintsToFind.subList(0,numberoffingerprintstoconsider+1);
07.
08.// Find all compounds that satisfy the specified conditions
09.DBObject compoundquery =
10.QueryBuilder.start(FINGERPRINTS_PROPERTY).in(fingerprintsToConsider)
11..and(FINGERPRINTCOUNT_PROPERTY).lessThanEquals(maxnumberofcompoundfingerprints)
12..and(FINGERPRINTCOUNT_PROPERTY).greaterThanEquals(minnumberofcompoundfingerprints)
13..get();
14.
15.// The map fuction
16.String map = "function() { " +
17."var found = 0; " +
18."var fingerprintslength = this.fingerprints.length; " +
19."for (i = 0; i < fingerprintslength; i++) { " +
20."if (fingerprintstofind[this.fingerprints[i]] === true) { found++; } " +
21."} " +
22."if (found >= minnumberofcompoundfingerprints) { emit (this.compound_cid, {found : found, total : this.fingerprint_count} ); } " +
23."}";
24.
25.// Execute the map reduce function
26.MapReduceCommand mr = new MapReduceCommand(compoundsCollection, map, "", null, MapReduceCommand.OutputType.INLINE, compoundquery);
27.
28.// Create a hashmap for the fingerprints to find (to speed up the javascript. execution)
29.Map tofind = new HashMap();
30.for(String fingerprinttofind : fingerprintsToFind) {
31.tofind.put(fingerprinttofind,true);
32.}
33.
34.// Set the map reduce scope
35.Map scope = new HashMap();
36.scope.put("fingerprintstofind",tofind);
37.scope.put("minnumberofcompoundfingerprints",minnumberofcompoundfingerprints);
38.mr.setScope(scope);
39.
40.// Execute the map reduce
41.MapReduceOutput ut = compoundsCollection.mapReduce(mr);
42.
43.// Iterate the results
44.for (DBObject result : out.results()) {
45.String compound_cid = (String)result.get("_id");
46.DBObject value = (DBObject)result.get("value");
47.
48.// Calculate the tanimoto coefficient
49.double totalcount = (Double)value.get("total");
50.double found = (Double)value.get("found");
51.double tanimoto = (Double)value.get("found") / ((Double)value.get("total") + fingerprintsToFind.size() - (Double)value.get("found"));
52.// We still need to check whether the tanimoto is really >= the required similarity
53.if (tanimoto >= 0.6) {
54.System.out.println(compound_cid + " " + (int)(tanimoto * 100) +"%");
55.}

來自 “ ITPUB部落格 ” ，連結：http://blog.itpub.net/301743/viewspace-733548/，如需轉載，請註明出處，否則將追究法律責任。

Amazing Algorithms with NoSQL: A MongoDB Example
2012-06-17
SQLMongoDB
Study Plan For Algorithms - Part38
2024-09-23
Go
Study Plan For Algorithms - Part39
2024-09-23
Go
Study Plan For Algorithms - Part40
2024-09-23
Go
Study Plan For Algorithms - Part45
2024-09-28
Go
Study Plan For Algorithms - Part44
2024-09-28
Go
Study Plan For Algorithms - Part46
2024-09-29
Go
Study Plan For Algorithms - Part1
2024-08-15
Go
Study Plan For Algorithms - Part3
2024-08-17
Go
Study Plan For Algorithms - Part4
2024-08-18
Go
Study Plan For Algorithms - Part6
2024-08-20
Go
Study Plan For Algorithms - Part7
2024-08-21
Go
Study Plan For Algorithms - Part9
2024-08-23
Go
Study Plan For Algorithms - Part33
2024-09-19
Go
Study Plan For Algorithms - Part37
2024-09-20
Go
Study Plan For Algorithms - Part13
2024-08-27
Go
Study Plan For Algorithms - Part14
2024-08-28
Go
Study Plan For Algorithms - Part15
2024-08-29
Go
Study Plan For Algorithms - Part16
2024-08-30
Go
Study Plan For Algorithms - Part17
2024-08-31
Go
Study Plan For Algorithms - Part19
2024-09-02
Go
Study Plan For Algorithms - Part25
2024-09-08
Go
Study Plan For Algorithms - Part26
2024-09-09
Go
Study Plan For Algorithms - Part27
2024-09-10
Go
Study Plan For Algorithms - Part28
2024-09-11
Go
Study Plan For Algorithms - Part29
2024-09-12
Go
Study Plan For Algorithms - Part10
2024-08-24
Go
Study Plan For Algorithms - Part11
2024-08-25
Go
Study Plan For Algorithms - Part12
2024-08-26
Go
NoSQL學習——MongoDB
2018-11-02
SQLMongoDB
NoSQL(MongoDB,Riak,CouchDB,Redis)
2014-04-24
SQLMongoDBRedis
angular 2 by example
2017-02-04
Angular
mongodb dba常用的nosql語句
2019-05-28
MongoDBSQL
The SQL vs NoSQL Difference: MySQL vs MongoDB
2019-04-13
MySqlMongoDB
Spring Boot 使用NoSQL之 MongoDB
2017-03-20
Spring BootSQLMongoDB
MongoDB DBA常用的NoSQL語句（全）
2020-01-13
MongoDBSQL
DB2 PL/SQL Example: Runstats
2015-03-28
DB2SQL
Spring Boot實戰系列(2)資料儲存之NoSQL資料庫MongoDB
2018-11-06
Spring BootSQL資料庫MongoDB

Amazing Algorithms with NoSQL: A MongoDB Example Part 2

相關文章