Amazing Algorithms with NoSQL: A MongoDB Example
In one of my previous blog posts, I debated the superficial idea that you should own billions of data records before you are eligible to use NoSQL/Big Data technologies.
In this article, I try to illustrate my point, by employing NoSQL, and more specifically MongoDB, to solve a specific Chemoinformatics problem in a truly elegant and efficient way. The complete source code can be found on the Datablend public GitHub repository.
1. Molecular similarity theory
Molecular similarity refers to the similarity of chemical compounds with respect to their structural and/or functional qualities. By calculating molecular similarities, Chemoinformatics is able to help in the design of new drugs by screening large databases for potentially interesting chemical compounds. (This by applying the hypothesis that similar compounds generally exhibit similar biological activities.)
Unfortunately, finding substructures in chemical compounds is a NP-complete problem. Hence, calculating similarities for a particular target compound can take a very long time when considering millions of input compounds. Scientist solved this problem by introducing the notion of structural keys and fingerprints.
In case of structural keys, we precompute the answers on a couple of specific questions that try to capture the essential characteristics of a compound. Each answer is assigned a fixed location within abitstring.
At query time, a lot of time is saved by only executing substructure searches for compounds that have compatible structural keys. (Compatibility being computed by making use of efficient bit operators.)
When employing fingerprints, all linear substructure patterns of a certain length are calculated. As the number of potential patterns is huge, it is not possible to assign an individual bit position to each possible pattern (as is done with structural keys).
Instead, the fingerprints patterns are used in a hash. The downside of this approach is that, depending of the size of the hash, multiple fingerprint patterns share the same bit position, giving lead to potential false positives.
In this article, we will demonstrate the use of non-hashed fingerprints to calculate compound similarities (i.e. using the raw fingerprints).
This approach has two advantages:
1. We eliminate the chance of false positives.
2. The raw fingerprints can be used in other types of structural compound mining
In this article, I try to illustrate my point, by employing NoSQL, and more specifically MongoDB, to solve a specific Chemoinformatics problem in a truly elegant and efficient way. The complete source code can be found on the Datablend public GitHub repository.
1. Molecular similarity theory
Molecular similarity refers to the similarity of chemical compounds with respect to their structural and/or functional qualities. By calculating molecular similarities, Chemoinformatics is able to help in the design of new drugs by screening large databases for potentially interesting chemical compounds. (This by applying the hypothesis that similar compounds generally exhibit similar biological activities.)
Unfortunately, finding substructures in chemical compounds is a NP-complete problem. Hence, calculating similarities for a particular target compound can take a very long time when considering millions of input compounds. Scientist solved this problem by introducing the notion of structural keys and fingerprints.
In case of structural keys, we precompute the answers on a couple of specific questions that try to capture the essential characteristics of a compound. Each answer is assigned a fixed location within abitstring.
At query time, a lot of time is saved by only executing substructure searches for compounds that have compatible structural keys. (Compatibility being computed by making use of efficient bit operators.)
When employing fingerprints, all linear substructure patterns of a certain length are calculated. As the number of potential patterns is huge, it is not possible to assign an individual bit position to each possible pattern (as is done with structural keys).
Instead, the fingerprints patterns are used in a hash. The downside of this approach is that, depending of the size of the hash, multiple fingerprint patterns share the same bit position, giving lead to potential false positives.
In this article, we will demonstrate the use of non-hashed fingerprints to calculate compound similarities (i.e. using the raw fingerprints).
This approach has two advantages:
1. We eliminate the chance of false positives.
2. The raw fingerprints can be used in other types of structural compound mining
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/301743/viewspace-733070/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- Amazing Algorithms with NoSQL: A MongoDB Example Part 2SQLMongoDB
- NoSQL學習——MongoDBSQLMongoDB
- NoSQL(MongoDB,Riak,CouchDB,Redis)SQLMongoDBRedis
- mongodb dba常用的nosql語句MongoDBSQL
- The SQL vs NoSQL Difference: MySQL vs MongoDBMySqlMongoDB
- Spring Boot 使用NoSQL之 MongoDBSpring BootSQLMongoDB
- MongoDB DBA常用的NoSQL語句(全)MongoDBSQL
- Nosql 資料庫 MemCache、Redis、MongoDB 的區別SQL資料庫RedisMongoDB
- NoSQL第一篇——初次見面MongoDBSQLMongoDB
- 28個MongoDB NoSQL資料庫的面試問答MongoDBSQL資料庫面試
- MongoDB、Hbase、Redis等NoSQL優劣勢、應用場景MongoDBRedisSQL
- MongoDB、Cassandra 和 HBase 三種 NoSQL 資料庫比較MongoDBSQL資料庫
- 分散式NoSQL資料庫MongoDB初體驗-v5.0.5分散式SQL資料庫MongoDB
- NoSQL高階培訓課程-HBase&&MongoDB(兩天版)SQLMongoDB
- 關聯式資料庫和NoSQL結合使用:MySQL + MongoDB資料庫MySqlMongoDB
- Spark exampleSpark
- oracle exampleOracle
- NoSQL 資料庫案例實戰 -- MongoDB資料備份、恢復SQL資料庫MongoDB
- VISUALIZATION ALGORITHMSGo
- An example of polybase for OracleOracle
- angular 2 by exampleAngular
- Oracle By Example (OBE)Oracle
- simd example code
- Amazing!!CSS 也能實現極光?CSS
- Amazing tree —— 二叉查詢樹
- An example about git hookGitHook
- react router animation exampleReact
- An Example of How Oracle WorksOracle
- [Typescript] Query builder exampleTypeScriptUI
- NoSQLSQL
- 技術分享|SQL和 NoSQL資料庫之間的差異:MySQL(VS)MongoDB資料庫MySqlMongoDB
- Spring Boot實戰系列(2)資料儲存之NoSQL資料庫MongoDBSpring BootSQL資料庫MongoDB
- Amazing!!CSS 也能實現煙霧效果?CSS
- Algorithms for Compiler DesignGoCompile
- A example that using JQuery clonejQuery
- a simple example for spring AOPSpring
- An Application Context exampleAPPContext
- NOSQL資料庫大比拼:Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBaseSQL資料庫MongoDBRedis