近來在使用PHP進行結構式搜尋的時候,發現獲取到smiles無法進行查詢操作,然後想到了轉化為分子式的操作。主要是用在有機物。


困難一:我在smiles 上利用正則匹配C,O。遇到了問題就是其他元素也有C字母,所以無法正確去掉C


解決:我只用原來的smiles來分析有機物的組成,然後我按照有機物的結構,單獨找出來C,O的數量,其餘元素,簡單統計一下放在後面就好了。主要分為三部分,C數目,O數目,其他元素


實現;1.前臺獲取的smiles,符合相應的規則

    2.php處理

    

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
                                        $Cnum=``;
                    $Onum=``;
                    //print($smiles."原來的");
                    $find=array("=","#",".","1","[","]","(",")");
                    $replace = array("");
                    $smiles=str_replace($find,$replace,$smiles,$j);
                    //print($j);
                    //ECHO `/n`;
                    $ChemElement= array("Li","Be","Na","Mg","Al","Si","Cl","Br","Ca","Cr","Mn","Fe","Co","Ni","Cu","Zn","Ga","Gc","Ag","Au");
                    foreach($ChemElement as $value){
                        //print($value);
                        $k_x=substr_count($smiles,$value);
                        if($k_x>0){
                            $k_x=$k_x==1?``:$k_x;
                            //str_replace($value,``,$smiles);
                            $smiles_new.=$value.$k_x;
                        }
                    }
                    $k_c=substr_count($smiles,`C`);
                    //print(`C個數`.$k_c);
                    $i_c=preg_match_all(`/C[a-z]/m`,$smiles);
                    //print(`非碳個數`.$i_c);
                    $j_c=$k_c-$i_c;
                    //print_r(`碳個數`.$j_c);
                    //$smiles=preg_replace(`/C[0-9A-Z/.]/m`,``,$smiles,-1,$count);
                    //print($smiles);
                    //print(`替換次數`.$count);
                    $k_o=substr_count($smiles,`O`);
                    //print(`O個數`.$k_o);
                    $i_o=preg_match_all(`/O[a-z]/m`,$smiles);
                    //print(`非氧個數`.$i_o);
                    $j_o=$k_o-$i_o;
                    //$smiles=preg_replace(`/C[0-9A-Z/.]/m`,``,$smiles,-1,$count);
                    //print($smiles);
                    //print(`替換次數`.$count);
                    if($j_c>0){$j_c=$j_c==1?``:$j_c;$Cnum=`C`.$j_c;}
                    if($j_o>0){$j_o=$j_o==1?``:$j_o;$Onum=`O`.$j_o;}
                    $smilesPara = $Cnum.$Onum.$smiles_new;
                     
 
                     
結果:基本上可以解決一般意義上的分子式,當然元素我沒有寫全,我認為常用的寫一下就好,本來就是為了搜尋,不常見的物質,化學品庫裡也沒有。
 
這裡推薦一下 斯芬克斯的PHP搜尋引擎,很不錯。