RDKit:计算不同分子或构象之间的RMSD

一、RDKit简介

RDKit是一款开源化学信息学与机器学习工具包,提供C++和Python的API接口。
RDKit官网http://www.rdkit.org/
RDKit的编译安装及Python(2.7)绑定见博文:http://www.aspirincode.com/2017/11/01/Linux-RDkit-%20install-update/
RMSD计算参考网站:http://www.rdkit.org/docs/Cookbook.html#rmsd-calculation-between-n-molecules

二、RMSD-RDKit

RMSD,Root Mean Square Deviation;给定一个分子结构的两种或者多种状态,
对结构中所有的粒子的偏移量求平方和再平均,开方,就是RMSD。

对于docking而言,如果有reference ligand(结合配体),计算对接配体的构象与结合配体之间的RMSD值,
对于判断docking准确性很重要。下面提供一个RDKit计算不同构象之间RMSD值的Python脚本。

RDKit-cd

三、运行环境

CentOS 7_x64位
RDKit+Python2.7

四、RDKit计算RMSD的Python脚本使用方法

RDKit-RMSD
MOLS-sdf
注:输入文件必须为sdf格式

五、RDKit计算RMSD的Python脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
#!/usr/bin/python
'''''
calculates RMSD differences between all structures in a file
@author: JP <jp@javaclass.co.uk>
'''
import os
import getopt
import sys
# rdkit imports
from rdkit import Chem
from rdkit.Chem import AllChem
'''''
Write contents of a string to file
'''
def write_contents(filename, contents):
# do some basic checking, could use assert strictly speaking
assert filename is not None, "filename cannot be None"
assert contents is not None, "contents cannot be None"
f = open(filename, "w")
f.write(contents)
f.close() # close the file
'''''
Write a list to a file
'''
def write_list_to_file(filename, list, line_sep = os.linesep):
# do some basic checking, could use assert strictly speaking
assert list is not None and len(list) > 0, "list cannot be None or empty"
write_contents(filename, line_sep.join(list))
'''''
Calculate RMSD spread
'''
def calculate_spread(molecules_file):
assert os.path.isfile(molecules_file), "File %s does not exist!" % molecules
# get an iterator
mols = Chem.SDMolSupplier(molecules_file)
spread_values = []
# how many molecules do we have in our file
mol_count = len(mols)
# we are going to compare each molecule with every other molecule
# typical n choose k scenario (n choose 2)
# where number of combinations is given by (n!) / k!(n-k)! ; if my maths isn't too rusty
for i in range(mol_count - 1):
for j in range(i+1, mol_count):
# show something is being done ... because for large mol_count this will take some time
print("Aligning molecule #%d with molecule #%d (%d molecules in all)" % (i, j, mol_count))
# calculate RMSD and store in an array
# unlike AlignMol this takes care of symmetry
spread_values.append(str(AllChem.GetBestRMS(mols[i], mols[j])))
# return that array
return spread_values
def main():
try:
# the options are as follows:
# f - the actual structure file
opts, args = getopt.getopt(sys.argv[1:], "vf:o:")
except getopt.GetoptError, err:
# print help information and exit:
print(str(err)) # will print something like "option -a not recognized"
sys.exit(401)
# DEFAULTS
molecules_file = None
output_file = None
for opt, arg in opts:
if opt == "-v":
print("RMSD Spread 1.1")
sys.exit()
elif opt == "-f":
molecules_file = arg
elif opt == "-o":
output_file = arg
else:
assert False, "Unhandled option: " + opt
# assert the following - not the cleanest way to do this but this will work
assert molecules_file is not None, "file containing molecules must be specified, add -f to command line arguments"
assert output_file is not None, "output file must be specified, add -o to command line arguments"
# get the RMSD spread values
spread_values = calculate_spread(molecules_file)
# write them to file
write_list_to_file(output_file, spread_values)
if __name__ == "__main__":
main()

推荐文章