Visualization
This tutorial will highlight the major viz related features of Datamol.
import datamol as dm
First let's get a dataset.
data = dm.read_csv(
"https://raw.githubusercontent.com/rdkit/rdkit/master/Data/NCI/first_200.tpsa.csv",
comment="#",
header=None,
)
data.columns = ["smiles", "tpsa"]
# Create a mol column
with dm.without_rdkit_log():
data["mol"] = data["smiles"].apply(dm.to_mol)
# Patch the dataframe to render the molecules in it
dm.render_mol_df(data)
data.iloc[0]["mol"]
Now let's cluster the molecules and only keep the first cluster.
cluster_indices, cluster_mols = dm.cluster_mols(data["mol"].dropna().tolist(), cutoff=0.7)
mols = cluster_mols[1]
Display the molecules of the cluster while aligning then using MCS. This can be done using a simple boolean flag in dm.to_image()
.
dm.to_image(mols, mol_size=(300, 200), align=True)
Lasso Highlighting¶
The code below will show how the lasso highlight function should be used. The signature for this function is:
def lasso_highlight_image(
target_molecule: Union[str, dm.Mol],
search_molecules: Union[str, List[str], dm.Mol, List[dm.Mol]],
mol_size: Optional[Tuple[int, int]] = (300, 300)
) -> Image:
The mol_size is the size of the image returned and the target molecule is accepted in the smiles format or mol object and the substructure search as smarts string or mol object.
It is quite difficult to test the production of images so the edge cases will be entered here with a brief description of each.
An edge case is that you can only search for up to 6 substructures unless more colors are added to the code.
import datamol as dm
smi = "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O"
smarts_list = ["[#6](=O)[#7]", "C(=O)", "NC(=O)CCC(=O)O"]
dm.lasso_highlight_image(smi, smarts_list, mol_size=(500, 400), use_svg=False)
Alternatively you may only have one substructure in mind or use a list of molecules as inputs
smis = ["CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]", "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O", "CC(N)Cc1c[nH]c2ccc3c(c12)CCCO3", "c1ccccc1"]
smarts_list = ["[#6](-,=[O;H0])", "[a]1[a][c][c][c][a]1"]
dm.lasso_highlight_image(smis, smarts_list, draw_mols_same_scale=True, legends=["Mol1", "Mol2", "Mol3", "Mol4"], use_svg=True)
CircleGrid¶
CircleGrid allows the visualization of a set of molecules using several concentric rings around a center structure. This representation was designed to:
- highlight neighborhood of an input molecule at increasing distance
- show molecules derived from a starting molecule or core
import datamol as dm
import pandas as pd
import json
buff = '{"SMILES":{"4":"CC(Cc1c[nH]c2c1c3c(cc2)OCCC3)N","7":"COC1Cc2ccccc2C3(O1)CCN(CC3)Cc4ccccc4","9":"Cc1c(c(=O)n2ccccc2n1)CCN3CCc4c(c5cccc(c5o4)Cl)C3","15":"Cc1c(c(=O)n2cc(c(cc2n1)OC)OC)CCN3CCc4c(c5ccccc5o4)C3","17":"Cc1c(c(=O)n2ccccc2n1)CCCN3CCc4c(c5ccccc5o4)C3","20":"Cc1c(c(=O)n2cc(cc(c2n1)Cl)Cl)CCN3CCc4c(c5ccccc5o4)C3","27":"Cc1ccc2nc(c(c(=O)n2c1)CCN3CCc4c(c5ccccc5o4)C3)C","28":"Cc1cccc2n1c(=O)c(c(n2)C)CCN3CCc4c(c5ccccc5o4)C3","36":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)Br","38":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C(F)(F)F","40":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C#N"},"Name":{"4":"CHEMBL133455","7":"CHEMBL141209","9":"CHEMBL164612","15":"CHEMBL162058","17":"CHEMBL165776","20":"CHEMBL163247","27":"CHEMBL162370","28":"CHEMBL163190","36":"CHEMBL359510","38":"CHEMBL181632","40":"CHEMBL181099"},"pIC50":{"4":4.72,"7":5.52,"9":9.05,"15":8.45,"17":7.99,"20":9.23,"27":9.46,"28":9.54,"36":7.2,"38":7.22,"40":6.82}}'
df = pd.DataFrame(json.loads(buff))
df['mol'] = df.SMILES.apply(dm.to_mol)
df.head()
SMILES | Name | pIC50 | mol | |
---|---|---|---|---|
4 | CC(Cc1c[nH]c2c1c3c(cc2)OCCC3)N | CHEMBL133455 | 4.72 | <rdkit.Chem.rdchem.Mol object at 0x1674e5970> |
7 | COC1Cc2ccccc2C3(O1)CCN(CC3)Cc4ccccc4 | CHEMBL141209 | 5.52 | <rdkit.Chem.rdchem.Mol object at 0x1674e59e0> |
9 | Cc1c(c(=O)n2ccccc2n1)CCN3CCc4c(c5cccc(c5o4)Cl)C3 | CHEMBL164612 | 9.05 | <rdkit.Chem.rdchem.Mol object at 0x1674e5a50> |
15 | Cc1c(c(=O)n2cc(c(cc2n1)OC)OC)CCN3CCc4c(c5ccccc... | CHEMBL162058 | 8.45 | <rdkit.Chem.rdchem.Mol object at 0x1674e5ac0> |
17 | Cc1c(c(=O)n2ccccc2n1)CCCN3CCc4c(c5ccccc5o4)C3 | CHEMBL165776 | 7.99 | <rdkit.Chem.rdchem.Mol object at 0x1674e5b30> |
The circle grid function takes as input a center molecule and a list of list of molecules. Each list corresponds to all molecules at the given ring level.
See figure below:
Note that because we are using the FlexiMode of the new RDKit drawing framework, we cannot control the size of the molecule rendering directly. Instead, we can scale up and down the size of the ring molecules, with respect to the center molecule
# let's define the activity dict
# We map to each molecule (or molecule id) a dictionary of properties
activity_dict = {}
for mol, pIC50 in df[["mol","pIC50"]].values:
activity_dict[mol] = {"pIC50" : pIC50, "cLogP" : dm.descriptors.clogp(mol)}
# let's put mol 0 in the center
center_mol = df.mol.values[0]
# let's put mol 1-3 in the first ring
# let's put mol 4-8 in the second ring
ring_mols = [df.mol.values[1:4], df.mol.values[4:9]]
out = dm.viz.circle_grid(center_mol,
ring_mols,
act_mapper=activity_dict,
legend="My Beautiful Circle Grid",
ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
margin=50, # set a reasonable margin
ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
align=None, # align all the molecules to the center molecule
legendFontSize=16 # rdkit drawing options
)
out
We can also highlight some atoms for either the ring molecules or the center molecules.
my_query = dm.from_smarts("CCN")
atom_matches, bond_matches = dm.substructure_matching_bonds(center_mol, my_query)
# we must flatten the list of lists
bond_matches = [item for sublist in bond_matches for item in sublist]
atom_matches = [item for sublist in atom_matches for item in sublist]
dm.viz.circle_grid(center_mol,
ring_mols,
act_mapper=activity_dict,
legend="My Beautiful Circle Grid",
ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
margin=50, # set a reasonable margin
center_mol_highlight_atoms=atom_matches, # highlight the atoms in the center molecule
center_mol_highlight_bonds=bond_matches, # highlight the bonds in the center molecule
ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
align=None, # align all the molecules to the center molecule
legendFontSize=16 # rdkit drawing options
)