Visualization

This tutorial will highlight the major viz related features of Datamol.

In [1]:

Copied!

import datamol as dm
import datamol as dm

First let's get a dataset.

In [2]:

Copied!





data = dm.read_csv(
    "https://raw.githubusercontent.com/rdkit/rdkit/master/Data/NCI/first_200.tpsa.csv",
    comment="#",
    header=None,
)
data.columns = ["smiles", "tpsa"]

# Create a mol column
with dm.without_rdkit_log():
    data["mol"] = data["smiles"].apply(dm.to_mol)

# Patch the dataframe to render the molecules in it
dm.render_mol_df(data)

data.iloc[0]["mol"]
data = dm.read_csv(
    "https://raw.githubusercontent.com/rdkit/rdkit/master/Data/NCI/first_200.tpsa.csv",
    comment="#",
    header=None,
)
data.columns = ["smiles", "tpsa"]

# Create a mol column
with dm.without_rdkit_log():
    data["mol"] = data["smiles"].apply(dm.to_mol)

# Patch the dataframe to render the molecules in it
dm.render_mol_df(data)

data.iloc[0]["mol"]

Out[2]:

No description has been provided for this image

Now let's cluster the molecules and only keep the first cluster.

In [3]:

Copied!

cluster_indices, cluster_mols = dm.cluster_mols(data["mol"].dropna().tolist(), cutoff=0.7)
mols = cluster_mols[1]
cluster_indices, cluster_mols = dm.cluster_mols(data["mol"].dropna().tolist(), cutoff=0.7)
mols = cluster_mols[1]

Display the molecules of the cluster while aligning then using MCS. This can be done using a simple boolean flag in dm.to_image().

In [4]:

Copied!

dm.to_image(mols, mol_size=(300, 200), align=True)
dm.to_image(mols, mol_size=(300, 200), align=True)

Out[4]:

Lasso Highlighting¶

The code below will show how the lasso highlight function should be used. The signature for this function is:

def lasso_highlight_image(
    target_molecule: Union[str, dm.Mol],
    search_molecules: Union[str, List[str], dm.Mol, List[dm.Mol]],
     mol_size: Optional[Tuple[int, int]] = (300, 300)
    ) -> Image:

The mol_size is the size of the image returned and the target molecule is accepted in the smiles format or mol object and the substructure search as smarts string or mol object.

It is quite difficult to test the production of images so the edge cases will be entered here with a brief description of each.

An edge case is that you can only search for up to 6 substructures unless more colors are added to the code.

In [5]:

Copied!

import datamol as dm

smi = "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O"
smarts_list = ["[#6](=O)[#7]", "C(=O)", "NC(=O)CCC(=O)O"]

dm.lasso_highlight_image(smi, smarts_list, mol_size=(500, 400), use_svg=False)
import datamol as dm

smi = "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O"
smarts_list = ["[#6](=O)[#7]", "C(=O)", "NC(=O)CCC(=O)O"]

dm.lasso_highlight_image(smi, smarts_list, mol_size=(500, 400), use_svg=False)

Out[5]:

Alternatively you may only have one substructure in mind or use a list of molecules as inputs

In [6]:

Copied!

smis = ["CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]", "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O", "CC(N)Cc1c[nH]c2ccc3c(c12)CCCO3", "c1ccccc1"]
smarts_list = ["[#6](-,=[O;H0])", "[a]1[a][c][c][c][a]1"]

dm.lasso_highlight_image(smis, smarts_list, draw_mols_same_scale=True, legends=["Mol1", "Mol2", "Mol3", "Mol4"], use_svg=True)
smis = ["CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]", "C1=CN=C2C(=N1)C(=O)NC(=N2)NC(=O)CCC(=O)O", "CC(N)Cc1c[nH]c2ccc3c(c12)CCCO3", "c1ccccc1"]
smarts_list = ["[#6](-,=[O;H0])", "[a]1[a][c][c][c][a]1"]

dm.lasso_highlight_image(smis, smarts_list, draw_mols_same_scale=True, legends=["Mol1", "Mol2", "Mol3", "Mol4"], use_svg=True)

Out[6]:

CircleGrid¶

CircleGrid allows the visualization of a set of molecules using several concentric rings around a center structure. This representation was designed to:

highlight neighborhood of an input molecule at increasing distance
show molecules derived from a starting molecule or core

In [7]:

Copied!





import datamol as dm
import pandas as pd
import json

buff = '{"SMILES":{"4":"CC(Cc1c[nH]c2c1c3c(cc2)OCCC3)N","7":"COC1Cc2ccccc2C3(O1)CCN(CC3)Cc4ccccc4","9":"Cc1c(c(=O)n2ccccc2n1)CCN3CCc4c(c5cccc(c5o4)Cl)C3","15":"Cc1c(c(=O)n2cc(c(cc2n1)OC)OC)CCN3CCc4c(c5ccccc5o4)C3","17":"Cc1c(c(=O)n2ccccc2n1)CCCN3CCc4c(c5ccccc5o4)C3","20":"Cc1c(c(=O)n2cc(cc(c2n1)Cl)Cl)CCN3CCc4c(c5ccccc5o4)C3","27":"Cc1ccc2nc(c(c(=O)n2c1)CCN3CCc4c(c5ccccc5o4)C3)C","28":"Cc1cccc2n1c(=O)c(c(n2)C)CCN3CCc4c(c5ccccc5o4)C3","36":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)Br","38":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C(F)(F)F","40":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C#N"},"Name":{"4":"CHEMBL133455","7":"CHEMBL141209","9":"CHEMBL164612","15":"CHEMBL162058","17":"CHEMBL165776","20":"CHEMBL163247","27":"CHEMBL162370","28":"CHEMBL163190","36":"CHEMBL359510","38":"CHEMBL181632","40":"CHEMBL181099"},"pIC50":{"4":4.72,"7":5.52,"9":9.05,"15":8.45,"17":7.99,"20":9.23,"27":9.46,"28":9.54,"36":7.2,"38":7.22,"40":6.82}}'
df = pd.DataFrame(json.loads(buff))
df['mol'] = df.SMILES.apply(dm.to_mol)
import datamol as dm
import pandas as pd
import json

buff = '{"SMILES":{"4":"CC(Cc1c[nH]c2c1c3c(cc2)OCCC3)N","7":"COC1Cc2ccccc2C3(O1)CCN(CC3)Cc4ccccc4","9":"Cc1c(c(=O)n2ccccc2n1)CCN3CCc4c(c5cccc(c5o4)Cl)C3","15":"Cc1c(c(=O)n2cc(c(cc2n1)OC)OC)CCN3CCc4c(c5ccccc5o4)C3","17":"Cc1c(c(=O)n2ccccc2n1)CCCN3CCc4c(c5ccccc5o4)C3","20":"Cc1c(c(=O)n2cc(cc(c2n1)Cl)Cl)CCN3CCc4c(c5ccccc5o4)C3","27":"Cc1ccc2nc(c(c(=O)n2c1)CCN3CCc4c(c5ccccc5o4)C3)C","28":"Cc1cccc2n1c(=O)c(c(n2)C)CCN3CCc4c(c5ccccc5o4)C3","36":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)Br","38":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C(F)(F)F","40":"CN(C)c1c2ccccc2nc(n1)NC3CCC(CC3)NC(=O)c4cccc(c4)C#N"},"Name":{"4":"CHEMBL133455","7":"CHEMBL141209","9":"CHEMBL164612","15":"CHEMBL162058","17":"CHEMBL165776","20":"CHEMBL163247","27":"CHEMBL162370","28":"CHEMBL163190","36":"CHEMBL359510","38":"CHEMBL181632","40":"CHEMBL181099"},"pIC50":{"4":4.72,"7":5.52,"9":9.05,"15":8.45,"17":7.99,"20":9.23,"27":9.46,"28":9.54,"36":7.2,"38":7.22,"40":6.82}}'
df = pd.DataFrame(json.loads(buff))
df['mol'] = df.SMILES.apply(dm.to_mol)

In [8]:

Copied!

df.head()
df.head()

Out[8]:

	SMILES	Name	pIC50	mol
4	CC(Cc1c[nH]c2c1c3c(cc2)OCCC3)N	CHEMBL133455	4.72	<rdkit.Chem.rdchem.Mol object at 0x1674e5970>
7	COC1Cc2ccccc2C3(O1)CCN(CC3)Cc4ccccc4	CHEMBL141209	5.52	<rdkit.Chem.rdchem.Mol object at 0x1674e59e0>
9	Cc1c(c(=O)n2ccccc2n1)CCN3CCc4c(c5cccc(c5o4)Cl)C3	CHEMBL164612	9.05	<rdkit.Chem.rdchem.Mol object at 0x1674e5a50>
15	Cc1c(c(=O)n2cc(c(cc2n1)OC)OC)CCN3CCc4c(c5ccccc...	CHEMBL162058	8.45	<rdkit.Chem.rdchem.Mol object at 0x1674e5ac0>
17	Cc1c(c(=O)n2ccccc2n1)CCCN3CCc4c(c5ccccc5o4)C3	CHEMBL165776	7.99	<rdkit.Chem.rdchem.Mol object at 0x1674e5b30>

The circle grid function takes as input a center molecule and a list of list of molecules. Each list corresponds to all molecules at the given ring level.

See figure below:

Note that because we are using the FlexiMode of the new RDKit drawing framework, we cannot control the size of the molecule rendering directly. Instead, we can scale up and down the size of the ring molecules, with respect to the center molecule

In [11]:

Copied!





# let's define the activity dict
# We map to each molecule (or molecule id) a dictionary of properties
activity_dict = {}
for mol, pIC50 in df[["mol","pIC50"]].values:
    activity_dict[mol] = {"pIC50" : pIC50, "cLogP" : dm.descriptors.clogp(mol)}
# let's define the activity dict
# We map to each molecule (or molecule id) a dictionary of properties
activity_dict = {}
for mol, pIC50 in df[["mol","pIC50"]].values:
    activity_dict[mol] = {"pIC50" : pIC50, "cLogP" : dm.descriptors.clogp(mol)}

In [13]:

Copied!





# let's put mol 0 in the center
center_mol = df.mol.values[0]
# let's put mol 1-3 in the first ring
# let's put mol 4-8 in the second ring
ring_mols = [df.mol.values[1:4], df.mol.values[4:9]]
# let's put mol 0 in the center
center_mol = df.mol.values[0]
# let's put mol 1-3 in the first ring
# let's put mol 4-8 in the second ring
ring_mols = [df.mol.values[1:4], df.mol.values[4:9]]

In [28]:

Copied!





out = dm.viz.circle_grid(center_mol, 
                        ring_mols,
                        act_mapper=activity_dict, 
                        legend="My Beautiful Circle Grid", 
                        ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
                        margin=50,  # set a reasonable margin
                        ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
                        align=None, # align all the molecules to the center molecule
                        legendFontSize=16 # rdkit drawing options
)
out = dm.viz.circle_grid(center_mol, 
                        ring_mols,
                        act_mapper=activity_dict, 
                        legend="My Beautiful Circle Grid", 
                        ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
                        margin=50,  # set a reasonable margin
                        ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
                        align=None, # align all the molecules to the center molecule
                        legendFontSize=16 # rdkit drawing options
)

In [29]:

Copied!

out
out

Out[29]:

We can also highlight some atoms for either the ring molecules or the center molecules.

In [35]:

Copied!





my_query = dm.from_smarts("CCN")
atom_matches, bond_matches = dm.substructure_matching_bonds(center_mol, my_query)
# we must flatten the list of lists
bond_matches = [item for sublist in bond_matches for item in sublist]
atom_matches = [item for sublist in atom_matches for item in sublist]
my_query = dm.from_smarts("CCN")
atom_matches, bond_matches = dm.substructure_matching_bonds(center_mol, my_query)
# we must flatten the list of lists
bond_matches = [item for sublist in bond_matches for item in sublist]
atom_matches = [item for sublist in atom_matches for item in sublist]

In [36]:

Copied!





dm.viz.circle_grid(center_mol, 
                        ring_mols,
                        act_mapper=activity_dict, 
                        legend="My Beautiful Circle Grid", 
                        ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
                        margin=50,  # set a reasonable margin
                        center_mol_highlight_atoms=atom_matches, # highlight the atoms in the center molecule
                        center_mol_highlight_bonds=bond_matches, # highlight the bonds in the center molecule
                        ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
                        align=None, # align all the molecules to the center molecule
                        legendFontSize=16 # rdkit drawing options
)
dm.viz.circle_grid(center_mol, 
                        ring_mols,
                        act_mapper=activity_dict, 
                        legend="My Beautiful Circle Grid", 
                        ring_color=(0.8, 0.8, 0.8, 0.5), # set to None to remove the ring
                        margin=50,  # set a reasonable margin
                        center_mol_highlight_atoms=atom_matches, # highlight the atoms in the center molecule
                        center_mol_highlight_bonds=bond_matches, # highlight the bonds in the center molecule
                        ring_scaler=0.7, # scale down the size of molecule in the ring compared to the center molecule
                        align=None, # align all the molecules to the center molecule
                        legendFontSize=16 # rdkit drawing options
)

Out[36]: