Visualization

This tutorial will highligh the major viz related features of Datamol.

In [1]:

                
                    Copied!
                    
import datamol as dm
import datamol as dm

First let's get a dataset.

In [20]:

                
                    Copied!
                    
                        
                        
                    
                    

            
data = dm.read_csv(
    "https://raw.githubusercontent.com/rdkit/rdkit/master/Data/NCI/first_200.tpsa.csv",
    comment="#",
    header=None,
)
data.columns = ["smiles", "tpsa"]

# Create a mol column
with dm.without_rdkit_log():
    data["mol"] = data["smiles"].apply(dm.to_mol)

# Patch the dataframe to render the molecules in it
dm.render_mol_df(data)

data.iloc[0]["mol"]
data = dm.read_csv(
    "https://raw.githubusercontent.com/rdkit/rdkit/master/Data/NCI/first_200.tpsa.csv",
    comment="#",
    header=None,
)
data.columns = ["smiles", "tpsa"]

# Create a mol column
with dm.without_rdkit_log():
    data["mol"] = data["smiles"].apply(dm.to_mol)

# Patch the dataframe to render the molecules in it
dm.render_mol_df(data)

data.iloc[0]["mol"]

Out[20]:

Now let's cluster the molecules and only keep the first cluster.

In [21]:

                
                    Copied!
                    
cluster_indices, cluster_mols = dm.cluster_mols(data["mol"].dropna().tolist(), cutoff=0.7)
mols = cluster_mols[1]
cluster_indices, cluster_mols = dm.cluster_mols(data["mol"].dropna().tolist(), cutoff=0.7)
mols = cluster_mols[1]

Display the molecules of the cluster while aligning then using MCS. This can be done using a simple boolean flag in dm.to_image().

In [22]:

                
                    Copied!
                    
dm.to_image(mols, mol_size=(300, 200), align=True, use_svg=False)
dm.to_image(mols, mol_size=(300, 200), align=True, use_svg=False)

Out[22]:

Lasso Highlighting¶

The code below will show how the lasso highlight function should be used. The signature for this function is

def lasso_highlight_image(
    target_molecule: Union[str, dm.Mol],
    search_molecules: Union[str, List[str], dm.Mol, List[dm.Mol]],
     mol_size: Optional[Tuple[int, int]] = (300, 300)
    ) -> Image:

The mol_size is the size of the image returned and the target molecule is accepted in the smiles format or mol object and the substructure search as smarts string or mol object.

It is quite difficult to test the production of images so the edge cases will be entered here with a brief description of each.

An edge case is that you can only search for up to 6 substructures unless more colors are added to the code.

In [6]:

                
                    Copied!
                    
import datamol as dm

smi = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
smarts_list = ["CONN", "N#CC~CO", "C=CON", "CONNCN"]

dm.lasso_highlight_image(smi, smarts_list, (400, 400), use_svg=True)
import datamol as dm

smi = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
smarts_list = ["CONN", "N#CC~CO", "C=CON", "CONNCN"]

dm.lasso_highlight_image(smi, smarts_list, (400, 400), use_svg=True)

Out[6]:

Alternatively you may only have one substructure in mind

In [7]:

                
                    Copied!
                    
smi = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
smarts_list = "CONN"

dm.lasso_highlight_image(smi, smarts_list, (300, 300))
smi = "CO[C@@H](O)C1=C(O[C@H](F)Cl)C(C#N)=C1ONNC[NH3+]"
smarts_list = "CONN"

dm.lasso_highlight_image(smi, smarts_list, (300, 300))

Out[7]:

In [ ]: