Fragment and Scaffold
In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
import datamol as dm
dm.disable_rdkit_log()
Fragmentation¶
The fragmentation methods implemented in datamol
will return the fragment set coverage of a molecule, as opposed to a break down of the molecules into non-overlapping blocks.
In the following, let's fragment a molecule using multiple methods.
In [3]:
smiles = "CCCOCc1cc(c2ncccc2)ccc1"
mol = dm.to_mol(smiles)
mol
Out[3]:
BRICS¶
In [4]:
frags = dm.fragment.brics(mol)
dm.viz.to_image(frags, n_cols=6)
Out[4]:
FraggleSim¶
In [5]:
frags = dm.fragment.frag(mol)
dm.viz.to_image(frags, n_cols=6)
Out[5]:
Recap¶
In [6]:
frags = dm.fragment.recap(mol)
dm.viz.to_image(frags, n_cols=6)
Out[6]:
Any break¶
This method uses BRICS first and fallback to generating all possible fragmentation if it doesn't work.
In [7]:
frags = dm.fragment.anybreak(mol)
dm.viz.to_image(frags, n_cols=6)
Out[7]:
Scaffold¶
Get the scaffolds and attachment points from a list of molecules to allow creating molecular series.
In [8]:
# Get some mols
data = dm.data.freesolv()
smiles = data["smiles"].iloc[:].tolist()
mols = [dm.to_mol(s) for s in smiles]
scaffolds, scf2infos, scf2groups = dm.scaffold.fuzzy_scaffolding(mols)
list(scaffolds)[:4]
Out[8]:
['c1cc2c([*:4])c([*:3])c([*:2])cc2cc1[*:5]', 'C(=C1c2ccccc2CCc2ccccc21)[*:1]', 'c1c([*:7])c2c(c([*:8])c1[*:9])Cc1c(c([*:1])c([*:2])c([*:3])c1[*:4])C2', 'CCc1cc([*:5])cc2cc([*:2])c([*:3])c([*:4])c12']
In [9]:
sfs = [dm.to_mol(s) for s in list(scaffolds)]
dm.viz.to_image(sfs, n_cols=6)
Out[9]:
Assembling¶
Assemble fragments to create new molecules.
In [10]:
# Get the fragment set of a molecule
smiles = "CCCOCc1cc(c2ncccc2)ccc1"
mol = dm.to_mol(smiles)
frags = dm.fragment.brics(mol)
# Limit the number of fragments to work with because
# assembling is computationally intensive.
frags = frags[:3]
# Assemble 8 molecules from the list of fragments
mols = list(dm.fragment.assemble_fragment_order(frags, max_n_mols=8))
dm.viz.to_image(mols)
Out[10]:
Decomposition¶
It's also possible to break a molecule based on a set of chemical transformation and gets the non-overlapping fragments and how they are linked
In [11]:
dm.fragment.break_mol(mol, randomize=False, mode="brics", returnTree=True)
# returns fragments, fragments + intermediate decomposition, decomposition tree
Out[11]:
(['CCC', 'O', 'C', 'c1ccncc1', 'c1ccccc1'], {'C', 'CCC', 'CCCOCc1cccc(-c2ccccn2)c1', 'Cc1cccc(-c2ccccn2)c1', 'O', 'OCc1cccc(-c2ccccn2)c1', 'c1ccc(-c2ccccn2)cc1', 'c1ccccc1', 'c1ccncc1'}, <networkx.classes.digraph.DiGraph at 0x7fce280d0c90>)