Fragment and Scaffold

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

import datamol as dm
dm.disable_rdkit_log()

Fragmentation¶

The fragmentation methods implemented in datamol will return the fragment set coverage of a molecule, as opposed to a break down of the molecules into non-overlapping blocks.

In the following, let's fragment a molecule using multiple methods.

In [3]:

smiles = "CCCOCc1cc(c2ncccc2)ccc1"
mol = dm.to_mol(smiles)
mol

Out[3]:

BRICS¶

In [4]:

frags = dm.fragment.brics(mol)
dm.viz.to_image(frags, n_cols=6)

Out[4]:

FraggleSim¶

In [5]:

frags = dm.fragment.frag(mol)
dm.viz.to_image(frags, n_cols=6)

Out[5]:

Recap¶

In [6]:

frags = dm.fragment.recap(mol)
dm.viz.to_image(frags, n_cols=6)

Out[6]:

Any break¶

This method uses BRICS first and fallback to generating all possible fragmentation if it doesn't work.

In [7]:

frags = dm.fragment.anybreak(mol)
dm.viz.to_image(frags, n_cols=6)

Out[7]:

Scaffold¶

Get the scaffolds and attachment points from a list of molecules to allow creating molecular series.

In [8]:

# Get some mols
data = dm.data.freesolv()
smiles = data["smiles"].iloc[:].tolist()
mols = [dm.to_mol(s) for s in smiles]

scaffolds, scf2infos, scf2groups = dm.scaffold.fuzzy_scaffolding(mols)
list(scaffolds)[:4]

Out[8]:

['c1cc2c([*:4])c([*:3])c([*:2])cc2cc1[*:5]',
 'C(=C1c2ccccc2CCc2ccccc21)[*:1]',
 'c1c([*:7])c2c(c([*:8])c1[*:9])Cc1c(c([*:1])c([*:2])c([*:3])c1[*:4])C2',
 'CCc1cc([*:5])cc2cc([*:2])c([*:3])c([*:4])c12']

In [9]:

sfs = [dm.to_mol(s) for s in list(scaffolds)]
dm.viz.to_image(sfs, n_cols=6)

Out[9]:

Assembling¶

Assemble fragments to create new molecules.

In [10]:

# Get the fragment set of a molecule
smiles = "CCCOCc1cc(c2ncccc2)ccc1"
mol = dm.to_mol(smiles)
frags = dm.fragment.brics(mol)

# Limit the number of fragments to work with because
# assembling is computationally intensive.
frags = frags[:3]

# Assemble 8 molecules from the list of fragments
mols = list(dm.fragment.assemble_fragment_order(frags, max_n_mols=8))

dm.viz.to_image(mols)

Out[10]:

Decomposition¶

It's also possible to break a molecule based on a set of chemical transformation and gets the non-overlapping fragments and how they are linked

In [11]:

dm.fragment.break_mol(mol, randomize=False, mode="brics", returnTree=True) 
# returns fragments, fragments + intermediate decomposition, decomposition tree

Out[11]:

(['CCC', 'O', 'C', 'c1ccncc1', 'c1ccccc1'],
 {'C',
  'CCC',
  'CCCOCc1cccc(-c2ccccn2)c1',
  'Cc1cccc(-c2ccccn2)c1',
  'O',
  'OCc1cccc(-c2ccccn2)c1',
  'c1ccc(-c2ccccn2)cc1',
  'c1ccccc1',
  'c1ccncc1'},
 <networkx.classes.digraph.DiGraph at 0x7fce280d0c90>)