One of the most important things enabling us to have success in drug discovery is understanding structure-activity relationship (SAR). This is a foundation that explains how the structure of a molecule relates to its biological activity. Here is an excerpt from CDDVault that explains the importance of SAR in drug discovery:
”SAR depends on the recognition of which structural characteristics correlate with chemical and biological reactivity. Thus the ability to draw conclusions about an unknown compound depends upon both the structural features that can be characterized as well as the database of molecules against which they are compared. When combined with appropriate professional judgment, SAR can be a powerful tool for understanding functional implications when similarities are found. For example, in the case of risk assessment of uncharacterized compounds, data from the most sensitive toxicological endpoints should be included in the analysis, such as carcinogenicity or cardiotoxicity.”
This is where aligning molecules can be useful. Given the importance of structure, aligning molecules is generally useful for a few things:
- Aligning molecules makes it easier to visualize and highlight substructures
- It can help you identify and understand any shared structures between drug candidates
- It can help you identify any potential activity patterns
- For example, when analyzing activity cliffs - “defined as pairs or groups of structurally similar compounds that are active against the same target but have large differences in potency. Activity cliffs capture chemical modifications that strongly influence biological activity”. An example of some compounds with activity cliffs and their structural differences highlighted are shown below:
Note: The concept of an activity cliff seems simple in this example, however, in practice it is extremely complicated to represent them computationally and derive a systematic approach for identification. Read more about the evolving concept of activity cliffs here.
As an example to show you how alignment makes visualization easier, please see the image below. The alignment of this sample dataset makes it extremely easy to identify the core structure of 3 rings which are common in all compounds.
This tutorial will show you how to:
- Partition a list of molecules into clusters sharing a common scaffold of a common core, then align the molecules to that common core.
- Note: this function will compute the list of smiles/smarts representative of each cluster first.
- Aligning molecules according to a template molecule
import datamol as dm data = dm.data.cdk2()[13:15] smiles = data["smiles"].iloc[:].tolist() mols = [dm.to_mol(s) for s in smiles] dm.to_image(mols)
aligned_list = dm.align.auto_align_many(mols) dm.to_image(aligned_list)
# Align the compounds within each cluster in a set of compounds data = dm.data.cdk2()[0:24] smiles = data["smiles"].iloc[:].tolist() mols = [dm.to_mol(s) for s in smiles] aligned_list = dm.align.auto_align_many(mols, partition_method="cluster") dm.to_image(aligned_list, mol_size=(200, 150))
# You can also align a compound to a template compound # In this example, we want to align the compound mols with the compound mols mols
mol_aligned = dm.align.template_align(mols, template=mols) mol_aligned