Generating Conformers¶
A drug-like molecule can exist in a variety of diverse 3D shapes depending on the number of rotatable bonds, bond order, torsion, and in general, its degree of freedom. Each individual 3D spatial arrangement of a molecule is defined as a conformer and each conformer may have different properties (e.g. relative energy). This is why the sampling of the conformational space, often referred to as conformational search, is a key step to understand the 3D properties of a given compound. You must factor in all the possible conformers and their respective properties in order to achieve the best representation of a molecule. It is a necessary step in any virtual screening campaign.
Note: You can see a good visualization of how the relative energy of conformers changes based on manual manipulation of bond angles here.
A common term that you will see throughout this example is RMSD which stands for root-mean-square deviation. RMSD is widely used as a similarity measure when analyzing conformations: the smaller the RMSD between two conformers, the more similar in 3D spatial arrangement they are. Once conformers are generated, they are usually pruned on RMSD, meaning, structures that are redundant and essentially correspond to the same conformation are removed from the list.
Note: RMSD is not the only measure of conformer similarity, and it does have its limitations. If you’re interested in learning more about all the various ways in which chemical structural similarity can be measured, read more here.
How are conformers generated?¶
The current default RDKit method used to generate conformers leverages various versions of experimental-torsion distance geometry with additional basic knowledge (ETKDG) created by Riniker and Landrum. From the RDKit book the default algorithm followed is:
- The molecule’s distance bounds matrix is calculated based on the connection table and a set of rules.
- The bounds matrix is smoothed using a triangle-bounds smoothing algorithm.
- A random distance matrix that satisfies the bound's matrix is generated.
- This distance matrix is embedded in 3D dimensions (producing coordinates for each atom).
- The resulting coordinates are cleaned up somewhat using the “distance geometry force field”, based on distance constraints from the bounds matrix.
The first 5 steps describe the “ETDG” approach. The additional “K” in ETKDG just defines further constraints from chemical knowledge such as “aromatic rings are to be flat or bonds connected to triple bonds are to be collinear”. These additional constraints introduce a certain level of “chemical awareness” that helps generate correct conformers which are chemically and physically valid. Read more here, here and here.
Tutorial¶
Now let’s start with a tutorial on how you would go about generating conformers via RDKit.
RDKit Example¶
Below is an example of how you would go about generating conformers in RDKit.
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdMolAlign
from rdkit.Chem import rdMolDescriptors
from rdkit.Chem import rdMolTransforms
from rdkit.Chem import rdForceFieldHelpers
from rdkit.Chem import PyMol
import copy
import numpy as np
m = Chem.MolFromSmiles("O=C(C)Oc1ccccc1C(=O)O")
m
m2 = Chem.AddHs(m)
m2
AllChem.EmbedMolecule(m2)
0
m2
rotatable_bonds = Chem.rdMolDescriptors.CalcNumRotatableBonds(m2)
rotatable_bonds
3
# Setup the parameters for the embedding
params = getattr(rdDistGeom, "ETDG")()
params.randomSeed = 0
params.enforceChirality = True
params.useRandomCoords = True
params.numThreads = 1
# Embed conformers
confs = rdDistGeom.EmbedMultipleConfs(m2, numConfs=50, params=params)
len(confs)
50
# Minimize energy
energy_iterations = 200
results = rdForceFieldHelpers.UFFOptimizeMoleculeConfs(m2, maxIters=energy_iterations)
energies = [energy for _, energy in results]
energies = []
for conf in m2.GetConformers():
ff = rdForceFieldHelpers.UFFGetMoleculeForceField(m2, confId=conf.GetId())
energies.append(ff.CalcEnergy())
energies = np.array(energies)
# Add the energy as a property to each conformer
[conf.SetDoubleProp("rdkit_uff_energy", energy) for energy, conf in zip(energies, m2.GetConformers())]
# Now we reorder conformers according to their energies,
# so the lowest energies conformers are first.
mol_clone = copy.deepcopy(m2)
ordered_conformers = [
conf for _, conf in sorted(zip(energies, mol_clone.GetConformers()), key=lambda x: x[0])
]
m2.RemoveAllConformers()
[m2.AddConformer(conf, assignId=True) for conf in ordered_conformers]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
# Align conformers to each other
rdMolAlign.AlignMolConformers(m2)
m2
As a beginner, this can be a bit overwhelming and complicated to get the hang of. Let’s see how this would look in Datamol
Datamol Example¶
import datamol as dm
smiles = "O=C(C)Oc1ccccc1C(=O)O"
mol = dm.to_mol(smiles)
mol
# generate conformers
mol = dm.conformers.generate(mol, align_conformers=True)
mol
# Get all conformers as a list
conformers = mol.GetConformers()
len(conformers)
50
# Get the 3D atom positions of the first conformer
positions = mol.GetConformer(0).GetPositions()
positions
array([[ 2.03936006, -1.45912965, -0.53172218], [ 1.9925788 , -1.56446348, 0.71921383], [ 3.17465024, -2.13438585, 1.41003403], [ 0.83770987, -1.13908313, 1.35789793], [-0.24737227, -0.60904676, 0.65495089], [-0.31723469, 0.74706314, 0.41869327], [-1.37632791, 1.30134851, -0.27275517], [-2.40159499, 0.50563021, -0.74962797], [-2.35444012, -0.85472732, -0.52586563], [-1.27290921, -1.39633958, 0.17670332], [-1.21424131, -2.82512313, 0.41699016], [-0.25080752, -3.31946086, 1.04455633], [-2.22014558, -3.68260765, -0.03978946]])
# If minimization has been enabled (default to True)
# you can access the computed energy.
conf = mol.GetConformer(0)
props = conf.GetPropsAsDict()
print(props)
{'rdkit_UFF_energy': 35.39391759550579, 'rdkit_UFF_delta_energy': 0.0}
# Compute RMSD
rmsd = dm.conformers.rmsd(mol)
rmsd
array([[6.61254163e-08, 1.01515980e+00, 1.01196417e+00, ..., 5.56023152e-02, 1.00244193e+00, 6.36410194e-02], [1.01515980e+00, 4.67577303e-08, 3.61762165e-02, ..., 1.02178695e+00, 5.29240969e-02, 1.01919051e+00], [1.01196417e+00, 3.61762165e-02, 4.67577303e-08, ..., 1.01924556e+00, 7.12401285e-02, 1.01739384e+00], ..., [5.56023152e-02, 1.02178695e+00, 1.01924556e+00, ..., 0.00000000e+00, 1.00778322e+00, 3.87585886e-02], [1.00244193e+00, 5.29240969e-02, 7.12401285e-02, ..., 1.00778322e+00, 0.00000000e+00, 1.00438156e+00], [6.36410194e-02, 1.01919051e+00, 1.01739384e+00, ..., 3.87585886e-02, 1.00438156e+00, 0.00000000e+00]])
rmsd.shape
(50, 50)
In essentially one line of code, you can generate a list of conformers. What’s important to understand are some of the key parameters that are factored into this process. In general, sticking with the defaults in Datamol will suffice in most cases, but if you want to make specific modifications, you can. If you’re interested in learning more about all the algorithms underlying conformer generation, read this.
A few parameters to highlight:
- n_confs - Specifying the number of conformers to generate. This is based on the number of rotatable bonds and, by default, this is set to 200 if there are more than 8 rotatable bonds and 50 if there are less than 8. Theoretically, there are an unlimited number of conformers that can be derived from a single rotatable bond, however, in practice, not all the conformer structures make sense since only “stable” conformers are relevant in this context. This is why the defaults are set in place. Hypothetically, if you only have 2 rotatable bonds and you set n_confs to 2,000,000, not only will this be computationally expensive but a lot of the conformers generated will start to have non-relevant structures that are not useful.
- add_hs - By default, hydrogen atoms are added before embedding because it is critical to generating high-quality 3D conformations.
- minimize_energy - Minimizing energy releases the strain of the generated conformation to the closest local minima enabling you to find a more relevant conformation. In other words, finding the conformer that is most likely to exist. There are multiple force fields that you can apply.
- method - Within the ETKDG method, there are various versions that can be selected to generate conformers.
- energy_iterations - This option allows you to specify how many iterations of conformer generation you want to go through if you have enabled energy minimization. In general, the more iterations you specify, the more accurate the conformers are. However, there is a trade-off between the number of iterations and computation speed. Running through 1000 iterations will be significantly more expensive computationally as opposed to 100 iterations.
- rms_cutoff is the max RMSD value for which two conformers are considered to be the same.