Nuclear Magnetic Resonance (NMR) Spectra
Background
Experimental NMR readings average over times ranging from microseconds to milliseconds. At finite temperature, the microscopic view of this temperature is energy stored in motions such as vibrations and rotations. Over this time period, molecules act as a statistical ensemble representing all possible modes of motion. In contast, single point DFT calculations with fixed nuclei do not cover any temperature effects, a choice which underestimates average bond lengths and does not capture any molecular motions. Even accounting for molecular motions and temperature effects, experimental NMR tends to produce a smaller number of well-defined peaks than theoretical calculations. In the experimental ensemble, within a symmetry group such as a benzene ring, protons may occasionally be exchanged, meaning the response from one proton in such a symmetry-related group of nuclei is expressed as the average of all possible positions within the symmetry group. In any given system these various factors alongside bonding between neighboring molecules, solvent effects, relativistic effects, and a variety of other approximations become more or less important in computing representative NMR spectra. This tutorial will provide a brief guide to producing an NMR spectrum and iterative improvements to refine this spectrum.
Tutorial
In this tutorial, we will recreate the NMR spectrum of ethanol (CH3CH2OH). This spectrum is famous for its J-couplings. To compute these J-couplings, see the J-Couplings tutorial.
Figure 1: Experimental 1H NMR Spectrum for CH3CH2OH (Data from Chemical BooK)
Step 1: Setup
First, the experimental geometry of ethanol was found on PubChem and translated to the geometry.in
file format for FHIaims as follows:
atom -1.1712 0.2997 0.0000 O
atom -0.0463 -0.5665 0.0000 C
atom 1.2175 0.2668 0.0000 C
atom -0.0958 -1.2120 0.8819 H
atom -0.0952 -1.1938 -0.8946 H
atom 2.1050 -0.3720 -0.0177 H
atom 1.2426 0.9307 -0.8704 H
atom 1.2616 0.9052 0.8886 H
atom -1.1291 0.8364 0.8099 H
Figure 2: Input Ethanol Geometry (as visualized with JMOL)
In the present simulation, this structure was optimized (i.e., its total energy was minimized) using the following control.in
file:
xc pbe
vdw_correction_hirshfeld
relativistic atomic_zora scalar
relax_geometry bfgs 5e-3
spin none
+ ['tight' FHIaims species defaults for H, C, and O]
This calculation should converge quickly (23 seconds locally on my MacBook with an M1 processor). The resultant geometry.in.next_step
file was computed as:
atom -1.22782808 0.27220854 -0.00620805 O
atom -0.04782759 -0.53900672 0.01791495 C
atom 1.24070776 0.27281531 0.00036829 C
atom -0.06164303 -1.21911023 0.88941819 H
atom -0.11806789 -1.16608127 -0.88155034 H
atom 2.11685863 -0.39088742 -0.03356413 H
atom 1.27268434 0.93427021 -0.87530125 H
atom 1.33264898 0.89452226 0.90371533 H
atom -1.21843310 0.83576931 0.78290700 H
Step 2: Single-point NMR Calculations
With an optimized geometry, 1H shielding values can now be computed. See the Shieldings tutorial for a further explanation. The input geometry.in
and control.in
files read as follows:
atom -1.22782808 0.27220854 -0.00620805 O
atom -0.04782759 -0.53900672 0.01791495 C
atom 1.24070776 0.27281531 0.00036829 C
atom -0.06164303 -1.21911023 0.88941819 H
magnetic_response
atom -0.11806789 -1.16608127 -0.88155034 H
magnetic_response
atom 2.11685863 -0.39088742 -0.03356413 H
magnetic_response
atom 1.27268434 0.93427021 -0.87530125 H
magnetic_response
atom 1.33264898 0.89452226 0.90371533 H
magnetic_response
atom -1.21843310 0.83576931 0.78290700 H
magnetic_response
xc pbe
vdw_correction_hirshfeld
spin none
relativistic atomic_zora scalar
basis_threshold 0.0
magnetic_response s
+ ['NAO-VCC-5Z' FHIaims species defaults for H and 'tight' species defaults for C and O]
In experimental NMR, results in ppm are recorded as shifts with relation to a standard sample. In this case, the ethanol spectra was calibrated from a baseline of tetramethylsilane (TMS). In TMS, the computed 1H shielding with FHI-aims is ~31.03126 ppm so all results are computed as shifts from this value. From this computation, shielding values can be found in the human-readable aims.out
standard output. In Table 1, computed total isotropic shieldings and TMS-shifted readings can be found. Throughout the rest of this tutorial all results will be shifted by this TMS baseline.
Atom # | Computed Shielding | Shifted Shielding |
---|---|---|
4 | 27.10640 ppm | -3.92486 ppm |
5 | 27.53035 ppm | -3.50091 ppm |
6 | 29.94804 ppm | -1.08322 ppm |
7 | 29.84444 ppm | -1.18682 ppm |
8 | 30.26037 ppm | -0.77089 ppm |
9 | 31.22923 ppm | 0.19797 ppm |
The sign (positive or negative) of a shift is arbitrary based on how the sample is calibrated. These computed values are plotted but with signs inverted as compared to the experimental spectra.
Figure 3: Theoretical 1H NMR Spectra for CH3CH2OH using a single-point DFT calculation
One benefit of computed spectra is the ability to unambiguously assign peaks to atoms. In this figure red peaks represent atoms 4 and 5 in the methylene (CH2) group, green peaks represent atoms 6, 7, and 8 in the methyl group (CH3), and the blue peak represents atom 9 in the hydroxyl group (OH). Comparing Figure 3 to Figure 1, the location of the peaks originating from the methylene and methyl groups match the experimental result relatively closely while the location of the hydroxyl peak is highly unphysical and not placed at ~2.5ppm (-2.5ppm) where the experimental peak is found. This mismatch comes from a combination of molecular motions and bonding effects as described in the background section. For some highly symmetric molecules, this single point calculation spectra may be sufficient to explain experimental results, however for most systems, continuing to step 3 is necessary to recreate the experimental peaks.
Step 3a: Symmetry-averaged NMR Calculations
Steps (3a) and (3b) represent the next two refinements necessary to improve computed spectra past the single calculation baseline. See Step (3b) to see when it may be appropriate. Step (3a), symmetry averaging, refers to averaging the responses from different atoms which contribute to singular peaks in the experimental result. Over the time frame of an NMR reading, protons within a symmetry group (such as a methyl group, benezene ring, etc.) can be exchanged with one another. As a result, the response from any atom within this group often shows up in the experimental reading as the average of all positions despite the slightly differing bonding environment of each atom in a group. Within the methylene and methyl groups of ethanol, this does not seem to be an issue with our single point calculation. However, the incorrect location of the hydroxyl peak implies there likely is another possible bonding geometry for this hydroxyl group which must be averaged for the location to be correct. In this case, ethanol molecules hydrogen bond to one another, and from the single point calculation, we can hypothesize this bonding environment creates a different shielding response for the hydroxyl group which is active in the bond.
In order to test this hypothesis, we start by creating a geometry.in
file with two ethanol molecules H-bound together. Using similar settings to Step 1, this geometry was then optimized leading to the following geometry which will be used as an input to the NMR calculations:
# ethanol 1
atom 2.77209484 1.27496144 0.76590318 C
atom 2.05271538 2.14693788 1.77484692 C
atom 3.01275872 2.57700175 2.76531480 O
atom 2.06108696 0.89836842 0.01875799 H
atom 3.23985953 0.41360050 1.26045289 H
atom 3.55108458 1.84453733 0.24281152 H
atom 1.60663840 3.02439164 1.27681394 H
atom 1.24238263 1.58417252 2.26837102 H
atom 2.56316695 3.15838112 3.39729254 H
# ethanol 2
atom 7.47674108 3.45043398 0.68520271 C
atom 6.34954958 2.90110763 1.54092291 C
atom 5.25062008 3.80799938 1.48483984 O
atom 8.34599264 2.77881761 0.70831770 H
atom 7.79210166 4.43718973 1.05000286 H
atom 7.15217799 3.56189852 -0.35821422 H
atom 6.05042616 1.90017754 1.17396081 H
atom 6.69688771 2.77061256 2.58378972 H
atom 4.50851512 3.41301047 1.98821288 H
Figure 4: Input Double Ethanol Geometry (as visualized with JMOL)
From this optimized geometry we can see the hydroxyl group of one ethanol is involved in H-bonding while the hydroxyl group of the other ethanol, similar to the single ethanol case, is not. Now, using the analogous input files as Step 2, we can compute the 1H shieldings of each hydrogen in this expanded geometry. We start by plotting the raw results, without any symmetry averaging:
Figure 5: Theoretical 1H NMR Spectra for 2xCH3CH2OH using a single-point DFT calculation
This result is still unphysical and the blue, hydroxyl peaks are not in the correct location compared to experiment. However, notably the peak corresponding to the hydroxyl involved in bonding is ~-4.5ppm, very different from the unbound hydroxyl, as expected. We now can average across relevant symmetry groups to simulate the exchange of protons which occurs experimentally. This results in a new spectra:
Figure 6: Theoretical 1H NMR Spectra for 2xCH3CH2OH using a single-point DFT calculation
As compared to Figure 1, all peaks are now in the correct locations. In this case we performed symmetry averaging by including more than one molecule in the simulation. In many cases, this is unnecessary, and simply symmetry averaging within a molecule will be sufficient to achieve similar leaps in accuracy. As mentioned previously, the multiplicity of peaks in the experimental result comes from J-couplings which is beyond the scope of this tutorial.
Step 3b: Time-averaged NMR Calculations
Due to finite temperatures, all molecules experience some level of motion, whether simple (almost) harmonic oscillation of bonds, rotations, or more complicated motions. The motions in ethanol are relatively limited but as molecules increase in complexity, the impact of such motions on the shielding point calculations increase dramatically. For demonstration purposes, this tutorial will now demonstrate how to account for these molecular motions within ethanol and this approach can be applied to any molecule being studied.
For this section, use the same starting geometry for Step (3a), the geometry with two ethanol molecules, which represents the minimal necessary variations in bonding environments. The most straightforward way to account for molecular motions at finite temperature is a molecular dynamics simulation. Depending on the system studied, many different choices can be made, but for this system, the following control.in
file is sufficient:
xc pbe
vdw_correction_hirshfeld
relativistic atomic_zora scalar
MD_run 10.0 NVT_parrinello 300 0.5
MD_time_step 0.001
MD_MB_init 300
MD_restart .true.
output_level MD_light
+ ['tight' FHIaims species defaults for H, C, and O]
Particularly, this runs a NVT molecular dynamics ensemble using the stochastic velocity rescaling thermostat of Bussi, Donadio and Parrinello (BDP), set to 300K over 10ps with 1.0 fs steps. The characteristic collision time of the thermostat is set to 0.5 ps - not long, but hopefully long enough to not impact the overall dynamics unduly compared to the physical system. ("Thermostats" do not exist in experiment. In computational simulations, they are added as artificial terms to Newton's equations but they must not strongly impact the dynamics to yield physically reasonable results. For the BDP thermostat, a longer collision time will lead to results that are closer to physically correct dynamics, but at the price of a longer initial equilibration period that does not yet reflect the thermalized atomic motions.)
Due to the small size of ethanol, the use of tight species defaults is possible here. In many cases for similar time scales, intermediate or light species defaults will provide sufficient accuracy while allowing a long enough overall trajectory to properly average over the molecular motion. Despite the small size of ethanol, this "tight" molecular dynamics calculations still take a relatively long time (~1 day run on a single, quite old (2014) compute node with 20 cpu cores). The FHI-aims code distribution comes prepackaged with a utility to retrieve all geometry.in
files from a molecular dynamics run. Assuming the FHI-aims directory is labelled as $AIMS
and the MD directory is the current directory, run the following command:
$AIMS/utilities/create_geometry_zip.pl ./aims.out
This will extract all geometries from the output file aims.out
and store them in the compressed geometries.zip
directory. Programs like JMOL are able to visualize the structures visited during this molecular dynamics run. As a first approximation, this tutorial will assume we simply want an average of all molecular motions without any specific motion needing to be perfectly described. For cases where this approximation is insufficient, see the Further refinements section for guidance on picking geometries to average the results from.
Given the collision time of 0.5 ps set for the BDP thermostat, the first ~1ps of a molecular dynamics simulation was not used as the geometry is "equilibriating" to temperature from the energy-minimized single-point geometry and might not reflect the physical dynamics of an equlibrated molecule. In order to approximately capture all motions, 10 structures were uniformly sampled from the remaining 9ps of the molecular dynamics simulation. For each structure, the 1H shieldings were computed as described above. By averaging the results of these 10 structures, the following spectrum is found:
Figure 7: Theoretical 1H NMR Spectra for 2xCH3CH2OH using time-averaged NMR
From this result alone, it is hard to determine whether the time-averaging has improved the result. By combining this result with the symmetry averaging from step (3a), the following spectra is derived:
Figure 8: Theoretical 1H NMR Spectra for 2xCH3CH2OH using time-averaged and symmetry-averaged NMR
In comparison to Figure 6, the red methylene peaks are all slightly shifted to the right, which is closer to location of the experimental peaks. There is very little difference in the green methyl groups other than slightly more spread between the peaks, which is closer to the experimental result. And most noticeably, the blue hydroxyl group is shifted right by about 0.2 ppm, bringing it in line with the experimental result. Thus, time-averaging has systematically improved the spectrum compared to pure symmetry averaging. In ethanol, this result is somewhat marginal, but in other molecules this effect can be drastic and necessary for describing the spectrum accurately.
Further refinements
The above steps should be sufficient to produce well-behaved, accurate structures for a range of molecules. However, for larger molecules that are sufficiently complex, additional refinement may be necessary to produce good spectra. For many structures, groups of symmetry equivalent atoms can be averaged in a straightforward manner as described in Step (3a). However, in some structures it is unclear what does or does not constitute such a symmetry-related group in a structure. For instance, consider a benzene ring where one proton has been removed to incorporate the ring into the larger molecule. Should all remaining protons be averaged equally into the same symmetry group? Does a symmetry group remain at all? There is no single answer for how a symmetry-related group will behave so such questions have to be explored on a molecule by molecule basis to generate the most accurate spectrum. Additionally, as was clear in the case of ethanol, including multiple molecules or an entire solvent environment to simulate bonding effects can be important to produce correct spectra.
In contrast to the ambiguity of symmetry averaging, with time-averaging, systematic improvement is possible. First, on the molecular dynamics simulation, there are three avenues of systematic improvement. First, larger and better converged basis sets will lead to more realistic molecular dynamics and better final spectra. Second, shorter time periods for each step will ensure each molecular motion is explored in fine detail. Finally, a longer overall molecular dynamics simulation will ensure than more molecular motions are fully captured, especially in molecules that tend to jump between different conformers during molecular dynamics.
Second, on the geometry selection for time-averaging, there are two avenues of systematic improvement. Rather than random structure selection as is done in (3b), known modes of molecular motion can be determined and structures representing the full range of each motion can be selected as part of the eventual spectra. Second, increasing the number of structures included in the time-averaging improves the final spectra. In experimental NMR, there is effectively a "continuous" time-averaging where all structures present through the data collection time are averaged into the final result. As the number of geometries selected for time-averaging increases, the approximation of this continuum improves systematically.