Sitemap

Installing RDKit Postgresql Cartridge on Windows

13 min readAug 21, 2024
Press enter or click to view image in full size
Image generated with AI

The RDKit Postgresql cartridge offers the possibility to run substructure match (and other molecular computation) directly in the PostgreSQL database, without the need to load the molecules first in Python. That can be a huge time saver, especially if you have a lot of molecules to parse. The version shipped with Conda also has PostgreSQL support and has been updated recently. In the previous version it worked only with Postresql 11, this is why I tried to build it from source initially. You might want to check first if the Conda build works for you before embarking on this trip (but don’t hesitate to check out the example below anyway).

If you need the latest version (or if the Conda build it not up-to-date again) running on your existing install of PostgreSQL, I documented here how to install the cartridge from source. If you have an existing database, note that you’ll probably need to create a new cluster and migrate the data there in order to create the extension (see section 11).

Following these steps I managed to build RDKit with PostgreSQL cartridge on a computer with Windows 11 (and Windows 10) and an Intel CPU. If you have an ARM 64 bits computer (e.g. Qualcomm Snapdragon X Elite), you need to specify that you want to install boost for ARM (see specific command in step 7).

1. Install a C++ compiler

First we need to install a C++ compiler. Go to Microsoft webpage to download Visual Studio C/C++ and run the installer. Then you’ll need to install the module Desktop development with C++.

Press enter or click to view image in full size

2. Install .NET Core and pwsh.exe (optional)

a. dotNET

Using Visual Studio, install .NET (see Microsoft instructions here) and install .NET SDK available on Microsoft website (check beforehand here which version of .NET is required for Powershell in order to install the correct version).

b. PowerShell

Now to install Powershell, open a command prompt and type

dotnet tool install --global PowerShell
Hopefully the message you’ll see when you successfully installed Powershell

3. Install Postgresql

You can follow the instruction here if you don’t have Postgresql installed already.

4. Installing miniforge

Here are the steps to install miniforge, but alternatively, you can also install miniconda3, see the instructions here.

Download and install miniforge3.

After the install you’ll be asked whether you want to update your shell profile to automatically initialise conda. You can undo this with the following command if needed:

# remove automatic initialisation of conda
conda config --set auto_activate_base false

Close the terminal and reopen it to have this take effect.

5. Create a dedicated environment for rdkit.build

We’ll then create a dedicated environment from which we’ll install the required components. Here I specify Python v.3.12, but if you change this you will need to adapt the option DRDK_BOOST_PYTHON3_NAME=python312 in part 7.b. to match your choice.

mamba create -n rdkit.build python=3.12
mamba activate rdkit.build

This will create and activate an new environment call rdkit.build.

Then we install required modules:

mamba install -y numpy matplotlib cmake cairo pillow eigen pkg-config boost-cpp boost swig

Then set a new environment variable locating your Boost folder (adapt the path according to your folder tree):

SETX BOOST_ROOT "../miniforge3/envs/rdkit.build"

6. Install vcpkg

Now, in your command prompt, navigate to C:/, clone Microsoft vcpkg into a new folder and install it (you can do this step elsewhere than C:/, but modify the remaining commands accordingly):

cd C:/
git clone https://github.com/microsoft/vcpkg.git vcpkg
cd vcpkg
bootstrap-vcpkg.bat
.\vcpkg integrate install

Then set a new environment variable for vcpkg:

SETX VCPKG_DEFAULT_TRIPLET "x86-windows"

On ARM machines use this instead:

SETX VCPKG_DEFAULT_TRIPLET "arm64-windows"

7. Install Catch2, Cairo & Boost

Still in a command prompt, install Catch2 using vcpkg:

vcpkg install catch2 --clean-after-build

If this fails, check you have added vcpkg to your PATH or move to the vcpkg folder and run the command from there:

cd C:/vcpkg
.\vcpkg install catch2

You might also need to install Catch2 via mamba in your environment (but you can wait until 8b, and see if it fails):

mamba install catch2

Then similarly, install Cairo

vcpkg install cairo --clean-after-build

If this fails, install some requirements for Cairo first:

vcpkg install zlib
vcpkg install libpng
vcpkg install pixman

Install Boost also with vcpkg (on ARM machines skip to the next command):

vcpkg install boost --clean-after-build

On ARM you need to specify the version:

vcpkg install boost:arm64-windows

8. Install cmake

Visit cmake webpage and download the version suitable for your computer. Run the installation file and when asked tick the box to add CMake to PATH.

9. Install RDKit

Now we are ready to build RDKit from source.

If you have trouble at some point with the commands, try using the Developer Command Prompt for VS Insiders instead of the common Command prompt (and possibly log as administrator, right-clicking on the icon). Then you might need to activate mamba calling mamba.bat located in the miniforge3/condabin folder, like this

miniforge3/condabin/mamba.bat activate rdkit.build

a. Download RDKit

We copy RDKit repository in a new folder (here at C:/, but you can also change this). In a command prompt, run:

cd C:/
git clone https://github.com/rdkit/rdkit.git rdkit

b. Configure RDKit

Now, navigate to the new RDKit folder and create a build folder:

cd C:/rdkit
mkdir build

In the next part we use cmake to build RDKit, with a lot of options that require you attention. Some of it is discussed in this thread on GitHub, and can be a helpful starting point if you get errors during the build.

cd C:/rdkit/build

cmake -DPy_ENABLE_SHARED=1\
-DBOOST_ROOT=%BOOST_ROOT%\
-DBOOST_NO_SYSTEM_PATHS=ON\
-DBOOST_NO_BOOST_CMAKE=TRUE\
-DRDK_INSTALL_INTREE=ON\
-DRDK_INSTALL_STATIC_LIBS=OFF\
-DRDK_BUILD_CPP_TESTS=ON\
-DRDK_BUILD_CAIRO_SUPPORT=ON\
-DRDK_BUILD_FREETYPE_SUPPORT=ON\
-DRDK_BOOST_PYTHON3_NAME=python312\
-DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include\
-DRDK_BUILD_PYTHON_WRAPPERS=ON\
-DRDK_BUILD_INCHI_SUPPORT=ON\
-USE_INCHI=1\
-DRDK_BUILD_AVALON_SUPPORT=ON\
-DRDK_BUILD_PGSQL=ON\
-DRDK_PGSQL_STATIC=ON\
-DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include"
-DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server"\
-DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..

Check the command options: If you installed another version of Python in the rdkit.build environment, you need to adapt the option DRDK_BOOST_PYTHON3_NAME.

Check that the last three options DPostgreSQL_INCLUDE_DIR, DPostgreSQL_TYPE_INCLUDE_DIR and DPostgreSQL_LIBRARY point to where your version of Postgresql is installed.

Now, before running the cmake command, check the environment variable %BOOST_ROOT% is correct:

echo %BOOST_ROOT%

If the SET command used in 5 did not work, you can navigate to the “Environment Variables…” tab in System Properties and add a new variable with the right path (see section 5 for an example of path).

Now run the command (here as a one-liner) with your command prompt (you’ll need to have it “run as administrator”) with the python environment activated.

cmake -DPy_ENABLE_SHARED=1 -USE_INCHI=1 -DBOOST_ROOT=%BOOST_ROOT% -DRDK_BUILD_PYTHON_WRAPPERS=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DBOOST_NO_SYSTEM_PATHS=ON -DBOOST_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BOOST_PYTHON3_NAME=python312 -DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include -DRDK_BUILD_PGSQL=ON -DRDK_PGSQL_STATIC=ON -DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include" -DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server" -DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..

Failing, e.g. on missing nmake (issue on Windows 11)

If you encounter an issue, e.g. telling you that nmake is not recognised, you can to run the cmake command again in the Developer Command Prompt. To do so, go in the start bar and under all, then Visual Studio click and run Developer Command Prompt for VS Insiders.

Now activate the python environment. To do so, you first need to call minimamba’s bat file (adjust the file path to your configuration):

@CALL "C:\Users\<your username>\miniforge3\condabin\mamba_hook.bat"

Then activate the environment and call the cmake command again.

Missing pg_regress.exe

If you get an issue like:

Then copy the pg_regress.exe from C:\Program Files\PostgreSQL\16\lib\pgxs\src\test\regress into C:\Program Files\PostgreSQL\16\bin (don’t forget to adapt the path, especially PostgreSQL version):

xcopy /s "C:\Program Files\PostgreSQL\16\lib\pgxs\src\test\regress\pg_regress.exe" "C:\Program Files\PostgreSQL\16\bin\pg_regress.exe"

Catch2 Issue

If you get into an issue saying Build step for catch2 failed, check that you’re running the command in an environment where you installed catch2 with

mamba install catch2

Otherwise, you can try adding the following option to the cmake command, checking first that the path to catch2 corresponds to your system:

-DCATCH_DIR:FILEPATH=C:\vcpkg\installed\x64-windows\include\catch2

For commodity, here’s the full command again:

cmake -DPy_ENABLE_SHARED=1 -DCATCH_DIR:FILEPATH=C:\vcpkg\installed\x64-windows\include\catch2 -USE_INCHI=1 -DBOOST_ROOT=%BOOST_ROOT% -DRDK_BUILD_PYTHON_WRAPPERS=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DBoost_NO_SYSTEM_PATHS=ON -DBoost_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BOOST_PYTHON3_NAME=python312 -DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include -DRDK_BUILD_PGSQL=ON -DRDK_PGSQL_STATIC=ON -DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include" -DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server" -DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..

c. Build RDKit

If the preceding build worked, you should now stop postgresql service by pressing CTRL+SHIFT+ESC, then go to tab Services, find postgresql-x64–16, right-click and stop.

When this is done, open a Command Prompt as administrator and run

cd C:/rdkit/build
cmake --build . --config Release --target install -j 10

d. Add RDKit to environment variables

Now, we need to add RDKit to the path (specifically C:/rdkit/rdkit, where the file rdBase.pyd is located), and create RDBASE and PYTHONPATH:

SETX PATH C:/rdkit/rdkit;%PATH%
SETX RDBASE C:/rdkit
SETX PYTHONPATH %RDBASE%

If PYTHONPATH already exists you can also append RDBASE to it using this instead:

SETX PYTHONPATH %RDBASE%;%PYTHONPATH%

If you’re using RDKit installed with Conda in other environments, you’ll run into issues. You need to set a different PYTHONPATH without RDBASE for these Conda environments. To do so, activate the environment in question (e.g. called here myEnv) and run the command below (adapt PYTHONPATH to reflect your folder tree, you must point to Conda’s condabin folder), then activate the environment again so the changes take effect:

conda activate myEnv
conda env config vars set PYTHONPATH=C:/Users/yourname/Miniconda3/condabin
conda activate myEnv

e. Check RDKit version

Now, open a new command prompt and check RDKit version:

mamba activate rdkit.build
python

Then

from rdkit import rdBase
print(rdBase.rdkitVersion)
Press enter or click to view image in full size

Missing pwsh.exe

If you see the following error appearing while cmake is building RDKit:

'pwsh.exe' is not recognized as an internal or external command,

Then, you might need to either ensure that powershell is installed (see section 2), that it is in the path, or/and change line 229 in the following file:

vcpkg/scripts/buildsystems/msbuild/vcpkg.targets

by renaming

<Exec
Condition="'$(VcpkgXUseBuiltInApplocalDeps)' != 'true'"
Command="pwsh.exe $(_ZVcpkgAppLocalPowerShellCommonArguments)"

to (change the pwsh.exe to powershell.exe):

<Exec
Condition="'$(VcpkgXUseBuiltInApplocalDeps)' != 'true'"
Command="powershell.exe $(_ZVcpkgAppLocalPowerShellCommonArguments)"

before calling cmake again, as above.

If it succeeded you can turn PostgreSQL back on.

Common error

If you see the following error:

  (rdkit.build) C:\rdkit\build>copy /Y "C:\rdkit\build\Code\PgSQL\rdkit\Release\rdkit.dll" "C:\PROGRA~1\POSTGR~1\16\lib\rdkit.dll"
Access is denied.
0 file(s) copied.

=====================================================================
This might be due to insufficient privileges.
Check C:/rdkit/build/Code/PgSQL/rdkit/pgsql_install.bat
for correctness of installation paths. If everything is OK, gain
administrator privileges, stop the PostgreSQL service, run
C:/rdkit/build/Code/PgSQL/rdkit/pgsql_install.bat
to install the PostgreSQL RDKit cartridge, then start again
the PostgreSQL service.
=====================================================================

Then, copy and paste the three files in

C:\rdkit\build\Code\PgSQL\rdkit\Release

to

C:\Program Files\PostgreSQL\16\lib

Then restart PostgreSQL service. If this fails, try rebooting your computer and check again.

10. Create extension

If the previous steps were successful, you should be able open a command prompt, activate rdkit.build environment, connect to your database and then run the following command in

CREATE EXTENSION rdkit;

If you get an error message like

ERROR:  could not load library "C:/Program Files/PostgreSQL/16/lib/rdkit.dll": The specified module could not be found.

Then check if the .dll file is in the folder mentioned in the error and/or copy paste the three files from

C:\rdkit\Code\PgSQL\rdkit\Release

If the DLL file is in the right folder, you can investigate further if there are missing dependencies, for example using Lucas G Dependencies (but check first the Missing pwsh.exe and Common error in section 10). To do so, download the software on Github https://github.com/lucasg/Dependencies, decompress it (for example in your Download folder), then run the file named DependenciesGUI.exe.

Select File/Open and navigate to the file rdkit.dll in PostgreSQL/16/lib.

You can also check if RDKit appears in the registered extension:

psql -d postgres -c "select * from pg_available_extensions where name ilike '%rdkit%';"

11. Creating a new cluster in case of error

I had issues with creating the extension on databases in my main cluster (the one I created when installing PostgreSQL with the EDB installer and located alongside the files in Program Files). However, I had no issues on clusters I initiated after having installed RDKit. So this might be a work-around. Simply create a new cluster in the location of your choice (this can also be an external hard-drive, though that runs the risk of losing data if you disconnect):

initdb -D path/to/your/new/cluster

Then go with file explorer to that location, and open the file called postgresql.conf. Either look for the entry

#port=5432

uncomment this line (remove the #) and change it to an unused port, e.g. 5431

port=5431

Then you’ll need to start the cluster with the following command (I added a log argument, otherwise all log is printed on your command prompt, the folder need to exist, but you can change the file name). You should keep the command prompt open afterwards.

pg_ctl -D path/to/your/new/cluster --log path/to/an/existing/folder/psql16.log start

Then you connect to the cluster by adding -p 5431 to any usual command, e.g.

psql -p 5431 -U postgres mydatabase

To close the cluster (e.g. if you installed it on an external hard-drive and want to remove that hard-drive) just do:

pg_ctl -D path/to/your/new/cluster stop

And if you need to restart the cluster:

pg_ctl -D path/to/your/new/cluster --log path/to/an/existing/folder/psql16.log restart

12. Updating PostgreSQL/RDKit

If you upgraded PostgreSQL, you might need to rebuild RDKit cartridge. Alternatively, if you want to update RDKit just use git pull:

cd C:/rdkit
git pull

If necessary, you can also erase all changes in your local repository and download again from remote:

git reset --hard HEAD
git clean -df
git pull

Then run the following command, but remember to adapt it (see point 8) regarding where PostgreSQL is installed, what version you’re using and also what python version you are using under DRDK_BOOST_PYTHON3_NAME.

Don’t forget to activate your mamba environment and to use an elevated command prompt (“run as administrator”).

cd C:/rdkit/build
cmake -DPy_ENABLE_SHARED=1\
-DBOOST_ROOT=%BOOST_ROOT%\
-DBoost_NO_SYSTEM_PATHS=ON\
-DBoost_NO_BOOST_CMAKE=TRUE\
-DRDK_INSTALL_INTREE=ON\
-DRDK_INSTALL_STATIC_LIBS=OFF\
-DRDK_BUILD_CPP_TESTS=ON\
-DRDK_BUILD_CAIRO_SUPPORT=ON\
-DRDK_BUILD_FREETYPE_SUPPORT=ON\
-DRDK_BOOST_PYTHON3_NAME=python312\
-DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include\
-DRDK_BUILD_PYTHON_WRAPPERS=ON\
-DRDK_BUILD_INCHI_SUPPORT=ON\
-DRDK_BUILD_AVALON_SUPPORT=ON\
-DRDK_BUILD_PGSQL=ON\
-DRDK_PGSQL_STATIC=ON\
-DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include"
-DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server"\
-DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..

Again, modify the Python version, PostgreSQL version and paths specific to your configuration, and run the command as one one-liner:

cmake -DPy_ENABLE_SHARED=1 -USE_INCHI=1 -DBOOST_ROOT=%BOOST_ROOT% -DRDK_BUILD_PYTHON_WRAPPERS=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DBOOST_NO_SYSTEM_PATHS=ON -DBOOST_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BOOST_PYTHON3_NAME=python312 -DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include -DRDK_BUILD_PGSQL=ON -DRDK_PGSQL_STATIC=ON -DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include" -DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server" -DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..

Then rebuild the cartridge:

cd C:/rdkit/build
cmake --build . --config Release --target install -j 10

Test again that it works, e.g. create a dummy database and try to create the extension:

createdb dummymoldb
psql dummymoldb
CREATE EXTENSION rdkit;

Hopefully you don’t get an error, otherwise, check section 10 for (some) guidance.

13. Using RDKit Cartridge, a short example

Adding mol to a table

Let’s create a table called molecules with a SMILES columns and a MolBlock column in our testmol database.

BEGIN;
CREATE TABLE molecules (
id SERIAL PRIMARY KEY,
smiles TEXT,
formula TEXT
);
END;

Here we can add a few molecules to add:

BEGIN;
INSERT INTO molecules (smiles)
VALUES
('[H]C1=C(C2=C([H])C(OC(F)(F)F)=C([H])C([H])=C2[H])C(C([H])([H])[H])=C([H])C(N([H])[H])=N1'),
('[H]OC(=O)C(=O)N([H])C1=C([H])C([H])=C(C2=C([H])SC(N(C3=C([H])C(Cl)=C(Cl)C([H])=C3[H])C([H])([H])C3=C([H])C(Br)=C(C([H])([H])P(=O)(OF)OF)C([H])=C3[H])=N2)C([H])=C1[H]'),
('[H]C1=C([H])C(C([H])([H])N([H])C([H])([H])C([H])([H])C([H])([H])[H])=C([H])C(C([H])([H])N(C([H])([H])C([H])([H])[H])C([H])([H])C2([H])C([H])([H])C2([H])[H])=C1F'),
('[H]C1=NC2=C(C([H])=C1F)N(C1=NC(N([H])[H])=C(N(C(=O)OC([H])([H])[H])C([H])([H])C3=C(F)C([H])=C(F)C([H])=C3[H])C(N([H])[H])=N1)N=C2C([H])([H])C1=C([H])C([H])=C([H])C([H])=C1F'),
('[H]C1=C([H])C(C(=O)OC([H])([H])[H])=C([H])C(OC2=C([H])C([H])=C([H])C(C3=NN(C([H])([H])C4=C([H])C(Br)=C(C([H])([H])P(=O)(OF)OF)C([H])=C4[H])C(=NC4=C([H])C(Cl)=C(Cl)C([H])=C4[H])S3)=C2[H])=C1[H]'),
('[H]C1=C([H])C([H])=C(C([H])=C([H])C(=O)N([H])C2=NC3=C(S2)C([H])([H])C([H])([H])C([H])([H])C3([H])C(=O)N2C([H])([H])C([H])([H])N(C3=C([H])C([H])=C([H])C(C(F)(F)F)=C3[H])C([H])([H])C2([H])[H])O1'),
('[H]OC(=O)C([H])([H])OC1=C([H])C([H])=C([H])C(C2=C([H])SC(N(C3=C([H])C(Cl)=C(Cl)C([H])=C3[H])C([H])([H])C3=C([H])C(Br)=C(C(F)(F)P(=O)(O[H])O[H])C([H])=C3[H])=N2)=C1[H]'),
('[H]C1=C([H])C(S(=O)(=O)N2C([H])([H])C([H])([H])N([H])C(=O)C2([H])C([H])([H])C(=O)N([H])C2([H])C3=C(C([H])=C(C([H])([H])N4C([H])([H])C([H])([H])C([H])([H])C([H])([H])C4([H])[H])C([H])=C3[H])C([H])([H])C([H])([H])C2([H])[H])=C([H])C(Cl)=C1F'),
('[H]OP(=O)(O[H])C(F)(F)C1=C(Br)C([H])=C(C([H])([H])N(C2=NC(C3=C([H])C([H])=C(S(=O)(=O)C([H])([H])[H])C([H])=C3[H])=C([H])S2)C([H])([H])C2=C([H])C([H])=C(S(=O)(=O)N([H])[H])C([H])=C2[H])C([H])=C1[H]'),
('[H]OC(=O)C1=C([H])C(C2=C(F)C([H])=C(Br)C([H])=C2[H])=NN1[H]');
COMMIT;

Now, we can save a mol representation in the database:

-- add a new column called mol
BEGIN;
ALTER TABLE molecules
ADD COLUMN "mol" mol;
COMMIT;

-- update molecules with the mol inferred from smiles
BEGIN;
UPDATE molecules
SET mol = mol_from_smiles(smiles::cstring)
WHERE is_valid_smiles(smiles::cstring);
COMMIT;

-- update formula from the mol
BEGIN;
UPDATE molecules
SET formula = mol_formula(mol);
COMMIT;

If instead of smiles we used molblocks, we would use this code instead:

-- update molecules with the mol inferred from molblock
BEGIN;
UPDATE molecules
SET mol = mol_from_ctab(molblock::cstring)
WHERE is_valid_ctab(molblock::cstring);
COMMIT;

Search for substructure pattern

Imagine you want to check for the presence of a carboxylic acid group in the molecules in the table. You would write a SMARTS for the carboxylic acid group, e.g. C(=O)([OH])[#6X4] and call:

SELECT id, smiles FROM molecules WHERE mol@>'C(=O)([OH])[#6X4]'::qmol;
Press enter or click to view image in full size
Press enter or click to view image in full size
The molecule with carboxylic acid group found in the database, with atoms matched by the SMARTS highlighted. Image produced with RDKit for Python.

If you followed the steps above, you built RDKit with InChI support, including in PostgreSQL. This means you can get both InChI and InChIKey from you molecules directly in PostgreSQL:

SELECT id, mol_inchi(mol) FROM molecules;
Press enter or click to view image in full size
SELECT id, mol_inchikey(mol) FROM molecules;

14. Install RDKit module in an other environment

If you want to install the RDKit module you’ve built from source in an other environment, you’ll need to install the modules described in section 5, so

mamba activate myNewEnv
mamba install -y numpy matplotlib cmake cairo pillow eigen pkg-config boost-cpp boost swig

Then you don’t need to rebuild or do anything else, it should work already. Test it by entering Python and typing:

from rdkit import rdBase
rdBase.rdkitVersion

If this fails because the module is not found (ModuleNotFoundError), then you need to check first that your environment uses the same Python version as the one you specified in the cmake options that you ran under section 12 (namely, option DRDK_BOOST_PYTHON3_NAME). To resolve this you’d need either to redo section 12, or to create a new environment with the same python version, e.g. like this:

mamba create -n chem python=3.12

For an environment called chem using Python 3.12.

15. Update vcpkg

It seems that you can’t just update all packages after that. There are more elegant solutions to this, but since I don’t have too many packages installed with vcpkg, I just delete them all (and the run the commands below).

cd C:/vcpkg
git pull
bootstrap-vcpkg.sh

Then install the necessary packages:

vcpkg install catch2 --clean-after-build
vcpkg install zlib --clean-after-build
vcpkg install libpng --clean-after-build
vcpkg install pixman --clean-after-build
vcpkg install cairo --clean-after-build
vcpkg install boost --clean-after-build

Then go git pull RDKit and install it again:

cd C:/rdkit
git pull
cd C:/rdkit/build
cmake -DPy_ENABLE_SHARED=1 -USE_INCHI=1 -DBOOST_ROOT=%BOOST_ROOT% -DRDK_BUILD_PYTHON_WRAPPERS=ON -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_AVALON_SUPPORT=ON -DBoost_NO_SYSTEM_PATHS=ON -DBoost_NO_BOOST_CMAKE=TRUE -DRDK_INSTALL_INTREE=ON -DRDK_INSTALL_STATIC_LIBS=OFF -DRDK_BUILD_CPP_TESTS=ON -DRDK_BUILD_CAIRO_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=ON -DRDK_BOOST_PYTHON3_NAME=python312 -DBoost_INCLUDE_DIR=%BOOST_ROOT%/Library/include -DRDK_BUILD_PGSQL=ON -DRDK_PGSQL_STATIC=ON -DPostgreSQL_INCLUDE_DIR="C:/Program Files/Postgresql/16/include" -DPostgreSQL_TYPE_INCLUDE_DIR="C:/Program Files/Postgresql/16/include/server" -DPostgreSQL_LIBRARY="C:/Program Files/Postgresql/16/lib/libpq.lib" ..
cd C:/rdkit/build
cmake --build . --config Release --target install -j 10

If you get into issues, try resetting RDKit folder (delete build folder) and then run:

git reset --hard HEAD

Acknowledgements

I wrote this tutorial while at the Department of Environmental Science at Stockholm University and as member of the ZeroPM project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101036756.

Press enter or click to view image in full size

--

--

Luc Miaz
Luc Miaz

Written by Luc Miaz

I have a background in Mathematics, Statistics and Environmental Science with keen interest in Social Science and Philosophy.