Software Engineers

Applied Cheminformatics with RDKit

Dr. Lochana C. Menikarachchi
December 5, 2025
15 hours
🥇Beginner

Objectives

Visualize and convert molecules
Detect functional groups with SMARTS
Filter molecules by properties
Measure molecular similarity
Analyze chemical diversity
Generate combinatorial libraries

Prerequisites

Approximately two years of computing or engineering coursework or equivalent experience
Basic Python programming skills
Familiarity with Jupyter Notebooks and VS Code
High school-level chemistry knowledge

Modern chemistry is, in many ways, a data challenge. This training program translates chemical concepts into computer science terms, allowing you to leverage your existing expertise in string and graph manipulation to analyze, filter, and generate molecular structures.

Note: Session 1 is a free, 3-hour onboarding event. It is mandatory and open to anyone. The remaining six sessions (2-7) are paid and run for 2 hours each. There is no obligation to enroll in the paid program. You can decide whether to register for the full training program after the onboarding session.

What You’ll Learn

Session 1: Onboarding (Free)

Review the program structure and logistics
Install software and set up the environment
Troubleshoot technical problems

Session 2: Molecular Representations and Depictions

Learn how to represent molecules in 1D, 2D, and 3D
Convert molecules between various formats
Visualize molecules and generate image files
Project: Develop a Chemical Converter to read molecular data from text files and save them as images

Session 3: Detecting Functional Groups with SMARTS

Learn SMARTS patterns (Regular Expressions for Chemistry)
Perform substructure searches
Highlight matching patterns visually
Project: Build a Functional Group Detector that scans a dataset and tags molecules containing specific features

Session 4: Converting Structures to Numerical Data

Understand Molecular Descriptors (Representing chemistry as numbers)
Compute key molecular properties
Filter data using numerical thresholds and Lipinski’s Rule of Five
Project: Build a Drug Candidate Filter that processes a CSV and extracts molecules suitable for further analysis

Session 5: Measuring Molecular Similarity

Distinguish between substructure matching and similarity search
Generate molecular fingerprints (hashing molecules into bit-vectors)
Calculate similarity scores (Tanimoto, Dice)
Project: Develop a Molecule Matcher tool that takes an input molecule and finds its top 5 matches in a database

Session 6: Clustering and Diversity Analysis

Group similar molecules using Butina clustering
Select diverse compounds using MaxMinPicker
Visualize chemical space
Project: Build a Diversity Picker that identifies the most unique structures in a large dataset

Session 7: Virtual Synthesis with Reaction SMARTS

Define chemical transformations using Reaction SMARTS
Run reactions and sanitize products
Enumerate combinatorial libraries
Project: Create a Library Generator that produces all possible products from a list of reactants

comments powered by Disqus

Related Training Programs

Life Scientists

Data Science for Life Scientists with KNIME

October 1, 2025
15 hours
🥇Beginner

Master core data science skills by building a Drug Discovery Pipeline that helps evaluate and prioritize drug-like compounds. You’ll create a workflow that combines raw lab data, calculates basic molecular features, and visualizes structure-activity patterns.

Learn More