- Software Engineers
Applied Cheminformatics with RDKit
Objectives
- Visualize and convert molecules
- Detect functional groups with SMARTS
- Filter molecules by properties
- Measure molecular similarity
- Analyze chemical diversity
- Generate combinatorial libraries
Prerequisites
- Approximately two years of computing or engineering coursework or equivalent experience
- Basic Python programming skills
- Familiarity with Jupyter Notebooks and VS Code
- High school-level chemistry knowledge
Modern chemistry is, in many ways, a data challenge. This training program translates chemical concepts into computer science terms, allowing you to leverage your existing expertise in string and graph manipulation to analyze, filter, and generate molecular structures.
Note: Session 1 is a free, 3-hour onboarding event. It is mandatory and open to anyone. The remaining six sessions (2-7) are paid and run for 2 hours each. There is no obligation to enroll in the paid program. You can decide whether to register for the full training program after the onboarding session.
What You’ll Learn
Session 1: Onboarding (Free)
- Review the program structure and logistics
- Install software and set up the environment
- Troubleshoot technical problems
Session 2: Molecular Representations and Depictions
- Learn how to represent molecules in 1D, 2D, and 3D
- Convert molecules between various formats
- Visualize molecules and generate image files
- Project: Develop a Chemical Converter to read molecular data from text files and save them as images
Session 3: Detecting Functional Groups with SMARTS
- Learn SMARTS patterns (Regular Expressions for Chemistry)
- Perform substructure searches
- Highlight matching patterns visually
- Project: Build a Functional Group Detector that scans a dataset and tags molecules containing specific features
Session 4: Converting Structures to Numerical Data
- Understand Molecular Descriptors (Representing chemistry as numbers)
- Compute key molecular properties
- Filter data using numerical thresholds and Lipinski’s Rule of Five
- Project: Build a Drug Candidate Filter that processes a CSV and extracts molecules suitable for further analysis
Session 5: Measuring Molecular Similarity
- Distinguish between substructure matching and similarity search
- Generate molecular fingerprints (hashing molecules into bit-vectors)
- Calculate similarity scores (Tanimoto, Dice)
- Project: Develop a Molecule Matcher tool that takes an input molecule and finds its top 5 matches in a database
Session 6: Clustering and Diversity Analysis
- Group similar molecules using Butina clustering
- Select diverse compounds using MaxMinPicker
- Visualize chemical space
- Project: Build a Diversity Picker that identifies the most unique structures in a large dataset
Session 7: Virtual Synthesis with Reaction SMARTS
- Define chemical transformations using Reaction SMARTS
- Run reactions and sanitize products
- Enumerate combinatorial libraries
- Project: Create a Library Generator that produces all possible products from a list of reactants
Related Training Programs
Data Science for Life Scientists with KNIME
Master core data science skills by building a Drug Discovery Pipeline that helps evaluate and prioritize drug-like compounds. You’ll create a workflow that combines raw lab data, calculates basic molecular features, and visualizes structure-activity patterns.
Learn More