- Life Scientists
Data Science for Life Scientists with KNIME
Objectives
- Master the KNIME interface and core concepts
- Clean and blend biological and chemical data
- Calculate molecular properties
- Visualize chemical space
- Train machine learning models
- Automate reporting
Prerequisites
- A background in biology, chemistry, or a related life sciences discipline
- No prior coding experience required
- Basic computer literacy with file management
- A laptop capable of running KNIME Analytics Platform
Master core data science skills by building a Drug Discovery Pipeline that helps evaluate and prioritize drug-like compounds. You’ll create a workflow that combines raw lab data, calculates basic molecular features, and visualizes structure-activity patterns. Finally, you’ll train a model to estimate compound bioactivity and generate a ranked list of promising candidates.
Note: Session 1 is a free, 3-hour onboarding event. It is mandatory and open to anyone. The remaining six sessions (2-7) are paid and run for 2 hours each. There is no obligation to enroll in the paid program. You can decide whether to register for the full training program after the onboarding session.
What You’ll Learn
Session 1: Onboarding: Introduction to KNIME (Free)
- Install KNIME Analytics Platform
- Learn basic KNIME concepts such as workspaces, hubs, workflows, nodes, ports, and traffic light logic
- Explore Meta Nodes, Components, and third-party extensions
- Project: Workflow Initialization – Set up the environment and execute a simple KNIME workflow
Session 2: Data Integration: Merging Chemical and Biological Datasets
- Read data from disparate sources (Excel lab notes vs. CSV chemical files)
- Join tables based on common identifiers (Relational Algebra)
- Handle duplicate columns and organize data structure
- Project: Master Data Joiner – Combine biological activity data with chemical structure lists into a single unified dataset
Session 3: Data Preprocessing: Standardizing and Classifying Assay Data
- Apply strategies to handle missing values and dirty data
- Perform mathematical transformations (Convert IC50 to pIC50)
- Categorize continuous data into classes (Active vs. Inactive)
- Project: Assay Normalizer – Clean the master dataset and create standardized classification labels for analysis
Session 4: Feature Engineering: Calculating Molecular Properties
- Apply Cheminformatics concepts using RDKit nodes
- Convert text strings (SMILES) into chemical objects
- Calculate molecular properties (Molecular Weight, LogP, H-Bond Donors)
- Project: Lipinski Filter – Generate chemical descriptors and screen compounds based on the Rule of 5
Session 5: Exploratory Analysis: Visualizing Chemical Space
- Create interactive visualizations and dashboards
- Map chemical properties to colors and shapes
- Link scatter plots to tabular data for deep exploration
- Project: Interactive SAR Dashboard – Visualize the Chemical Space to inspect Structure-Activity Relationships
Session 6: Predictive Modeling: Training a Bioactivity Classifier
- Understand Predictive Modeling concepts
- Partition data into Training and Testing sets
- Train a Decision Tree to classify compounds
- Project: Bioactivity Predictor – Train a machine learning model to predict the efficacy of untested compounds
Session 7: Automated Reporting: Ranking and Exporting Candidates
- Sort and rank model predictions
- Filter for the “Top K” best candidates
- Format data for export to stakeholders
- Project: Candidate Report Generator – Automatically generate a formatted purchase order for the most promising drug candidates
Related Training Programs
Applied Cheminformatics with RDKit
Modern chemistry is, in many ways, a data challenge. This training program translates chemical concepts into computer science terms, allowing you to leverage your existing expertise in string and graph manipulation to analyze, filter, and generate molecular structures.
Learn More