Life Scientists

Data Science for Life Scientists with KNIME

Dr. Lochana C. Menikarachchi
October 1, 2025
15 hours
🥇Beginner

Objectives

Master the KNIME interface and core concepts
Clean and blend biological and chemical data
Calculate molecular properties
Visualize chemical space
Train machine learning models
Automate reporting

Prerequisites

A background in biology, chemistry, or a related life sciences discipline
No prior coding experience required
Basic computer literacy with file management
A laptop capable of running KNIME Analytics Platform

Master core data science skills by building a Drug Discovery Pipeline that helps evaluate and prioritize drug-like compounds. You’ll create a workflow that combines raw lab data, calculates basic molecular features, and visualizes structure-activity patterns. Finally, you’ll train a model to estimate compound bioactivity and generate a ranked list of promising candidates.

Note: Session 1 is a free, 3-hour onboarding event. It is mandatory and open to anyone. The remaining six sessions (2-7) are paid and run for 2 hours each. There is no obligation to enroll in the paid program. You can decide whether to register for the full training program after the onboarding session.

What You’ll Learn

Session 1: Onboarding: Introduction to KNIME (Free)

Install KNIME Analytics Platform
Learn basic KNIME concepts such as workspaces, hubs, workflows, nodes, ports, and traffic light logic
Explore Meta Nodes, Components, and third-party extensions
Project: Workflow Initialization – Set up the environment and execute a simple KNIME workflow

Session 2: Data Integration: Merging Chemical and Biological Datasets

Read data from disparate sources (Excel lab notes vs. CSV chemical files)
Join tables based on common identifiers (Relational Algebra)
Handle duplicate columns and organize data structure
Project: Master Data Joiner – Combine biological activity data with chemical structure lists into a single unified dataset

Session 3: Data Preprocessing: Standardizing and Classifying Assay Data

Apply strategies to handle missing values and dirty data
Perform mathematical transformations (Convert IC50 to pIC50)
Categorize continuous data into classes (Active vs. Inactive)
Project: Assay Normalizer – Clean the master dataset and create standardized classification labels for analysis

Session 4: Feature Engineering: Calculating Molecular Properties

Apply Cheminformatics concepts using RDKit nodes
Convert text strings (SMILES) into chemical objects
Calculate molecular properties (Molecular Weight, LogP, H-Bond Donors)
Project: Lipinski Filter – Generate chemical descriptors and screen compounds based on the Rule of 5

Session 5: Exploratory Analysis: Visualizing Chemical Space

Create interactive visualizations and dashboards
Map chemical properties to colors and shapes
Link scatter plots to tabular data for deep exploration
Project: Interactive SAR Dashboard – Visualize the Chemical Space to inspect Structure-Activity Relationships

Session 6: Predictive Modeling: Training a Bioactivity Classifier

Understand Predictive Modeling concepts
Partition data into Training and Testing sets
Train a Decision Tree to classify compounds
Project: Bioactivity Predictor – Train a machine learning model to predict the efficacy of untested compounds

Session 7: Automated Reporting: Ranking and Exporting Candidates

Sort and rank model predictions
Filter for the “Top K” best candidates
Format data for export to stakeholders
Project: Candidate Report Generator – Automatically generate a formatted purchase order for the most promising drug candidates

comments powered by Disqus

Related Training Programs

Software Engineers

Applied Cheminformatics with RDKit

December 5, 2025
15 hours
🥇Beginner

Modern chemistry is, in many ways, a data challenge. This training program translates chemical concepts into computer science terms, allowing you to leverage your existing expertise in string and graph manipulation to analyze, filter, and generate molecular structures.

Learn More