Data Science for Life Scientists with KNIME
  • Life Scientists

Data Science for Life Scientists with KNIME

Objectives

  • Master the KNIME interface and core concepts
  • Clean and blend biological and chemical data
  • Calculate molecular properties
  • Visualize chemical space
  • Train machine learning models
  • Automate reporting

Prerequisites

  • A background in biology, chemistry, or a related life sciences discipline
  • No prior coding experience required
  • Basic computer literacy with file management
  • A laptop capable of running KNIME Analytics Platform

Master core data science skills by building a Drug Discovery Pipeline that helps evaluate and prioritize drug-like compounds. You’ll create a workflow that combines raw lab data, calculates basic molecular features, and visualizes structure-activity patterns. Finally, you’ll train a model to estimate compound bioactivity and generate a ranked list of promising candidates.

Note: Session 1 is a free, 3-hour onboarding event. It is mandatory and open to anyone. The remaining six sessions (2-7) are paid and run for 2 hours each. There is no obligation to enroll in the paid program. You can decide whether to register for the full training program after the onboarding session.

What You’ll Learn

Session 1: Onboarding: Introduction to KNIME (Free)

  • Install KNIME Analytics Platform
  • Learn basic KNIME concepts such as workspaces, hubs, workflows, nodes, ports, and traffic light logic
  • Explore Meta Nodes, Components, and third-party extensions
  • Project: Workflow Initialization – Set up the environment and execute a simple KNIME workflow

Session 2: Data Integration: Merging Chemical and Biological Datasets

  • Read data from disparate sources (Excel lab notes vs. CSV chemical files)
  • Join tables based on common identifiers (Relational Algebra)
  • Handle duplicate columns and organize data structure
  • Project: Master Data Joiner – Combine biological activity data with chemical structure lists into a single unified dataset

Session 3: Data Preprocessing: Standardizing and Classifying Assay Data

  • Apply strategies to handle missing values and dirty data
  • Perform mathematical transformations (Convert IC50 to pIC50)
  • Categorize continuous data into classes (Active vs. Inactive)
  • Project: Assay Normalizer – Clean the master dataset and create standardized classification labels for analysis

Session 4: Feature Engineering: Calculating Molecular Properties

  • Apply Cheminformatics concepts using RDKit nodes
  • Convert text strings (SMILES) into chemical objects
  • Calculate molecular properties (Molecular Weight, LogP, H-Bond Donors)
  • Project: Lipinski Filter – Generate chemical descriptors and screen compounds based on the Rule of 5

Session 5: Exploratory Analysis: Visualizing Chemical Space

  • Create interactive visualizations and dashboards
  • Map chemical properties to colors and shapes
  • Link scatter plots to tabular data for deep exploration
  • Project: Interactive SAR Dashboard – Visualize the Chemical Space to inspect Structure-Activity Relationships

Session 6: Predictive Modeling: Training a Bioactivity Classifier

  • Understand Predictive Modeling concepts
  • Partition data into Training and Testing sets
  • Train a Decision Tree to classify compounds
  • Project: Bioactivity Predictor – Train a machine learning model to predict the efficacy of untested compounds

Session 7: Automated Reporting: Ranking and Exporting Candidates

  • Sort and rank model predictions
  • Filter for the “Top K” best candidates
  • Format data for export to stakeholders
  • Project: Candidate Report Generator – Automatically generate a formatted purchase order for the most promising drug candidates
comments powered by Disqus

Related Training Programs

Applied Cheminformatics with RDKit

  • December 5, 2025
  • 15 hours
  • 🥇Beginner

Modern chemistry is, in many ways, a data challenge. This training program translates chemical concepts into computer science terms, allowing you to leverage your existing expertise in string and graph manipulation to analyze, filter, and generate molecular structures.

Learn More