Pralsetinib Polypharmacology & Toxicity Knowledge Graph

A Neuro-Symbolic Approach to Drug Safety using Knowledge Graphs.

View the Project on GitHub Ishaanbal/DSC180B-B23-Website

Predicting Unknown Side Effects of the Cancer Drug Pralsetinib

A Neuro-Symbolic Approach to Drug Safety using Knowledge Graphs

Team Members: Ryan Cao, Suchit Bhayani, Ishaan Bal, Taranvir Chima
Mentors: Raju Pusapati (Solix), Murali Krishnam (Solix), Justin Eldridge (UCSD)


Abstract

Pralsetinib has limited long-term real-world safety data. Post-marketing pharmacovigilance already shows unexpected adverse effects beyond on-target effects. Because it is new, used in a small subset of patients, and has complex kinase biology, there is a high “unknown off-target space.” Our approach builds a knowledge graph linking mechanisms to toxicity, enriching it with biological pathways, and trains an AI model to predict specific off-target adverse outcomes.

Introduction: The Problem

Objectives

Methods

1. Knowledge Graph Construction

We constructed the baseline KG from PubChem-derived sources:

Visualizing the Graph Structure Initially, our graph was highly centralized around Pralsetinib, relying heavily on chemical and gene co-occurrences. After enriching it with biological ontologies, the composition shifted dramatically.

KG Composition Above: Node type distribution showing how proteins and diseases become the dominant entities after our ontology enrichment, significantly increasing the graph’s mechanistic breadth.

Visualizing this network highlights the complexity of drug interactions, but also reveals a structural challenge:

KG Subset Above: A subset visualization of the Knowledge Graph. Notice how heavily centralized the edges remain around Pralsetinib (forming a “star topology”). This centralization motivates our use of Graph Neural Networks to look beyond direct edges and find hidden lateral connections.

Click here for the technical enrichment details

To move beyond a simple "star topology" centered on the drug, we enriched the KG using Biomedical Ontologies:

  • Gene Ontology (GO): We queried GO via the UniProt API to link proteins to Biological Processes, adding involved_in edges and Pathway nodes.
  • Comparative Toxicogenomics Database (CTD): We used CTD to connect proteins to clinical outcomes (Disease/AE) via associated_with edges, focusing on oncology to prevent node explosion.

After enrichment, the graph grew to 580 nodes and 1,456 edges, with proteins and diseases dominating the structure.

2. Modeling

We compared a simple baseline model against an advanced AI model (Graph Neural Network).

Click here for more model details

Our GNN uses the full KG, learns node embeddings with a 2-layer Graph Convolutional Network (GCN), and scores links with a Multi-Layer Perceptron (MLP) head. It is trained to predict (Pralsetinib, inhibits, Protein) and optionally (Protein, associated_with, Outcome).

Results

To evaluate our approach, we compared how well our AI model (GNN) and the baseline model could “rediscover” known targets that we intentionally hid during training.

Target Predictions Above: Known targets recovered in the top-k predictions. The GNN exploits multi-hop structures to find hidden connections much more effectively than the baseline.

Conclusion & Next Steps

Conclusion: Our pipeline demonstrates that combining structured assay evidence with biological pathways enables mechanistic reasoning. The final deliverable provides pharmacologists with transparent, data-backed hypotheses for Pralsetinib’s safety-relevant outcomes, addressing the critical interpretability challenges in AI-driven drug discovery.

Next Steps / Future Directions:


Get in Touch

GitHub: Ishaanbal/DSC180B-B23-Knowledge-Graph-and-Biomedical-Ontology

We welcome any questions or suggestions for further exploration.