A Neuro-Symbolic Approach to Drug Safety using Knowledge Graphs.
A Neuro-Symbolic Approach to Drug Safety using Knowledge Graphs
Team Members: Ryan Cao, Suchit Bhayani, Ishaan Bal, Taranvir Chima
Mentors: Raju Pusapati (Solix), Murali Krishnam (Solix), Justin Eldridge (UCSD)
Pralsetinib has limited long-term real-world safety data. Post-marketing pharmacovigilance already shows unexpected adverse effects beyond on-target effects. Because it is new, used in a small subset of patients, and has complex kinase biology, there is a high “unknown off-target space.” Our approach builds a knowledge graph linking mechanisms to toxicity, enriching it with biological pathways, and trains an AI model to predict specific off-target adverse outcomes.
We constructed the baseline KG from PubChem-derived sources:
Visualizing the Graph Structure Initially, our graph was highly centralized around Pralsetinib, relying heavily on chemical and gene co-occurrences. After enriching it with biological ontologies, the composition shifted dramatically.
Above: Node type distribution showing how proteins and diseases become the dominant entities after our ontology enrichment, significantly increasing the graph’s mechanistic breadth.
Visualizing this network highlights the complexity of drug interactions, but also reveals a structural challenge:
Above: A subset visualization of the Knowledge Graph. Notice how heavily centralized the edges remain around Pralsetinib (forming a “star topology”). This centralization motivates our use of Graph Neural Networks to look beyond direct edges and find hidden lateral connections.
To move beyond a simple "star topology" centered on the drug, we enriched the KG using Biomedical Ontologies:
involved_in edges and Pathway nodes.associated_with edges, focusing on oncology to prevent node explosion.After enrichment, the graph grew to 580 nodes and 1,456 edges, with proteins and diseases dominating the structure.
We compared a simple baseline model against an advanced AI model (Graph Neural Network).
Our GNN uses the full KG, learns node embeddings with a 2-layer Graph Convolutional Network (GCN), and scores links with a Multi-Layer Perceptron (MLP) head. It is trained to predict (Pralsetinib, inhibits, Protein) and optionally (Protein, associated_with, Outcome).
To evaluate our approach, we compared how well our AI model (GNN) and the baseline model could “rediscover” known targets that we intentionally hid during training.
Above: Known targets recovered in the top-k predictions. The GNN exploits multi-hop structures to find hidden connections much more effectively than the baseline.
Conclusion: Our pipeline demonstrates that combining structured assay evidence with biological pathways enables mechanistic reasoning. The final deliverable provides pharmacologists with transparent, data-backed hypotheses for Pralsetinib’s safety-relevant outcomes, addressing the critical interpretability challenges in AI-driven drug discovery.
Next Steps / Future Directions:
GitHub: Ishaanbal/DSC180B-B23-Knowledge-Graph-and-Biomedical-Ontology
We welcome any questions or suggestions for further exploration.