Tumor Severity Prediction from RNA-Seq

Summary

Designed and implemented a machine learning and deep learning pipeline to classify tumor severity in lung adenocarcinoma based on TCGA-LUAD RNA-seq data, stratifying patients as “severe” (stage >1) or “non-severe” (stage 1) using clinical metadata.

Project Highlights

  • Developed an end-to-end ML/DL workflow that automates tumor severity prediction directly from RNA-seq profiles.
  • Improved model interpretability and performance by systematically evaluating ANOVA-selected gene subsets across classifiers.
  • Increased predictive performance by 21% through deep learning optimization, achieving best results using a CNN model.
  • Accelerated hyperparameter tuning by integrating Optuna with Keras, streamlining CNN optimization workflows.
  • Enhanced reproducibility and future scalability with Docker containerization and a modular Nextflow pipeline for HPC use.

Keras Machine Learning CNN RNA-Seq Optuna Docker Nextflow