Tumor Severity Prediction from RNA-Seq
Summary
Designed and implemented a machine learning and deep learning pipeline to classify tumor severity in lung adenocarcinoma based on TCGA-LUAD RNA-seq data, stratifying patients as “severe” (stage >1) or “non-severe” (stage 1) using clinical metadata.
Project Highlights
- Developed an end-to-end ML/DL workflow that automates tumor severity prediction directly from RNA-seq profiles.
- Improved model interpretability and performance by systematically evaluating ANOVA-selected gene subsets across classifiers.
- Increased predictive performance by 21% through deep learning optimization, achieving best results using a CNN model.
- Accelerated hyperparameter tuning by integrating Optuna with Keras, streamlining CNN optimization workflows.
- Enhanced reproducibility and future scalability with Docker containerization and a modular Nextflow pipeline for HPC use.