Tumor vs Normal TCR Classification

Summary

This project applies deep learning to classify tumor versus normal immune repertoires using TRB CDR3 sequences from lung cancer patients in the CPTAC dataset. The pipeline covers end-to-end preprocessing, variable-length sequence modeling, and interpretability, with modular, reproducible code.

Project Highlights

  • Designed a deep learning workflow to classify tumor vs normal tissue using patient-level TRB CDR3 repertoires.
  • Preprocessed and tokenized variable-length CDR3 sequences with padding and mean pooling.
  • Built and compared mean pooling and LSTM-based sequence models in PyTorch.
  • Evaluated models using ROC AUC, F1-score, and confusion matrices.
Deep Learning PyTorch LSTM Sequence Modeling Immune Repertoires Tokenization