
Normalization vs Standardization - What to pick?
Have you ever trained a machine learning model and wondered why people keep talking about “scaling the data”? Why does it matter if one feature is in the thousands while another is in decimals? Th...
Have you ever trained a machine learning model and wondered why people keep talking about “scaling the data”? Why does it matter if one feature is in the thousands while another is in decimals? Th...
TL;DR (Cheat-Sheet) Scatter plot (num vs num): relationships, clusters, outliers. Box plot (num vs cat): compare distributions across categories. Count plot (cat): check balance of classes. ...
TL;DR (Cheat‑Sheet) Epoch = one full pass over the dataset. Batch = a small chunk of data used for one update step. Batch size = how many samples per batch. Iteration = one parameter updat...
Every deep learning model asks itself the same thing after making a prediction: “How wrong am I right now?” The answer comes from a cost function (also called a loss function). It’s the model’s s...
Today, I took a step back to explore one of the most classic algorithms in machine learning: the Perceptron. It’s simple, lightning-fast to train, and a great first stop for anyone trying to unders...
In this project, we built a pipeline to classify tumor severity in lung adenocarcinoma patients using RNA-seq gene expression data from TCGA-LUAD (via cBioPortal). Patients with pathological stage ...
Understanding how the immune system interacts with cancer is a hot topic in bioinformatics. In this project, I explore how T-cell receptor (TCR) CDR3 sequences—which capture a patient’s immune land...
In this blog, I’d like to talk about the aim and the step-by-step logic behind building a simple model to predict house prices using multiple features. We will use the Boston Housing dataset to tr...
The Boston Housing dataset is one of the classic beginner projects in machine learning. I worked on this as part of my learning journey and wanted to share a few key steps that I personally found i...
This project presents a reproducible pipeline for analyzing single-nucleus RNA-seq (snRNA-seq) data from hepatoblastoma tumor, PDX, and normal samples. The workflow includes Seurat-based QC, batch ...