Boston Housing Price Prediction

Summary

I built a complete end-to-end machine learning pipeline in Python using scikit-learn to predict Boston housing prices. This project demonstrates my ability to perform data cleaning, feature engineering, multivariable regression, diagnostic evaluation, and clear result interpretation — skills I also apply to biological datasets. Visualizations were created with Matplotlib and Seaborn for clear insights.

Project Highlights

  • Developed a multivariable linear regression model achieving R² > 80% on test data.
  • Performed correlation analysis and addressed multicollinearity using VIF.
  • Validated model performance with RMSE, residual plots, and diagnostic graphs.
  • Structured the pipeline end-to-end: data preprocessing → feature selection → model training → evaluation → visualization.
  • Used Matplotlib and Seaborn for residuals, confidence intervals, and prediction trends.
Machine Learning Scikit-learn Linear Regression Matplotlib Seaborn Pandas Statsmodels EDA Docker