Data Analysis and PCA

Source folder: Data-Analysis-and-PCA/

Summary

Homework project on preprocessing and principal component analysis using the UCI Wisconsin Breast Cancer dataset.

Analysis Used

  • Cleaning placeholder values and removing incomplete/duplicate rows.
  • Target normalization and label encoding approaches.
  • Feature scaling/normalization (StandardScaler, MinMaxScaler).
  • Baseline supervised classification with KNeighborsClassifier.
  • Correlation-matrix analysis and PCA variance exploration.
  • Dimensionality reduction and class-separation visualization.

Technologies and Methods

  • Python, Jupyter Notebook
  • pandas, numpy
  • scikit-learn (PCA, preprocessing scalers, KNeighborsClassifier, metrics)
  • seaborn, matplotlib, plotly