Data Analysis and PCA¶
Source folder: Data-Analysis-and-PCA/
Summary¶
Homework project on preprocessing and principal component analysis using the UCI Wisconsin Breast Cancer dataset.
Analysis Used¶
- Cleaning placeholder values and removing incomplete/duplicate rows.
- Target normalization and label encoding approaches.
- Feature scaling/normalization (
StandardScaler,MinMaxScaler). - Baseline supervised classification with
KNeighborsClassifier. - Correlation-matrix analysis and PCA variance exploration.
- Dimensionality reduction and class-separation visualization.
Technologies and Methods¶
- Python, Jupyter Notebook
pandas,numpyscikit-learn(PCA, preprocessing scalers,KNeighborsClassifier, metrics)seaborn,matplotlib,plotly