Data Analysis and k-NN

Source folder: Data-Analysis-and-KNN/

Summary

Homework project using the Palmer penguins dataset to build and evaluate k-nearest-neighbors classifiers.

Analysis Used

  • Dataset loading and cleanup of missing rows.
  • Integer encoding of categorical fields (species/sex).
  • Feature selection from penguin body measurements.
  • Train/test split and supervised modeling with KNeighborsClassifier.
  • Evaluation with accuracy, confusion matrix, and classification report.
  • Comparison across different k values.

Technologies and Methods

  • Python, Jupyter Notebook
  • pandas, numpy, scipy
  • scikit-learn (train_test_split, KNeighborsClassifier, metrics)
  • matplotlib, plotly