K-NN: A Neighborhood Council of Data
Imagine making decisions based on the wisdom of your closest neighbors. In machine learning, K-Nearest Neighbors (K-NN) uses this concept to predict outcomes by considering the input of the nearest data points.
Conceptualizing K-NN
- Selecting Neighbors: Choose a number of neighbors (k) to consult. This determines how many data points will influence the prediction.
- Majority Rules: The majority opinion among the selected neighbors guides the decision.
K-NN for Different Tasks
- KNN for Classification:
- KNN is a simple algorithm that classifies a new data point based on the majority class of its k nearest neighbors in the feature space.
- The distance metric (e.g., Euclidean distance) is used to determine the proximity of data points.
- KNN for Regression:
- In regression, KNN predicts the target variable of a new data point by averaging the target values of its k nearest neighbors.
K-NN Example
| Number of Rooms | Area (sq. feet) | Has Garage | Price (Category) |
|---|---|---|---|
| 3 | 1500 | No | Affordable |
| 4 | 2000 | Yes | Expensive |
| 2 | 1200 | No | Affordable |
| 5 | 2500 | Yes | Expensive |
| 4 | 1800 | No | Affordable |
| 3 | 1600 | Yes | Expensive |
KNN Classification:
- Choose a Distance Metric (e.g., Euclidean): Select a distance metric to measure the similarity between data points. For instance, Euclidean distance is commonly used.
- Select a Value for k: Decide on the number of neighbors (k) to consider. Let’s say k=3.
- Predict a New Data Point:
- Suppose we want to predict the category for a house with 4 rooms, an area of 1700 sq. feet, and no garage.
- Find the three nearest neighbors in the dataset based on the chosen distance metric.
- If two neighbors are “Affordable” and one is “Expensive,” the majority class (Affordable) is assigned to the new data point.
| Number of Rooms | Area (sq. feet) | Has Garage | Price |
|---|---|---|---|
| 3 | 1500 | No | 120,000 |
| 4 | 2000 | Yes | 250,000 |
| 2 | 1200 | No | 100,000 |
| 5 | 2500 | Yes | 300,000 |
| 4 | 1800 | No | 150,000 |
| 3 | 1600 | Yes | 200,000 |
KNN Regression:
- Choose a Distance Metric (e.g., Euclidean): As in classification, select a distance metric, such as Euclidean distance.
- Select a Value for k: Decide on the number of neighbors (k) to consider. Let’s say k=3.
- Predict a New Data Point:
- Suppose we want to predict the price for a house with 4 rooms, an area of 1700 sq. feet, and no garage.
- Find the three nearest neighbors in the dataset based on the chosen distance metric.
- Take the average of the prices of these three neighbors as the predicted price for the new data point.
The Role of Euclidean Distance
Suppose we want to predict the price for a new house with 4 rooms, an area of 1700 sq. feet, and no garage.
- Calculate Euclidean Distances:
- Calculate the Euclidean distance between the new house and each existing data point in the dataset based on the features (number of rooms, area, and garage).
- For example, the Euclidean distance (d) between the new house and the first data point (3 rooms, 1500 sq. feet, no garage) can be calculated as:
- d=math.sqrt((4−3)2+(1700−1500)2+(0−0)2)
- Rank Distances:
- Rank the calculated distances in ascending order to identify the nearest neighbors.
- Select Nearest Neighbors:
- Choose the top three data points with the smallest distances as the nearest neighbors.
- Regression Prediction:
- For a regression task, predict the target variable (house price) for the new data point by taking the average of the target values of the three nearest neighbors.






Leave a Reply