ML Made Simple

K-NN in Machine Learning: Harnessing the Wisdom of Neighbors

December 20, 2023

2–3 minutes

K-NN: A Neighborhood Council of Data

Imagine making decisions based on the wisdom of your closest neighbors. In machine learning, K-Nearest Neighbors (K-NN) uses this concept to predict outcomes by considering the input of the nearest data points.

Conceptualizing K-NN

Selecting Neighbors: Choose a number of neighbors (k) to consult. This determines how many data points will influence the prediction.
Majority Rules: The majority opinion among the selected neighbors guides the decision.

K-NN for Different Tasks

KNN for Classification:
- KNN is a simple algorithm that classifies a new data point based on the majority class of its k nearest neighbors in the feature space.
- The distance metric (e.g., Euclidean distance) is used to determine the proximity of data points.
KNN for Regression:
- In regression, KNN predicts the target variable of a new data point by averaging the target values of its k nearest neighbors.

K-NN Example

Number of Rooms	Area (sq. feet)	Has Garage	Price (Category)
3	1500	No	Affordable
4	2000	Yes	Expensive
2	1200	No	Affordable
5	2500	Yes	Expensive
4	1800	No	Affordable
3	1600	Yes	Expensive

Classification Table

KNN Classification:

Choose a Distance Metric (e.g., Euclidean): Select a distance metric to measure the similarity between data points. For instance, Euclidean distance is commonly used.
Select a Value for k: Decide on the number of neighbors (k) to consider. Let’s say k=3.
Predict a New Data Point:
- Suppose we want to predict the category for a house with 4 rooms, an area of 1700 sq. feet, and no garage.
- Find the three nearest neighbors in the dataset based on the chosen distance metric.
- If two neighbors are “Affordable” and one is “Expensive,” the majority class (Affordable) is assigned to the new data point.

Number of Rooms	Area (sq. feet)	Has Garage	Price
3	1500	No	120,000
4	2000	Yes	250,000
2	1200	No	100,000
5	2500	Yes	300,000
4	1800	No	150,000
3	1600	Yes	200,000

Regression Table

KNN Regression:

Choose a Distance Metric (e.g., Euclidean): As in classification, select a distance metric, such as Euclidean distance.
Select a Value for k: Decide on the number of neighbors (k) to consider. Let’s say k=3.
Predict a New Data Point:
- Suppose we want to predict the price for a house with 4 rooms, an area of 1700 sq. feet, and no garage.
- Find the three nearest neighbors in the dataset based on the chosen distance metric.
- Take the average of the prices of these three neighbors as the predicted price for the new data point.

The Role of Euclidean Distance

Suppose we want to predict the price for a new house with 4 rooms, an area of 1700 sq. feet, and no garage.

Calculate Euclidean Distances:
- Calculate the Euclidean distance between the new house and each existing data point in the dataset based on the features (number of rooms, area, and garage).
- For example, the Euclidean distance (d) between the new house and the first data point (3 rooms, 1500 sq. feet, no garage) can be calculated as:
- d=math.sqrt((4−3)²+(1700−1500)²+(0−0)²)
Rank Distances:
- Rank the calculated distances in ascending order to identify the nearest neighbors.
Select Nearest Neighbors:
- Choose the top three data points with the smallest distances as the nearest neighbors.
Regression Prediction:
- For a regression task, predict the target variable (house price) for the new data point by taking the average of the target values of the three nearest neighbors.

ML Made Simple

Leave a ReplyCancel reply

Drone Delivery: Revolutionizing Package Delivery in 2024

Augmented Reality in Education, Gaming, and Shopping: The 2024 Revolution

AI Trends in 2024: Navigating the Future of Technology

Trending