← All simulations · Pillar 5: Making decisions

k-nearest neighbors

What it is

You found a fruit and you’re not sure what it is. The simplest plan: look at the fruits most like it — the nearest neighbors — and go with whatever most of them are. If the 5 closest fruits are 3 apples and 2 lemons, you guess apple. That’s the whole idea, and it’s a real machine-learning method.

Go deeper: each fruit is a point with features (here, sweetness and size). “Closest” means the smallest distance between points. The k is how many neighbors get a vote. There’s no training step at all — the model just remembers every example and measures distances when you ask. That’s why it’s called a “lazy” learner.

Why care

“Things near each other are probably alike” is one of the most useful ideas in AI. It powers “you might also like…” suggestions, photo tagging, and spotting which past example a new case most resembles — all without any complicated math.

The idea, intuitively

Plot every known fruit by two clues. Drop your mystery fruit anywhere. Draw a line to the k closest fruits and let them vote — majority wins. Move the mystery fruit toward the apples and the vote turns apple; slide it toward the lemons and it flips. Right on the border, the answer is genuinely close, and the choice of k can tip it.

Peek at the data first

Before guessing anything, look at the fruits we already know. Each row is a fruit with two clues — sweetness and size — and its kind. Here are a few of them, with a summary of each column. This is what Spectra’s describe_data shows you.

Try it

Click or drag anywhere in the plot to move the mystery fruit (the “?”). The voting neighbors light up with rings and lines. Slide k to change how many of them get a vote.

Where it shows up

Recommendations. “People near you in taste also liked…” finds the nearest neighbors among users or items.
Recognizing things. A new handwritten digit is matched to the most similar examples it has seen before.
Medicine & science. Compare a new case to the most similar past cases to make a careful, evidence-based guess.

Where it came from

The nearest-neighbor rule was written up by Evelyn Fix and Joseph Hodges in a 1951 U.S. Air Force report. In 1967 Thomas Cover and Peter Hart published “Nearest Neighbor Pattern Classification,” proving how surprisingly good this simple idea can be. It has stayed a textbook starting point for classification ever since.

Try it in code

The Studio’s classifier is a k-nearest-neighbors model under the hood:

data  = load "fruits"
train, test = split data, hold_out: 20%

model = make_model "classifier"
train_model model, on: train, predict: "type", using: ["sweetness", "size"]

check model, with: test
say predict(model, sweetness: 5, size: 5)

Open it in the Studio ▶

Check your understanding

Why does the guess sometimes change when you slide k, even though the fruit hasn’t moved?
Where on the plot is the guess the least certain? Why?
Why do we usually pick an odd number for k when there are two classes?