ML can be extremely intuitive and one intuitive ML method works extremely well

Jrinne · June 23, 2024, 9:23am

All,

K-Nearest Neighbors (KNN) is the simplest and most intuitive ML model there is.

Despite all of the math jargon, it is like using "Comps" in real estate. So if you want to know how much a house is worth before you buy you find as many houses just like the one you are interested in that have sold recently, average their prices and that is about what the house you are looking at is worth.

For real estate features think similar square feet, number of bedrooms, master suite, first floor bedroom, when built, brick etc. Whatever you think is important. Just as you have a choice of features at P123 to use for your Comps.I wonder of some to the online sites use this, e.g., Zillow……...

Maybe an hallucination but according to ChatGPT 4o: "Several online real estate pricing tools and platforms use machine learning algorithms, including K-Nearest Neighbors (KNN), to estimate property values."

KNN looks into the past and finds the 'K' number of stocks that are the closest fit (with regard to the features you selected) to the stock you are looking to buy. Averages the target you have selected and that is the prediction. Pretty simple. i think the average real-estate agent gets it. Most do on the Home and Garden channel anyway (my wife loves that channel). Might as well throw away the remote control. But you can learn about ML there.

Cramer does the same thing. His most common Macro Economic reasoning—on TV at least— is "Last time we had this situation……". He is look at similar situation in the past as KNN does (with a ton of objective data, in KNN's case at least).

Heck, they even use the word "Neighborhood." Because, just like in real estate you assume something similar--in the same neighborhood--will sell for the same price.

One cool thing about KNN is this: NO TRAINING!!!!!!

It just takes in all of the data , puts it into memory where it sit for a while (called a "Lazy" algorithm for that reason) When you have a stock you want to predict then the program searches the database for the K most similar stocks in its memory and averages the target (perhaps future returns for those k stocks).

I mention this for 2 reasons:

no training but huge memory requirements. Maybe not something P123 would want to add becauae of the memory requirements (but no training remember). However, maybe with a slower memory system (e.g. hard drives) to run overnight would be economical..
So intuitive that you just have to believe it!!!!
Okay a little non-real estate jargon, it is non-parametric which means you don't have to worry about normality etc. Which is just to say, all you have to worry about in the end is the Comps.

Anyway, I believe it. And it corresponds to what i get with Extra Trees Regressor providing some justification for P123's AI/ML

References:

A Kaggle link: Predicting House Prices with Machine Learning(KNN)

Medium Link: Designing an optimal KNN regression model for predicting house price with Boston Housing Dataset

Medium likes this (another reference): House Price Prediction using KNN Regression-Intel OneAPI Optimized

Zillow's Accuracy: How accurate is the Zestimate?

Video of Zillow Executive using the words "Nearest Neighbors" when describing the Zillow Zestimate: "Nearest Neighbors"

Zillow site: "The core tech for the previous Zestimate algorithm used random forests,"

Jim