Real Estate APIs and Data Science

29 Replies

Hello everyone!

Quick bit about me- I'm a data scientist working in NYC and literally just started looking into REI a week ago. Mostly because of the dream I have of retiring early with a nice cash flow, but also partly because of the undoubtedly cool stuff I can do with the data.

I'm aware there are probably a good amount of tools/sites already available for determining hot markets in the area, where to invest in hopes of appreciation, predicting house value, etc. But it would be a dope project to build some machine learning model tuned to exactly what I want, like interactions between price and crime and population movement and income shifts and their impact on cash flow, not just property value. And so on and so forth.

I'm going with the Zillow API for now but it seems there's a pretty low rate limit. But it's a good start.

If anyone can point me to less-known real estate open databases/APIs that supply additional information, as well as any smaller data sources you find valuable, please let me know! I'd love to share any findings here.

And if anyone's looked into or done this kind of stuff- data science for real estate- I'm very curious about your experiences and I'd love to hear about them!


@Sebastian Garcia

Played around with this a bit myself. I am a software engineer. I also live in NYC.

My take is that disorganized and unfiltered data sources are some of the most compelling. Some ideas:

* County auction sites

* Insurance auction sites

* Educational data

* For sale by owner lists (if there is such a thing)

* Listings from smaller real-estate firms

Just a few ideas to get you going. One of the most interesting parts about real-estate is how decentralized it is. The idea of a single source for anything is almost an anathema. I think the smart people recognize that and bring together disparate parties to create win-win situations.

@Trevor Ewen

Thanks for the input! I was wondering why these types of discussions were rare here and it makes sense- there's no centralized source to make it easier and and there's probably hundreds of thousands of local sources that need to be pieced together.

That's definitely a good list to get started, at least for the tiny piece of Jersey I'm looking into. I'm sure as I get more into it I'll have a much better understanding on what types of data is best. Nothing beats domain knowledge.

Thanks again!

@Sebastian Garcia Awesome ideas! I would recommend looking into some of the following sites available to real estate professionals in the North East:

  1. Propertyshark
  2. Lavamap
  3. Corelogic

Most of these sites aggregate data from state/county websites, so you may find some valuable data sources there.

I worked with the Zillow API a little bit and having results populate in Excel; but then I was interested in Craiglist results to find the "For Sale by Owner" listings (web scraping; there can be some issues with that if done wrong though so was hoping for an open API there).  

That'd be cool to find the For Sale by Owner, load the info in Google Sheets, then you can use App Script for more organization and prep.


Then there are paid services that I haven't checked into yet:

So, data is very, very, very, readily available and cheap.... and that's why you do NOT want that data... Cause everyone has it.  The data you DO want is the stuff that makes it on to those lists.. just before it lands on those lists.... for example..
properties that have tax payments due, and not yet paid, but not yet late... like 2 months from tax sale. Some of those people will pay their taxes before its too late, but most of the people who are 60 days fro tax sale and owe 1-3k in taxes.... aren't gonna get there.  What you need is a developer...conveniently, like me.   about 60% of my week is spend scraping data... Real Estate , healthcare, industrial, and EVERY client that I have came to me as their second or third choice.... cause thats what happens when the first guy or the second guy could not get it done.

in anycase, Sebatian, Ive been a full time dev for over 10 years now, and there is one thing I can promise you about data...  If it was "INTENDED" to be available to you, ... Its worthless.. for numerous reasons, 1. its likely tampered with.. Zillow is a great example, dont believe me, look at a house in public mode, than switch to owner mode...2.  the companies who have useful , relevant, and profitable data do 1 of 2 things with it... use it within their own business model, or sell it for a premium.   So where do you get good data?  If its properties and realestate you are into... learn to scrape the county Auditor for 'EVERY' property... then, add some additional depth by scraping the county clerk's site for deeds, now, bounce that data down again against GIS and geolocation to get a good grasp of not only population density, but amenities , nightlife, dining etc...  And lastly (well not lastly... just lastly for now)  phone append and email append the names and addresses to get methods of contact...If you really want to get crazy from that point, compare length of phone line ownership, and age of deed on the property to EPA minor environmental disaster maps and you can sort out who is old and probably has caner, pretty quickly.  Want the best data that anyone on this site has ever seen... scrub all of the previous against obituary listing to look for surviving spouses...

@Sebastian Garcia @Erik Chan  

Hi Guys, 

I am interested in working with on a project like this as well. I have professional experience in both Real Estate and as a Software Engineer. Lets get together and discuss!

Also, I have a plug for an up-to-date standardized and unadulterated set of data.

PM Me if you are interested in working on something!

I'm also a sw engineer just starting to look at real estate.

I expect obvious things are all ready out there like taking the median asking rent for a given property class like a 3 bed 2 bath SFR in anytown from craigslist and dividing it by the median asking price for such SFRs on zillow. Is that sort of thing already available someplace? Maybe even for free?

Is selenium an option for getting data from some of these sites? I know there's a risk of IP getting blacklisted.... but if we stagger the search using a random time generator and spread the work amongst several of us - we could build a pretty nice data set, no?

Originally posted by @John Kelsey :

Is selenium an option for getting data from some of these sites? I know there's a risk of IP getting blacklisted.... but if we stagger the search using a random time generator and spread the work amongst several of us - we could build a pretty nice data set, no?

Another option is to use an open source VPN to change your IP address every time the IP gets blacklisted. 

Has anyone been working on this? I would very much like to work on this project. I currently work as a cyber security consultant and would like to help in any way I can. Please reply or PM me so we can discuss further.

After digging into it for some time, I think it's safe to say that the choices are slim to none - there are some relatively cheap (sometimes free) data sources that are very poor quality, and the good quality data is prohibitively expensive (House Canary charges $1 per API call - WOW). 

So far, there seems to be a few types of data aggregators that make their datasets available:

*) Folks who are trying to promote "whitelabel clones" (e.g. Zillow, which comes with a host of restrictions expressly prohibiting "enriching other datasets" etc)

*) Those who will process and sanitize MLS for you (which is a huge task in itself) but you have to have MLS creds, meaning you have to deal with the zoo of MLS providers all by yourself - or pay something like $1200/mo for nationwide (US + Canada) listing feed

*) Folks who charge per call (CoreLogic, House Canary etc.) and who have high quality sets but they don't want you to mine them so they set all sorts of call rates and excessive price per call to ward off "gold diggers"

It is a mistake to say that the data "is abundant" and "if it is available, it must be worthless". It is definitely NOT abundant (and I am not talking about "websites" like Zillow, I am talking about raw datasets) and it is definitely valuable - if you can afford it of course :(

Great ideas here! I have been working as an Environmental Scientist for the past 6 months and have been learning how to use R and Python to analyze a lot of data. The first thing that came to my mind was how can I use this towards real estate.. What language do you guys prefer to do web scrapping?

Hey hey!!! Now this is my kind of talk!

@Kevin Zolea I think you're already on the right track with R and Python. Essentially, any language you're comfortable with that has a good crawler/scraping framework is the way to go, in my opinion. However, I would say play on your strengths and so you definitely want to stick with Python. GIS is finally starting to ramp up and specialities (including your own) appear to be incorporating the technology more into their domain. ESRI is one of the top players in the GIS scene and guess what? Python just so happens to be their accepted scripting langugage.

I myself am a data geek and am building a Property Management Software with my partner. Any one has any idea about APIs that i can use to get lease templates by state and also tenant background checks. 

Thank You

I'm also a developer and would love to join such a project. Do you guys have any private repo so that I can contribute perhaps?

Also, this website offers free access to MLS but needs to put company name in the application form, so I'm a bit hesitant. Please let me know if anyone made any progress.

This post has been removed.

I am thinking of pivoting from a SaaS application to APIs, so that RE professionals and investors can access that better and more current market data than what CoStar, the MLS, and others provide. More info on SaaS application and data we deliver is at

But before I build the APIs, I need some information.

Who has the market share for market data for API’s? And what is their pricing matrix? What features and data are most in demand?

Any idea where I might find some better information and facts? Then once built, looking for smart way to get this new market data to the masses.

Appreciate your feedback. Thanks Eddie…

Create Lasting Wealth Through Real Estate

Join the millions of people achieving financial freedom through the power of real estate investing

Start here