Real Estate APIs and Data Science
42 Replies
Sebastian Garcia
from New York, NY
posted almost 3 years ago
Hello everyone!
Quick bit about me- I'm a data scientist working in NYC and literally just started looking into REI a week ago. Mostly because of the dream I have of retiring early with a nice cash flow, but also partly because of the undoubtedly cool stuff I can do with the data.
I'm aware there are probably a good amount of tools/sites already available for determining hot markets in the area, where to invest in hopes of appreciation, predicting house value, etc. But it would be a dope project to build some machine learning model tuned to exactly what I want, like interactions between price and crime and population movement and income shifts and their impact on cash flow, not just property value. And so on and so forth.
I'm going with the Zillow API for now but it seems there's a pretty low rate limit. But it's a good start.
If anyone can point me to less-known real estate open databases/APIs that supply additional information, as well as any smaller data sources you find valuable, please let me know! I'd love to share any findings here.
And if anyone's looked into or done this kind of stuff- data science for real estate- I'm very curious about your experiences and I'd love to hear about them!
Thanks!
Trevor Ewen
Rental Property Investor from Weehawken, NJ
replied almost 3 years ago
Played around with this a bit myself. I am a software engineer. I also live in NYC.
My take is that disorganized and unfiltered data sources are some of the most compelling. Some ideas:
* County auction sites
* Insurance auction sites
* Educational data
* For sale by owner lists (if there is such a thing)
* Listings from smaller real-estate firms
Just a few ideas to get you going. One of the most interesting parts about real-estate is how decentralized it is. The idea of a single source for anything is almost an anathema. I think the smart people recognize that and bring together disparate parties to create win-win situations.
Sebastian Garcia
from New York, NY
replied almost 3 years ago
Thanks for the input! I was wondering why these types of discussions were rare here and it makes sense- there's no centralized source to make it easier and and there's probably hundreds of thousands of local sources that need to be pieced together.
That's definitely a good list to get started, at least for the tiny piece of Jersey I'm looking into. I'm sure as I get more into it I'll have a much better understanding on what types of data is best. Nothing beats domain knowledge.
Thanks again!
Nick Hakim
Analyst from New York, NY
replied almost 3 years ago
@Sebastian Garcia Awesome ideas! I would recommend looking into some of the following sites available to real estate professionals in the North East:
- Propertyshark
- Lavamap
- Corelogic
Most of these sites aggregate data from state/county websites, so you may find some valuable data sources there.
Cesar Ramirez
from weehawken, Nj
replied over 2 years ago
I'm a developer as well. I would love to collaborate on something like this.
Erik Chan
from Burlingame, CA
replied over 2 years ago
Same here, I am also a developer based in San Francisco. Would be interested in discussing some ideas
Diana Smith
from San Francisco, California
replied over 2 years ago
Craigslist
Tony Zuanich
from Orange, California
replied over 2 years ago
here is a API the might be useful:
http://www.ziplabs.us/documentation
Tony Zuanich
from Orange, California
replied over 2 years ago
Here is a site that provides data:
https://www.zaplabs.com/
Tony Zuanich
from Orange, California
replied over 2 years ago
Here is another one the gives a free trial:
https://www.housecanary.com/real-estate-products-technology
Kevin Smith
replied over 2 years ago
I worked with the Zillow API a little bit and having results populate in Excel; but then I was interested in Craiglist results to find the "For Sale by Owner" listings (web scraping; there can be some issues with that if done wrong though so was hoping for an open API there).
That'd be cool to find the For Sale by Owner, load the info in Google Sheets, then you can use App Script for more organization and prep.
Then there are paid services that I haven't checked into yet: https://retsrabbit.com/pricing
Gavin D.
from South Carolina
replied over 2 years ago
So, data is very, very, very, readily available and cheap.... and that's why you do NOT want that data... Cause everyone has it. The data you DO want is the stuff that makes it on to those lists.. just before it lands on those lists.... for example..
properties that have tax payments due, and not yet paid, but not yet late... like 2 months from tax sale. Some of those people will pay their taxes before its too late, but most of the people who are 60 days fro tax sale and owe 1-3k in taxes.... aren't gonna get there. What you need is a developer...conveniently, like me. about 60% of my week is spend scraping data... Real Estate , healthcare, industrial, and EVERY client that I have came to me as their second or third choice.... cause thats what happens when the first guy or the second guy could not get it done.
Gavin D.
from South Carolina
replied over 2 years ago
in anycase, Sebatian, Ive been a full time dev for over 10 years now, and there is one thing I can promise you about data... If it was "INTENDED" to be available to you, ... Its worthless.. for numerous reasons, 1. its likely tampered with.. Zillow is a great example, dont believe me, look at a house in public mode, than switch to owner mode...2. the companies who have useful , relevant, and profitable data do 1 of 2 things with it... use it within their own business model, or sell it for a premium. So where do you get good data? If its properties and realestate you are into... learn to scrape the county Auditor for 'EVERY' property... then, add some additional depth by scraping the county clerk's site for deeds, now, bounce that data down again against GIS and geolocation to get a good grasp of not only population density, but amenities , nightlife, dining etc... And lastly (well not lastly... just lastly for now) phone append and email append the names and addresses to get methods of contact...If you really want to get crazy from that point, compare length of phone line ownership, and age of deed on the property to EPA minor environmental disaster maps and you can sort out who is old and probably has caner, pretty quickly. Want the best data that anyone on this site has ever seen... scrub all of the previous against obituary listing to look for surviving spouses...
Bert La
replied over 2 years ago
@Sebastian Garcia @Erik Chan
Hi Guys,
I am interested in working with on a project like this as well. I have professional experience in both Real Estate and as a Software Engineer. Lets get together and discuss!
Also, I have a plug for an up-to-date standardized and unadulterated set of data.
PM Me if you are interested in working on something!
Dave Nixon
replied over 2 years ago
I'm also a sw engineer just starting to look at real estate.
I expect obvious things are all ready out there like taking the median asking rent for a given property class like a 3 bed 2 bath SFR in anytown from craigslist and dividing it by the median asking price for such SFRs on zillow. Is that sort of thing already available someplace? Maybe even for free?
John Kelsey
from Mckinney, TX
replied over 2 years ago
Is selenium an option for getting data from some of these sites? I know there's a risk of IP getting blacklisted.... but if we stagger the search using a random time generator and spread the work amongst several of us - we could build a pretty nice data set, no?
Sean O'Connor
Rental Property Investor from Atlanta, GA
replied over 2 years ago
Originally posted by @John Kelsey :
Is selenium an option for getting data from some of these sites? I know there's a risk of IP getting blacklisted.... but if we stagger the search using a random time generator and spread the work amongst several of us - we could build a pretty nice data set, no?
Another option is to use an open source VPN to change your IP address every time the IP gets blacklisted.
Has anyone been working on this? I would very much like to work on this project. I currently work as a cyber security consultant and would like to help in any way I can. Please reply or PM me so we can discuss further.
Alex Stepanov
Investor from San Jose, California
replied over 2 years ago
After digging into it for some time, I think it's safe to say that the choices are slim to none - there are some relatively cheap (sometimes free) data sources that are very poor quality, and the good quality data is prohibitively expensive (House Canary charges $1 per API call - WOW).
So far, there seems to be a few types of data aggregators that make their datasets available:
*) Folks who are trying to promote "whitelabel clones" (e.g. Zillow, which comes with a host of restrictions expressly prohibiting "enriching other datasets" etc)
*) Those who will process and sanitize MLS for you (which is a huge task in itself) but you have to have MLS creds, meaning you have to deal with the zoo of MLS providers all by yourself - or pay something like $1200/mo for nationwide (US + Canada) listing feed
*) Folks who charge per call (CoreLogic, House Canary etc.) and who have high quality sets but they don't want you to mine them so they set all sorts of call rates and excessive price per call to ward off "gold diggers"
It is a mistake to say that the data "is abundant" and "if it is available, it must be worthless". It is definitely NOT abundant (and I am not talking about "websites" like Zillow, I am talking about raw datasets) and it is definitely valuable - if you can afford it of course :(
Christian Hubbs
from Pittsburgh, PA
replied over 2 years ago
Has anybody played with the Quandl API? It promises a lot, but it looks like some of the documentation is out of date.
Kevin Zolea
Rental Property Investor from Parlin, NJ
replied over 2 years ago
Great ideas here! I have been working as an Environmental Scientist for the past 6 months and have been learning how to use R and Python to analyze a lot of data. The first thing that came to my mind was how can I use this towards real estate.. What language do you guys prefer to do web scrapping?
Khenkis K.
from Lenox, GA
replied over 2 years ago
Hey hey!!! Now this is my kind of talk!
@Kevin Zolea I think you're already on the right track with R and Python. Essentially, any language you're comfortable with that has a good crawler/scraping framework is the way to go, in my opinion. However, I would say play on your strengths and so you definitely want to stick with Python. GIS is finally starting to ramp up and specialities (including your own) appear to be incorporating the technology more into their domain. ESRI is one of the top players in the GIS scene and guess what? Python just so happens to be their accepted scripting langugage.
Dev Kumar
from Phoenix, Arizona
replied almost 2 years ago
I myself am a data geek and am building a Property Management Software with my partner. Any one has any idea about APIs that i can use to get lease templates by state and also tenant background checks.
Thank You
Minwei Xu
replied almost 2 years ago
I'm also a developer and would love to join such a project. Do you guys have any private repo so that I can contribute perhaps?
Also, this website offers free access to MLS https://www.reso.org/mls-data-access/ but needs to put company name in the application form, so I'm a bit hesitant. Please let me know if anyone made any progress.
Eddie Godshalk
from Santa Clara, CA
replied almost 2 years ago
I am thinking of pivoting from a SaaS application to APIs, so that RE professionals and investors can access that better and more current market data than what CoStar, the MLS, and others provide. More info on SaaS application and data we deliver is at bit.ly/2PWRMXB
But before I build the APIs, I need some information.
Who has the market share for market data for API’s? And what is their pricing matrix? What features and data are most in demand?
Any idea where I might find some better information and facts? Then once built, looking for smart way to get this new market data to the masses.
Appreciate your feedback. Thanks Eddie…