Real Estate APIs and Data Science

29 Replies

@Eddie Godshalk Your Project sounds very promising. I would like to hear more about your final output and what it will look like.

If I understand your question, I would say that ATTOM Data is a data warehouse that you may want to take a look at. They power RealtyTrac and I think they aggregrate Foreclosure, Tax, and neighborhood data, and property data from CoreLogic. At one time, they were offering up an API but I’m not sure about their pricing model.

If I had a Christmas List for an automated feed of data to use for my Real Estate Business, ATTOM Data would be at the top of it.

Interesting ideas. I'm a wantabe data nerd and developer. I had to hire developers. 

Attom data seems like the best API to get data for my need. If you buy more than 1 million records at a time via FTP they will reduce the price down to under a penny. Their MLS access is $3k per state or county (can't remember...I quit listening at 3k)

Attom Data's API calls are like .08 per call delivering hundreds of data points I didn't need. 

I went the scraping route on the county level. It has many challenges due to the fact no two counties are the same. This gives me the ability to also collect court records as other posters have mentioned. 

I guess it really depends on what data you are after. Mine was building dynamic evergreen data sets that are used to create lists.

or the latest real estate buzz word list stacking?


IMHO, national data will do little good if you are looking for profitable investment properties. What you need is hyper-local data. Even general local data is of limited value because you will not be buying an "average" property in an "average" location. You will buy a specific property in a specific location that will target a specific tenant pool. (The property type, configuration, location and rent range defines the target tenant pool. Ask if you are interested in more details.)

The reason you need data for a specific tenant pool is that each tenant pool's needs/wants may be different. For example, it may be critical to one tenant pool that the property is within walking distance to public transit. For another tenant pool, easy freeway access may be critical. The point is that there are few generalizations other than that the property must be in a location that is perceived as safe, looks and smells clean and is priced correctly compared to what your target tenant pool perceives as your competition.

How I developed our software was neither easy or quick. My efforts started about 12 years ago when I decided to change professions and build a business selling investment real estate. My first decision was the location. I was living in the NYC area and quickly determined that this was not a place that would work well. After a lot of research, I chose Las Vegas. (If any one is interested as to why Las Vegas, let me know.)

Once I settled on the location I started researching the market. I quickly realized that the days of driving around looking at properties and cruising real estate sites was over. In Las Vegas, good properties typically remain on the market 3 to 5 days during peak times and with over 10,000 properties on the market at any given time, data mining was the only viable option.

In order to build the software I first had to have a clear understanding of the tenant pool I wanted to target. This required me to define what I considered to be a good tenant. I define a good tenant as someone who:

  • Has stable employment in a market segment that is very likely to be stable or improve over time.
  • Pays all the rent on schedule
  • Takes care of the property
  • Does not cause problems with neighbors
  • Does not engage in illegal activities while on the property
  • Stays for many years

I next determined the tenant pool with the highest concentration of good tenants. With the tenant pool identified, I then developed a property profile (location, type, rent range and configuration) that my target tenant pool would be willing and able to rent. Over time we've developed a number of rules to efficiently filter out properties that are unlikely to be good rentals. For example, if the ratio of bathrooms to bedrooms does not conform to the following, we generally remove the property from consideration: Bedrooms <= Bathrooms + 1. In total, we have about 40 such filters.

Once we reduced the number of candidate properties using filters, we next evaluate properties based on more computationally expensive factors. For example, one of the key factors is subdivision median time to rent. Basically, if it takes a long time for properties to rent in a subdivision, you do not want a property within that subdivision. Properties that take a long time to rent in the good times will be very hard to rent in the bad times, when you are most likely to need the income. How well did our clients do during the 2008 crash? Zero change in rent and zero increase in time to rent. The market value of their properties crashed like every other property in Las Vegas but their income stream was unchanged.

While data mining is critical to get the number of candidate properties down to a manageable size , you still need to go on-site and manually evaluate the property. For example, if there is a constantly barking dog next door the property will not rent, no matter how good the numbers are.

In summary, there is no alternative to targeting a specific tenant pool and acquiring a deep understanding of that pool. Acquiring the information you need to build filters and processes will take time. Also, once you identify candidate properties, you must have a process in place to validate them, never blindly believe what the numbers say.

Feel free to ask questions and I will do my best to respond.

This post has been removed.

Create Lasting Wealth Through Real Estate

Join the millions of people achieving financial freedom through the power of real estate investing

Start here