MLS Crawler Tool To Summarize

15 Replies

Do you guys know any tool that crawls the MLS or sites like Zillow/Trulia and extracts information from it to excel or some sort of document with everything summarized?

Will be nice to set up some triggers (show me properties only below $x with y rooms and z ROI)

If not - would you need something like that? If so, how much will you pay for it?

Sounds great to me, but I probably wouldn't pay much.  I'd just do all the leg work myself like I do now.  Sorry, if you were thinking of developing one.  I'm sure a lot of other folks would use it!

You can probably have a freelancer in India set up a python script that can easily do that, however if it's using the Zillow API or similar they are going to limit the number of requests you can make daily

@Ian Ippolito mentioned about this as one of his secret weapons in his blog. hopefully nhell elaborate more in the future. I'm very interested in this type of tools as it can save so much time that can be used elsewhere.

Thanks @Hau N. , and yes, I do plan to write an article on it in the future. For you and @Sagiv O. , here's more information:
1) the tool allows me to  select number of rooms, number of baths, square footage, zip codes to scan etc.

2) it then goes to Zillow to pull from the MLS. @Steve B. is correct that Zillow limits the number of requests you can make daily through their API, which essentially makes it useless for such a tool. So this means you have to do a technique called screen scraping to get by the limit. Unfortunately, this means every time Zillow changes their site the program breaks.  So it is quite a bit of maintenance to keep it running. Also, it means you have to do the requests kind of slowly so that Zillow doesn't kick you off the site. For example, when I scan all the ZIP Codes in my area of Tampa, it takes hours, so typically I run the program overnight and have the report ready for me in the morning. 

3) Some of the MLS properties don't meet the filters. But if they do, then the program looks further because there are still going to be way too many properties at this point. It ties into Trulia to get crime statistics, and then also filters on that. Again the same restrictions as Zillow with screen scraping and keeping it slow.

4) If it passes that test, then it ties into RentoMeter. This one is very important, because at this point there still too many of properties, in most are complete waste of time because they won't meet my yield requirements. So this part of the program does a rough estimate on how much I can expect to earn on the property, and checks that make sure it passes my filter. Ideally, it would be able to do a check on the property itself, but doing that again would overload RentOMeter and cause them to kick the tool off the site. So instead, I have to do it to pass test. At this stage, I just pull the data for the ZIP Code, and then store it and reuse it for every property in the ZIP Code. That greatly reduces the amount of data that has to be pulled from RentOMeter and weeds out 70% of the bad ones which is a huge improvement. Later in the process I do it on the specific property to weed out the rest.

5) The tool applies additional filters.

6) If it survives at this point, then it does a 2nd RentOMeter check on it, but this time on the property itself rather than the ZIP Code. This is a much more accurate check, and can be done at this stage because there are so few properties and it won't overload rent to meter and cause them to kick the tool off the site.

At this point, out of my entire Tampa area with maybe 20 ZIP Codes, I'll end up with a list of 5 or 6 properties that the tool spits out in Excel. Like I said earlier, I start the program before I go to bed and then when I wake up in the morning my report is there. It's like magic and each report says me many man months of time.

Developing the program wasn'tt easy. It took about 3 to 4 months to develop the initial tool and get it to the point where it was usable (not spitting out bad data or missing good properties) and not getting kicked off the sites. And at that point I thought I was done, but that was actually the easy part. The biggest challenge and cost of the tool is keeping it working, because of all the screen scraping. Every time 1 of the sites changes its format even slightly, the tool breaks and has to be reprogrammed. And they are constantly changing their format, which can drive you a little crazy (although it makes your program are very happy since it keeps them gainfully employed).

@Ian Ippolito I assume your doing this is Python?  If your reading the data via JSON couldn't you parse it so it could be more robust against new page changes?  I plan to do something similar in the future to what you have done via web scraping so I'm curious.

Been thinking of doing something very similar to this.... would love to collaborate vs trying to reinvent the wheel myself. Anyone open to it?

@Ian Ippolito   Have you considered opening this up to other investors and charge a fee for it? It'll be another stream of passive income for you.  

@Ian Ippolito that's exactly what i had in mind, plus some few features. 

I'm actually working on it right now. I'm a software engineer and already built similar tools for Amazon FBA sellers (check https://fba.zone). 

@Matt K. , @Steve B. - would love to collaborate if you're down. 

@Hau N. , @Jody Schnurrenberger - you can be our first beta testers and would love to learn from you what will make you happy with this tool!

@Sagiv O. I would love to test it out for you.  That would be awesome!

When I first read your thread, I thought, "Cool, but I'm not really interested in paying much.  But hey, I can bump this guy's thread up by commenting to get more attention and help him out."  Thanks for mentioning me, @Sagiv O. !  (I forgot to Follow it.)  Now that I've read all these comments, especially from @Ian Ippolito , I'm psyched about the tool!  Hahaha!  Maybe I just needed better marketing to get me pumped.  ;-)  Anyway, I'd love to be a beta tester!  Now I'm REALLY glad I bumped your post up to be seen again.  It got a lot more attention than it did the 1st time and the whole real estate community may benefit from it.  :-D

One suggestion, keeping in mind I only understood like 2/3 of the stuff you guys were talking about...  lol  Could you try teaming up with Zillow, Trulia, and Rentometer to get a variance to allow your app special access more often than your regular Joe Schmo?  Otherwise, that whole overnight thing seems like a good workaround.  Great thinking!  Though it certainly sucks that any little change "breaks" the code.  I wonder if they do that on purpose...

@Steve B. , no unfortunately that doesn't work. You have to read the HTML of the page, and yes you do parse it. However when the page changes it breaks your parsing, so you have to recode it. It's unfortunately just the nature of screen scraping.

@Hau N. , you're right, and yes I'm fine with opening this up to other investors for a fee if there's demand (maybe from someone who wants an immediately working product, rather than going through beta testing). In thinking about it, it doesn't make sense to use the usual model and charge a flat fee (say $100) for the software, because it's going to break pretty soon after a person buys it. So then the user has to pay a constant monthly maintenance fee (maybe $5/month) which is more expensive and a hassle. And also, even after the person bought it they would still have to pay $15 more per month for the subscription to RentOMeter, which makes it even more expensive.

I could make it cheaper for the user by selling the reports (which I would generate from my own machine and my own rent o meter account) instead of the software. I'm thinking at $20 per month for a reasonable number of reports (5 per month, since it takes time to go through the final results and do due diligence before you're ready for a new one). It would still be cheaper than owning the software. What do you think?

 @Matt K. , @Jody Schnurrenberger if you might be interested, let me know. You too @Sagiv O., maybe you'd rather just have something that works immediately then take the time to start a new project.

Updated 9 months ago

It turns out that it’s probably a no-no to resell the information from Zillow, Trulia etc.. So I’m going to have to purchase the data from a provider who sells it and allows redistribution. This unfortunately means charging more to customers...at least until there are enough customers to be able to bring the prices down.

@Ian Ippolito , I'm SO glad you came up with the idea of just selling the reports! That sounds MUCH better to me! lol I'm down for that...when I'm ready to buy again. It will be 2018. I'm hoping earlier rather than later, but my financial manager really wants me to sell my current SFR BEFORE buying another. I'm open to whatever order it happens in. lol

But, you know, if you start this side business, you might quickly have to invest in more computers...just saying.  I can see this blowing up pretty quickly.  ;-)

@Ian Ippolito I agree with @Jody Schnurrenberger .  I like the reports selling idea.  You get to keep your work private and we don't have to bug you for a new program every time it needs an update. LOL.  

@Sagiv O. - on my MLS (Trend / Bright) I can set up auto searches and download all of those property records meeting my criteria to excel on my computer in about 30 seconds. I.e. All 3br or greater for sale under $200k in xyz zipcodevyields thousands of records I can download. The thing I don’t know with my tool is the rentometer or ARV quickly. I think you could sell the tool you described to private equity types (big $$) and/or reia members (lots of little $$)

Join the Largest Real Estate Investing Community

Basic membership is free, forever.

By signing up, you indicate that you agree to the BiggerPockets Terms & Conditions.