Skip to content
×
Pro Members Get
Full Access!
Get off the sidelines and take action in real estate investing with BiggerPockets Pro. Our comprehensive suite of tools and resources minimize mistakes, support informed decisions, and propel you to success.
Advanced networking features
Market and Deal Finder tools
Property analysis calculators
Landlord Command Center
ANNUAL Save 16%
$32.50 /mo
$390 billed annualy
MONTHLY
$39 /mo
billed monthly
7 day free trial. Cancel anytime
Real Estate Technology
All Forum Categories
Followed Discussions
Followed Categories
Followed People
Followed Locations
Market News & Data
General Info
Real Estate Strategies
Landlording & Rental Properties
Real Estate Professionals
Financial, Tax, & Legal
Real Estate Classifieds
Reviews & Feedback

Updated over 8 years ago on . Most recent reply

User Stats

17
Posts
6
Votes
Aleks Petrov
  • Belmont, CA
6
Votes |
17
Posts

Creating county Web Scrapper

Aleks Petrov
  • Belmont, CA
Posted

Hi Biggerpockets community!

My name is Aleksei, and I’m planning :) to invest in RE.

I live in expensive area, so my steps in RE should be very careful. I’m active listener of BP video podcasts and it is excellent source of information.

While I’m learning theory, I thought what I can do with my knowledge as software developer?

I know that all data about properties is public accessible, but you cannot do search using particular filters. So I decided to write my own web scrapper and create own data base. 

Picked 1 county for pilot project, and in this topic I’ll post updates about challenges in web scrapping.

Please let me know if it is going to be interesting topic, so I’ll keep posting updates.

Thanks BiggerPockets for being so awesome resource!

Best regards, Aleksei.

Most Popular Reply

User Stats

17
Posts
6
Votes
Aleks Petrov
  • Belmont, CA
6
Votes |
17
Posts
Aleks Petrov
  • Belmont, CA
Replied

@Trevor Ewen thanks for reply!
it is good question 'what then'...
I'm planning to create full copy of DB, so I can do any query per demand.

Goal #1: get all available APN’s numbers for given county.

Goal #2: get all data on available APN’s and save on local DB

Part1:

Web site has form with 1 billion available inputs, so need to iterate and find correct numbers.

I wrote script that using web browser to input numbers and submit search. Results I decided to store in CSV file.

1 browser executing script in 2 seconds, I was able to run in 16 browsers total (2 machines x 8 browsers).

2 seconds * 1 billion / 16 browsers = 125000000 seconds = 1446 days which is not acceptable.

Next solution is to use API requests to omit browsers.

In this case I can run 1 request/response in 0.2 seconds and can execute ~10 parallel executions:

0.2 second * 1 billion / 10 = 20000000 seconds = 231 days.. much better than previous result, but still slow.

Right now I don’t have better solution..

I’ve noticed that I can search APN’s on 2 (at least) different web sites, my next step is to check if this sites using different DB (or copies) and I can double my speed by hitting 2 points.

Will keep you posted.

Loading replies...