On Tuesday, 24 February, 2015 at 5 PM the lecture of Eddie Bell from Lyst.com will be hosted on FRI.
Web scraping is an integral part of data acquisition at Lyst. Almost all fashion products sold on our site come from scraping. We run hundreds of spiders in parallel via a distributed scheduling platform and scrape millions of pages each day. One of our main problems is that the data from scraping is not reliable. In this talk Eddie Bell will explain how they built a robust and scalable scraping architecture with the help of machine learning and crowd sourcing.