Do vratu v blatu: A Scalable Scraping Architecture

Do vratu v blatu: A Scalable Scraping Architecture

Announcements

On Tuesday, 24 February, 2015 at 5 PM the lecture of Eddie Bell from Lyst.com will be hosted on FRI.

Web scraping is an integral part of data acquisition at Lyst. Almost all fashion products sold on our site come from scraping. We run hundreds of spiders in parallel via a distributed scheduling platform and scrape millions of pages each day. One of our main problems is that the data from scraping is not reliable. In this talk Eddie Bell will explain how they built a robust and scalable scraping architecture with the help of machine learning and crowd sourcing.

About Eddie Bell

Dr Edward John Lancaster Bell III (Eddie) is an ex-finance PhD who saw the light and joined a start-up. He is the lead data scientist (aka "The Fashematician") at Lyst and he solves fashion data problems using NLP, ML and image processing. He likes describing himself in the third person and long walks on the beach.

For attending please sign in on the web page meetup.com.