Web is almost an unlimited source of information. Using search engines such as Google, Bing and similar we can easily find web pages with possibly relevant information. The number of returned pages would usually however be very large which does not allow for manual processing. The solution to this are computer programs that are able to find and extract relevant information from possibly very large number of non-structured or semi-structured documents and return results in structured form.
COURSE GOAL
The main objective of this course is to teach students about how to develop programs for web search (including surface web and deep web search) and for extraction of structural data from both, static and dynamic web pages. Beside basic concepts of the web search and retrieval, students will learn about relevant techniques and approaches. After the course, if successful, students will be able to develop programs for automatic web search and structured data extraction from web pages (including search and extraction from on-line social media).
COURSE CONTENT
The main topics that will be addressed within the course are:
REQUIRED KNOWLEDGE
It is expected from students that they know at least basics of program languages and technologies such as, Java, JavaScript, Python, HTML, CSS, web page structure.
COURSE GRADING
For a positive grade at this course students are expected to successfully finish three projects (seminars) and written examination (at least 50% of all points) .