New experimental approaches based on high-throughput sequencing are revolutionizing transcriptome research. They allow a genome-wide insight into cells at a single nucleotide level and are altering our understanding of the structure and dynamics of the transcriptome. Even in a single experiment, high-throughput sequencing can gather a large quantity (few GB) of short sequence data. This wealth of information can only be explored by new computational analytics tools, of which design and availability plays a central role in modern biomedical scientific discovery. The general lack of such tools represents a major bottleneck in the scientific workflows. Their development may have a great impact on the genomics and biomedical research community.
The project will develop a set of computational and visualization methods for the identification and quantification of transcriptome elements based on RNA high-throughput sequencing and transcription factor binding preference data. Methods from data mining and artificial intelligence will be used for modeling relations among the transcriptome elements and. We will embed them an intelligent assistant for explorative analysis of transcriptome data.
The expected main results of the project are 1) a computational pipeline for mapping high-throughput sequencing data (RNA-Seq and iCLIP), including methods for the identification and quantification of transcriptome elements, 2) computational methods, a descriptive language and efficient heuristics for modeling relations among transcriptome elements and inference of new hypotheses on transcription regulation, 3) implementation of an intelligent, web-based interface for explorative analysis that will direct the researcher to the most interesting results, and 4) application of the developed tools to model the transcriptome of the social soil amoeba Dictyostelium discoideum during multi-cellular development, intracellular recognition and interaction with bacteria, and in human and mouse model neurodegenerative diseases.