• Techniques for inductive learning from a wide range of partially labeled datasets about various diseases
The Client : Javna agencija za raziskovalno dejavnost RS
Project type: Bilateral projects
Project duration: 2020 - 2021
  • Description

I) INTRODUCTION and DESCRIPTION OF THE RESEARCH FIELD The proposal belongs to the field of intelligent computer systems that can automatically learn from the data and produce predictive models that can be used for describing data, predicting outcomes and obtaining new knowledge. The application of such predictive modeling in medicine is bringing a paradigm shift to healthcare, especially in major disease areas such as cancer, neurology and cardiology (Jiang et al., 2017). To successfully learn the relation between medical symptoms and diseases, medical experts need to correctly diagnose and annotate each patient record. This can be very time consuming and expensive (in terms of diagnostic costs and manual labor). Consequentially, an abundance of only partially annotated (labeled) datasets are available. The field of semi-supervised learning addresses exactly the problem of additionally utilizing an abundance of unlabeled data in addition to labeled data, with the goal to improve predictive model accuracy (Zhu, 2005). In the proposed project we would like to develop new methods for performing semi-supervised learning from medical datasets. The newly proposed methods will apply state-of-the-art approaches, such as deep learning, fuzzy learning and active learning with reliability estimation. II) GOALS OF THE COOPERATION GOALS. The main goals of the cooperation will be to propose, evaluate and apply various methods for semi-supervised learning for detection and inference about the causes and consequences of cancer, cognitive and heart disorders. The specific goals of the project will be: 1.) to overview and analyze advantages and drawbacks of existing semi-supervised learning approaches. The focus will be given to the latest state-of-the-art publications in the field, 2.) to propose new semi-supervised learning approaches that combine benefits of other existing approaches. The ideas for the new approaches include (1) implementation of active learning with selection of the most reliable examples using prediction reliability estimates (Bosnić and Kononenko, 2008); (2) adaptation of supervised clustering to artificially annotate unlabeled examples; (3) fuzzy learning by probabilistically annotating unlabeled examples and performing probabilistic classification; (4) deep learning and utilizing hidden factors (weights on neuron) to infer missing labels, 3.) to empirically test and evaluate existing and newly proposed methods on real medical datasets, including various data about cognitive disorders (Alzheimer and Parkinson's disease), cardiomyopathy data and data about breast cancer, 4.) to establish the long-term cooperation between partner institutions in Ljubljana and Novi Sad that will provide the further cooperation on a European scale (e.g. Horizon 2020 etc.). EXPECTED CONTRIBUTIONS. 1.) A set of new semi-supervised methods that will make use of unlabeled examples with different strategies. The proposed methods will be general enough to be used in arbitraty semi-supervised learning setting (e.g. medical, industrial, financial, insurance, banking etc. datasets). 2.) Comparative report of the developed methods on the available cognitive disorder data acquired from the Neurological institute (Novi Sad), breast cancer data acquired from the University clinical center (Ljubljana), and publicly available heart disease datasets. 3.) Dissemination of findings about newly proposed methods and discovered medical knowledge in interdisciplinary journals between fields of computer science and health informatics, such as Statistical Methods In Medical Research, IEEE Journal of Biomedical and Health Informatics, and Artificial Intelligence In Medicine. III) COMPLEMENTARITY AND ADDED VALUE OF COLLABORATION The complementarity within the collaboration of the both research groups stems from the: (1) differences in expertise in different data analysis methodologies, (2) accessibility of different datasets about heart, cancer and cognitive disorders, and (3) data analysis tools that were developed in their previous work. The researchers from the Serbian group (Novi Sad) have intensively worked with case-based reasoning methodologies, applied in medical domains. This group has also developed the general framework FAP (framework for time-series analysis and prediction) that can be used for development of an arbitrary decision support system (so far, the framework has been used for predicting the multiple sclerosis). This group will contribute the FAP framework and related methodologies; all these will be included into the joint decision system prototype. Faculty in Ljubljana is actively conducting research in the oncological and cardiovascular field, using the models for supervised and unsupervised learning. They have developed the methodology for explaining individual predictions/models that has shown to be promising in medicine as it helps doctors to understand causes for the illnesses and helps plan courses of therapy. They have also developed an innovative methodology for estimating reliability of individual predictions, which is important for medical predictions as it prevents negative consequences of wrong diagnoses or medical actions. ADDED VALUE OF THE COLLABORATION comes from the possible progress that will arise from joining expertise and synergy of both research groups. The results of the proposed project will represent methodological (basic) advancement in big data analysis as well as in the medical knowledge (its causes, indicators, consequences, therapies). Both research groups have carried out significant research in the field of medical decision support systems and published scientific contributions in renowned journals and on conferences. Both research groups are highly motivated for establishing a formal collaboration that would ensure further collaboration on a longer term.