IPS & CPS

Dec 8. 15:40-17:20

14SCO Theatre 4. IPS16: Statistical Modeling for Medical and Biological Data

IN THIS SESSION

Organiser: Ivan Chang

  • Osamu Komori (Seikei University), Yusuke Saigusa (Yokohama City University), Shinto Eguchi (The Institute of Statistical Mathematics)

Species distribution modeling plays a crucial role in estimating the abundance of species based on environmental variables such as temperature, precipitation, evapotranspiration, and more. Maxent, a representative method, is widely used across various scientific fields, especially in ecology and biodiversity research. However, the calculation of the normalizing constant in the likelihood function becomes time-consuming when the number of grid cells sharply increases. In this presentation, we propose geometric-mean divergence and derive an exponential loss function, which can be calculated efficiently even when the number of grid cells increases significantly. A sequential estimating algorithm is also proposed to minimize the exponential loss in a way similar to that of Maxent. The results of simulation studies and the analysis of Japanese vascular plant data are demonstrated to validate the efficacy of this approach.

  • Sheng-Mao Chang (National Taipei University)

This study was motivated by two image discrimination examples: handwritten digit recognition and COVID-19 lung CT scanning image recognition. These two problems have a significant difference. Handwritten ones, for example, have a slash in the middle of all images, whereas locations of lung damage vary from one person to another. Linear classifiers excel at handling the former due to the consistent patterns, but they struggle with the latter due to the varying lung damage locations. To tackle the latter discrimination problem, we propose a novel approach called convolutional multiple-instance logistic regression (CMILR) that combines convolutional neural network (CNN) and multiple-instance learning. In the case of COVID-19 lung CT scans, CMILR resulted in an accuracy of 0.81 with only 169 parameters. In contrast, a fine-tuned CNN model resulted in an accuracy of 0.88 and 377,858 parameters. Additionally, CMILR provides a probability map indicating the likelihood of lung damage, offering valuable insights for medical diagnosis and making the learning algorithm explainable.

  • Yuan-chin Chang (Academia Sinica), Zhanfeng Wang (University of Science and Technology of China) & Xinyu Zhang (University of Science and Technology of China)

Tackling the formidable task of managing massive datasets stands as a paramount challenge in contemporary data analysis, particularly in the critical domains of epidemiology and medicine. This study introduces a groundbreaking approach that harnesses the power of sequential ensemble learning to masterfully dissect these extensive datasets. Our central focus revolves around optimizing efficiency, meticulously considering both statistical and computing dimensions. Furthermore, we tackle intricate challenges, including seamless data communication and the safeguarding of private information, echoing the discourse within the realm of federated learning in machine learning literature.