Abstract
Automated Radiology Report Generation (ARRG) represents a crucial task focused on the automatic formulation of free-text descriptions for radiology images, such as chest X-ray (CXR) images. This task involves creating a detailed and contextually appropriate summary that highlights the content and clinical relevance of the image. Given its considerable ability to mitigate the substantial workload of radiologists, ARRG has increasingly captured the interest of researchers internationally. Despite significant advancements, ARRG still faces several challenges. The first challenge is the generation of lengthy texts. Many existing methods utilize an encoder-decoder architecture similar to that used in image captioning for natural images. However, unlike the concise descriptions typical of image captioning, ARRG must produce extensive paragraphs filled with intricate details. The second challenge pertains to the nuanced distinctions in radiology images. Radiology images frequently show a high degree of similarity, making it difficult for the model to identify distinguishing features among them. The third challenge involves biases present in the visual and textual data. The training dataset frequently overrepresents normal samples, which skews the model’s learning process and limits its effectiveness in identifying abnormalities and anomalies.To address these challenges, this thesis introduces several innovative methodologies aimed at enhancing the quality and accuracy of radiology report generation. Firstly, we propose a multi-layer convolutional visual encoder designed to extract multi-level visual features from the CXR image(s). This encoder provides the text generator with more comprehensive information, thereby improving the quality of the generated reports. Additionally, we explore a retrieval-based multimodal framework that incorporates the conditional radiology report findings text associated with the training image similar to the CXR image(s), thereby enriching the semantic content of the generated radiology reports and enhancing the performance of generated reports. Furthermore, we develop a multimodal framework that integrates indication texts with CXR images to enhance the accuracy of the generated radiology reports. Finally, we extend our multimodal framework by simultaneously leveraging a multi-layer image encoder, conditional radiology report findings texts and indication texts, achieving superior performance in ARRG. Experimental results on two public datasets demonstrate the effectiveness of these approaches, show- casing their superior performance compared to multiple state-of-the-art medical report generation methods.
Date of Award | 2025 |
---|---|
Original language | English |
Awarding Institution |
|
Sponsors | China Scholarship Council |
Supervisor | Stephen McKenna (Supervisor) & Vladimir Janjic (Supervisor) |