Contrary to classical Machine Learning approaches, Deep Learning (DL) models scale exceptionally well with the amount of available data to learn from. This has led to significant advances in all of computer science, and to a widespread adoption of Artificial Intelligence (AI) methods in many areas of life. And while more efficient algorithms and computing power certainly contribute to these advancements, large amounts of high-quality data are arguably the enabling factor for the current AI revolution.
However, large amounts of high-quality data are not always readily available. This could be the case for newly discovered diseases, where sample sizes are low. Data scarcity is a common issue in medical imaging, for multiple reasons. Medical images are expensive to make and annotate, usually requiring medical professionals and time-consuming labeling efforts. Despite these issues, the medical imaging domain can benefit tremendously from the application models, as AI can be used as a tool to assist clinical decision making and shorten time-to-diagnosis.
Therefore, this work proposes and evaluates 3 distinct methods to improve DL performance under scarce data conditions in medical imaging. The first method examines a quantity-quality trade-off by utilizing additional image annotations in a special knowledge distillation process to improve a classification model. The second method establishes class-specific image augmentations for better data rebalancing in skewed class distributions. The third method evaluates the generation of synthetic images as part of a synthetic data augmentation. It should be noted that the methods presented are not limited to medical images but can be applied to all vision-based use cases.