A Systematic Framework for Data Augmentation for Tropical Cyclone Intensity Estimation Using Deep Learning

Guido Ascenso, Giulio Palcic,Enrico Scoccimarro,Matteo Giuliani,Andrea Castelletti

crossref（2024）

引用 0|浏览0

暂无评分

摘要

Tropical cyclones (TCs) are among the costliest and deadliest natural disasters worldwide. The destructive potential of a TC is usually modelled as a power of its maximum sustained wind speed, making the estimation of the intensity of TCs (TCIE) an active area of research. Indeed, TCIE has improved steadily in recent years, especially as researchers moved from subjective methods based on hand-crafted features to methods based on deep learning, which are now solidly established as the state of the art. However, the datasets used for TCIE, which are typically collections of satellite images of TCs, often have two major issues: they are relatively small (usually ≤ 40,000 samples), and they are highly imbalanced, with orders of magnitude more samples for weak TCs than for intense ones. Together, these issues make it hard for deep learning models to estimate the intensity of the strongest TCs. To mitigate these issues, researchers often use a family of Computer Vision techniques known as “data augmentation”—transformations (e.g., rotations) applied to the images in the dataset that create similar, synthetic samples. The way these techniques have been used in TCIE studies has been largely unexamined and potentially problematic. For instance, some authors flip images horizontally to generate new samples, while others avoid doing so because it would cause images from the Northern Hemisphere to look like images from the Southern Hemisphere, which they argue would confuse the model. The effectiveness or potentially detrimental effects of this and other data augmentation techniques for TCIE have never been examined, as authors typically borrow their data augmentation strategies from established fields of Computer Vision. However, data augmentation techniques are highly sensitive to the task for which they are used and should be optimized accordingly. Furthermore, it remains unclear how to properly use data augmentation for TCIE to alleviate the imbalance of the datasets. In our work, we explore how best to perform data augmentation for TCIE using an off-the-shelf deep learning model, focusing on two objectives: Determining how much augmentation is needed and how to distribute it across the various classes of TC intensity. To do so, we use a modified Gini coefficient to guide the amount of augmentation to be done. Specifically, we aim to augment the dataset more for more intense (and therefore less represented) TCs. Our goal is to obtain a dataset that, when binned according to the Saffir Simpson scale, is as close to a normal distribution as possible (i.e., all classes of intensity are equally represented). Evaluating which augmentation techniques are best for deep learning-based TCIE. To achieve this, we use a simple feature selection algorithm called backwards elimination, which leads us to find an optimal set of data augmentations to be used. Furthermore, we explore the optimal parameter space for each augmentation technique (e.g., by what angles images should be rotated). Overall, our work provides the first in-depth analysis of the effects of data augmentation for deep learning-based TCIE, establishing a framework to use these techniques in a way that directly addresses highly imbalanced datasets.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要