Synthetic Data and AI - Teaching a Neural Network to Identify Clouds Despite the Lack of Annotated Observation Data

Ronald Scheirer, Aleksis Pirinen,Nosheen Abid, Nuria Agues Paszkowsky, Thomas Ohlson Timoudas, Chiara Ceccobello,György Kovács, Anders Persson

crossref(2024)

引用 0|浏览0
暂无评分
摘要
Clouds are characterized - among other things - by their intense variability in time, space and optical thickness. These variables impact the modulation of solar radiation (reflection, transmission and absorption) and may distort the signal from the surface beneath. This in turn makes it important to detect even optically thin clouds using remote sensing methods, even if the focus is on earth observation. This study has been initiated by the Swedish Forest Agency (SFA). In order to reduce the proliferation of bark beetles, SFA needs to identify stressed trees at an early stage. To this end, high-resolution scenes from the Multi-Spectral Imager (MSI) on board the Sentinel-2 platforms were analyzed. Unfortunately, the quality of ESA's scene classification layer (SCL) does not meet the requirements for reliably sorting out scenes contaminated with thin clouds. To overcome this problem, it was decided to make use of the fact that the integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance on remote sensing tasks. But a common difficulty is that ML methods typically depend on large amounts of annotated data for training. Annotation or classification is usually done manually or by a superior instrument (i.e. active LIDAR). Since such a data basis is missing, a synthetic database (based on simulations instead of observations) was generated to train a Multi Layer Perceptron (MLP). The dataset consists of 200,000 data points, which have been simulated taking into consideration different cloud types, cloud optical thicknesses (COT), cloud geometrical thickness, cloud heights, as well as ground surface and atmospheric profiles. The MLP is trained to predict COT as a proxy for the cloud/clear decision. The performance of the proposed algorithm using both synthetic data (as used during training) and real satellite observations (never presented to the algorithm before) will be discussed in detail. It was found that the MLP approach trained on 1D synthetic data can seamlessly transition to real datasets without requiring additional training. Furthermore it outperforms the ESA-SCL.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要