Synthetic samples generator (SYSGEN), an approach to increase the size of incidence samples in coffee leaf rust modelling

Edwar Javier Girón,David Camilo Corrales, María Paz Sesmero,Jose Antonio Iglesias,Juan Carlos Corrales

Evolving Systems（2021）

引用 0|浏览10

暂无评分

摘要

Rust is declared as big problem for coffee farmers. Several rust attacks were occurred in Latin American countries as Colombia, Mexico, Peru, Ecuador and Salvador. Due to damage caused by coffee rust, several regression models have been proposed to estimate the rust from weather variables. However, these models lack real rust samples because the recollection process of samples requires large expenses of money and time. Considering this issue, we propose in this paper a mechanism called SYnthetic Samples GENerator (SYSGEN). This proposal is based on cubic spline interpolation to increase the size of rust incidence samples (RIS) and expert knowledge to adjust the rust progress curve in Colombian coffee crops. In order to demonstrate the reliability of SYSGEN, we built 132 regression models from synthetic incidence samples (dependent variable) and weather observations (independent variables). To do this, we considered three Colombian coffee regions, five experiments and four regression models. Besides, we used Recursive Feature Elimination (RFE) to select the relevant weather variables. The analysis of these models and RFE are promising since several aspects and effects related with the rust development are revealed. One of these aspects is that the regression models used frequently temperature (maximum, minimum and average) and relative humidity variables. In this sense, it is important to highlight that these meteorological variables are considered by experts as key drivers in germination, penetration, colonization and sporulation phases. In terms of performance, our experiments allow us to conclude that random forest (RF) and bagging trees (BT) reached the lowest Root Mean Square Error (RMSE). Finally, it is important to consider that different datasets produce different performance. For example, if we consider those experiments that involve flowering periods datasets, the lowest RMSE was reached by RF. However, in datasets of coffee harvest periods, BT reached lowest RMSE.

查看译文

关键词

Expert knowledge, Interpolation, Regression models, Feature selection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要