Demšar, D.2; Džeroski, S.2; Debeljak, M.2; Krogh, P. H.4
K. K. Tochtermann, A. Scharl, K. Tochtermann, K. ; Scharl2, Scharl, A.2
1 Department of Terrestrial Ecology, National Environmental Research Institute, Aarhus University, Aarhus University2 unknown3 Department of Bioscience - Soil Fauna Ecology and Ecotoxicology, Department of Bioscience, Science and Technology, Aarhus University4 Department of Bioscience - Soil Fauna Ecology and Ecotoxicology, Department of Bioscience, Science and Technology, Aarhus University
Increasing amounts of environmental data are being collected. With environmental data, we often encounter the situation of having to predict several target variables of similar type, such as biomasses of different species. This situation is usually handled by computing an aggregate target variable (like total biomass or a biodiversity measure) and then predicting the aggregate variable. An other possible (but rarely taken) approach is to model all target variables and then calculate the aggregate variable from the model outputs. In this paper, we try to answer the question whether the simpler approach of producing one model for the aggregate target variable is worse than the more complex approach of producing multiple models and then calculating the aggregate variable from the model outputs. We do this by taking a dataset describing the agricultural events and soil biological parameters as independent variables and a set of microarthropod species biomasses as dependent variables. We calculated several aggregate target variables such as total biomass, Shannon biodiversity and species richness from the original data. We build models to predict these directly, and also build separate predictive models for the biomass of the microarthropod species and calculate the aggregate target variables from the outputs of these models. We compared the aggregate variables calculated from the measured data, the aggregate variables predicted directly and the aggregate variables calculated from the outputs of the models for individual species using the Parson correlation coefficient and two additional error measures. Our results show, that in most cases first calculating the aggregate variables, and then learning models to predict these directly yields better results than modeling individual species and then calculating the aggregate variables from the predictions of these models.
Managing Environmental Knowledge: Enviroinfo 2006: Proceedings of the 20th International Conference on Informatics for Environmental Protection, Aachen: Shaker Verlag, Graz, Austria, 2006, p. 295-302