Abstract:
Machine learning (ML) applications in weather and climate are gaining momentum as big data and the
immense increase in High-performance computing (HPC) power are paving the way. Ensuring FAIR data and
reproducible ML practices are significant challenges for Earth system researchers. Even though the FAIR
principle is well known to many scientists, research communities are slow to adopt them. Canonical
Workflow Framework for Research (CWFR) provides a platform to ensure the FAIRness and reproducibility
of these practices without overwhelming researchers. This conceptual paper envisions a holistic CWFR
approach towards ML applications in weather and climate, focusing on HPC and big data. Specifically, we
discuss Fair Digital Object (FDO) and Research Object (RO) in the DeepRain project to achieve granular
reproducibility. DeepRain is a project that aims to improve precipitation forecast in Germany by using ML.
Our concept envisages the raster datacube to provide data harmonization and fast and scalable data access.
We suggest the Juypter notebook as a single reproducible experiment. In addition, we envision JuypterHub
as a scalable and distributed central platform that connects all these elements and the HPC resources to the
researchers via an easy-to-use graphical interface.
-
Journal:
DATA INTELLIGENCE
-
Subject:
Computer Science
>>
Integration Theory of Computer Science
-
Cite as:
ChinaXiv:202211.00440
(or this version
ChinaXiv:202211.00440V1)
DOI: 10.1162/dint_a_00131
CSTR:32003.36.ChinaXiv.202211.00440.V1
- Recommended references:
Amirpasha, Mozaffari, Michael, Langguth,Bing, Gong,Jessica, Ahring,Adrian, Rojas Campos,Pascal, Nieters, Otoniel, José Campos Escobar, Martin, Wittenbrink,Peter, Baumann,Martin, G. Schultz.(2022).HPC-oriented Canonical Workflows for Machine Learning Applications in Climate and Weather Prediction.DATA INTELLIGENCE.doi: 10.1162/dint_a_00131
(Click&Copy)