Abstract
Data-rich fields such as the Earth sciences benefit from transparent, structured documentation. Based on best practices in software engineering, we adapt for the Earth sciences a questionnaire format known as a datasheet which guides the author to document both biases and technical information in a dataset. Datasheets complement existing standards for documentation by eliciting information about technical aspects and biases together within the scope of a project. This combination of information is not easily obtained elsewhere and provides transparency, aids reproducibility, and informs subsequent uses of data. This information is broadly useful for all research applications, and vital for data-driven methods such as machine learning which are strongly influenced by biases within training data. Datasheets synthesize information uniquely known to the creator of a dataset, and provide easy and equitable access to information otherwise restricted to community networks. We adapted the datasheet format for the Earth sciences through our own knowledge and further tailored it through multiple years of community feedback. We address common concerns that arose through this feedback process, such as the time commitment needed for completion and distinctions between dataset creation and dataset usage. We also contrast our format with other well-known dataset documentation efforts.
© 2025 American Meteorological Society. This is an Author Accepted Manuscript distributed under the terms of the default AMS reuse license. For information regarding reuse and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).
These authors contributed equally to this work.
Suggested citation: Connolly, C. J., Hueholt, D. M., & M. A. Burt (2025). Datasheets for Earth Science Datasets.