CAFE Dataverse Collection README Template#
All datasets shared through the Climate CAFE Dataverse Collection should include a README file. When you submit your dataset for review to be published to the CAFE Collection, the reviewer will verify that the questions below have been answered in either the attached README file or the metadata. Below are some questions to get you started on what information should be included. Not all of these questions need to be answered through the README if they are already described in the Metadata.
The bolded questions do not have a designated metadata field where the info can be entered and should only be provided through the README file.
1. Briefly describe the dataset for which this README is attached.
The ‘Description Text’ Metadata field should include a longer, more in-depth description.
2. Describe how all relevant files in this dataset are connected and if they connect with any files from another dataset.
This should include what code files are used to generate computation-ready datasets.
3. Have you uploaded a data dictionary?
A data dictionary should be included in the README or uploaded separately. This data dictionary should consist of a description of the variables in your data upload. Field names should be described, including a description of units (if applicable).
4. When relevant, answer each of the following questions about the motivation behind the dataset:
- What is the motivation behind the creation of this dataset? This information can be provided through the ‘Description’ metadata field.
- Who funded the creation of this dataset? This information can be provided through the ‘Funder Information’ metadata field.
- What groups/people were involved in the collection/generation/processing of this data? This information can be provided through the ‘Author’ and ‘Contributor’ metadata fields.
6. When relevant, answer each of the following questions about the composition of the dataset or the source data:
- Provide a general description of the data uploaded. If there are multiple different datasets, describe each. This information can be provided through the ‘Description’ metadata field.
- What are the relationships between the files in this dataset? This information can be provided through the ‘Description’ metadata field.
- Is this dataset a sample or a complete representation of the possible observations for the noted spatial extent? If source data is being utilized, specify whether it is a sample that is being extrapolated to give a complete representation.
- Are there any errors, sources of noise, or redundancies in this dataset? Provide an overview of known limitations or measures of uncertainty for your uploaded data or the source from which it was derived.
- Is the dataset self-contained, or does it rely on external sources to process or access it?
- If external data sources were used, what are they? Is there citation information or disclaimers that should be included? Source data information should be provided through the metadata fields in the ‘Metadata about Data Sources’ metadata block.
- Has the dataset been de-sensitized (ie, removed potentially identifiable/sensitive information prior to sharing)? If so, please describe what was done to desensitize the dataset.
8. When relevant, answer each of the following questions about the collection/analysis process for this dataset:
- What software/mechanisms/instruments were used to collect the data? If the creation of this dataset involved any software, the name and version numbers should be entered in the ‘Software’ metadata fields.
9. When relevant, answer each of the following questions about the composition of the dataset:
- What preprocessing, labeling, or cleaning was done to the dataset to develop the final product?
- Is the source data included in this dataset or otherwise preserved and accessible? Source data information should be provided through the metadata fields in the ‘Metadata about Data Sources’ metadata block.