Metadata Requirements for Extracted Data Contributions#
This page provides instructions for filling out metadata fields when uploading a dataset to the CAFE Dataverse Extracted Data Contributions subcollection.
Uploading to Dataverse#
Please use this walkthrough page for information about how to upload to the CAFE Collection on Dataverse. A video tutorial is also available: How to Upload to Dataverse - Tutorial
Metadata Entry#
For datasets that are being reposted without modifications from their original form, please add metadata in line with the guidance below:
All datasets will be located in a subcollection within the CAFE collection that is named “Extracted Data Contributions”. Visit the CAFE Collection, and this listed collection, then select “Add Data” and “New Dataset” when you are ready to record metadata and upload files.
In the Citation Metadata - Title field, you will see prepopulated “Extracted Data from: “. Please keep this language and add the source dataset title.
If you had made subsets of a source dataset (ie: selecting subset of variables, specific geographic area, etc.), these should be noted in the title and description
Under Author, enter the name of the source dataset creator (ie: agency, department, etc.).
The Point of Contact field will be pre-populated with CAFE and climatecafe@bu.edu. Please leave this field as is.
In the Description, add a quoted description (if available from the source data) where noted in the metadata template. If no description is available, or if further guidance is required given the complexity/type of data add your own description text. Note the contact information for the source dataset where noted.
Include the Subject and Keywords that best represent the dataset
Note, you can ignore the language about the CCH terms in this section. These will be added separately. Add Keywords you think are relevant, using a controlled vocabulary if applicable.
Include your name in the Depositor Field
In the Notes section include the dataset name, your name, and the download date where listed.
For the Time Period, use the best estimation possible for the start and end dates the data are applicable for. If there are multiple component datasets with separate time spans, you can use the Plus button to the right to note separate entries. The time period will not always be applicable, try to include us much information as possible. If no details are available, use the date of last update to the dataset or 9999-99-99.
If you used any coding or programming to query the data, or if there are relevant scripts to note for using the data, list these in the Computational Workflow Metadata.
In the Metadata About Data Sources section, leave the derivation question marked as yes.
Include all information in the Metadata About Data Sources that is available, including version numbers, institutions, DOI/URL, Date Obtained, Attribution, and Disclaimer.
Thoroughly review any relevant licensing to ensure extraction and posting of data is permitted
The Metadata About Geospatial Files applies to vector polygons, points, line and raster datasets. If you are unfamiliar with geographical data systems, contact a CAFE team member or colleague. Information about the spatial reference may be included in metadata, or may need to be accessed by reading the data into a geospatial processing tool.
At the very bottom, select CCH Terms from the dropdown menu relevant to the source dataset.
Uploading Data and Metadata#
When you are uploading files, maintain the structure and include relevant metadata
If data is downloaded with descriptive file naming, maintain the naming for upload to Dataverse
When you upload the dataset to Dataverse, include with it any metadata files (PDF documents, TXT files, etc.). To keep a record of the data source and URL from which data was downloaded, print both the source data web page to a PDF and save as an HTML to include as added metadata.
For uploaded metadata documents and web page PDFs, add a tag to the file indicating “Documentation”.
You can make this adjustment by selecting the checkmark to the left of the files you will tag, then selecting “Edit Data” and “Tags”.
We would rather capture more information we need than less. Please also print to PDF or save as HTML any other metadata documents, terms and conditions, frequently asked questions, or additional sites that contain details about the dataset being uploaded. Please be thorough in reviewing the available metadata and include as much as possible.
The upload process for Dataverse can crash with very large files or with many small files. If you are uploading many small files, consider using a .zip file to make the upload easier and the data more easily accessible. If your Dataverse session appears to be loading for longer than expected, try to preserve metadata in a separate file and reload your browser. You may need to refill metadata and upload your data again. Reach out to a CAFE team member if issues persist.
Please Note: You will not see the full list of data description fields when first posting your data. After you have uploaded and saved your data initially, please click the Edit Dataset button to add more metadata. After selecting Edit Dataset and Metadata, scroll through to see if there are any additional fields you can complete with the available metadata. Not all fields are necessary, but if any appear relevant, add your content.
If you find Terms and Conditions relevant to your dataset, please download the PDF and upload as Data, and also copy the text as is to the Documentation and Access to Sources field in the Citation Metadata. Preface with “Terms and conditions were captured directly from SOURCE and are pasted below:”
Submission for Review#
Don’t forget to submit your dataset for review when you have uploaded the data and completed all the metadata fields which you are able to complete! The CAFE team will ensure all fields are complete. Expect a 1 to 2 week turnaround, if there are questions or issues, a team member will reach out to you.