The FAANG Data Sharing Statement

Version 1.0
(Replaced by updated verion 2.0)
This document describes the principles of data sharing held by the FAANG consortium. This document is subject to approval by the FAANG steering committee. Any queries about this document should be sent to faang@iastate.edu.

Definitions

Archive means one of the archives hosted at the EBI, NCBI or DDBJ. These include the ENA, Genbank, ArrayExpress and Geo. A full list of the FAANG recommended archives is available as part of the FAANG metadata recommendations.

Submission means data and metadata submission to one of the FAANG recommended Archives.

FAANG member means an individual who has signed up to the FAANG consortium through the FAANG website and agreed to the FAANG core principles.

Data means any assay or metadata generated for or associated with FAANG experiments.

Analysis means any computational process where raw assay data is aligned, transformed or combined to produce a new product.

Internal means data that is only accessible via the FAANG private shared storage.

Private shared storage means a storage space hosted at EMBL-EBI that has password access via FTP, aspera and Globus Grid FTP technologies.

Public means all data available through the FAANG public FTP site, which has no password and is accessible to everyone.

FAANG recognizes that quickly sharing the data generated by the consortium with the wider community is a priority. Rapid data sharing before publication ensures that everyone can benefit from the data created by FAANG and can take advantage of improved understanding of the functional elements in these animal genomes to aid their own research.

All raw data produced for a FAANG associated project will be submitted to the archives without any hold until publication date, thus allowing the data to be publicly available immediately after successful archive submission and useful to the community as soon as possible.

The FAANG analysis group will turn the raw data into primary and integrated analysis results. Primary analysis results consistent of sample level analysis such as alignment to a reference genome or quantification of signal in the assay. Integrated analysis results represent analyses which drawn together data from multiple samples and/or experiments such as genome segmentation or differential analysis results.

The majority of these analysis results will not be archived before publication but FAANG recognizes the need to share them both within the consortium and with the community. Initially all files that are not archived will be shared between FAANG members in private shared storage hosted at the EMBL-EBI. Any individual who signs up to FAANG and agrees to the Toronto principles 1 will be allowed access to this. There will be metadata files in the private data sharing area, which make credit for different datasets as clear as possible.

FAANG expects to make multiple releases each year. A data release will involve declaring a data freeze and copying all files associated with that data freeze from the private shared storage to the public FTP site. In the first instance these data freezes will contain the primary analysis results. As FAANG's analyses progress, the data freeze will be expanded to include integrative analysis too. The data freeze process will be coordinated by the FAANG Data Coordination Centre and will be based on consultation with FAANG members. FAANG will also aim to release all data associated with a paper before publication even if it lies outside this standard freeze cycle. The public data will be available to the whole community.

All FAANG public data is released under Fort Lauderdale principles 2. The FAANG website, data portal and FTP site will all have clear data reuse statements on them.

When considering internal FAANG data, if one FAANG member wishes to publish using data generated by another FAANG member they should first contact the data generator and clarify the member's publication strategy. Collaboration is for everyone's benefit and is strongly encouraged. The FAANG Steering Committee commits to report to journal editors and the laboratories involved any event that disregards the rights of data creators (including biological measurements as well as analysis of such measurements).

All members of FAANG can and will continue to do experimental and analysis work outside of FAANG and the other data generated is not required to meet the same data sharing expectations.

Only FAANG data can be distributed through the private storage and public FTP site.

REFERENCES:

  1. Toronto International Data Release Workshop: Rapid release of prepublication data has served the field of genomics well. Attendees at a workshop in Toronto recommend extending the practice to other biological data sets.
  2. Fort Lauderdale principles: Reaffirmation and Extension of NHGRI Rapid Data Release Policies: Large-scale Sequencing and Other Community Resource Projects. (alt link)
Approved by the FAANG Steering Committee on May 26, 2015;
(Replaced by updated verion 2.0)
* Refer to FAANG Questions and Answers for questions related with FAANG data share practices.