Root Data Commons

Repository for general public data sets of scientific interest, hosted on the OSDC.
The Root Data Commons features a variety of social science, biology, genomics, and general purpose data of interest to the research community. The OSDC has ~1PB of public data in a wide variety of disciplines.   These data sets are freely available and can be downloaded over the internet or high performance networks for analysis locally.   Information on how to download can be found at the links below for each individual dataset.   All recipients of OSDC resource allocations can also compute directly over the data in the Public Data Commons, without having to download them locally.  Datasets hosted in the OSDC Public Data Commons are reviewed periodically as part of our resource allocation process.   If you have suggestions about data that should be included, please let us know at info@occ-data.org 
Human sequence data from populations around the world with the goal of cataloging human genetic variation.
Total Size: 396.7TB
Identifiers:
  • ark:/31807/osdc-4a3ec448
Keywords: biology, genomics
Last Modified: 2013-06-04 15:30:00 UTC
Data set from the City of Chicago Data Portal in JSON format for tabular data and the raw files for "blob" data.
Total Size: 9.5GB
Identifiers:
  • ark:/31807/osdc-eb865c84
Keywords: social science
Last Modified: 2012-10-25 14:03:18 UTC
Whole human genome sequence data sets provided by Complete Genomics, containing 69 standard, non-diseased samples as well as two matched tumor and normal sample pairs.
Total Size: 50.4TB
Identifiers:
  • ark:/31807/osdc-919d4bed
Keywords: biology, genomics
Last Modified: 2013-06-04 15:30:00 UTC
Unified Data Resource for 3-Dimensional Electron Microscopy.
Total Size: 122.1GB
Identifiers:
  • ark:/31807/osdc-9d410a22
Keywords: biology
Last Modified: 2013-06-18 11:17:00 UTC
Data sets based on the original Enron emails released to the public by the Federal Energy Regulatory Commission as part of their investigation.
Total Size: 154.1GB
Identifiers:
  • ark:/31807/osdc-5597413b
Keywords: text data, social science
Last Modified: 2012-08-20 12:51:00 UTC
FlyBase is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species.
Total Size: 661.0GB
Identifiers:
  • ark:/31807/osdc-f222e3c5
Keywords: biology, genomics
Last Modified: 2013-04-23 17:54:00 UTC
The GSS contains a standard "core" of demographic, behavioral, and attitudinal questions, plus topics of special interest.
Total Size: 202.1MB
Identifiers:
  • ark:/31807/osdc-64c4b1f3
Keywords: social science
Last Modified: 2013-04-24 16:39:00 UTC
N-gram data obtained from over 5 million books digitized by Google. Contains all n-grams that appeared in over 40 books.
Total Size: 863.4GB
Identifiers:
  • ark:/31807/osdc-6a9633ac
Keywords: text data, social science
Last Modified: 2012-08-07 19:01:22 UTC
Model reduction dataset: Heat transfer in random media.
Total Size: 4.0TB
Identifiers:
  • ark:/31807/osdc-cf45683a
Keywords: model reduction
Last Modified: 2013-08-29 09:59:56 UTC
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Total Size: 199.5GB
Identifiers:
  • ark:/31807/osdc-c1c763e4
Keywords: social science, music
Last Modified: 2013-12-11 13:47:45 UTC
Encyclopedia of genomic functional elements in the model organisms C. elegans and D. melanogaster.
Total Size: 10.9TB
Identifiers:
  • ark:/31807/osdc-381ed653
Keywords: biology, genomics
Last Modified: 2013-07-28 01:35:52 UTC
All datasets from the NCBI FTP site except 1000genomes, pub, and sra.
Total Size: 10.8TB
Identifiers:
  • ark:/31807/osdc-f16c2fa3
Keywords: biology, genomics
Last Modified: 2013-06-18 11:25:15 UTC
The text of over 42,000 free ebooks.
Total Size: 742.1GB
Identifiers:
  • ark:/31807/osdc-5d5dd1a7
Keywords: text data, social science
Last Modified: 2013-12-18 13:33:41 UTC
The PDB contains 3D structural information on biological macromolecules.
Total Size: 243.4GB
Identifiers:
  • ark:/31807/osdc-bf242fd3
Keywords: biology
Last Modified: 2013-06-04 15:30:00 UTC
Data from the decennial United States Census as well as the Economic Census and the American Community Survey.
Total Size: 1.8TB
Identifiers:
  • ark:/31807/osdc-b7b76e53
Keywords: social science
Last Modified: 2013-12-09 15:54:18 UTC