Public Data Commons

Repository for public data sets of scientific interest, hosted on the OSDC.
The OSDC has 1PB of public data in a wide variety of disciplines.   These data sets are freely available and can be downloaded over the internet or high performance networks for analysis locally.   Information on how to download can be found at the links below for each individual dataset.   All recipients of OSDC resource allocations can also compute directly over the data in the Public Data Commons, without having to download them locally.  Datasets hosted in the OSDC Public Data Commons are reviewed periodically as part of our resource allocation process.   If you have suggestions about data that should be included, please let us know at info@opencloudconsortium.org 
Human sequence data from populations around the world with the goal of cataloging human genetic variation.
Total Size: 396.7TB
Identifiers:
  • ark:/31807/osdc-4a3ec448
Keywords: biology, genomics
Last Modified: 2013-06-04 15:30:00 UTC
ASTER Level-1B Registered Radiance at the Sensor.
Total Size: 23.7TB
Identifiers:
  • ark:/31807/osdc-97469090
Keywords: earth science
Last Modified: 2013-08-02 14:53:22 UTC
Data set from the City of Chicago Data Portal in JSON format for tabular data and the raw files for "blob" data.
Total Size: 9.5GB
Identifiers:
  • ark:/31807/osdc-eb865c84
Keywords: social science
Last Modified: 2012-10-25 14:03:18 UTC
Whole human genome sequence data sets provided by Complete Genomics, containing 69 standard, non-diseased samples as well as two matched tumor and normal sample pairs.
Total Size: 50.4TB
Identifiers:
  • ark:/31807/osdc-919d4bed
Keywords: biology, genomics
Last Modified: 2013-06-04 15:30:00 UTC
Data gathered by the Advanced Land Imager (ALI) Hyperspectral Imager (Hyperion) instruments on NASA’s Earth Observing-1 Mission (EO-1) satellite.
Total Size: 80.5TB
Identifiers:
  • ark:/31807/osdc-c6458e33
Keywords: earth science
Last Modified: 2013-04-24 18:56:07 UTC
Unified Data Resource for 3-Dimensional Electron Microscopy.
Total Size: 122.1GB
Identifiers:
  • ark:/31807/osdc-9d410a22
Keywords: biology
Last Modified: 2013-06-18 11:17:00 UTC
Data sets based on the original Enron emails released to the public by the Federal Energy Regulatory Commission as part of their investigation.
Total Size: 154.1GB
Identifiers:
  • ark:/31807/osdc-5597413b
Keywords: text data, social science
Last Modified: 2012-08-20 12:51:00 UTC
FlyBase is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species.
Total Size: 661.0GB
Identifiers:
  • ark:/31807/osdc-f222e3c5
Keywords: biology, genomics
Last Modified: 2013-04-23 17:54:00 UTC
The GSS contains a standard "core" of demographic, behavioral, and attitudinal questions, plus topics of special interest.
Total Size: 202.1MB
Identifiers:
  • ark:/31807/osdc-64c4b1f3
Keywords: social science
Last Modified: 2013-04-24 16:39:00 UTC
Imagery from the Landsat-7 ETM+ detector.
Total Size: 2.1TB
Identifiers:
  • ark:/31807/osdc-99731751
Keywords: earth science
Last Modified: 2013-06-04 17:00:00 UTC
Imagery from the Landsat-7 ETM+ detector.
Total Size: 1.5TB
Identifiers:
  • ark:/31807/osdc-c99aba25
Keywords: earth science
Last Modified: 2013-07-03 15:52:33 UTC
N-gram data obtained from over 5 million books digitized by Google. Contains all n-grams that appeared in over 40 books.
Total Size: 863.4GB
Identifiers:
  • ark:/31807/osdc-6a9633ac
Keywords: text data, social science
Last Modified: 2012-08-07 19:01:22 UTC
Model reduction dataset: Heat transfer in random media.
Total Size: 4.0TB
Identifiers:
  • ark:/31807/osdc-cf45683a
Keywords: model reduction
Last Modified: 2013-08-29 09:59:56 UTC
Large global climate dynamics simulation run on the Titan supercomputer at Oak Ridge National Laboratory.
Total Size: 2.6TB
Identifiers:
  • ark:/31807/osdc-45e52bca
Keywords: earth science
Last Modified: 2013-10-11 13:38:29 UTC
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Total Size: 199.5GB
Identifiers:
  • ark:/31807/osdc-c1c763e4
Keywords: social science, music
Last Modified: 2013-12-11 13:47:45 UTC
Encyclopedia of genomic functional elements in the model organisms C. elegans and D. melanogaster.
Total Size: 10.9TB
Identifiers:
  • ark:/31807/osdc-381ed653
Keywords: biology, genomics
Last Modified: 2013-07-28 01:35:52 UTC
Data from the moderate-resolution imaging spectroradiometer aboard the Terra (EOS AM) Satellite
Total Size: 85.9TB
Identifiers:
  • ark:/31807/osdc-8a052845
Keywords: earth science
Last Modified: 2013-06-18 11:25:52 UTC
Density of organic carbon in vegetation the conterminous United States at a 30 meter resolution.
Total Size: 79.2GB
Identifiers:
  • ark:/31807/osdc-4074c6cc
Keywords: earth science
Last Modified: 2012-08-02 19:02:23 UTC
All datasets from the NCBI FTP site except 1000genomes, pub, and sra.
Total Size: 10.8TB
Identifiers:
  • ark:/31807/osdc-f16c2fa3
Keywords: biology, genomics
Last Modified: 2013-06-18 11:25:15 UTC
A periodically updated mirror of the publicly available FTP site for the National Climatic Data Center.
Total Size: 3.3TB
Identifiers:
  • ark:/31807/osdc-35f2f09f
Keywords: earth science
Last Modified: 2014-06-12 10:08:40 UTC
The text of over 42,000 free ebooks.
Total Size: 742.1GB
Identifiers:
  • ark:/31807/osdc-5d5dd1a7
Keywords: text data, social science
Last Modified: 2013-12-18 13:33:41 UTC
The PDB contains 3D structural information on biological macromolecules.
Total Size: 243.4GB
Identifiers:
  • ark:/31807/osdc-bf242fd3
Keywords: biology
Last Modified: 2013-06-04 15:30:00 UTC
The Sloan Digital Sky Survey (SDSS) consists of a series of three interlocking imaging and spectroscopic surveys, carried out over an eight-year period with a dedicated 2.5m telescope located at Apache Point Observatory in Southern New Mexico.
Total Size: 23.2TB
Identifiers:
  • ark:/31807/osdc-2ac1a513
Keywords: astronomy
Last Modified: 2014-06-11 23:13:18 UTC
Real-time monitoring and forecasting of solar and geophysical events
Total Size: 3.1GB
Identifiers:
  • ark:/31807/osdc-4f2e501c
Keywords: astronomy
Last Modified: 2013-12-10 11:12:37 UTC
Data from the decennial United States Census as well as the Economic Census and the American Community Survey.
Total Size: 1.8TB
Identifiers:
  • ark:/31807/osdc-b7b76e53
Keywords: social science
Last Modified: 2013-12-09 15:54:18 UTC
Weather observations from around the country
Total Size: 1.3GB
Identifiers:
  • ark:/31807/osdc-d073aca6
Keywords: earth science
Last Modified: 2013-06-18 11:26:01 UTC