Public Data Commons

Repository for public data sets of scientific interest, hosted on the OSDC.
The OSDC has 1PB of public data in a wide variety of disciplines.   These data sets are freely available and can be downloaded over the internet or high performance networks for analysis locally.   Information on how to download can be found at the links below for each individual dataset.   All recipients of OSDC resource allocations can also compute directly over the data in the Public Data Commons, without having to download them locally.  Datasets hosted in the OSDC Public Data Commons are reviewed periodically as part of our resource allocation process.   If you have suggestions about data that should be included, please let us know at info@opencloudconsortium.org 
Human sequence data from populations around the world with the goal of cataloging human genetic variation.
Total Size: 396.7TB
Categories: biology, genomics
Last Modified: June 4, 2013, 3:30 p.m. UTC
ASTER Level-1B Registered Radiance at the Sensor.
Total Size: 23.7TB
Categories: earth science
Last Modified: Aug. 2, 2013, 2:53 p.m. UTC
Data set from the City of Chicago Data Portal in JSON format for tabular data and the raw files for "blob" data.
Total Size: 9.5GB
Categories: social science
Last Modified: Oct. 25, 2012, 2:03 p.m. UTC
Whole human genome sequence data sets provided by Complete Genomics, containing 69 standard, non-diseased samples as well as two matched tumor and normal sample pairs.
Total Size: 50.4TB
Categories: biology, genomics
Last Modified: June 4, 2013, 3:30 p.m. UTC
Data gathered by the Advanced Land Imager (ALI) Hyperspectral Imager (Hyperion) instruments on NASA’s Earth Observing-1 Mission (EO-1) satellite.
Total Size: 80.5TB
Categories: earth science
Last Modified: April 24, 2013, 6:56 p.m. UTC
Unified Data Resource for 3-Dimensional Electron Microscopy.
Total Size: 122.1GB
Categories: biology
Last Modified: June 18, 2013, 11:17 a.m. UTC
Data sets based on the original Enron emails released to the public by the Federal Energy Regulatory Commission as part of their investigation.
Total Size: 154.1GB
Categories: text data, social science
Last Modified: Aug. 20, 2012, 12:51 p.m. UTC
FlyBase is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species.
Total Size: 661.0GB
Categories: biology, genomics
Last Modified: April 23, 2013, 5:54 p.m. UTC
The GSS contains a standard "core" of demographic, behavioral, and attitudinal questions, plus topics of special interest.
Total Size: 202.1MB
Categories: social science
Last Modified: April 24, 2013, 4:39 p.m. UTC
Imagery from the Landsat-7 ETM+ detector.
Total Size: 2.1TB
Categories: earth science
Last Modified: June 4, 2013, 5 p.m. UTC
Imagery from the Landsat-7 ETM+ detector.
Total Size: 1.5TB
Categories: earth science
Last Modified: July 3, 2013, 3:52 p.m. UTC
N-gram data obtained from over 5 million books digitized by Google. Contains all n-grams that appeared in over 40 books.
Total Size: 863.4GB
Categories: text data, social science
Last Modified: Aug. 7, 2012, 7:01 p.m. UTC
Model reduction dataset: Heat transfer in random media.
Total Size: 4.0TB
Categories: model reduction
Last Modified: Aug. 29, 2013, 9:59 a.m. UTC
Large global climate dynamics simulation run on the Titan supercomputer at Oak Ridge National Laboratory.
Total Size: 2.6TB
Categories: earth science
Last Modified: Oct. 11, 2013, 1:38 p.m. UTC
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Total Size: 199.5GB
Categories: social science, music
Last Modified: Dec. 11, 2013, 1:47 p.m. UTC
Encyclopedia of genomic functional elements in the model organisms C. elegans and D. melanogaster.
Total Size: 10.9TB
Categories: biology, genomics
Last Modified: July 28, 2013, 1:35 a.m. UTC
Data from the moderate-resolution imaging spectroradiometer aboard the Terra (EOS AM) Satellite
Total Size: 85.9TB
Categories: earth science
Last Modified: June 18, 2013, 11:25 a.m. UTC
Density of organic carbon in vegetation the conterminous United States at a 30 meter resolution.
Total Size: 79.2GB
Categories: earth science
Last Modified: Aug. 2, 2012, 7:02 p.m. UTC
All datasets from the NCBI FTP site except 1000genomes, pub, and sra.
Total Size: 10.8TB
Categories: biology, genomics
Last Modified: June 18, 2013, 11:25 a.m. UTC
A periodically updated mirror of the publicly available FTP site for the National Climatic Data Center.
Total Size: 3.3TB
Categories: earth science
Last Modified: June 12, 2014, 10:08 a.m. UTC
The text of over 42,000 free ebooks.
Total Size: 742.1GB
Categories: text data, social science
Last Modified: Dec. 18, 2013, 1:33 p.m. UTC
The PDB contains 3D structural information on biological macromolecules.
Total Size: 243.4GB
Categories: biology
Last Modified: June 4, 2013, 3:30 p.m. UTC
The Sloan Digital Sky Survey (SDSS) consists of a series of three interlocking imaging and spectroscopic surveys, carried out over an eight-year period with a dedicated 2.5m telescope located at Apache Point Observatory in Southern New Mexico.
Total Size: 23.2TB
Categories: astronomy
Last Modified: June 11, 2014, 11:13 p.m. UTC
Real-time monitoring and forecasting of solar and geophysical events
Total Size: 3.1GB
Categories: astronomy
Last Modified: Dec. 10, 2013, 11:12 a.m. UTC
Data from the decennial United States Census as well as the Economic Census and the American Community Survey.
Total Size: 1.8TB
Categories: social science
Last Modified: Dec. 9, 2013, 3:54 p.m. UTC
Weather observations from around the country
Total Size: 1.3GB
Categories: earth science
Last Modified: June 18, 2013, 11:26 a.m. UTC