Public Data Sets

Repository for public data sets of scientific interest, hosted on the OSDC.
The data sets below can downloaded over the internet or high performance networks such as Internet2, as well as computed over directly on the OSDC. Currently, the OSDC hosts about 700 TB of data and the plan is to steadily increase this to the petabyte level. If you have suggestions about data that should be included, please let us know at info@opencloudconsortium.org.
Human sequence data from populations around the world with the goal of cataloging human genetic variation.
Total Size: 396.7TB
Categories: genomics, biology
Last Modified: June 4, 2013, 3:30 p.m. UTC
ASTER Level-1B Registered Radiance at the Sensor
Total Size: 23.7TB
Categories: earth science
Last Modified: March 6, 2014, 10:58 a.m. UTC
Data set from the City of Chicago Data Portal in JSON format for tabular data and the raw files for "blob" data.
Total Size: 9.5GB
Categories: social science
Last Modified: Oct. 25, 2012, 2:03 p.m. UTC
Whole human genome sequence data sets provided by Complete Genomics, containing 69 standard, non-diseased samples as well as two matched tumor and normal sample pairs.
Total Size: 49.7TB
Categories: genomics, biology
Last Modified: June 4, 2013, 3:30 p.m. UTC
Data gathered by the Advanced Land Imager (ALI) Hyperspectral Imager (Hyperion) instruments on NASA's Earth Observing-1 Mission (EO-1) satellite.
Total Size: 79.8TB
Categories: earth science, satellite imagery
Last Modified: April 24, 2013, 6:56 p.m. UTC
Unified Data Resource for 3-Dimensional Electron Microscopy
Total Size: 121.0GB
Categories: biology
Last Modified: March 6, 2014, 3:41 p.m. UTC
Data sets based on the original Enron emails released to the public by the Federal Energy Regulatory Commission as part of their investigation.
Total Size: 154.1GB
Categories: social science
Last Modified: Aug. 20, 2012, 12:51 p.m. UTC
FlyBase is the leading database and web portal for genetic and genomic information on the fruit fly Drosophila melanogaster and related fly species.
Total Size: 661.0GB
Categories: biology, genomics
Last Modified: April 23, 2013, 5:54 p.m. UTC
The GSS contains a standard 'core' of demographic, behavioral, and attitudinal questions, plus topics of special interest.
Total Size: 202.1MB
Categories: social science
Last Modified: April 24, 2013, 4:39 p.m. UTC
Imagery from the Landsat-7 ETM+ detector
Total Size: 2.1TB
Categories: earth science
Last Modified: June 4, 2013, 5 p.m. UTC
Global Land Survey - 2010
Total Size: 1.5TB
Categories: earth science
Last Modified: July 3, 2013, 3:52 p.m. UTC
N-gram data obtained from over 5 million books digitized by Google. Contains all n-grams that appeared in over 40 books.
Total Size: 863.4GB
Categories: social science, linguistics
Last Modified: Aug. 7, 2012, 7:01 p.m. UTC
Model reduction dataset: Heat transfer in random media
Total Size: 4.0TB
Categories: model reduction
Last Modified: Aug. 29, 2013, 9:59 a.m. UTC
Large global climate dynamics simulation run on the Titan supercomputer at Oak Ridge National Laboratory
Total Size: 2.6TB
Categories: earth science
Last Modified: Oct. 11, 2013, 1:38 p.m. UTC
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
Total Size: 199.5GB
Categories: music
Last Modified: Dec. 11, 2013, 1:47 p.m. UTC
Encyclopedia of genomic functional elements in the model organisms C. elegans and D. melanogaster
Total Size: 10.9TB
Categories: genomics, biology
Last Modified: July 28, 2013, 1:35 a.m. UTC
Data from the moderate-resolution imaging spectroradiometer aboard the Terra (EOS AM) Satellite
Total Size: 85.9TB
Categories: earth science
Last Modified: June 18, 2013, 11:25 a.m. UTC
Density of organic carbon in vegetation the conterminous United States at a 30 meter resolution.
Total Size: 79.2GB
Categories: earth science
Last Modified: Aug. 2, 2012, 7:02 p.m. UTC
All datasets from the NCBI FTP site except 1000genomes, pub, and sra.
Total Size: 10.8TB
Categories: genomics, biology
Last Modified: March 6, 2014, 10:55 a.m. UTC
A mirror of the publicly available FTP site for the National Climatic Data Center.
Total Size: 2.6TB
Categories: weather
Last Modified: June 4, 2013, 3:30 p.m. UTC
The text of over 42,000 free ebooks
Total Size: 742.1GB
Categories: linguistics
Last Modified: March 6, 2014, 10:54 a.m. UTC
The PDB contains 3D structural information on biological macromolecules.
Total Size: 240.9GB
Categories: biology
Last Modified: March 6, 2014, 3:41 p.m. UTC
The Sloan Digital Sky Survey (SDSS) consists of a series of three interlocking imaging and spectroscopic surveys, carried out over an eight-year period with a dedicated 2.5m telescope located at Apache Point Observatory in Southern New Mexico.
Total Size: 23.2TB
Categories: astronomy
Last Modified: March 6, 2014, 3:41 p.m. UTC
Real-time monitoring and forecasting of solar and geophysical events
Total Size: 3.1GB
Categories: earth science
Last Modified: Dec. 10, 2013, 11:12 a.m. UTC
Data from the decennial United States Census as well as the Economic Census and the American Community Survey
Total Size: 1.8TB
Categories: social science
Last Modified: March 6, 2014, 3:41 p.m. UTC
Weather observations from around the country
Total Size: 1.3GB
Categories: earth science
Last Modified: June 18, 2013, 11:26 a.m. UTC