A Petabyte-scale Scientific Community Cloud

The OSDC enables scientific researchers to easily manage, share, and analyze large datasets.

What is the OSDC? >> Watch a Video

OSDC in brief

The Open Science Data Cloud provides the scientific community with resources for storing, sharing, and analyzing terabyte and petabyte-scale scientific datasets. The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complementary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions. It is a one-stop shop for making scientific research faster and easier.

Why is there a need?

With datasets growing larger and larger, researchers are finding that the bottleneck to discovery is no longer a lack of data but an inability to manage, analyze, and share their large datasets. Individual researchers can no longer download and analyze the important datasets in their scientific fields on their own computers. The goal of the Open Science Data Cloud is to remove the bottleneck to discovery by providing researchers with access to a variety of key datasets across scientific disciplines and the computing infrastructure to allow scientists to easily manage and share their data and analysis.>> read more

Featured on the OSDC

Bionimbus Bionimbus
Project Matsu Project Matsu
Tukey Tukey
What is the OSDC? What is the OSDC?



OCC NOAA Data Alliance Seeks Community Feedback

The OCC NOAA Data Alliance Working Group is seeking feedback to help prioritize which datasets will provide maximum impact. Please take a few moments and fill out the survey below (or here) and tell us more about how you use Environmental datasets and your data needs. Survey results will help us determine which datasets and services have the potential for the greatest impact in the OCC Environmental Commons ecosystem. Loading... ... more ...

NEXRAD L2 in the OCC Environmental Data Commons

NOAA NEXRAD L2 data is now available in the OCC Environmental Data Commons as part of the NOAA Big Data Project. The NEXRAD dataset is ID'd by the Signpost digital ID system. Signpost balances the needs of both data archiving for persistent storage, that is, assigning identifiers to unique pieces of information, and also active computation, that is, for finding locations of data on a living system where data may be physically moved or updated. OCC Members constructed a simple implementation of this design in a two-layer identification scheme service with a REST-like API interface. By utilizing the Signpost digital identifier service, we can relocate data files from our data commons to another commons and no researcher needs to change their code. We have also made public a sample analysis of NEXRAD L2 data that uses Signpost, Jupyter Notebook, and the Py-ART python package. The analysis creates an animated visualization of a mayfly event available as a public snapshot image for OSDC Griffin allocation grantees, or for non grantees via github or github.io. If you have any questions or comments, please contact noaa dot crada at occ-data.org. ... more ...

Big Data vs the Scientist - ACM Chicago Meetup

Dr. Maria Patterson, Scientific Lead for the OSDC is a guest speaker at the ACM Chicago meetup on Wed, June 8th. Her lecture on 'Big Data vs the Scientist' will touch on her work with the OCC to build and maintain data commons, and her many contributions to OCC working groups like Project Matsu and the NOAA Big Data Project. RSVP here: http://www.meetup.com/acm-chicago/ If you're in town for the Center For Data Intensive Science (CDIS) sponsored Data Commons Workshop Series be sure to RSVP for this meet-up. ... more ...

Environmental Data Commons Workshop, June 9th 2016

On June 9th, OCC partner the Center For Data Intensive Science (CDIS) will be hosting a full day workshop on Environmental Data Commons in Chicago, IL as part of their Data Commons Data Sharing workshop series. There will be sessions on environmental commons, services for environmental commons, environmental data commons applications, the OCC-NOAA Big Data Alliance, and interoperability of environmental commons, clouds, and repositories. To register and for more information including workshop location, agenda, and options for lodging, please visit: https://sites.google.com/site/environmentalcommons/ ... more ...

OCC at IEEE Big Data Conference

Maria Patterson, scientific lead for the Open Science Data Cloud and a researcher at UChicago working on the OCC's Project Matsu, will be attending the IEEE Big Data Service and Applications conference this week at Exeter College, Oxford, UK. This conference will bring together a wide variety of researchers focused on innovations in big data computing, service sharing, and big data applications in energy and environment, medical and healthcare, library, social media and networking, and education. The conference is also held in conjunction with several other IEEE conferences, including the 10th International IEEE Symposium on Service-Oriented System Engineering (SOSE), the 4th International Conference on Mobile Cloud Computing, Services, and Engineering, the IEEE International Symposium on Creative Computing (ISCC), the IEEE International Symposium on Software Crowdsourcing (ISSC), and the Second International Workshop On Education in the Cloud. Dr. Patterson will be presenting on Thursday during the Big Data in energy and then environment applications track about Project Matsu's work analyzing satellite imagery from NASA's Earth Observing-1 satellite using an "analytic wheel," which is an efficient reanalysis framework for large datasets. The Matsu Wheel allows many shared data services to be performed together to efficiently use resources for processing hyperspectral satellite image data and other, e.g., large environmental datasets that may be analyzed for many purposes. For more information about Project Matsu see the Matsu website or the arXiv paper on the Matsu Wheel. ... more ...

How can I get involved?


Access the Public Data Commons

The OSDC has 1 PB of publicly accessible data in a wide variety of disciplines. Interested researchers can freely access and download these data to their own machines or apply for resources to compute over the data within the cloud.

Contribute to OSDC


All of the software developed as part of the OSDC is open source and hosted on GitHub. You can directly help the scientific cloud computing community by contributing to the open source OSDC software stack.

Apply for Compute and Storage

Fill out a short proposal for an OSDC resource allocation. Allocations start at 16 dedicated cores and 1TB of storage, but scale depending on the project needs and level of organizational partnership.


Partner with us and add your own racks to the OSDC (we will manage them for you). Organizations can also join the Open Commons Consortium (OCC) which is made up of working groups, including the OSDC.

Contact Us

Questions? Comments? Suggestions? Contact us at info@occ-data.org.