OSDC Projects

Below is a list of some of the larger projects supported by the Open Science Data Cloud ecosystem. To get involved with the OSDC, propose your own project by contacting one of the OSDC members or by writing to info@occ-data.org.


Bionimbus is a collaboration between the Institute for Genomics and Systems Biology (IGSB) at the University of Chicago and the OSDC to develop open source technology for managing, analyzing, transporting, and sharing large genomics datasets in a secure and compliant fashion.

The Bionimbus Community Cloud, which the OSDC ran for several years, has now been integrated into the OSDC itself, while open access biological and biomedical data is now part of the OSDC Data Commons (OSDC-Root). The OSDC Data Commons contains a variety of public biological datasets, including the 1,000 Genomes dataset.

The Bionimbus Protected Data Cloud (PDC) is a collaboration between the OSDC and the IGSB, the Center for Research Informatics (CRI), and the Institute for Translational Medicine (ITM) at the University of Chicago and the Open Commons Consortium. The PDC allows authorized users to compute over human genomic data and other PHI in a secure and compliant fashion.

More information about Bionimbus can be found at bionimbus.opensciencedatacloud.org


Matsu is a collaboration between NASA and the Open Commons Consortium to develop open source technology for cloud-based processing of satellite imagery to support the earth sciences.

The OSDC is used to process Earth Observing 1 (EO-1) satellite imagery from the Advanced Land Imager and the Hyperion instruments and to make this data available to interested users. We are currently working on developing services for detecting fires and floods and getting relevant information to first responders.

More information about Matsu can be found at matsu.opensciencedatacloud.org


OSDC supports Bookworm from Harvard's Cultural Observatory and offers a way to interact with digitized book content and full text search. Bookworm uses ngrams extracted from books in the public domain and integrates library metadata, including genre, author information, publication place and date. See bookworm.culturomics.org.

Climate Impacts Lab

The CLimate Impact Lab works to connect data-driven, empirically derived climate damages to social and economic outcomes relevant to policymakers, investors, business leaders and households. The group is currently working on the OSDC Griffin resource.

The Climate Impact Lab is a collaboration of more than 20 climate scientists, economists, computational experts, researchers, analysts, and students from several institutions, including the University of California at Berkeley, the Energy Policy Institute at the University of Chicago (EPIC), Rhodium Group and Rutgers University. See www.impactlab.org.

Conte Cloud

The Atwood Protected Data Cloud (aka Conte Cloud) is a Bionimbus based cloud computing infrastructure designed to store genomics, electronic medical records, and other sensitive data in a secure and compliant environment. The Atwood Protected Data Cloud is used by the Silvio O. Conte Center for Computational Neuropsychiatric Genomics. Based at the University of Chicago, the mission of the Conte Center is to apply integrated informatics and mathematical modeling to predict genetic and environmental factors underlying mental health and illness, including autism, schizophrenia, bipolar disorder, depression, anxiety disorders and conduct disorder. Data includes legacy genetic association and linkage datasets, including over 60 TB of genomic (e.g. GWAS, whole exome) and phenotypic (e.g. clinical, brain imaging) data from the National Database of Autism Research (NDAR), as well as data sets from the Center for Collaborative Studies on Mental Disorders supported by the National Institute of Mental Health, which gives authorized researchers access to the relevant collections. In 2016, Atwood PDC allocation grantees were merged into the larger Bionimbus PDC infrastructure.


The National Human Genome Research Institute (NHGRI) launched a public research consortium named ENCODE, the Encyclopedia Of DNA Elements, in September 2003, to carry out a project to identify all functional elements in the human genome sequence. The OSDC provides a hot back up of all the data from the ENCODE project and enables interested researchers the ability to compute over all this data using the OSDC.

The modENCODE project is described in Unlocking the Secrets of the Genome, Nature 2009 Jun 18;459(7249):927-30.


Knowledge does not arise from the simple accumulation of facts. Rather, it is a complex, dynamic system, and its emergent outcomes -including scientific consensus- are unpredictable. The complexity of knowledge creation has exploded with the growing number of participating scientists and citizens. If human knowledge is to grow efficiently, we need a deeper understanding of the processes by which knowledge is conceived, validated, shared and reinforced. We need to understand the limits of knowledge in relation to these processes. In short, we need knowledge about knowledge.

The current explosion of digitally available text, including journal publications, books, patents and news articles, makes it possible for the first time in history to study the dynamics that shape scientific research at scale, as the latest computational tools can capture some of the richness of these insights. The Metaknowledge Project develops leading edge machine learning and data mining tools and methods to catalyze a new field devoted to understanding the current shape and limits of human understanding. The project is lead by James Evans from the department of sociology and the Computation Institute at the University of Chicago.

NSF OSDC Partnership for International Research & Education (PIRE) Projects

The Open Science Data Cloud (OSDC) PIRE Program provides U.S. graduate students, post doctoral scientists, and early career scientists with fellowships so that they can work with OSDC PIRE partners in a variety of countries around the world on big data science.

The OSDC has PIRE Partners in a number of countries, including the United Kingdom, Brazil, the Netherlands, Japan, Korea and China. The OSDC PIRE program is hosted by the University of Chicago, Florida International University, and the University of Illinois at Chicago. The OSDC PIRE program is supported in part by a five-year grant from the National Science Foundation.

If you are a US Citizen or permanent resident and are interested in this unique opportunity, please contact info@occ-data.org