Leading the Charge on Science DMZs to Support Big-Data Research
Developing a campus-wide Science DMZ takes time, commitment, and lots of advocacy, according to Joshua Sonstroem, systems administrator at the University of California, Santa Cruz. If successful, however, the resulting cyberinfrastructure helps ensure that faculty and students can join in leading-edge research and discovery.
A Science DMZ is a portion of a computer network at or near the campus’s local network perimeter that is configured such that the equipment, configuration, and security policies are optimized for data-intensive science. The deployment of a Science DMZ enables scientists to dramatically increase the size and scale of the data sets they can productively use in their research — to such an extent that they can be more ambitious in their work and make discoveries that would otherwise be out of reach. Computing performance, security, risk segmentation, scalability, and network measurement are all integrated components of the Science DMZ model.
At the 2018 CENIC annual conference, Sonstroem described the process and effort required to introduce a Science DMZ to a highly decentralized campus.
“We began with a creation phase where we seeded interest in end-user domains or groups of faculty,” said Sonstroem. “For us, it was the genomics cluster and the astrophysics cluster of researchers who got engaged first, as we rolled out the experimental version of our DMZ.”
Research in both fields requires an immense amount of data transfer. “In the field of genomics, scientists are moving around 300-Gigabyte files for nearly any kind of research,” said Brad Smith, interim vice chancellor of IT for UCSC. “That kind of scope breaks the Internet or the traditional client-server model. It’s particularly challenging.”
Next was the adoption phase, which involved organizing a cyberinfrastructure (CI) council — an interdepartmental group of people who understand the advantages of Science DMZs and can evangelize that knowledge to the greater campus community. UCSC’s CI council included engineers, scientists, and administrators who sought out pilot projects and institutional financial support. The council serves to promote the growth of the community of users who are actively engaged with research on the Science DMZ.
Once established, work began to connect UCSC’s Science DMZ to the DMZs at other research institutions. Such integrations are supported by the Pacific Research Platform (PRP), which was funded by the National Science Foundation to create a high-capacity, regional information freeway.
This system makes it possible for large amounts of scientific data to be moved between researchers’ labs and their collaborators’ sites, supercomputer centers or data repositories, without performance degradation. The Pacific Research Platform supports a broad range of data-intensive research projects that will have wide-reaching impacts on science and technology worldwide.
“We needed to connect all of these scientists together beyond the boundaries of individual campuses,” said Smith. “And a lot of disciplines out there, such as genomics, astrophysics, and neuroscience, use high-resolution images and video that need to be shared with colleagues. That requires high-performance data storage and transfer — a challenge that PRP has been designed to solve.”
For UCSC, the final phase of launching the Science DMZ on its campus was the integration phase, in which the experimental nature of the network shifted to “mission critical” and was moved into the data center where it now is a core platform for the campus.
“Now we have a client services team and a faculty partnership group so that we can train researchers on the use of this network to improve the performance of their science,” said Sonstroem.
“If we’re ever going to tap into the power of genomics to develop truly personalized treatments for cancer, we will need the power of PRP on a national or global scale to unlock the secrets within exabytes of data,” said Smith.
As UCSC has learned, engagement with faculty is critical to ensuring the successful adoption of advanced cyberinfrastructure such as a Science DMZ. When science teams incorporate the Science DMZ resources into their workflows, their capability and productivity increase significantly. To make this vision possible involves engaging teams of faculty using big data, creating a team of advocates, and integrating the sophisticated Science DMZ into the core mission of the campus.
The end goal is worthy of the process required. Cancer research and astronomy are just two of the myriad scientific research domains that will benefit from the ability to effectively analyze very large data sets, and UCSC is integrating the cutting-edge capabilities that support those efforts into the core of the production campus network.
For Further Information
- Two-page fact sheet (PDF) about the Pacific Research Platform
- CENIC Recognizes Project Connecting Hyades Supercomputer Cluster with NERSC
- 2nd National Research Platform Workshop (August 6-7, 2018)