The Australian Synchrotron’s new Store.Synchrotron facility is a world-first storage, sharing, and re-use environment for synchrotron experiment data. Currently hosted at Monash University, Store.Synchrotron has been allocated storage on VicNode, the RDSI Node operated by the University of Melbourne and Monash University.
Long term data storage need
“For us, it’s all about data,” says Dr Tom Caradoc-Davies, Principal Scientist for the Synchrotron’s Crystallography beamlines.
The results of an experiment run on a synchrotron beamline are datasets of image frames. A collection of frames makes up a ‘data set’. In a day, a user might collect 250-300 gigabytes of this raw data which is often kept for correcting errors in image processing or for later use.
“A protein crystallographer may need 3-5 years and several hundred thousand dollars to prepare crystals for one day of beamline time,” explained Dr Caradoc-Davies.
“After such a large investment to produce the data, it is very important that nothing happens to it. We often get sad emails from people, saying that their hard drive’s died or their DVDs are corrupted. So a couple times a year we’ll be digging up archive data,” Dr Caradoc-Davies says.
"Most synchrotrons worldwide delete raw data after 60 days. We’ve managed to keep it all, but we we’re running out of storage. So we started working with Steve Androulakis and the guys at Monash to set up an archive."
Sharing, re-using, and validating data
In addition to providing an archive, Store.Synchrotron allows researchers to share data with team members, during a project.
“There’s a real question in the field as to how to make raw frames available when the structures are deposited to the Protein Data Bank,” Dr Caradoc-Davies says. Store.Synchrotron solves this problem for experiments at the Australian Synchrotron, providing access to the raw data when the paper has been published. Alternatively, a researcher could indicate that they either have not been able to solve the structure or they are not planning to publish the results.
“Ours is going to be the only system in the world where all of the primary data from the beamlines, every frame, will go into the store” Dr Caradoc-Davies explained.
The Macro- and Micro- Crystallography beamlines are already using Store.Synchrotron in production. Other beamlines will begin using it soon.
Store.Synchrotron and RDSI
Store.Synchrotron is currently hosted at Monash University through the Monash Large Research Data Store (LaRDS), however the collections have been allocated storage on VicNode, the RDSI Node operated by the University of Melbourne and Monash University.
Dr Caradoc-Davies sees RDSI as an important component in making research data available. “Granting agencies such as the ARC and NHMRC require raw data from the research they fund to be made publicly available. It is possible that this concept may be applied to primary x-ray data and Store.Synchrotron would make this possible. RDSI also provides this fundamental concept of storing data as a national initiative. People don’t want to put their data into an archive where somebody is going to charge them large amounts of money to download their own data in the future.”
A national eResearch infrastructure stack
Steve Androulakis from the Monash eResearch Centre is the Coordinator for Store.Synchrotron. He explains that although other synchrotron facilities have been interested in creating this kind of facility, Australia’s is the first to succeed. “It’s been an almost perfect storm of circumstances, that has enabled this achievement. The Australian Synchrotron is smaller than a lot of the other facilities and has centralised IT management. We have major national initiatives such as the Australian National Data Service and the NeCTAR Research Cloud and RDSI to lean on. And not only that, it’s the funding for the people who care about this, top put it into place.”
In addition to RDSI storage from VicNode, Store.Synchrotron uses the NeCTAR Research Cloud for faster data processing and access, and persistent Digital Object Identifiers (DOIs) from the Australian National Data Services (ANDS). “The real functionality here is the entire stack of infrastructure that’s being created,” Steve says. "To have RDSI storage is one piece of the picture, and then the whole thing gains meaning and power through having access to the NeCTAR Cloud and to ANDS."
Store.Synchrotron is available at https://store.synchrotron.org.au/
Image: Dr Tom Caradoc Davies preparing a sample for data collection on one of the MX Beamlines.