Edinburgh Research Explorer

The Biodare Data Repository

Research output: Other contribution

Original languageEnglish
TypeBioDare Data Repository
Publication statusPublished - 2011

Abstract

BioDare, an integrated data analysis and sharing resource for dynamic biological systems,

by Zielinski, Moore, Troup, Beaton, Adams, Halliday, Millar.

2016 overview: BioDare returns immediate value to any user who uploads data, directly justifying the time that they spend in describing and organising their data. This makes BioDare unusual among biological data management systems. It is entirely typical that this immediate value is highly targeted, to users who require specialised analysis of rhythmic data. In addition, it facilitates data sharing and public dissemination, which give value in the much longer term.

2011 summary: BioDare, was developed to store, share and analyse rhythmic time series data. Currently it stores more than 70000 time series with over 9 million time points. The repository supports the description and processing of data from various experimental techniques, as well as literature data. It allows searching and aggregation of data from independent experiments and subsequent visualisation of not only original data but also processed data (averaged, normalized, detrended). BioDare also performs data analysis by executing period analysis routines via web services, including FFT-NLLS, mFourfit and the ROBuST spectrum resampling method. BioDare was designed initially to support the ROBuST project [opening to ROBuST users in 2009], and was extended for SynthSys and TiMet projects. It is highly relevant to other similar research, worldwide. The data infrastructure team is following a staged process to open the data repository and associated web services for analysis of rhythmic data to external users. Six potential beta-testing locations were recruited and visited in Jan-Feb 2011. Requirements specified by these betatesters have been progressively included in the system, in some cases over multiple rounds of interaction. Further beta-test users were recruited in the summer of 2011. We expect to open the system to additional users in the Spring of 2012, and to make the system public within the year.

2016 update.
BioDare was made public as proposed and additional external users were recruited at scientific conferences in 2012-2014, including the UK circadian clock clubs, Gordon conferences on Chronobiology and GARNET data management workshops.

BioDare's data analysis was transformed to support public use. First SynthSys, then in 2015 the UK Centre for Mammalian Synthetic Biology provided upgraded computer servers. Both the original analysis methods and four further rhythm analysis methods were refactored to native Java, greatly enhancing compute speed and stability, in part through a collaborative project with Edinburgh's supercomputing centre EPCC (see Zielinski et al. 2014 for detailed method evaluation and user guidance).

The detailed experimental metadata required from users now supports a very powerful search method, which aggregates data from multiple labs and experiments.
Data visualisation is more flexible, with many secondary data series (normalised, de-trended, averages, error bars, etc) pre-computed for rapid graphical display.
Any data displayed can be downloaded as a numerical spreadsheet, to reproduce exactly the online graphs.

As of February 2015, BioDare held over 41 million data points, in 232,844 timeseries, from 2,344 experiments. The 10 largest user labs were from UK, USA, Chile and Sweden. The largest single user lab by experiments works on circadian clocks in mouse cell and tissue cultures, at MRC LMB, Cambridge UK. The largest user lab by timeseries is from the original ROBuST project, working on plant circadian clocks. (see Flis et al. 2015).

Partial cost recovery started in 2014: heavy users of data analysis functions pay an annual subscription. To encourage data sharing, users who release their BioDare data for public dissemination gain "analysis credits", which can fully support their usage costs.

ID: 2067654