High Throughput Sequencing and the IT architecture (Part 1 : Volume dimensioning and filesystems)

George Magklaras

Abstract


Improvements in DNA sequencing technology have reduced the cost and time of sequencing a new genome.  The new generation of High Throughput Sequencing (HTS) devices means has provided large impetus to the life science field and genome sequencing is now a necessary first step of many complex research projects with direct implications to the field of medical sequencing, cancer and pathogen vector genomics, epi and meta genomics.  

                  

However, despite the falling sequencing cost and time outlines, there are other associated costs and difficulties in the process of maintaining a functional data repository on large scale research projects. The new generation of HTS technologies [1] has introduced the need for increased data storage technologies whose capacity is well beyond the average local data storage facilities [2]. In fact, the computing world has produced a new term for this paradigm, that of data intensive computing [2a]. Data storage costs are falling, however a study of the functional specifications of popular HTS equipment, such as Roche's 454 pyrosequencers [3], Illumina's hardware [4] and ABI SOLiD technology [5]  suggests that a single high throughput experiment run creates several Tbytes of information. If one takes into account that genome sequencing is often performed repeatedly in order to study genetic variation [6], the capacity of a suitable data archiving facility needs to scale to several Petabytes of information, which is well beyond the scale of most group, departmental, university computing facilities.


Keywords


High Throughput Sequencing; IT architecture

Refbacks

  • There are currently no refbacks.

Comments on this article

View all comments