Next generation sequencing has revolutionized genome research and marked the start of a new era. These technologies present us with unprecedented amounts of data - but with this sequencing data comes errors that are not only platform specific but also depend on the library preparation method and the type of sequencing (i.e. amplicon or metagenome). Illumina’s sequencing platforms are currently among the most utilized platforms as they are able to generate millions of reads at relatively low cost – but Illumina error profiles are still poorly understood. A better knowledge of the error profiles is essential for sequence analysis and vital in order to draw valid conclusions. It has been reported that the major source of errors for Illumina are substitution-type miscalls. We developed a program that enables us to infer error profiles based on sequencing data from mock communities. This allows us to study and compare different errors and biases introduced by different sequencing machines, different library preparation methods as well as different types of sequencing. Here, we present the metagenome error profiles for a mock community that was sequenced on the Genome Analyzer (GA) II for the standard Illumina library preparation method (TruSeq). Being able to infer error profiles for individual sequencing runs has the potential to greatly improve our ability to correct errors and thus enhance further sequencing analysis.
sequencing error profiles; next generation sequencing; Genome Analyzer II; TrueSeq library preparation; mock community