A better sequence-read generator program for metagenomics

Stephen Eric Johnson, Brett Trost, Jeffrey R Long, Anthony Kusalik


There are many programs available for generating simulated metagenomic sequence reads. The data generated by these programs follow rigid models, which limits the use of a given program to the author’s original intentions. For example, many popular simulator programs only generate reads that follow uniform or normal distributions. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirical next-generation sequencing (NGS) data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine learning approach to generate reads with lengths and quality values mimicking empirically derived distributions. BEAR is able to emulate reads from various NGS platforms, including Illumina, 454 and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate internal parameter settings.

Full Text:


DOI: https://doi.org/10.14806/ej.19.A.634


  • There are currently no refbacks.