RSAT peak-motifs: fast extraction of transcription factor binding motifs from full-size ChIP-seq datasets

Morgane Thomas-Chollier, Matthieu Defrance, Olivier Sand, Carl Herrmann, Denis Thieffry, Jacques van Helden

Abstract


http://rsat.ulb.ac.be/rsat/


ChIP-seq has become a method of choice to study binding preferences of transcription factors, and localization of epigenetic regulatory marks at a genomic scale. There is a crucial need for specialized software tools to make sense of these data. While various programs have been developed to perform read mapping and peak calling, the subsequent steps have not yet reached proper maturation: identifying relevant transcription factor binding motifs and the precise location of their binding sites remains a bottleneck. Most existing tools present limitations on sequence size, and typically restrict motif discovery to a few hundreds peaks.


We present a pipeline called peak-motifs, integrated in the Regulatory Sequence Analysis Tools1, which takes as input a set of peak sequences, discovers exceptional motifs, compares them with motif databases, predicts binding site positions, and offers different visualization interfaces. The pipeline relies on tried-and-tested algorithms whose computing time increases linearly with sequence size, ensuring scalability to massive datasets of several tens of Mb. In addition to the website, peak-motifs can be used as stand-alone application, as well as SOAP/WSDL web services.


We assessed peak-motifs performances on several published datasets. In all cases, relevant motifs are disclosed. For example, we discovered individual Oct and Sox motifs in Sox2 and Oct4 peak collections, whereas the original study only found the composite Sox/Oct motif. For the generic transcriptional co-activator p300 examined in heart and midbrain, peak-motifs identified motifs bound by tissue-specific transcription factors consistent with these two tissues.


In summary, peak-motifs supports time-efficient and statistically reliable analysis of complete ChIP-seq datasets, while offering an online user-friendly and well-documented interface.


References
1. Thomas-Chollier, M., Defrance, M., Medina-Rivera, A., Sand, O., Herrmann, C., Thieffry, D. and van Helden, J. (2011). RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res 39, W86-91.
2. Thomas-Chollier, M., Herrmann, C., Defrance, M., Sand, O., Thieffry, D. and van Helden, J. (2011). RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets Nucleic Acids Res accepted.


Relevant Web sites
3. http://rsat.ulb.ac.be/rsat/


Keywords


next generation sequencing; COST;ChIP-seq; peak-motifs

Full Text:

PDF


DOI: http://dx.doi.org/10.14806/ej.17.B.266

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.