HTSeq: Analysing high-throughput sequencing data with Python

HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays.

  • Please see the chapter A tour through HTSeq first for an overview on the kind of analysis you can do with HTSeq and the design of the package, and then look at the reference documentation.
  • While the main purpose of HTSeq is to allow you to write your own analysis scripts, customized to your needs, there are also a couple of stand-alone scripts for common tasks that can be used without any Python knowledge. See the Scripts section in the overview below for what is available.

Paper

HTSeq is described in the following publication (which is currently under review but already available as preprint):

Simon Anders, Paul Theodor Pyl, Wolfgang Huber
HTSeq — A Python framework to work with high-throughput sequencing data
bioRxiv preprint (2014), doi: 10.1101/002824

If you use HTSeq in research, please cite this paper in your publication.

Documentation overview

  • Prequisites and installation

    Download links and installation instructions can be found here

  • A tour through HTSeq

    The Tour shows you how to get started. It explains how to install HTSeq, and then demonstrates typical analysis steps with explicit examples. Read this first, and then see the Reference for details.

  • A detailed use case: TSS plots

    This chapter explains typical usage patterns for HTSeq by explaining in detail three different solutions to the same programming task.

  • Counting reads

    This chapter explorer in detail the use case of counting the overlap of reads with annotation features and explains how to implement custom logic by writing on’s own customized counting scripts

  • Reference documentation

    The various classes of HTSeq are described here.

  • Scripts

    The following scripts can be used without any Python knowledge.

    • Quality Assessment with htseq-qa

      Given a FASTQ or SAM file, this script produces a PDF file with plots depicting the base calls and base-call qualities by position in the read. This is useful to assess the technical quality of a sequencing run.

    • Counting reads in features with htseq-count

      Given a SAM file with alignments and a GFF file with genomic features, this script counts how many reads map to each feature.

  • Appendices

Author

HTSeq is developed by Simon Anders at EMBL Heidelberg (Genome Biology Unit). Please do not hesitate to contact me (anders at embl dot de) if you have any comments or questions.

License

HTSeq is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

The full text of the GNU General Public License, version 3, can be found here: http://www.gnu.org/licenses/gpl-3.0-standalone.html