Short read aligner comparison
From CSBLwiki
(Difference between revisions)
Line 26: | Line 26: | ||
# indel line only: # reads supporting a third indel allele | # indel line only: # reads supporting a third indel allele | ||
If pileup is invoked without `<tt>-c</tt>', indel lines and columns between 3 and 7 inclusive will not be outputted. | If pileup is invoked without `<tt>-c</tt>', indel lines and columns between 3 and 7 inclusive will not be outputted. | ||
+ | |||
+ | |||
+ | $ fastq_quality_filter -h | ||
+ | usage: fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE] | ||
+ | |||
+ | version 0.0.6 | ||
+ | [-h] = This helpful help screen. | ||
+ | [-q N] = Minimum quality score to keep. | ||
+ | [-p N] = Minimum percent of bases that must have [-q] quality. | ||
+ | [-z] = Compress output with GZIP. | ||
+ | [-i INFILE] = FASTA/Q input file. default is STDIN. | ||
+ | [-o OUTFILE] = FASTA/Q output file. default is STDOUT. | ||
+ | [-v] = Verbose - report number of sequences. | ||
+ | If [-o] is specified, report will be printed to STDOUT. | ||
+ | If [-o] is not specified (and output goes to STDOUT), | ||
+ | report will be printed to STDERR. |
Latest revision as of 08:58, 7 July 2011
http://iga-rna.sourceforge.net/
http://samtools.sourceforge.net/swlist.shtml
http://bamview.sourceforge.net/
This is explained in the manual page. Or briefly (when you invoke pileup with the -c option):
- reference sequence name
- reference coordinate
- reference base, or `*' for an indel line
- genotype where heterozygotes are encoded in the IUB code: M=A/C, R=A/G, W=A/T, S=C/G, Y=C/T and K=G/T; indels are indicated by, for example, */+A, -A/* or +CC/-C. There is no difference between */+A or +A/*.
- Phred-scaled likelihood that the genotype is wrong, which is also called `consensus quality'.
- Phred-scaled likelihood that the genotype is identical to the reference, which is also called `SNP quality'. Suppose the reference base is A and in alignment we see 17 G and 3 A. We will get a low consensus quality because it is difficult to distinguish an A/G heterozygote from a G/G homozygote. We will get a high SNP quality, though, because the evidence of a SNP is very strong.
- root mean square (RMS) mapping quality
- # reads covering the position
- read bases at a SNP line (check the manual page for more information); the 1st indel allele otherwise
- base quality at a SNP line; the 2nd indel allele otherwise
- indel line only: # reads directly supporting the 1st indel allele
- indel line only: # reads directly supporting the 2nd indel allele
- indel line only: # reads supporting a third indel allele
If pileup is invoked without `-c', indel lines and columns between 3 and 7 inclusive will not be outputted.
$ fastq_quality_filter -h
usage: fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE] version 0.0.6 [-h] = This helpful help screen. [-q N] = Minimum quality score to keep. [-p N] = Minimum percent of bases that must have [-q] quality. [-z] = Compress output with GZIP. [-i INFILE] = FASTA/Q input file. default is STDIN. [-o OUTFILE] = FASTA/Q output file. default is STDOUT. [-v] = Verbose - report number of sequences. If [-o] is specified, report will be printed to STDOUT. If [-o] is not specified (and output goes to STDOUT), report will be printed to STDERR.