Short read aligner comparison
From CSBLwiki
(Difference between revisions)
Line 10: | Line 10: | ||
[http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_protocol#Support_Protocol_1:_Base_Quality_Recalibration SAM protocol] | [http://sourceforge.net/apps/mediawiki/samtools/index.php?title=SAM_protocol#Support_Protocol_1:_Base_Quality_Recalibration SAM protocol] | ||
+ | |||
+ | This is explained in the [http://samtools.sourceforge.net/samtools.shtml manual page]. Or briefly (when you invoke pileup with the <tt>-c</tt> option): | ||
+ | # reference sequence name | ||
+ | # reference coordinate | ||
+ | # reference base, or `<tt>*</tt>' for an indel line | ||
+ | # genotype where heterozygotes are encoded in the [http://biocorp.ca/IUB.php IUB code]: <tt>M=A/C, R=A/G, W=A/T, S=C/G, Y=C/T</tt> and <tt>K=G/T</tt>; indels are indicated by, for example, <tt>*/+A, -A/*</tt> or <tt>+CC/-C</tt>. There is no difference between <tt>*/+A</tt> or <tt>+A/*</tt>. | ||
+ | # Phred-scaled likelihood that the genotype is wrong, which is also called `consensus quality'. | ||
+ | # Phred-scaled likelihood that the genotype is identical to the reference, which is also called `SNP quality'. Suppose the reference base is <tt>A</tt> and in alignment we see 17 <tt>G</tt> and 3 <tt>A</tt>. We will get a low consensus quality because it is difficult to distinguish an <tt>A/G</tt> heterozygote from a <tt>G/G</tt> homozygote. We will get a high SNP quality, though, because the evidence of a SNP is very strong. | ||
+ | # [http://en.wikipedia.org/wiki/Root_mean_square root mean square] (RMS) mapping quality | ||
+ | # # reads covering the position | ||
+ | # read bases at a SNP line (check the manual page for more information); the 1st indel allele otherwise | ||
+ | # base quality at a SNP line; the 2nd indel allele otherwise | ||
+ | # indel line only: # reads directly supporting the 1st indel allele | ||
+ | # indel line only: # reads directly supporting the 2nd indel allele | ||
+ | # indel line only: # reads supporting a third indel allele | ||
+ | If pileup is invoked without `<tt>-c</tt>', indel lines and columns between 3 and 7 inclusive will not be outputted. |
Revision as of 08:47, 6 July 2011
http://iga-rna.sourceforge.net/
http://samtools.sourceforge.net/swlist.shtml
http://bamview.sourceforge.net/
This is explained in the manual page. Or briefly (when you invoke pileup with the -c option):
- reference sequence name
- reference coordinate
- reference base, or `*' for an indel line
- genotype where heterozygotes are encoded in the IUB code: M=A/C, R=A/G, W=A/T, S=C/G, Y=C/T and K=G/T; indels are indicated by, for example, */+A, -A/* or +CC/-C. There is no difference between */+A or +A/*.
- Phred-scaled likelihood that the genotype is wrong, which is also called `consensus quality'.
- Phred-scaled likelihood that the genotype is identical to the reference, which is also called `SNP quality'. Suppose the reference base is A and in alignment we see 17 G and 3 A. We will get a low consensus quality because it is difficult to distinguish an A/G heterozygote from a G/G homozygote. We will get a high SNP quality, though, because the evidence of a SNP is very strong.
- root mean square (RMS) mapping quality
- # reads covering the position
- read bases at a SNP line (check the manual page for more information); the 1st indel allele otherwise
- base quality at a SNP line; the 2nd indel allele otherwise
- indel line only: # reads directly supporting the 1st indel allele
- indel line only: # reads directly supporting the 2nd indel allele
- indel line only: # reads supporting a third indel allele
If pileup is invoked without `-c', indel lines and columns between 3 and 7 inclusive will not be outputted.