Genome1

From CSBLwiki

(Difference between revisions)
Jump to: navigation, search
(Solexa illumina)
(Solexa illumina)
Line 90: Line 90:
http://en.wikipedia.org/wiki/FASTQ_format
http://en.wikipedia.org/wiki/FASTQ_format
[[File:774px-Probability metrics.svg.png|thumb|left|600px|alt=Relationship between Q and p|Relationship between ''Q'' and ''p'' using the Sanger (red) and Solexa (black) equations (described above). The vertical dotted line indicates ''p'' = 0.05, or equivalently, ''Q'' ≈ 13. (http://en.wikipedia.org/wiki/FASTQ_format)]]
[[File:774px-Probability metrics.svg.png|thumb|left|600px|alt=Relationship between Q and p|Relationship between ''Q'' and ''p'' using the Sanger (red) and Solexa (black) equations (described above). The vertical dotted line indicates ''p'' = 0.05, or equivalently, ''Q'' ≈ 13. (http://en.wikipedia.org/wiki/FASTQ_format)]]
 +
:p = 0.01, Q = 20
:p = 0.01, Q = 20

Revision as of 11:11, 21 June 2011

Contents

Methods & Procedures

GC skew

theory
*Made python code(gc_skew.py)

Primer

EP Hamilton, Use of HAPPY mapping for the higher order assembly of the Tetrahymena genome, elsevier, 2006 : 
*To confirm directly HAPPY links by PCR amplification, primers were designed in unique regions of scaffold sequence nearest to the linked ends, 
 using the Primer3 program
Samuel Assefa, ABACAS: algorithm-based automatic contiguation of assembled sequences, Bioinformatics 2009 25(15):1968-1969; doi:10.1093/bioinformatics/btp347 :
*ABACAS automatically extracts gaps on the pseudomolecule and, based on flanking sequences above a base quality threshold, designs primers for gap closure using Primer3

Finishing

Sequence Finishing and Gene Mapping for Candida albicans Chromosome 7 and Syntenic Analysis Against the Saccharomyces cerevisiae Genome
DNA amplification for gap closing:PCR with each primer pair (shown in supplementary data at http://www.genetics.org/supplemental/)
was carried out with Ready-To-Go PCR beads (Amersham Biosciences) using genomic DNA of C. albicans SC5314 as
a template DNA. PCR was carried out using a hotstart of 3 min at 94° followed by 35 cycles of 94° for 10 sec,
50° for 10 sec, and 68° for 1 min, concluding with 68° for 10 min. Long PCR was carried out with LA PCR kit ver.2.1 (Takara, Tokyo).
Conditions used were a hotstart of 3 min at 94° followed by 35 cycles of 98° for 10 sec and 68° for 20 min,
concluding with a final extension of 72° for 10 min. Genomic DNA from C. albicans strain SC5314
(Fonzi and Irwin 1993) was used for all sequence analysis in this work.
Complete Genome Sequence of Staphylococcus lugdunensis Strain HKU09-01 
Briefly, gap closures were performed by genomic PCR followed by DNA sequencing of amplification products 
on an ABI 3130xl sequencer (Applied Biosystems, CA). The finished sequence was validated by genome macrorestriction 
analysis using multiple rare-cutting enzymes and visualization by pulsed-field gel electrophoresis.
CBCB Finishing Toolbox
Finishing procedures with Dupfinisher
Here is the LANL finishing procedure involving Dupfinisher: 
1) run Dupfinisher on the assembly ace file;
2) put the artificial reads generated by Dupfinisher into the main project;
3) assemble with parallel Phrap;
4) repeat steps 1-3 with new ace file;
5) run Consed autoFinish on the main project and do only primer walks from the main project and those from subprojects of unfinished repeats;
6) repeat step 4;
7) run autoFinish using primer walks for the main project and those from subprojects of unfinished repeats and use PCR to close gaps between scaffolds in main project;
8) repeat step 4;
9) perform manual finishing including closing gaps, resolving low quality and single clone coverage regions and checking repeat resolutions from Dupfinisher.

Cliff S. Han1, Patrick Chain2, Finishing Repetitive Regions Automatically with Dupfinisher
[1]
Illumina reads -> EULER-SR :4233 contigs
+
454 reads
-> newbler : 270 hybrid contigs
+
paired 454 reads
-> newbler's scaffolder : 3 contigs (A:3.18 Mb, B:5.7 kb and C:524 kb) |  (unscaffolded contigs -> utilized later in the final Finishing phase)

[2]
+
(Hybrid EULER-SR/VELVET contigs, Unscaffoled contigs -> nucmer) & (Illumina reads -> mosaik aligner)
->scaffolder의 N들(degenerate nucleotides)을 채워넣어 finishing  (We developed a Scaffold Bridging and Finishing phase for the purpose of linking the de novo scaffolds and for resolving the intra-scaffold degenerate nucleotide positions that were introduced by the scaffolder)
potential repeats/duplications by examining the read coverage and also the multiplicity of the vertices in the repeat graph that is part of EULER-SR's output
->scaffold B was indeed duplicated and a BLAST [11] search identified it as an rRNA gene
=> A, B, B, C

[3]
-> ordering with PCR

[4] 
-> correct indel error : mosaik aligner with Illumina reads 

figure
Harish Nagarajan et al, De Novo Assembly of the Complete Genome of an Enhanced Electricity-Producing Variant of Geobacter sulfurreducens Using Only Short Reads
Zhou Yu, Tao Li, Jindong Zhao and Jingchu Luo, PGAAS: a prokaryotic genome assembly assistant system
=> ABBA와 같은 원리

rRNA

The positions of rRNA operons in the genome assembly were confirmed by long-range PCR amplification using primers that annealed to genes flanking the rRNA genes. These PCR fragments were sequenced to high redundancy and the consensus sequences were manually inserted into the assembly. Among the seven rRNA operons, the nucleotide sequences of 16S and 23S genes are at least 99% identical, differing by only one to three nucleotides in pairwise comparisons.Complete genome sequence of the industrial bacterium Bacillus licheniformis and comparisons with closely related Bacillus species

SEQanswer

SEQanswer


Reads Library

454 SE

454 PE

Solexa illumina

http://en.wikipedia.org/wiki/FASTQ_format

Relationship between Q and p
Relationship between Q and p using the Sanger (red) and Solexa (black) equations (described above). The vertical dotted line indicates p = 0.05, or equivalently, Q ≈ 13. (http://en.wikipedia.org/wiki/FASTQ_format)


p = 0.01, Q = 20
p = 0.001, Q = 30
Quality stats.png

Andrew D Smith et al, Using quality scores and longer reads improves accuracy of Solexa read mapping, BMC Bioinformatics 에 의하면 미스매치 클수록(4), 퀼리티 cutoff(8) 높을 수록, read 길이가 길 수록 mapping이 잘 된다고 함. 맵핑 소프트웨어도 제공함.

set1: original
set2: "." -> "N"
set3: divided into 4 files
set4: divided into 4 files, "." -> "N"
set5: original -> fastx toolkit의 fastq_masker를 사용하여 quality 10 기준으로 'N' 으로 바꿈

mapping took : gsMapper(newbler), mosaik aligner

Softwares

Software Version Input Output Location(machine/folder)
Newbler 2.3(091027_1459) panflam,panpyro
Phrap 0.990329(Phrap0.990329_patch) panflam
Phrap 1.090518 panflam
Consed 090206 panflam
CABOG(celera) 6.1 sanger, 454(.sff), illumina(fastq), fastq CABOG_output panflam,panpyro
maq 0.7.1 ref:fasta, read:illumina, long read(not good) panflam,panpyro
abyss [[1]] 1.2.0 454, illumina panflam
SOAPdenovo 1.04 illumina panflam
Corrector(soap package) 1.00 fasta,fastq panflam
GapCloser(soap package) 1.10 fasta,fastq panflam
MIRA sanger,454,illumina
gapResolution newbler results fasta,qual
Dupfinisher ace file
AutoEditor 1.20 .contig(TIGR)
rnammer 1.2 fasta gff2 panflam
hmmer 2.3.2(for rnammer), 3 panflam.panpyro
tRNAscan-SE 1.23 panflam,panpyro
BlastViewer panflam
M-GCAT panflam,panpyro


galaxy web-page(NGS tools)

fastx-toolkit

manuals

Introduction to Newbler (ppt) : 게시판

consed manual

about fake reads

phrap_input

phrap_input_v1.090518

phrap diff

phrap_v1.090518_shortread

create mate file from illumina for bambus

a blog very good at newbler

phrap사용법

454 sff 다루기

cabog 유용 옵션

Taxonomy

NCBI

   cellular organisms; Bacteria; Firmicutes; Clostridia; Clostridiales; Eubacteriaceae; Eubacterium


References

Pawel Mackiewicz, Where does bacterial replication start? Rules for predicting the oriC region, Nucleic Acids Research 2004 32(13):3781-3791 [2]

Personal tools
Namespaces
Variants
Actions
Site
Choi lab
Resources
Toolbox