Genome assembly2
From CSBLwiki
(Difference between revisions)
(→rRNA operon of Other species) |
|||
(15 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
=Lactobacillus genus= | =Lactobacillus genus= | ||
+ | ==Circular view== | ||
+ | [[File:NC 015214.png|400px]] | ||
+ | [[File:NC 015213.png|200px]] | ||
+ | [[File:NC 015218.png|250px]] | ||
==copy # and position of rRNA operon== | ==copy # and position of rRNA operon== | ||
*copy # : 4 | *copy # : 4 | ||
Line 57: | Line 61: | ||
==Read== | ==Read== | ||
- | + | {| class="wikitable" style="text-align:center" border="1" | |
- | * | + | |+ |
- | * | + | |- |
+ | |Flatform || Read Type || Total Reads || Number of Reads Used || Number of Bases Used || Percent Reads Assembled || Percent Bases Assembled | ||
+ | |- | ||
+ | |Solexa Illumina || SE || 5359073 || - || - || - || - | ||
+ | |- | ||
+ | | Fake Reads(FR) (Solexa/Illumina)|| SE || 8390 || 11964248|| 8243 || 98.25 || 95.15 | ||
+ | |- | ||
+ | | FR (CABOG) || SE || 7020 || 10529913 || 7017 || 99.96 || 99.93 | ||
+ | |- | ||
+ | | Roche 454 || PE || 158188 (270784) || || || || | ||
+ | |- | ||
+ | | Roche 454 || PE || 235924 (364291) || || || || | ||
+ | |} | ||
+ | |||
+ | *Solexa and 454 derived from different strain. | ||
+ | **Only use 454 reads. | ||
+ | |||
+ | ==Assembly flow== | ||
+ | *illumina -> ABySS: 4336 contigs, total 4M; very bad -> fake reads | ||
+ | *fake reads(ABySS illumina) + 454reads -> Newbler : 443 scaffolds; very bad => reject | ||
+ | |||
+ | |||
+ | *only 454 -> Newbler,gapResolution -> 15scf, 21ctg -> '''Running PCR NOW''' | ||
+ | *only 454 -> CABOG -> compare(mapping) to Newbler (by nucmer & mummerplot) -> '''some disagree in scaffolds between 2 softwares''' | ||
+ | **CABOG can assemble rRNA operon | ||
+ | |||
+ | *found plasmid | ||
+ | |||
+ | *gap 10_2-10_3: filled by CABOG contig | ||
+ | |||
+ | ==Assembly Result== | ||
+ | |||
+ | {| class="wikitable" style="text-align:center" border="1" | ||
+ | |+ | ||
+ | |- | ||
+ | |Assembler || Contig Type || Number of Contigs || Total bases || | ||
+ | |- | ||
+ | |velvet || contigs || 14702 || 2674691 || | ||
+ | |- | ||
+ | |ABySS || Large Contig (>500bp) || 4336 || 4121677 || *very bad | ||
+ | |- | ||
+ | |Newbler(FR(Sol/Ill) + 454 || Scaffolds || 443 || 4585959 || *very bad, reject | ||
+ | |- | ||
+ | |Newbler(454 only), gapResolution || Scaffolds || 15 || 2058137 || Running PCR | ||
+ | |- | ||
+ | |Newbler(454 only), gapResolution || Contigs || 21 || 2053877 || Running PCR | ||
+ | |- | ||
+ | |CABOG(454 only) || Scaffolds (>500bp)|| 13 || 2120461 || | ||
+ | |- | ||
+ | |CABOG(454 only) || Contigs (>500bp) || 26 || 2119042 || | ||
+ | |- | ||
+ | |Newbler(FR(CABOG) +454) || Scaffolds || 7 || 2051269 || | ||
+ | |- | ||
+ | |Newbler(FR(CABOG) +454) || Scaffolds || 59 || 2012620 || | ||
+ | |} | ||
+ | |||
+ | |||
+ | *1 chromosome and 2 plasmids are assembled | ||
+ | **1986681bp, 7197bp, 12568bp | ||
+ | |||
+ | ==Links== | ||
+ | ===Scripts=== | ||
+ | *[http://www.genome.ou.edu/informatics.html Various Perl Scripts] | ||
+ | ===Etc=== | ||
+ | *NCBI [http://www.ncbi.nlm.nih.gov/Genomes/ Genomes] | ||
+ | *[http://www.genoscope.cns.fr/spip/Lactobacillus-bulgaricus-whole.html ''Lactobacillus delbrueckii bulgaricus''] by Genoscope | ||
+ | *[http://seqanswers.com/forums/showthread.php?t=21 Tech Summary: Illumina's Solexa Sequencing Technology] | ||
+ | *Case stuides: | ||
+ | #[http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000186 Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads] | ||
+ | ##[http://www.cbcb.umd.edu/research/SR-assembly-tutorial.shtml SR assembly Tutorial] | ||
+ | #[http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000139 High-Precision, Whole-Genome Sequencing of Laboratory Strains Facilitates Genetic Studies] | ||
+ | #[http://www.genome.org/cgi/content/full/18/5/802 De novo bacterial genome sequencing: Millions of very short reads ...] | ||
+ | #[http://genome.cshlp.org/content/18/2/324.full Short read fragment assembly of bacterial genomes...] | ||
+ | #[http://dx.doi.org/10.1101/gr.079053.108 De novo fragment assembly with short mate-paired reads: Does the read length matter?] ([[media:GR2008.pdf|PDF]]) | ||
+ | *Solexa format & Fastq format | ||
+ | #[http://seqanswers.com/forums/showthread.php?t=330 sequence quality in solexa format] | ||
+ | *Benchmark papers | ||
+ | #A Draft Genome Sequence of Pseudomonas syringae pv. tomato T1 Reveals a Type III Effector Repertoire Significantly Divergent from That of Pseudomonas syringae pv. tomato DC3000, MPMI(2008) ([[media:Mpmi-22-1-0052.pdf|PDF]]) | ||
+ | #[http://www.nature.com/ng/journal/v40/n8/full/ng.195.html High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi, Nature Genetics (2008)] |
Latest revision as of 09:03, 29 July 2011
Contents |
Lactobacillus genus
Circular view
copy # and position of rRNA operon
- copy # : 4
- position
- 2h,3h, 5t, 7h, 9h, 11h, 13h
Terminus
- scf13 : oriC, 155nt, 56722..56876 nt
- scf10_4
rRNA operon of Other species
Name | rRNA operon | Total Len | transposase |
Lactobacillus_acidophilus_NCFM | 4 | 2M | 40 |
Lactobacillus_brevis_ATCC_367 | 5 | 2.2M | |
Lactobacillus_casei | 5 | 3M | |
Lactobacillus_casei_ATCC_334 | 5 | 2.9M | |
Lactobacillus_casei_Zhang_uid50673 | 5 | 2.8M | |
Lactobacillus_crispatus_ST1_uid48359 | 4 | 2M | |
Lactobacillus_delbrueckii_bulgaricus | 9 | 1.8M | |
Lactobacillus_delbrueckii_bulgaricus_ATCC_BAA-365 | 9 | 1.8M | |
Lactobacillus_fermentum_IFO_3956 | 5 | 2.1M | |
Lactobacillus_gasseri_ATCC_33323 | 6 | 1.9M | |
Lactobacillus_helveticus_DPC_4571 | 4 | 2.1M | 260 |
Lactobacillus_johnsonii_FI9785 | 4 | 1.8M | |
Lactobacillus_johnsonii_NCC_533 | 6 | 2M | |
Lactobacillus_plantarum | 5 | 3.3M | |
Lactobacillus_plantarum_JDM1 | 5 | 3.2M | |
Lactobacillus_reuteri_DSM_20016 | 6 | 2M | |
Lactobacillus_reuteri_F275_Kitasato | 6 | 2M | |
Lactobacillus_rhamnosus_GG | 5 | 3M | |
Lactobacillus_rhamnosus_Lc_705 | 5 | 3M | |
Lactobacillus_sakei_23K | 7 | 1.9M | |
Lactobacillus_salivarius_UCC118 | 7 | 1.8M |
Read
Flatform | Read Type | Total Reads | Number of Reads Used | Number of Bases Used | Percent Reads Assembled | Percent Bases Assembled |
Solexa Illumina | SE | 5359073 | - | - | - | - |
Fake Reads(FR) (Solexa/Illumina) | SE | 8390 | 11964248 | 8243 | 98.25 | 95.15 |
FR (CABOG) | SE | 7020 | 10529913 | 7017 | 99.96 | 99.93 |
Roche 454 | PE | 158188 (270784) | ||||
Roche 454 | PE | 235924 (364291) |
- Solexa and 454 derived from different strain.
- Only use 454 reads.
Assembly flow
- illumina -> ABySS: 4336 contigs, total 4M; very bad -> fake reads
- fake reads(ABySS illumina) + 454reads -> Newbler : 443 scaffolds; very bad => reject
- only 454 -> Newbler,gapResolution -> 15scf, 21ctg -> Running PCR NOW
- only 454 -> CABOG -> compare(mapping) to Newbler (by nucmer & mummerplot) -> some disagree in scaffolds between 2 softwares
- CABOG can assemble rRNA operon
- found plasmid
- gap 10_2-10_3: filled by CABOG contig
Assembly Result
Assembler | Contig Type | Number of Contigs | Total bases | |
velvet | contigs | 14702 | 2674691 | |
ABySS | Large Contig (>500bp) | 4336 | 4121677 | *very bad |
Newbler(FR(Sol/Ill) + 454 | Scaffolds | 443 | 4585959 | *very bad, reject |
Newbler(454 only), gapResolution | Scaffolds | 15 | 2058137 | Running PCR |
Newbler(454 only), gapResolution | Contigs | 21 | 2053877 | Running PCR |
CABOG(454 only) | Scaffolds (>500bp) | 13 | 2120461 | |
CABOG(454 only) | Contigs (>500bp) | 26 | 2119042 | |
Newbler(FR(CABOG) +454) | Scaffolds | 7 | 2051269 | |
Newbler(FR(CABOG) +454) | Scaffolds | 59 | 2012620 |
- 1 chromosome and 2 plasmids are assembled
- 1986681bp, 7197bp, 12568bp
Links
Scripts
Etc
- NCBI Genomes
- Lactobacillus delbrueckii bulgaricus by Genoscope
- Tech Summary: Illumina's Solexa Sequencing Technology
- Case stuides:
- Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads
- High-Precision, Whole-Genome Sequencing of Laboratory Strains Facilitates Genetic Studies
- De novo bacterial genome sequencing: Millions of very short reads ...
- Short read fragment assembly of bacterial genomes...
- De novo fragment assembly with short mate-paired reads: Does the read length matter? (PDF)
- Solexa format & Fastq format
- Benchmark papers
- A Draft Genome Sequence of Pseudomonas syringae pv. tomato T1 Reveals a Type III Effector Repertoire Significantly Divergent from That of Pseudomonas syringae pv. tomato DC3000, MPMI(2008) (PDF)
- High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi, Nature Genetics (2008)