Create mate file from illumina for bambus

From CSBLwiki

(Difference between revisions)
Jump to: navigation, search
 
(One intermediate revision not shown)
Line 1: Line 1:
 +
http://seqanswers.com/forums/showthread.php?t=4124&highlight=phrap
 +
 +
----
 +
 +
 +
This script i got from Sergey Koren from AMOS, (which i adapted a bit):
This script i got from Sergey Koren from AMOS, (which i adapted a bit):
Line 30: Line 36:
-
Quote:
 
-
Originally Posted by danix View Post
 
-
Thanx boetsie for your quick answer.
 
-
But I can't use your script in this project because the 454 outputs I have 454Reads.01.MID4.fna and 454Reads.02.MID4.fna, have sequences with different names, so all id is unique and it creates a mates.txt empty.
 
-
Besides, the other bacteria I'm working with has only one fasta from 454.
 
-
 
-
Both fasta are like this:
 
-
>F35ERS102DJ7GS rank=0000002 x=1343.0 y=826.0 length=56
 
-
ATCAGACACGGAGGCGTACGCGCCGCTGTTCCAGGTGATGCTGGCATTCC AGAACA
 
-
>F35ERS102DBYUE rank=0000006 x=1249.0 y=1428.0 length=69
 
-
ATCAGACACGCCGCCGGCACCTTCGCCGCTGCCGCGCTCGCCACCGGTGG CACCCGTCGT
 
-
GCTGTGGTC
 
-
>F35ERS102C47FN rank=0000036 x=1172.0 y=1361.0 length=68
 
-
ATCAGACACGAGGTGAAGACCGGTTTCCGTCGCGGCGGAGAATAGCCGAA CATCAGCGCG
 
-
CGATCGGG
 
-
 
-
I'm wondering if there is a way to create the .mates from the data I have. Any other idea?
 
-
 
-
Thanx
 
-
Complementing the information I gave before:
 
-
454Reads.01.MID4.fna is like this:
 
-
>FZ92HC101CZUHH length=41 xy=1111_1155 region=1 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
>FZ92HC101DJEHD length=46 xy=1334_0127 region=1 run=R_2009_08_04_12_33_02_
 
-
GTCTCGCGTCGTGTCTTCGCGTCGTATGCGGTACTGGTCAGGCGTT
 
-
 
-
454Reads.02.MID4.fna is like this:
 
-
>FZ92HC102IDBLW length=40 xy=3315_0370 region=2 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
>FZ92HC102JYG94 length=40 xy=3966_0618 region=2 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
 
-
Can I extract any information from these fastas to create a .mates?
 
-
Thanx
 
-
danix is offline Report Post  Reply With Quote Multi-Quote This Message Quick reply to this message
 
-
danix
 
-
View Public Profile
 
-
Find More Posts by danix
 
-
Add danix to Your Contacts
 
-
Old 04-15-2010, 04:50 AM   #15
 
-
boetsie
 
-
Member
 
-
 
-
Join Date: Feb 2010
 
-
Location: NL
 
-
Posts: 10
 
-
 
-
Default
 
-
Quote:
 
-
Originally Posted by danix View Post
 
-
Complementing the information I gave before:
 
-
454Reads.01.MID4.fna is like this:
 
-
>FZ92HC101CZUHH length=41 xy=1111_1155 region=1 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
>FZ92HC101DJEHD length=46 xy=1334_0127 region=1 run=R_2009_08_04_12_33_02_
 
-
GTCTCGCGTCGTGTCTTCGCGTCGTATGCGGTACTGGTCAGGCGTT
 
-
 
-
454Reads.02.MID4.fna is like this:
 
-
>FZ92HC102IDBLW length=40 xy=3315_0370 region=2 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
>FZ92HC102JYG94 length=40 xy=3966_0618 region=2 run=R_2009_08_04_12_33_02_
 
-
CGCGCGTTCTCGTACGGCTCGCTGTATCCGACNCGCGCGC
 
-
 
-
Can I extract any information from these fastas to create a .mates?
 
-
Thanx
 
Hmmm i see it, it's 454, that doesn't have a prefix like .x or /1. (sorry, i have never worked with 454 data before )
Hmmm i see it, it's 454, that doesn't have a prefix like .x or /1. (sorry, i have never worked with 454 data before )

Latest revision as of 05:59, 13 July 2010

http://seqanswers.com/forums/showthread.php?t=4124&highlight=phrap



This script i got from Sergey Koren from AMOS, (which i adapted a bit):

cat my.fasta |grep ">" |sed s/\>//g |sed 's/\/1*$/./g;s/\/2*$/./g'|awk -F "." '{print $1}' |sort |uniq -c |awk '{if ($1 == 2) print $2"/1\t"$2"/2\tsmall"}' > mates.txt

You need to put in the fasta file with the read names as 'my.fasta'.

The file 'my.fasta' requires filenames to end with /1 and /2. If you have other file names, like .x and .y. You should replace;

sed 's/\/1*$/./g;s/\/2*$/./g'

to for example;

sed 's/.x*$/./g;s/.y*$/./g'

in the code above.

If you have two fasta files. Just insert one and change; if ($1 == 2) to if ($1 == 1) in the code, this way you only have to run it for one file.

This will print the names to 'mates.txt'. Only thing to do is to set your library name and insert sizes on the top of this file.

Bambus will probably generate a lot of errors, because some names are not found in the .contig file. But this shouldn't be a problem.

Hope this works otherwise ask me.




Hmmm i see it, it's 454, that doesn't have a prefix like .x or /1. (sorry, i have never worked with 454 data before )

Can you tell me how your .contig file looks like?

The mate file should have the same name as the first string after the "#" line in the .contig file. This line represents which read has mapped to the contig (starting with ##).

So if the line with "#" starts with e.g. FZ92HC102IDBLW, followed by the offset in parantheses, like;

  1. FZ92HC102IDBLW(0)

you should extract the names out of both files and put them in the same file

If this is indeed the case, you can use my script i attached. Use it with;

perl testmates.pl file1 file2

It will generate a txt file with the mates. Only thing to do is put the library sizes at the top of the file.

more info about .contig file at http://www.cbcb.umd.edu/research/con...entation.shtml

Hope this helps. Attached Files File Type: pl testmates.pl (820 Bytes, 7 views) Last edited by boetsie; 04-15-2010 at 05:25 AM.

Personal tools
Namespaces
Variants
Actions
Site
Choi lab
Resources
Toolbox