Create mate file from illumina for bambus

From CSBLwiki

Jump to: navigation, search

This script i got from Sergey Koren from AMOS, (which i adapted a bit):

cat my.fasta |grep ">" |sed s/\>//g |sed 's/\/1*$/./g;s/\/2*$/./g'|awk -F "." '{print $1}' |sort |uniq -c |awk '{if ($1 == 2) print $2"/1\t"$2"/2\tsmall"}' > mates.txt

You need to put in the fasta file with the read names as 'my.fasta'.

The file 'my.fasta' requires filenames to end with /1 and /2. If you have other file names, like .x and .y. You should replace;

sed 's/\/1*$/./g;s/\/2*$/./g'

to for example;

sed 's/.x*$/./g;s/.y*$/./g'

in the code above.

If you have two fasta files. Just insert one and change; if ($1 == 2) to if ($1 == 1) in the code, this way you only have to run it for one file.

This will print the names to 'mates.txt'. Only thing to do is to set your library name and insert sizes on the top of this file.

Bambus will probably generate a lot of errors, because some names are not found in the .contig file. But this shouldn't be a problem.

Hope this works otherwise ask me.

Hmmm i see it, it's 454, that doesn't have a prefix like .x or /1. (sorry, i have never worked with 454 data before )

Can you tell me how your .contig file looks like?

The mate file should have the same name as the first string after the "#" line in the .contig file. This line represents which read has mapped to the contig (starting with ##).

So if the line with "#" starts with e.g. FZ92HC102IDBLW, followed by the offset in parantheses, like;

  1. FZ92HC102IDBLW(0)

you should extract the names out of both files and put them in the same file

If this is indeed the case, you can use my script i attached. Use it with;

perl file1 file2

It will generate a txt file with the mates. Only thing to do is put the library sizes at the top of the file.

more info about .contig file at

Hope this helps. Attached Files File Type: pl (820 Bytes, 7 views) Last edited by boetsie; 04-15-2010 at 05:25 AM.

Personal tools
Choi lab