Consed Install

From CSBLwiki

Jump to: navigation, search

3.5) Put the Consed executable in /usr/local/genome/bin (or $CONSED_HOME/bin)

Read the appropriate section of this document: NOTE TO SOLARIS USERS, NOTE TO MACOSX USERS, NOTE TO LINUX USERS (32 BIT), NOTE TO LINUX USERS (64 BIT) or (if you are running Linux on an Itanium--a big 64 bit box) NOTE TO ITANIUM LINUX USERS.

3.6) In /usr/local/genome/bin (or $CONSED_HOME/bin):

Type: ln -s (consed executable name) consed

where (consed executable name) is the name of one of:

consed_linux64bit consed_linux32bit consed_linux_itanium consed_mac consed_solaris consed_solaris_intel

This enables you to just use "consed" instead of consed_linux32bit (or whatever) in all commands to consed. It is also important since the scripts refer to "consed" rather than any of the names such as "consed_linux32bit".

3.7) Make sure that /usr/local/genome/bin (or $CONSED_HOME/bin) is in every Consed users' PATH.

3.8) Check this by logging on as a user and typing:

rehash (don't worry if the rehash command says "not found") consed -V

You should see 'Version 19.0'. If you see something else, you have some debugging to do.

3.9) Check that the correct version of cross_match is installed by typing:

cross_match

You should see:

> cross_match

cross_match cross_match cross_match version 1.080812

cross_match version 1.080812 Reading parameters ... 1.008 Mbytes allocated -- total 1.008 Mbytes

Run date:time 081205:135315 Run date:time 081205:135315 FATAL ERROR: Sequence files must be specified on command line. See documentation.

FATAL ERROR: Sequence files must be specified on command line. See documentation.

where 1.080812 is a date in the form YYMMDD. It must be this date or more recent. Otherwise, follow the instructions above for getting cross_match (which is part of the phrap package).

3.10) SETTING UP TEST DIRECTORIES

Copy the test directories and their contents to some location where the users have write access. Copy--do not move them--because the users will occasionally want a fresh copy. I've written the command to make it easy for you to cut/paste from this document to the command line:

cp -r 454_newbler align454reads align454reads_answer assembly_view \nautofinish solexa_example solexa_example_answer polyphred standard \nselectRegions selectRegionsAnswer \n(new_location)

cd (new_location) chmod -R a+w *

3.11) PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION

From tne (new_location) where you put the test directories, type the following:

cd standard/edit_dir

3.12) start Consed by typing consed

If you get some error such as:

Error: Can't open display:

then the problem probably has nothing to do with Consed, but rather with X. To test this, run some other X application (such as xclock, xterm, xeyes, or xcalc) and see if you get the same error. (If you are running on MACOSX, you must start X11 and then consed in an xterm--see NOTE TO MACOSX USERS below.) The problem may be due to your X emulator. See 'MONITORS AND MICE FOR CONSED' below.

Don't worry about a message like: Warning: Cannot convert string "helvetica" to type FontStruct

Two windows will appear. One of these will have the list of .ace files and say 'select assembly file to open' and 'standard.fasta.screen.ace.1'. Double click on "standard.fasta.screen.ace.1". The first window goes away.

You will now see a list of one contig and a list of reads. This is the

'Consed Main Window'.

Double click on 'Contig1'.

The 'Aligned Reads Window' will appear.

If it does, consider this preliminary test successful.

3.13) Build phd2fasta: Go to the misc/phd2fasta directory and type 'make' Move the phd2fasta executable to /usr/local/genome/bin (or $CONSED_HOME/bin)

3.14) Build mktrace: Go to the misc/mktrace directory and type 'make' (If you get any warnings about "gets", ignore them.) Move the mktrace executable to /usr/local/genome/bin (or $CONSED_HOME/bin)

3.15) Build the 454 software: Go to the misc/454 directory and type (see below for solaris)

gcc sff2scf.c -o sff2scf

(or substitute your compiler for gcc) Move sff2scf into /usr/local/genome/bin (or $CONSED_HOME/bin)

For solaris, type:

gcc -DSOLARIS sff2scf.c -o sff2scf

3.16) Move all perl scripts from the scripts directory to /usr/local/genome/bin (or $CONSED_HOME/bin) Make sure all are executable by typing: chmod a+x * Make sure all are readable by typing: chmod a+r *

3.17) Create a subdirectory /usr/local/genome/lib (or $CONSED_HOME/lib)

3.18) In /usr/local/genome/lib (or $CONSED_HOME/lib), put phredpar.dat which comes with phred

3.19) Create a subdirectory /usr/local/genome/lib/screenLibs. (If you are using a location other than /usr/local/genome for the root of all Phred/Phrap/Consed programs, create $CONSED_HOME/lib/screenLibs).

3.20) From the misc subdirectory, copy the following files to the directory /usr/local/genome/lib/screenLibs (or $CONSED_HOME/lib/screenLibs).

filter454Reads.fa primerCloneScreen.seq primerSubcloneScreen.seq repeats.fasta sffLinkers.fa singleVectorForRestrictionDigest.fasta vector.seq

filter454Reads.fa is the puc19 vector used to produce 454 reads. 454 reads containing puc19 vector are eliminated.

primerCloneScreen.seq is used to screen candidate primers when you use Consed's function "Pick Primer from Clone Template" (on the Aligned Reads Window).

primerSubcloneScreen.seq is used to screen candidate primers when you use Consed's function "Pick Primer from Subclone Template" (on the Aligned Reads Window).

repeats.fasta is used to tag repeats (to put a blue line under the bases)

vector.seq is used to mask the parts of reads that are from vector rather than insert

sffLinkers.fa contains the linkers for 454 reads that separate the 2 reads of a read pair.

Take a look at files primerCloneScreen.seq, primerSubcloneScreen.seq, repeats.fasta, and vector.seq: They are dummy files indicating the fasta format of the sequences that should be put in them.

3.21) You should put into primerCloneScreen.seq the vector sequence of the cloning vectors you are using (BAC or cosmid) and into primerSubcloneScreen.seq the sequencing vectors you are using (plasmid, M13, etc). Don't be too generous in putting lots of vectors into the files! The larger they are, the slower primer picking will be. Our files are only this big:

-rw-r--r-- 1 root root 29938 Nov 7 1997 primerCloneScreen.seq -rw-r--r-- 1 root root 7381 Aug 13 1997 primerSubcloneScreen.seq

and primer picking is quite fast enough.

TESTING PRIMER PICKING

3.22) Follow the steps above under PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION to bring up the Aligned Reads Window on Contig1.

Go to some location near the right end of the contig, say base 2470. Click with the right mouse button on the consensus and click on either one of the top strand primer choices (either from subclone template or from clone template). Consed will pause a moment, and then there will appear a selection of primers that pass all of Consed's requirements. (If you get an error message, Consed might not have been correctly installed. See INSTALLING CONSED above.) Templates are also chosen for each primer. You may have to scroll the primer list to the right to see the templates. Consed lists these templates in order of quality--all of them will cover the read you want to make.

3.23) You should put into the file /usr/local/genome/lib/screenLibs/vector.seq

(or $CONSED_HOME/lib/screenLibs/vector.seq if you are not using /usr/local/genome for the root of the Phred/Phrap/Consed files.)

the vector sequences (in FASTA format) that you want to mask out before running phrap. In general, it is the combination of primerCloneScreen.seq and primerSubcloneScreen.seq. I've given you a dummy file, but you should replace it with your real vector.

3.24) You should put into the file /usr/local/genome/lib/screenLibs/repeats.fasta

(or $CONSED_HOME/lib/screenLibs/repeats.fasta if you are not using /usr/local/genome for the root of the Phred/Phrap/Consed files.)

any sequences (in FASTA format) that you want to have automatically tagged (visibly marked by a blue line in Consed). These typically are ALU sequences. If you don't want to tag anything, then comment out (put '#' as the first character of the line) the following lines in phredPhrap:

To not tag anything, change: !system( "$tagRepeats $szAceFileToBeProduced" )

 || die "some problem running $tagRepeats";

to:

!system( "$tagRepeats $szAceFileToBeProduced" )
|| die "some problem running $tagRepeats";

3.25) You should create a file /usr/local/genome/lib/screenLibs/singleVectorForRestrictionDigest.fasta containing the cloning vector sequence. This is used for doing in-silico restriction digests. Thus this cloning vector must start at precisely the site where you cut the (circular) vector to ligate the insert. It is not sufficient to just download the vector sequence from Genbank because they may start the sequence at a different site.

3.26) ENOUGH MEMORY FOR CONSED

Enough memory is vital with large datasets. Even if you have enough physical memory, the operating system may not allow a single process to use it all.

In csh or tcsh type:

limit

You should see something like this:

cputime unlimited filesize unlimited datasize 2097148 kbytes stacksize 8192 kbytes coredumpsize 0 kbytes vmemoryuse unlimited descriptors 64

Type: limit datasize unlimited Then type: limit just to see that the number has changed.

3.27) Make sure you have enough swap space to support the amount of RAM on the computer.

To get you started for doing the demonstration, I've provided such a file that will work for the test datasets, but will not work for your own data.

3.28) TESTING THE INSTALLATION

After installing Consed, you should run all the following tests to make sure you have installed everything correctly:

If one of the tests (below) fails with a message like:

"couldn't execute ..."

then you can troubleshoot the problem by going to the directory where this error occurred and type the command that failed. If the command includes any output redirection (e.g, 2>/dev/null or >>temp or >temp), remove everything that occurs on the line after the 2> or > so that

all output comes to your screen.

3.29) TESTING ADDING SOLEXA READS

Follow the 8 steps under "ADDING SOLEXA READS" (below)

Troubleshooting: If you get an error like this:

couldn't execute time /home/genome/BioSw/consed18/bin/cross_match

reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0
-minscore 25 -gap1_only -repeat_screen 2
>>alignmentFile.081205_130653.cross.0 2>/dev/null

then run it on the command line without "time" and without the ">>" and "2>" so you can see any errors:

/home/genome/BioSw/consed18/bin/cross_match

reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0
-minscore 25 -gap1_only -repeat_screen 2

If this says: FATAL ERROR: Command line option -gap1_only not recognized that indicates that you are not running the correct version of cross_match (see above).

3.30) TESTING ADDING 454 READS

Follow the 4 steps under "USING 454 READS (ALIGNING TO REFERENCE SEQUENCE )" (below)

3.31) TESTING 454 READS (NEWBLER ASSEMBLY)

Follow the first 6 steps under "USING 454 READS (NEWBLER ASSEMBLY)" and especially be sure that the traces pop up.

3.32) TESTING ADD NEW READS

It will make your life easier if phred, phrap, and cross_match are all where Consed expects them: in /usr/local/genome/bin

3.33) Decide where to put phred's parameter file phredpar.dat and edit both addReads2Consed.perl and phredPhrap to reflect this location. I generally prefer to put it in /usr/local/genome/lib to keep all of the Phred/Phrap/Consed files in one place.

3.34) Next you should test the ADD NEW READS step in the Quick Tour (below). This step requires that everything be set up correctly and in the correct location. Hopefully the error messages are clear enough to help you if you have set up anything incorrectly.

3.35) TESTING RUNNING CROSS_MATCH FROM ASSEMBLY VIEW

See RUNNING CROSS_MATCH FOR SEQUENCE MATCHES (below) and make sure that step works.

3.36) TEST RUNNING PHREDPHRAP

See the section RUNNING PHRED and PHRAP (below)

3.37) TESTING MINIASSEMBLIES

See PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES) and MINIASSEMBLIES (below) and make sure those steps work.

The newer version of phredPhrap is required for this. If you have invested a lot of work customizing some ancient version of phredPhrap (e.g., 10 years old), and don't want to upgrade, you do have the option of keeping your customized version of phredPhrap for regular assemblies, and using the new version of phredPhrap for miniassemblies. To do this, you must specify the alternate name/location of phredPhrap by the .consedrc parameter:

consed.fullPathnameOfMiniassemblyScript: /usr/local/genome/bin/phredPhrap

(See CONSED CUSTOMIZATION below.)

NOTE: You might be done installing consed --------

The following 4 installation steps are only necessary if you are using autofinish or consed's primer picker *and* if you are using Sanger reads. Otherwise, you can skip:

MODIFYING determineReadTypes.perl TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl FAKE READS APPENDING EXPID TO THE PHD FILES

3.38) MODIFYING determineReadTypes.perl

Read the comments in determineReadTypes.perl

Phrap, Consed's primer picking, and Consed/Autofinish all need the

following information for each read:

         is it a univeral primer forward, a universal primer reverse,  
            or a walking read?
         what is its template name?

If you are using different libraries that have different insert sizes, then Consed/Autofinish also need the library name for each read.

Generally this information can be determined from the read name, using

your* naming convention. Modify the perl script

determineReadTypes.perl to put this information at the end of the phd file using WR info items.

If you don't want to do much perl programming and all your libraries have the same insert size, you have the option of using the St Louis naming convention. In this case, you needn't do anything with determineReadTypes.perl

You must also uncomment (remove the "#"s in column 1) the lines in the phredPhrap script that say roughly:

print "

";

print "Now running determineReadTypes.perl...

";

print "--------------------------------------------------------

";

!system( "$determineReadTypes" ) || die "some problem running determineReadTypes.perl $!

";

But what is the St Louis naming convention? Most of it (but not all) is explaned in the file phrap.doc that comes with phrap. In addition, you must never use an underscore in the name if the read is a universal primer forward or universal primer reverse read. If the read is a walk, then you must have an underscore (_) follow the template name and then have a number (the oligo number).

Examples of reads in the St Louis naming convention:

read eeq03a01.g1 is univ rev template: eeq03a01 library: eeq03 read eeq03a02.b1 is univ fwd template: eeq03a02 library: eeq03 read eeq03a02.g1 is univ rev template: eeq03a02 library: eeq03 read eeq03a03.b1 is univ fwd template: eeq03a03 library: eeq03 read eej45h07_2.i1 is walk template: eej45h07 library: eej45 read eej46c12_1.i1 is walk template: eej46c12 library: eej46

Once you have correctly customized determineReadTypes.perl, then uncomment the line in phredPhrap which calls determineReadTypes.perl

It is fine to assume the St Louis naming convention for the purpose of the sample dataset directories that come with Consed ("standard", "assembly_view", "autofinish", and "polyphred").

3.39) TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl

Consed allows you to check that you have correctly modified determineReadTypes.perl: On the Consed Main Window, point to 'Info', hold down the left mouse button, and release on 'Show Info for Each Read'. Study all the information and check that the information presented is correct. If, for example, Consed thinks that there are templates that have 9 or more reads, it is likely that you have not correctly customized determineReadTypes.perl

You will see a section that looks like this:

template djs736a2_fp04q286 with 2 reads

   djs736a2_fp04q286.x2 term     universal forward (from phd file)
   djs736a2_fp04q286.y2 term     universal reverse (from phd file)

You want to see the "from phd file" part. If, instead of "from phd file", it says "inferred from name", that means that determineReadTypes.perl couldn't figure out what kind of read it was.

If you think you have made a mistake in customizing determineReadTypes.perl, it is best to delete the PHD files (and phd.ball if you are using that) and run phredPhrap again since the otherwise incorrect WR items will be left in the PHD files.

There is more specific documentation within the script determineReadTypes.perl for more information about how to customize it.

CUSTOMIZING determineReadTypes.perl: SPECIAL CASES

3.40) FAKE READS

By "fake reads" I mean reads such as those created from a Genbank reference sequence or a consensus from some other assembly... or others for which there is no chromatogram (and there never was any chromatogram). If you don't use any such reads, you can skip this step.

In the past, any read that ended with a .a2 or .c3 (where 2 and 3 could be any numbers), was considered a fake read. Now you can make Autofinish not assume this using the .consedrc parameter (see CONSED CUSTOMIZATION):

consed.fakeReadsSpecifiedByFilenameExtension: false

Instead, you must have determineReadTypes.perl put "fake" into the "type:" field of a "template" WR item. See determineReadTypes.perl for more information.

3.41) APPENDING EXPID TO THE PHD FILES

If you are not using Autofinish, you can skip this step. If you are using Autofinish, and would like Autofinish to tell you how well your reads are succeeding, then the phd files must be appended with the experiment id's. In the 3 Autofinish summary files (*.univReverse,

.univForwards, and *.customPrimers), you will see information like

this:

univ rev,,,->,-329,-249,71,Contig1,3,djs228_1034

or this:

tgaagaaatggctgactcc,56,1,->,3258,3338,3658,Contig1,4,djs228_2813,5,djs228_168,6,djs228_1248

The '3' just before the djs228_1034 on the line starting with "univ rev" is an experiment id. There is also an expid '4' just before djs228_2813, an expid '5' before djs228_168, and an expid '6' just before djs228_1248.

Autofinish doesn't know what you will end up calling these reads it is telling you to make. Autofinish only knows those reads by the numbers 3, 4, 5, and 6. So when you make the reads, Autofinish needs to be informed that this is 'experiment 3' or whatever. You do this by appending in the phd file the following structure:

WR{ expid addExpid 990811:140818 5 }

where WR stands for 'whole read item',

     expid for 'expid'
     addExpid is the name of the program that you will write that
           will append this information
     990811:140818 is the date and time in format YYMMDD:HHMISS
     5 is the expid

This program must be run *after* phred runs to create the phd files. Thus your program must have some method of determining what the expid of each read is. What the University of Washington Genome Center does is to have the finishers put the expid as part of the filename. This makes it easy for a program to look at the phd file and figure out what the expid is and then write the WR item into that phd file.

Alternatively, you could keep a database and, after the phd file is created, look into the database to see what the expid is.

When you have successfully added expid's to the phd files, the next time you run Autofinish on this project, see the 'EVALUATE' section of the Autofinish output file--you will see lots of interesting information about how well the reads succeeded.

Consed Install

From CSBLwiki

Personal tools

Namespaces

Variants

Views

Actions

Search

Site

Choi lab

Resources

Toolbox