Evolutionary age

From CSBLwiki

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
{|align="left" cellpadding="15"
+
{|align="left" cellpadding="25"
| __TOC__  
| __TOC__  
|}
|}
-
==Evolutionary age of protein domains==
+
=Evolutionary age of protein domains=
(Based on this reference)
(Based on this reference)
<biblio>Reference pmid=16959887</biblio>
<biblio>Reference pmid=16959887</biblio>
-
===Pfam data (24.0)===
+
==Data==
 +
===Pfam release 24.0 (2009. Oct)===
*The [http://pfam.sanger.ac.uk Pfam] database
*The [http://pfam.sanger.ac.uk Pfam] database
**ftp [ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/ current_release]
**ftp [ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/ current_release]
Line 49: Line 50:
*[http://www.arb-silva.de/ Silva]: SSU rRNA database
*[http://www.arb-silva.de/ Silva]: SSU rRNA database
-
===Results===
+
==Results==
*Species in Pfam (NCBI taxonomy): 148,925 species
*Species in Pfam (NCBI taxonomy): 148,925 species
*Genus (taxonomy): 31,949
*Genus (taxonomy): 31,949
Line 70: Line 71:
</pre>
</pre>
*Now, you have a list of Pfam having taxonomic distribution (ready to map)
*Now, you have a list of Pfam having taxonomic distribution (ready to map)
-
====Building the universal phylogenetic tree====
+
===Building the universal phylogenetic tree===
===Procedure===
===Procedure===

Revision as of 13:43, 16 August 2010

Contents

Evolutionary age of protein domains

(Based on this reference)

Error fetching PMID 16959887:
  1. Error fetching PMID 16959887: [Reference]

Data

Pfam release 24.0 (2009. Oct)

# download total DB (estimated ~2 days)
wget -c ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database.tar &
The pfamA_reg_full_significant and pfamA_reg_full_insignificant tables contain,
as the names suggest, the significant and insignificant data respectively.
Significant hits are those with a bits score above the curated threshold for the family,
whilst insignificant matches are those that score below the curated threshold.
With respect to the tables that contain significant data (pfamA_reg_full_significant and
pfamA_reg_full), there is an extra column called 'in_full'.
The matches that are present in the full alignment for a Pfam family have this column set to 1,
while those that are not present in the full alignment have the 'in_full' column set to 0.
Where there is an overlapping fragment match and a full length match to the same Pfam-A family, only one of the matches will be present in the full alignment for that Pfam-A family. 
mysql -u user -p
mysql>create DATABASE pfam24; \q
mysql -u user -p pfam24 < FULL_PATH/pfamseq.sql
mysql -u user -p
mysql>use pfam24
mysql>load data local infile 'pfamseq.txt' into table pfamseq FIELDS ENCLOSED BY "\'";
mysql -u user -p'passwd' database < loadscript.sql &

Tree data

Results

# pfamA list
select distinct pfamA_id, auto_pfamA FROM pfamA;
# protein sequences of each auto_pfamA (pfamA_id)
select auto_pfamseq from pfamA_reg_full_significant WHERE auto_pfamA = '';
# taxonomic distribution
SELECT DISTINCT species,taxonomy,ncbi_code FROM pfamseq WHERE auto_pfamseq = 'auto_pfamseq';
or
SELECT DISTINCT species, taxonomy, ncbi_code FROM pfamseq seq, pfamA pf, pfamA_reg_full_significant sig \
       WHERE pf.pfamA_id='PF...' \
       AND sig.auto_pfamA=pf.auto_pfamA \
       AND seq.auto_pfamseq=sig.auto_pfamseq;

Building the universal phylogenetic tree

Procedure

Personal tools
Namespaces
Variants
Actions
Site
Choi lab
Resources
Toolbox