Evolutionary age
From CSBLwiki
(Difference between revisions)
(→Evolutionary age of protein domains) |
(→Data set) |
||
Line 16: | Line 16: | ||
mysql -u user -p | mysql -u user -p | ||
mysql>create DATABASE pfam24; \q | mysql>create DATABASE pfam24; \q | ||
- | mysql -u user -p | + | mysql -u user -p pfam24 < FULL_PATH/pfamseq.sql |
mysql -u user -p | mysql -u user -p | ||
mysql>use pfam24 | mysql>use pfam24 | ||
- | mysql>load data local infile 'pfamseq.txt' into table pfamseq FIELDS ENCLOSED BY | + | mysql>load data local infile 'pfamseq.txt' into table pfamseq FIELDS ENCLOSED BY "\'"; |
</pre> | </pre> | ||
- | *MySQL로 로딩시, 테이블 작성 순서대로 할것 - 에러발생 (작성하지 않은 테이블의 키인덱스 링크 [http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html 관련 키워드] | + | *MySQL로 로딩시, 테이블 작성 순서대로 할것 - 에러발생 (작성하지 않은 테이블의 키인덱스 링크 [http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html 관련 키워드]) |
+ | **loading time: a few hours | ||
+ | |||
*Tree data | *Tree data | ||
**[http://www.arb-silva.de/ Silva]: SSU rRNA database | **[http://www.arb-silva.de/ Silva]: SSU rRNA database |
Revision as of 05:55, 13 August 2010
Evolutionary age of protein domains
(Based on this reference)
Error fetching PMID 16959887:
- Error fetching PMID 16959887:
Data set
- The Pfam database
- ftp current_release
- User manual (the pfam format)
- Tip: use Mysql dump <- easy to handle the content
# download total DB (estimated ~2 days) wget -c ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database.tar &
- Loading into MySQL
- Not all DBs were imported (only a few key DBs for this study)
- pfamseq, genome_seqs, ncbi_taxonomy, genome_species, pfamA, gene_ontology
mysql -u user -p mysql>create DATABASE pfam24; \q mysql -u user -p pfam24 < FULL_PATH/pfamseq.sql mysql -u user -p mysql>use pfam24 mysql>load data local infile 'pfamseq.txt' into table pfamseq FIELDS ENCLOSED BY "\'";
- MySQL로 로딩시, 테이블 작성 순서대로 할것 - 에러발생 (작성하지 않은 테이블의 키인덱스 링크 관련 키워드)
- loading time: a few hours
- Tree data
- Silva: SSU rRNA database
Procedure
- Extract the taxonomic origins of each protein
- Get the Taxonomy info. of each origin
- Non-redundant (NR) set of taxonomic origins
- Collect all Small Subunit (SSU) rRNA sequences of the NR set
- Build the universal tree of life using SSU sequences
- Mapping the each protein belonging to a given domain into the universal tree
- Check which node is the most recent common ancestor (MRCA) node
- Calculate the branch length between MRCA and LCA (last common ancestor)