登录  
 加关注
查看详情
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Black

Life is a travel

 
 
 

日志

 
 

About www.FlyTF.org (z)  

2011-03-11 11:16:32|  分类: Bioinformatics |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

About www.FlyTF.org - The Drosophila Transcription Factor Database

This database contains information on the manual curation of 1052 FlyBase identifiers, which are putative site-specific transcription factors, based on FlyBase/Gene Ontology annotation or theDBD Transcription Factor Database.

Authors: Boris Adryan and Sarah A. Teichmann



Background
Although the published sequence of the fly genome has been available for many years now, it is still difficult to name the number of exact number site-specific transcription factors (TFs). The major obstacles in counting those factors lie within the experimental and computational identification of TFs, as well as in the lack of a unifying resource for the community. We sought to address this challenge by combining powerful computational methods with a vigorous manual curation and literature study. The results of this attempt can be seen on 
www.FlyTF.org.

Curation

Seeding.
We first seeded a list of proteins that were putative transcription factors, identified by one or both of the two following approaches:

  • GO. Transcription factors were identified using the Gene Ontology annotation (September 2005) from FlyBase. We counted those FlyBase identifiers(FBgn) as candidate TFs which had GO annotations as described in theSupplementary Material.

  • DBD. A long standing interest of our group is the structural identification of transcription factors (for background, see DBD, the DNA Binding Domain Transcription Factor Database, Kummerfeld & Teichmann, 2006). The TFs identified by this resource are detected using hidden Markov models of a manually curated list of structurally determined DNA binding domains. Using this strategy, based on an evaluation using a large set of annotated protein sequences, we find that it is extremely accurate (97% correct) and has good coverage (65% identification rate). We used DBD v1.2 for this analysis, which yielded 591 candidates.

There was generally a good overlap between the two approaches. In total, 1052 proteins were primary candidate TFs.

Reading & Verdict.
We focused on two separate aspects of the molecular function of transcription factors: DNA binding and transcription regulatory properties. We assessed the evidence for these two properties of each candidate transcription factor by using FlyBase annotation (mostly the sections on Gene Ontology and References), which also included the review of evidence stated for the GO annotation. Assignments to GO were not treated as evidence as such, but rather as pointers to the literature. Those assignments by evidence of sequence similarity or electronic annotation would only be accepted if the carefully benchmarked predictions (DBD) or experimental evidence from the literature were in favour for this annotation. Additional information was retrieved from PubMed or the iHOP search tool. 
Curated data from Casey Bergman's Drosophila DNase I Footprint Database (v2.0) at www.flyreg.org and the data-mining projectFlyMine was included in our annotation.

  • DNA binding. The curator’s verdict for this property can be either ‘yes’, ‘no’ or ‘maybe’. The annotation is ‘yes’, if the protein itself or an orthologue in a different species was shown to bind DNA in an experimental assay. In a few rare cases, the annotation is also ‘yes’ on the basis of non-traceable authors statements from the GO entry, or in case of the Homeodomain, simply based on electronic annotation. We recorded the PubMed ID of each reference, and cite a key sentence from the reference such as for example "...this experiment shows the binding of X to DNA..." or "...protein Y binds to a hexamer of CTGCTG...", or the reference to the DNase I Footprint Database FlyReg. In cases where there was no direct evidence for DNA binding, but no contradicting evidence either, we noted the possibility by entering ‘maybe’. There were a variety of cases where there was good evidence that the candidate protein is not DNA binding, or DNA binding but certainly not a TF, and these are annotated as ‘no’.
  • Site-specific TF. The curator’s verdict for the three subsequent properties are concerned with the mode of regulation of the TF rather than its DNA binding. They can be either ‘yes’ or ‘no’. For ‘Site-specfic TF’, the verdict ‘yes’ is only granted if there is evidence for transcriptional regulatory activity, either alone or in complex. The large majority of DNA binding proteins with extensive experimental support are site-specific TFs. This means that they regulate one or more specific individual target genes, and this category is best described by GO term GO:0003705 ‘RNA polymerase II transcription factor activity, enhancer binding’.
  • Putative short-range TF. Transcription factors with little or no experimental evidence for their regulatory activity, or classified as ‘maybe’ for DNA-binding, are annotated as ‘yes’ for ‘Putative short-range TF’. Most of the factors contained in this category belong to the GO category GO:0003700 ‘transcription factor activity’.
  • Known or putative long-range TF. Some TFs do not bind in the direct vicinity of their target genes, but instead act on an entire molecular neighborhood. We annotated these proteins as ‘long-range TFs’. Examples of proteins in this group are insulator proteins such as CTCF (Moon et al., 2005) and Su(Hw) (Golovnin et al., 2003). This group is best described by GO category GO:0006355 ‘regulation of transcription, DNA-dependent’, in contrast to proteins that are involved in chromatin remodelling, like histone deacetylases etc.
    Moon,H., Filippova,G., Loukinov,D., Pugacheva,E., Chen,Q., Smith,S.T., Munhall,A., Grewe,B., Bartkuhn,M., Arnold,R. 
    et al. (2005) CTCF is conserved from Drosophila to humans and confers enhancer blocking of the Fab-8 insulator. EMBO Rep.6, 165-170.
    Golovnin,A., Birukova,I., Romanova,O., Silicheva,M., Parshikov,A., Savitskaya,E., Pirrotta,V. and Georgiev,P. (2003) An endogenous Su(Hw) insulator separates the 
    yellow gene from the Achaete-scute gene complex inDrosophilaDevelopment 130, 3249-3258.

We documented the DNA binding property particularly carefully and consistently. The evidence for transcriptional regulatory activity on the other hand was not curated exhaustively for the more obvious cases. For TFs predicted by DBD only for which there was no further experimental data, the transcriptional regulatory activity was inferred by the structural assignment to a commonly known transcription factor DNA binding domain, and they are annotated as ‘maybe’ for the DNA binding property and 'yes' for the putative short-range TF property. The C2H2 zinc finger domain family is an exception to this, since there is currently no known method for reliable functional assignment to either DNA or RNA binding or protein interaction, so they are annotated as ‘maybe’ for the DNA binding property, but no verdict is reached for their potential role in transcription.

Database search tips
The entire set of the 1052 originally seeded candidate TFs is available on www.FlyTF.org. The individual researcher can browse catalogues (all candidates, putative TFs, definitive TFs, not a TF) and filter for those criteria which appear most important for them.

Quick guide to the Advanced Search tool

  1. You can select your filter conditions,
  2. press the 'Please find appropriate TFs' button,
  3. browse the list,
  4. and select FlyBase identifiers for further review.

Filtering
You can query our database for a variety of parameters, depending on what 
you expect from the candidate TFs. A couple of exemplary queries should make the use clear:

  • The predefined catalogues available from the www.FlyTF.org home page were compiled using the Advanced Search tool.

      (1) all candidates >>,

      • This search reveals the entire set of 1052 genes (FlyBase identifiers) evaluated in this study.
      • Filter set: Curator's verdict: none checked. GO term: none checked. TF DB: 'none'. DBD domain architecture: 'none' and 'none'.

      (2) putative and known site-specific TFs >>,

      • This is possibly the least critical filter for 'true' site-specific transcription factors. It evaluates our verdict on the DNA binding capability of the protein, which must be proven, but is not as rigid with our views on it being a transcription factor. For example, this may include zinc finger proteins with the footprint of a sequence-specific TF, for which no further evidence is available.
      • Filter set: Curator's verdict: DNA binding 'YES' and ''maybe', the rest see above (1).

      (3) well supported site-specific TFs >>,

      • This is essentially the same as the above list, but requires some evidence that the protein is indeed a transcription factor. Please note that for many proteins pure existence of a DNA binding domain (except zinc finger domains!) was rated as a proof.
      • Filter set: See above (2), but DNA binding/sequence specific TFlikely TF, short range and likely TF, long range are 'YES'.

      (4) not a TF in our sense >>,

      • This search retrieves all candidate TFs that are not site-specific TFs in our sense.
      • Filter set: See above (1), but DNA binding is 'NO'.
  • If you know the name of your protein of interest: enter its identifier (FBid, CG number, symbol or name) in the field 'must have identifier'. If this field contains text, no other parameters will be evaluated.
  • If you want to see those with a TF-related GO term, just tick that option.
  • You can retrieve a list of all zinc finger proteins identified by DBD by choosing 'zf-C2H2' from the 'DNA binding domain' pull-down menu. Likewise, you can see all HLH-PAS domain proteins by selecting '(HLH) (PAS)' from the simplified architecture' pull-down menu. Please note that the selection of a simplified architecture overrides the selection of a DNA binding domain.

After you have entered your criteria, hit 'Please find appropriate TFs' and wait for the 'list view', which will appear updated in a separate window.

List view
After querying the database, the list will contain summary information on 
your candidate TFs. The number of genes (FlyBase identifiers, FBgn) will be displayed on top of the list. Further information includes: FBid, Symbol/Name, Synonym, Architecture (as identified by DBD), Curator's verdict, GO terms (only those for DNA binding or transcription related ones), TF DB (presence in TF databases). The click on the FBid of the candidate TF will open a detailed compilation of data on this protein.

Candidate view
This is a compilation of data on the candidate TF. Most of the information from the list view is repeated. For your convenience, we do provide you with the sequences of proteins encoded by the gene. A click on the FlyBase ID opens their Web site in a separate window. The 'meaningful phrase' is the evidence that we documented for the call on the DNA binding capabilities. A click on the '
P' opens the appropriate abstract on PubMed.

The GO annotation as it appears in FlyBase is shown as full GO term.
The colour code represents the degree of evidence:

  • 'Inferred by curator' or 'Inferred by direct assay' are black,
  • traceable author statements and computational peer-reviewed information is blue (we tend to trust those four evidence levels),
  • annotation based on sequence similarity or electronic annotation is displayed in red (we tend to question this information),
  • and all other evidence codes are shown in grey.

The single letters are links to the respective references in 'G' (GO), 'F' (FlyBase), or 'P' (PubMed).


From: http://www.mrc-lmb.cam.ac.uk/genomes/FlyTF/info.html

  评论这张
 
阅读(858)| 评论(0)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018