登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

Black

Life is a travel

 
 
 

日志

 
 

Some common file formats in MEGA  

2010-01-02 19:00:27|  分类: Bioinformatics |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

COMMON FILE CONVERSION ATTRIBUTES

The default input formats are determined by a file’s extension (e.g., a file with the extension of ".ig" is initially assumed to be in "IG" input format).However, you have the option to specify any format for any file; the file extension is simply used as an initial guide.  Note that the specification of an incorrect file format most often results in an erroneous conversion or other unexpected error.

CLUSTAL Format

The sequence alignment outputs from CLUSTAL software often are given the default extension .aln. CLUSTAL is an interleaved format. In a page-wide arrangement the sequence name is in the first column and a part of the sequence’s data is right justified. An example of the CLUSTAL format follows: Some common file formats in MEGA - 斯克莉 - 斯克莉的魔法盒

  The CLUSTAL file above would be converted by MEGA into the following format: Some common file formats in MEGA - 斯克莉 - 斯克莉的魔法盒

MEGA Format

For MEGA to read and interpret your data correctly, it should be formatted according to a set of rules. All input data files are basic ASCII-text files, which may contain DNA sequence, protein sequence, evolutionary distance, or phylogenetic tree data. Most word processing packages (e.g., Microsoft Word, WordPerfect, Notepad, and WordPad) allow you to edit and save ASCII text files, which are usually marked with a .txt extension. After creating the file, you should change this extension to .meg, so that you can distinguish between your data files and the other text files. Because the organizational details vary for different types of data, we discuss the data formats for molecular sequences, distances, and phylogenetic trees separately. However, there are a number of features that are common to all MEGA data files.

Alignment format

When working in MEGA’s Alignment Explorer you can choose to save the current state of all data and settings in the alignment explorer to a file so you can archive your work, or save it to resume editing in the future. An alignment session is a binary file format that is saved with the .MAS file extension.   

TreeView Document format

TreeView is a simple program for displaying phylogenies on Apple Macintosh and Windows PCs. TreeView provides a simple way to view the contents of a NEXUS, PHYLIP, Hennig86, Clustal, or other format tree file. While PAUP and MacClade have excellent tree printing facilities, there may be times you just want to view the trees without having to load the data set they were generated from. The PHYLIP package contains tree drawing programs which offer a greater variety of trees than TreeView, but are somewhat clumsy to use. The forthcoming PAUP* for Windows does not have a graphical interface, hence TreeView allows you to create publication quality trees from PAUP files, either directly, or by generating graphics files for editing by other programs.

FASTA format

A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA format is:

>gi|532319|pir|TVFV2E|TVFV2E envelope protein

ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCKMDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXXXXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNLLAAVEAQQQMLKLTIWGV

The FASTA file format is very simple and is quite similar to the MEGA file format.

PHYLIP (interleaved) Format

The PHYLIP format is interleaved, similar to the MSF format. It consists of a line of numeric data, which is ignored by MEGA, followed by a group of one or more lines of text. The text begins with a sequence name in the first column and is followed by the initial part of each sequence; the group is terminated by a blank line. The number of lines in subsequent groups of data is similar to the first group. Each line is a continuation of the identified sequence and begins in the same position as in the first group. The following might be observed at the beginning of a PHYLIP data file: 

Some common file formats in MEGA - 斯克莉 - 斯克莉的魔法盒  

MEGA would convert this data as follows:

Some common file formats in MEGA - 斯克莉 - 斯克莉的魔法盒 

References:

http://www.ncbi.nlm.nih.gov/blast/fasta.shtml

The help file of MEGA 4.1 (Beta 3 )

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

 

  评论这张
 
阅读(1015)| 评论(0)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018