FASTQ 格式说明

最新推荐文章于 2026-04-17 11:26:40 发布

原创最新推荐文章于 2026-04-17 11:26:40 发布 · 7.1k 阅读

本内容遵循CC 4.0 BY-SA版权协议

3 篇文章

订阅专栏

FASTQ是高通量测序数据的标准格式，用于存储核酸序列及其质量信息。每条序列由四行描述：ID信息、序列、描述信息和质量评价。序列ID包含唯一标识，质量评价与序列字符一一对应。

FASTQ是一种存储了生物序列（通常是核酸序列）以及相应的质量评价的文本格式。

目前几乎是高通量测序数据的标准格式。

FASTQ格式每四行描述一条测序序列信息：

第一行由'@'开始，后面跟着序列的ID信息，这点跟FASTA格式是一样的。

第二行是序列。

第三行由'+'开始，后面也可以跟着序列的描述信息。

第四行是第二行测序序列的质量评价，字符数跟第二行的序列是相等的，一一对应。

注：序列的ID信息，是这条序列的唯一标识，包含信息如下：

例1：@HWUSI-EAS100R:6:73:941:1973#0/1

HWUSI-EAS100R	the unique instrument name
6	flowcell lane
73	tile number within the flowcell lane
941	'x'-coordinate of the cluster within the tile
1973	'y'-coordinate of the cluster within the tile
#0	index number for a multiplexed sample (0 for no indexing)
/1	the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

例2：@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG

EAS139	the unique instrument name
136	the run id
FC706VJ	the flowcell id
2	flowcell lane
2104	tile number within the flowcell lane
15343	'x'-coordinate of the cluster within the tile
197393	'y'-coordinate of the cluster within the tile
1	the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
Y	Y if the read fails filter (read is bad), N otherwise
18	0 when none of the control bits are on, otherwise it is an even number
ATCACG	index sequence