Fastq是二代测序中常用的原始序列文件格式。相比于FASTA文件多了质量(q)参数;
解压fastq.gz 用 gunzip [filename]
查看fastq文件用 less -S 【filename】
R1 reads
@ST-E00298:149:HWCHHCCXX:6:1101:10571:2891 1:N:0:CTCTCTAC
TTTAAGAACAGCCCTCCCATCTTAGCAATGTCCCGGGGTGGCTGGAGCCACGGTCACTTCTTGGTCCTGGTCCAGAACTGTCGGTAGCGCTCCACATGCAAGTCATCACTGAGCTCCTGCTCGTACTCCTTCCTCGACTGGAGCTGAGCC
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKKKFKKK
R2 reads
@ST-E00298:149:HWCHHCCXX:6:1101:10571:2891 1:N:0:CTCTCTAC
TTTAAGAACAGCCCTCCCATCTTAGCAATGTCCCGGGGTGGCTGGAGCCACGGTCACTTCTTGGTCCTGGTCCAGAACTGTCGGTAGCGCTCCACATGCAAGTCATCACTGAGCTCCTGCTCGTACTCCTTCCTCGACTGGAGCTGAGCC
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
示例:
@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
BBBBCCCC?:
因此@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG可以解释为:
测序仪id为EAS139
run number:136
flowcell ID:FC706VJ
lane:2
tile:5
x_pos:1000
y_pos:12850
read:1,代表是单端测序
is filtered:Y,代表是filtered
control number:18
index sequence:ATCACG
Phred quality score是用来测定DNA自动测序中每个核算的测序质量的。它最早是为了人类基因组计划中的程序Phred base calling而开发的。
公式
Q = -10 * log10(P) <==> P = 10 ^(-Q/10)
这里Q为Phred quality score,P为base-calling的error probabilities(错误率)。
ASCII
ASCII(American Standand Code for Information Interchange),是一套计算机编码系统。