Numerous options are available for converting data to compatible sequence file formats such as FASTQ files, and for downstream analysis of sequencing data. Illumina sequencers are designed so data can be easily streamed into BaseSpace Sequence Hub for cloud-based data management, analysis, and collaboration.
On-premise options are also available. And for users interested in additional data analysis options, raw data files are provided in sequence file formats that are compatible, or easily converted, for use with other software platforms.
FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores. FASTQ files have become the standard format for storing NGS data from Illumina sequencing systems, and can be used as input for a wide variety of secondary data analysis solutions.
The MiniSeq and MiSeq Sequencing Systems provide the option to automatically convert data from BCL to FASTQ format, so separate conversion software is not required.
FASTQ ORA is a binary compressed file format of the text-based FASTQ sequencing data file format. fastq.ora files are up to 5x smaller than their corresponding fastq.gz files without compromising data integrity. All fastq.ora files can be read using the free decompression software available here. Once installed, a simple command can be used to directly pipe the output of decompression on the fly into a wide range of popular mapping tools such as BWA,1 STAR,2 and Bowtie.3Learn More About FASTQ Files
The binary base call (BCL) sequence file format requires conversion to FASTQ format for use with user-developed or third-party data analysis tools. The NextSeq, HiSeq, and NovaSeq Sequencing Systems generate raw data files in BCL format.
The DRAGEN Bio-IT Platform offers rapid BCL conversion to FASTQ files as part of its suite of pipelines.
Illumina also offers bcl2fastq Conversion Software to convert BCL files to FASTQ files. bcl2fastq is a standalone conversion software solution that demultiplexes data and converts BCL files to standard FASTQ file formats for downstream analysis.
FASTQ files are the typical starting format for sequencing data analysis. However, BaseSpace Sequence Hub can create other file formats that are common to secondary and tertiary analysis programs.
During secondary or tertiary analysis of NGS data, software platforms and apps in the BaseSpace Informatics Suite will often convert raw sequence files from FASTQ files to other sequence file formats (ie, .vcf, .bam) as part of the analysis workflow.