When analyzing NGS data, it's always beneficial to use quality control tools like FastQC to check the quality of the data. In this post, I summarized cases where we can use diagnostic plots from FastQC to solve problems.

Segementation of this article is based-on the output from FastQC, but criteria or cases mentioned here should also be applicable to similar measurements produced by other tools.

Per base sequence quality

No cases yet.

Per sequence quality scores

No cases yet.

Per Base Sequence Content (PBSC)

PBSC aggregates the percent of each base position in a fastq file for which each of the four normal DNA bases has been called. In a random library, you would expect that lines to be parallel with each other (around 25%), as there would be little to no difference between the different bases of a sequence run. The actual distributions may have some fluctuations, depending on the overall amount of bases in the genome you study and the capturing bias from the assay, but in most cases there should not be huge bias.

PBSCs can be used as diagnostics for:

  • biased fragments, like:
    • untrimmed barcodes. For demultiplexed libraries, if there are untrimmed barcodes, then because of the fixed sequences, you would observe sharp peaks at the beginings or ends of reads. Below is an example showing that 5' barcodes (TGGTCAC) are not trimmed: barcode TGGTCAC
    • template switching oligo. In cases like this, you can observe characteristic GGG or CCC near to the begining of reads.
  • overrepresented sequences, like adapter dimers or rRNAs.

Whitelists:

  • For libraries treated with sodium bisulphite, which will convert C to T, then percent of Cs will be very low.

Per sequence GC content

No cases yet.

Per base N content

No cases yet.

Sequence Length Distribution

No cases yet.

Sequence Duplication Levels

No cases yet.

Overrepresented sequences

No cases yet.

Adapater Content (AC)

For libraries where a significant amout of the inserts are shorter than sequencing length (like PROcap, PROseq, etc), then adapters are likely to be incorporated in final reads. AC module compares reads with commonly used adapter sequences, and plots the enrichment. Adapter sequences may have large effect on sequencing alignments, so if you see warnings in this section, you may need to trim adapters with cutadapt, fastp or anyother tool.

Reference: FastQC manual