Detection of Somatic Mutations in Tumors

109 358
Detection of Somatic Mutations in Tumors

Results



Twenty tumor samples were analyzed for EGFR mutations and a second set of 25 tumors for BRAF and KRAS mutations. These samples had previously been screened for somatic mutations at the known mutational hotspots in codon 600 of BRAF; codons 718, 744, 752, 767, 773, 789, 857 and 860 of EGFR; and codons 11, 12, 13 and 61 of KRAS. In this way, it was possible to compare the known genotypes with those identified using the AgileSMPoint and AgileSMAll programs.

Sensitivity of the Analysis



As AgileSMAll screens all the positions within a PCR product >6 bases from a primer, it detected a number of sequence variants not identified either by the earlier diagnostic screening or by AgileSMPoint (which only screened specified positions of known pathological importance). Most of these were present in 30% or more of the total reads, and were known SNPs. Interestingly, the remaining variants not identified by the previous analysis were present in only one sample of the EGFR cohort. This suggests that this sample may have had an intrinsically higher experimental error, possibly due to chemical modification of the DNA as a result of the formalin fixation process. Although AgileSMPoint and AgileSMAll identified all the variants detected by the prior diagnostic screening, AgileSMPoint also detected the presence of four extra variants occurring at known mutational hotspots and present in approximately 1% of reads. We did not attempt to distinguish whether these were artifacts of PCR or formalin fixation.

When the distribution of the proportion of base calls that differ from the reference sequence at each non-primer position in the BRAF and KRAS amplicons was examined (Supplementary Table S5 and Supplementary Figure S6 http://www.nature.com/labinvest/journal/v94/n10/suppinfo/labinvest201496s1.html), it could be seen that most positions were associated with non-reference sequence base calls. The distribution of these non-reference calls suggests that it will not be possible to discern if a low allele fraction variant (<2% of reads) identified by AgileSMPoint and AgileSMAll is a biologically genuine mutation, or the result of experimental artifact (created by formalin fixation, PCR error or sequencing error).

Optimum Read Depth



The EGFR and BRAF — KRAS data sets consisted of 125 (25 × 5) and 60 (20 × 3) amplicons, respectively. At this degree of multiplexing, very high read depths were obtained for all of the amplicons. As stated above, this does not necessarily increase sensitivity. Therefore, we performed a series of in silico experiments where the number of reads used in the analysis was reduced. These experiments suggested that it is possible to consistently identify variants at read depths of approximately 2000 reads. However, we found that this analysis was confounded by the difficulties of creating an equimolar pool of amplicons to be sequenced. As can be seen from Table 1 and Table 2, the read depths vary several-fold between different amplicons. This suggests that difficulties in creating an equimolar pool of amplicons are a practical concern when choosing the number of samples to multiplex per lane. When pooling more than ~100 amplicons per lane, the time spent in equalizing the representation across samples becomes prohibitive, unless a robotic solution is used.

It can also be seen that the number of reads flagged as identifying a specific sequence variant differs between AgileSMPoint and AgileSMAll. This reflects their different approaches to identifying, which amplicon a read represents and whether or not a read originates from a pseudogene sequence. As there are no pseudogenes for EGFR, differences in read depths in this data set are a direct consequence of the method each program uses to identify the origin of a read. AgileSMAll detects a slightly higher number of reads per variant as it uses the 5′ part of a read to deduce its origin. This tends to have higher base-calling quality scores than the sequences used by AgileSMPoint. However, if primers of low synthesis quality and purity are used, the aberrant primer sequences in the amplicon hinder AgileSMAll's ability to identify its origin and can have a major effect on the read depth identified by AgileSMAll.

When screening the BRAF and KRAS data sets, which could also contain reads from pseudogene sequences, all the variants were found to have a very similar proportion of supporting reads. This suggests that both programs were equally effective at distinguishing reads originating from the pseudogenes. If the analysis was repeated using amplicon descriptions that lacked information on the divergent positions between the gene and pseudogene, the read depth at each variant position noticeably increased, with a corresponding decrease in the proportion of reads supporting the variant. This suggested that both programs were discounting a large number of pseudogene-derived reads. Manual examination of the retained and discarded reads could not quantify the efficiency with which the reads were filtered, but the similarity of the variant read depth data reported by the programs when filtering out the pseudogene sequences suggested that both filtering mechanisms were robust.

Comparison of Somatic Variant Detection Using Aligned and Unaligned Sequence Data



When the sequence variant data sets produced by AgileSMall and AgileSMPoint are compared with the sequence variants identified using the BWA/VarScan pipeline (Supplementary Tables S6 and S7 http://www.nature.com/labinvest/journal/v94/n10/suppinfo/labinvest201496s1.html), it can be seen that BWA/VarScan detected all single base substitutions when the variant allele was present in >5% of the total number of reads. However, the BWA/VarScan pipeline did not identify any of the large indel variants present in the EGFR data set.
Subscribe to our newsletter
Sign up here to get the latest news, updates and special offers delivered directly to your inbox.
You can unsubscribe at any time

Leave A Reply

Your email address will not be published.