throughput strategies are recommended. Indeed, the
combination of large genomic dataset, obtained by
massive analysis of multiple genes with methods of
next-generation sequencing (NGS), with clinical and
morphological findings and MRI results has
increased our chances to reach a precise molecular
definition in CMD and CM (Savarese et al., 2016).
The application of NGS platforms generates an
unprecedented amount of data, and this makes
management, storage and, above all, analysis of the
data a real challenge (Pop et al., 2008). This amount
of data is such that an interconnected system
(pipeline data) with very high operational capacity is
required to allow its management and processing (Li
et al., 2008). Moreover, targeted NGS platforms
offer sufficient depth of “coverage”, molecular
definition of causative variants but also a plethora of
variants that are not clearly pathogenic per se but
may have a modifying effect on the phenotype.
Whilst several public and commercial tools are
available to prioritize rare gene variants emerging in
NGS studies of CMD and CM (Savarese et al., 2016;
Astrea et al., 2018) and attribute causality to a
specific clinical condition, how the myriad of
additional less rare or frequent gene mutations
contribute to a specific disorder remain largely
unexplored.
In this manuscript, we designed a novel targeted
gene panel (MotorPlex7.0) able to analyze massively
over 200 genes in a subset of CM and CMD patients
and elaborate the resulting set of data using non
metric multidimensional scaling (nMDS), a
multivariate data mining algorithm that uses the
information about the specific variants found in each
patient to (i) compare CM and CMD groups of
patients; (ii) identify groupings of patients; (iii)
identify if specific genes or variants can cluster and
be associated with clinical manifestations.
2 METHODS
We genotyped a sample of 159 patients (71 men and
56 women), with a clinical and morphological
diagnosis of CM (127) and CMD (32) (see Figure 1
for details) using MotorPlex7.0 (Savarese et al.,
206), a validated targeted gene panel containing 241
muscular genes (for a total of 1.287 Mbp of DNA)
designed with the SureSelect technology (Agilent,
Santa Clara, CA). Among the CM patients, clinical
criteria fully met a definition of “congenital
weakness and slow muscle disease progression”
(North KN et al., 2014) in 72 cases whereas 54
patients had less specific clinical features
overlapping other neuromuscular conditions or not
sufficient data to define a CM disorder (“not specific
myopathies”).
Figure 1: Distribution of patients based on clinical
diagnosis.
Aligning, call, and interpretation for the analysis
of the data, were performed using the following
softwares: SureCall (Algilent) for the assembly and
alignment phase, and Ingenuity Variant analysis
(QIAGEN, Hilden, Germany) and wANNOVAR
(wannovar.wglab.org) for the variant calls phase and
interpretation. The following criteria had to be met
to reach a judgment of sequence accuracy: a quality
score greater than 30 and a coverage of at least 80
reads. Freely available softwares (PolyPhen 2,
http://genetics.bwh.harvard.edu/pph2/, and SIFT,
http://sift.jcvi.org/) were used to predict the
pathogenic effect of gene mutations. The MAF
(minor allele frequency) was calculated referring to
allele frequencies in several open-access population-
based gene variant polymorphic databases (gnomad.
broadinstitute.org, exac.broadinstitute.org/, www.
ncbi.nlm.nih.gov/projects/SNP; www.international
genome.org/1000-genomes-browsers/) and selecting
as rare variants those with an allele frequency of
0.1% (in an autosomal recessive or an X-linked
model of inheritance) and 0.01% (in an autosomal
dominant model of transmission.
The patients were divided into three subgroups
on the basis of the certainty of their molecular
diagnosis. The group with a “definite diagnosis”
contains patients with published pathogenic
mutations and presenting a clinical phenotype
compatible with the mutation identified. The group
with a “probable diagnosis” includes patients having
rare mutations considered to be pathogenic based on
in silico bioinformatic tools and showing clinical
manifestations matching a phenotype that has
already been linked to mutations in that specific
gene. Cases not matching the above criteria were
defined as “no diagnosis established”.