
platform for implementing kernels. For this reason,
they have been extensively explored recently (Wang
et al., 2021)(J
¨
ager and Krems, 2023).We are inter-
ested in Quantum Embedding Kernels (QKEs), which
use quantum circuits to embed data points into a
high-dimensional Hilbert space. The overlap between
these quantum states is then used to compute the in-
ner product between data points in this feature space.
Typically, these kernels are parameterized, making
them a form of VQCs. The parameters of these cir-
cuits are optimized based on Kernel Target Alignment
(KTA), which serves as a metric to align the kernel
with the target task (which we will consider to be bi-
nary classification) (Hubregtsen et al., 2022). Once
the kernel is optimized, it is fed into an SVM to de-
termine the optimal decision boundary for the classi-
fication task at hand. This integration leverages the
strengths of both quantum and classical computing,
aiming to enhance classification performance due to
the expressive power of quantum embeddings com-
bined with the robust framework of SVMs.
However, the method scales poorly. For instance,
using the KTA as cost function requires calculat-
ing the kernel matrix at every single training step, a
process that scales quadratically O(N
2
) with train-
ing dataset size N. To alleviate this, both the origi-
nal paper (Hubregtsen et al., 2022) and a following
one (Sahin et al., 2024) propose using only a subsam-
ple of size D ≪ N of the training dataset to compute
the KTA at each epoch, effectively turning the com-
putation, which becomes O(D
2
), independent of N.
Moreover, another paper (Tscharke et al., 2024) pro-
poses using a clustering algorithm to find centroids
of the classes and then computing the kernel matrix
with respect to these centroids at each training step,
bringing the complexity of the computation to O(N).
So, these methods are able to decrease the complex-
ity of training from scaling quadratically with N to
either scaling only linearly or even being completely
independent of N. Nonetheless, the end goal is for
the optimized kernel matrix, which contains the pair-
wise inner-products between all training data points,
to be fed into the SVM. Thus, at least for this final
computation, these methods still require O(N
2
) com-
putations. This is especially important if we consider
that each pairwise inner-product requires a quantum
circuit execution.
In this paper, we adapt a low-rank matrix approx-
imation known as the Nystr
¨
om Approximation, which
is commonly used in classical kernel methods, to ad-
dress this scalability issue. This approach reduces
the complexity of computing the kernel matrix from
O(N
2
) to O(NM
2
), where M ≪ N is an hyperparam-
eter that determines the quality of the approximation.
The Nystr
¨
om Approximation is applied for comput-
ing the optimized kernel matrix that is fed into the
SVM, resulting in a classification pipeline that scales
linearly with the training dataset size N in all steps.
Consequently, our method facilitates the efficient ap-
plication of quantum kernel methods to industrially-
relevant problems.
The contributions of this paper are organized as
follows:
• Adapted the Nystr
¨
om Approximation to Quantum
Embedding Kernels
• Empirically verified (on several synthetic
datasets) that, in a noiseless setting, the Nystr
¨
om
Approximation method allows for quantum
kernels with reduced quantum circuit executions
at a small cost in the accuracy of the resulting
SVM.
• Empirically tested the performance of the devel-
oped method under both coherent and incoherent
noise.
2 KERNEL METHODS AND
SUPPORT VECTOR MACHINES
Kernel methods can be used for different tasks, from
dimensionality reduction (Sch
¨
olkopf et al., 1997) to
regression (Drucker et al., 1996). We will focus on bi-
nary classification using SVMs, which are linear clas-
sifiers (Hearst et al., 1998). Nonetheless, they can be
used in non-linear classification problems due to the
kernel trick, which implicitly maps the data into high-
dimensional feature spaces where linear classification
is possible. In this section we will go through this
pipeline for classification, starting with kernel meth-
ods and ending with SVMs.
2.1 Kernel Methods
Consider a dataset X = {(x
i
, y
i
)}
n
i=1
, where x
i
∈ R
d
,
with d being the number of features of x
i
. A kernel
method involves defining a feature map φ : R
d
→ H ,
where H is a high-dimensional Hilbert space. The
kernel function can then be defined as:
k(x
i
, x
j
) = ⟨φ(x
i
), φ(x
j
)⟩
H
(1)
Put in words, the kernel function computes the
inner product between the inputs x
i
and x
j
in some
high-dimensional feature space H . Given the kernel
function k, the kernel matrix K is a symmetric matrix
that contains the pairwise evaluations of k over all the
points in the training dataset X:
QAIO 2025 - Workshop on Quantum Artificial Intelligence and Optimization 2025
764