Positive-Unlabeled Learning Using Pairwise Similarity and Parametric
Minimum Cuts
Torpong Nitayanont
a
and Dorit S. Hochbaum
b
Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA, U.S.A.
Keywords:
Positive-Unlabeled Learning, Binary Classification, Pairwise Similarity, Parametric Minimum Cut.
Abstract:
Positive-unlabeled (PU) learning is a binary classification problem where the labeled set contains only posi-
tive class samples. Most PU learning methods involve using a prior π on the true fraction of positive samples.
We propose here a method based on Hochbaum’s Normalized Cut (HNC), a network flow-based method, that
partitions samples, both labeled and unlabeled, into two sets to achieve high intra-similarity and low inter-
similarity, with a tradeoff parameter to balance these two goals. HNC is solved, for all tradeoff values, as
a parametric minimum cut problem on an associated graph producing multiple optimal partitions, which are
nested for increasing tradeoff values. Our PU learning method, called 2-HNC, runs in two stages. Stage 1
identifies optimal data partitions for all tradeoff values, using only positive labeled samples. Stage 2 first ranks
unlabeled samples by their likelihood of being negative, according to the sequential order of partitions from
stage 1, and then uses the likely-negative along with positive samples to run HNC. Among all generated parti-
tions in both stages, the partition whose positive fraction is closest to the prior π is selected. An experimental
study demonstrates that 2-HNC is highly competitive compared to state-of-the-art methods.
1 INTRODUCTION
Positive-unlabeled (PU) learning is a variant of bi-
nary classification where labeled samples only come
from the positive class. Each unlabeled sample could
either belong to the positive or negative class. PU
learning is related to the one-class learning problem
in which the model is trained solely on the positive
labeled set, but unlabeled samples are not utilized
(Khan and Madden, 2014). PU learning is also related
to semi-supervised learning, where unlabeled sam-
ples are used in addition to the labeled set of samples
from both classes, giving better performances than
one-class learning methods (Lee and Liu, 2003; Li
et al., 2010). PU learning is a special case of semi-
supervised learning where no negative labeled sam-
ples are provided.
PU learning arises in contexts where negative
samples are difficult to verify or obtain, and when the
absence of positive label does not always imply that
the sample is negative. In personalized advertising (Yi
et al., 2017; Bekker and Davis, 2020), each advertise-
ment that is clicked is a positive sample. However, an
a
https://orcid.org/0009-0002-6976-1951
b
https://orcid.org/0000-0002-2498-0512
unclicked advertisement is regarded as unlabeled as it
could either be uninteresting (negative) or interesting
but overlooked (positive). In the identification of ma-
lignant genes (Yang et al., 2012; Yang et al., 2014a),
a limited set of genes have been verified to cause dis-
eases (positive) while many other genes have not been
evaluated (unlabeled). Other domains include fake re-
views detection (Li et al., 2014; Ren et al., 2014) and
remote sensing (Li et al., 2010).
A natural way to deal with the absence of negative
labeled samples is to identify unlabeled samples that
are likely negative, and train a traditional classifier us-
ing the positive labeled set and the likely-negative un-
labeled set (Liu et al., 2002; Li and Liu, 2003). An-
other common approach is to train a classifier on a
modified risk estimator, in which each unlabeled sam-
ple can be regarded as positive and negative with dif-
ferent weights. This idea has been adopted in differ-
ent learning methods such as neural network models
(Du Plessis et al., 2014; Du Plessis et al., 2015; Kiryo
et al., 2017), and random forest (Wilton et al., 2022)
with a modified impurity function. Most of these
methods rely on the prior information of the fraction
of positive samples, π, in the dataset.
The method that we propose here is based on a
network flow-based method called Hochbaum’s Nor-
60
Nitayanont, T. and Hochbaum, D.
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts.
DOI: 10.5220/0012948100003838
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2024) - Volume 1: KDIR, pages 60-71
ISBN: 978-989-758-716-0; ISSN: 2184-3228
Proceedings Copyright © 2024 by SCITEPRESS Science and Technology Publications, Lda.
malized Cut (HNC) (Hochbaum, 2010). HNC par-
titions samples into two sets to achieve high intra-
similarity within sets and low inter-similarity between
the two, with a tradeoff parameter that balances the
two goals. The problem was shown in (Hochbaum,
2010) to be solved, for all values of the tradeoff pa-
rameter, as a minimum cut problem on an associated
graph. This method was previously used as in binary
classification where both positive and negative la-
beled samples are available (Yang et al., 2014b; Bau-
mann et al., 2019). HNC is applicable in PU learning
since it does not require labeled samples from both
classes. Moreover, it makes use of unlabeled sam-
ples through their similarities with labeled samples
and among themselves, making it advantageous when
labeled data is limited.
As a transductive method, HNC predicts labels
only for the given unlabeled samples. This is differ-
ent from inductive methods that make predictions for
any unlabeled samples, whether they are the given un-
labeled samples, or unseen, separate set of unlabeled
samples. Indeed, HNC can be extended and used as
an inductive classifier.
The main contribution of this work is a new
method for PU learning that utilizes the unique fea-
tures of HNC in two stages, called 2-HNC. Stage 1
generates multiple partitions of data samples, corre-
sponding to different tradeoff values, efficiently, with
a parametric cut procedure. We infer from the se-
quence of partitions in stage 1 the likelihood of unla-
beled samples to be negatively labeled. Based on this,
stage 2 generates a set of likely-negative unlabeled
samples and apply HNC using both the positive sam-
ples and the likely-negative samples. Among all parti-
tions generated in both stages, the one whose fraction
of positive samples is closest to the given prior π is
selected as the prediction for unlabeled samples.
Additional and independent contribution here is
the method of extracting likely-negative samples from
the unlabeled set using results of stage 1. This method
has potential uses in settings other than PU learning.
Another contribution of this work is the consideration
of the intra-similarities of both positive and negative
prediction sets, in data partitioning. This is in con-
trast to past uses of HNC, such as in (Baumann et al.,
2019; Spaen et al., 2019; As
´
ın Ach
´
a et al., 2020),
where the scenario considered was to maximize the
intra-similarity of the positive prediction set only.
We show via experiments on real data that 2-
HNC outperforms leading methods, which include
two standard benchmarks, uPU (Du Plessis et al.,
2014; Du Plessis et al., 2015) and nnPU (Kiryo et al.,
2017), as well as a recent state-of-the-art tree-based
method, PU ET (Wilton et al., 2022).
2 RELATED WORKS
The main challenge of PU learning is the lack of neg-
ative labeled samples. A number of methods utilize a
preprocessing step to identify a set of unlabeled sam-
ples that are likely to be negative prior to training
a traditional binary classifier. For instance, the Spy
technique (Liu et al., 2002) selects a few positive la-
beled samples as spies and include them in the unla-
beled set, all of which are treated as negative. With a
binary classifier trained on this data, unlabeled sam-
ples with lower posterior probability than the spies are
considered likely to be negative. The Rocchio method
(Li and Liu, 2003) marks unlabeled samples that are
closer to the centroid of unlabeled samples than that
of positive labeled samples as likely negative. (Lu and
Bai, 2010) used Rocchio to also expand the positive
labeled set when a small labeled set is given.
Another common approach in recent works is to
train a model based on an empirical risk estimator,
modified in the context of PU learning. (Du Plessis
et al., 2014; Du Plessis et al., 2015) proposed uPU, an
unbiased risk estimator for PU data on which neural
network models are trained. (Kiryo et al., 2017) miti-
gates the overfitting nature of uPU via a non-negative
risk estimator in their state-of-the-art method known
as nnPU. There are also works on other classifiers,
besides deep learning models, that apply this simi-
lar idea such as a random forest model called PU ET
(Wilton et al., 2022), in which the impurity function
is modified for PU data. PU ET gives competitive
results, especially on tabular data type where deep
learning PU methods are not always effective.
There are methods, other than the above, which
rely on pairwise similarities between samples. In la-
bel propagation method of (Carnevali et al., 2021), a
graph representation of the data is constructed with
edge weights that reflect pairwise similarities. The
likelihood of being negative for each unlabeled sam-
ple is inferred based on its shortest path distance
on the graph to the positive labeled set. Labels are
then propagated from the positive and likely-negative
unlabeled samples to the remaining unlabeled ones.
(Zhang et al., 2019) presented a maximum margin-
based method that penalizes similar samples that are
classified differently. While methods like (Carnevali
et al., 2021; Zhang et al., 2019) utilize graph repre-
sentation of the data as well as pairwise similarities,
a network-flow based approach, which is a closely re-
lated area, has never been utilized in PU learning.
Hochbaum’s Normalized Cut or HNC
(Hochbaum, 2010) has been used in binary classi-
fication, where labeled samples from both classes
are given. It was shown to be competitive in many
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
61
applications (Baumann et al., 2019; Spaen et al.,
2019; Yang et al., 2014b). In this work, we devise
a variant of HNC for PU learning, called 2-HNC.
We compare 2-HNC to the following benchmarks:
uPU (Du Plessis et al., 2014), nnPU (Kiryo et al.,
2017) and PU ET (Wilton et al., 2022). uPU and
nnPU are selected as standard PU learning bench-
marks. nnPU exhibited competitive performance
consistently, mostly on image and text data. PU ET, a
recent state-of-the-art method, demonstrated leading
performance, particularly on tabular data where it
outperformed deep learning models. Similar to most
PU methods, the fraction of positive samples in the
data, or π, is given as a prior information for 2-HNC
and the benchmark methods.
3 PRELIMINARIES, NOTATION
AND HNC
3.1 Notations
Given a dataset V with a set of positive labeled sam-
ples L
+
and a set of unlabeled samples U, which is a
mixture of positive and negative samples, the goal is
to predict the label, or class, of each sample in U . We
formalize the PU-learning task as a graph problem.
Let the directed graph G = (V,A) represent the
data with V , the set of vertices that corresponds to
samples in the data, and A = {(i, j)|i, j V, i ̸= j} the
set of arcs that connect each sample pair. Arcs (i, j)
and ( j,i) that connect i and j carry the same capacity
weight w
i j
, which reflects the symmetry of pairwise
similarity of i and j.
3.2 Hochbaum’s Normalized Cut
(HNC)
Given a dataset, with the set of samples V and pair-
wise similarities w
i j
for i, j V , the goal of HNC
is to find a partition of V to two non-empty sets S
and
¯
S that optimizes the tradeoff between two ob-
jectives: high intra-similarity within the set S and
small inter-similarity between S and its complement
¯
S. We denote their inter-similarity by C(S,
¯
S), defined
as
iS, j
¯
S
w
i j
. The intra-similarity within S is de-
fined as C(S, S) =
i, jS,i< j
w
i j
. HNC, with a tradeoff
parameter µ 0, is the following problem:
(HNC+) minimize
SV
C(S,
¯
S) µ C(S, S) (1)
Because of the symmetry between S and
¯
S, the
problem can be alternatively presented for the trade-
off between the intra-similarity within
¯
S and the inter-
similarity between it and its complement.
(HNC-) minimize
SV
C(S,
¯
S) µ C(
¯
S,
¯
S) (2)
One might consider a variant of HNC that incor-
porates both intra-similarities, C(S, S) and C(
¯
S,
¯
S), as
a more generalized version of both HNC+ (1) and
HNC- (2). This variant, with two tradeoff weights
α 0 and β 0, is given as problem (3) below.
minimize
SV
C(S,
¯
S) α C(S, S) β C(
¯
S,
¯
S) (3)
However, as proved in the next lemma, problem (3)
is equivalent to either HNC+ or HNC-, depending on
the relative values of α and β.
Lemma 3.1. Problem (3) is equivalent to HNC+ (1)
when α β for µ =
αβ
1+β
, and is equivalent to HNC-
(2) when α < β for µ =
βα
1+α
.
Proof. C(V,V) is a constant, which we denote by W
V
,
and is equal to C(S,
¯
S) + C(S, S) + C(
¯
S,
¯
S) for any
nonempty S V . Hence, the objective function of
(3) can be written as C(S,
¯
S) α C(S,S) β(W
V
C(S,
¯
S) C(S,S)) = (1 + β)(C(S,
¯
S)
αβ
1+β
C(S,S))
βW
V
. Minimizing this function is equivalent to solv-
ing (1) with the tradeoff µ =
αβ
1+β
0 when α β.
Alternatively, the objective function of (3) can
be written as C(S,
¯
S) α(W
V
C(S,
¯
S) C(
¯
S,
¯
S))
β C(
¯
S,
¯
S) = (1 + α)(C(S,
¯
S)
βα
1+α
C(S,S)) αW
V
.
Hence, minimizing this objective is equivalent to
solving (2) with µ =
βα
1+α
0 when α < β.
Therefore, instead of solving (3) where the two
intra-similarities are shown explicitly, it is sufficient
to consider either HNC+ or HNC- depending on
whether we put more weight on the intra-similarity
of S, or of
¯
S. We note that in prior applications of
HNC to binary classification, e.g. (Yang et al., 2014b;
Baumann et al., 2019), the model was the one that
considered the intra-similarity in S only, as in HNC+.
Applying HNC in binary classification, when la-
beled samples from both classes are given, the goal
is to partition a data that consists of the positive and
negative labeled sets, L
+
and L
, as well as the unla-
beled set, U, into S and
¯
S, and predict the labels of un-
labeled samples in U accordingly. In previous works,
e.g. (Yang et al., 2014b; Baumann et al., 2019), the la-
beled sets are used as seeds and either HNC+ or HNC-
is solved with the restriction that L
+
S V \ L
.
Unlabeled samples in the optimal S
and
¯
S
are pre-
dicted positive and negative, respectively.
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
62
4 2-HNC: A TWO-STAGE
METHOD FOR PU LEARNING
In this section, we describe the 2-HNC method where
HNC is applied in two stages in PU learning where
only the positive labeled set L
+
and the unlabeled set
U are given. We then show how the optimization
problems in 2-HNC are solved as parametric mini-
mum cut problems on associated graphs.
4.1 2-HNC for PU Learning
The 2-HNC method consists of two stages. In stage 1,
we solve HNC- using only the given positive labeled
set. In stage 2, we utilize the likely-negative samples
extracted from the unlabeled set based on the result of
the first stage, prior to solving HNC+ using both the
positive labeled set and the likely-negative set. The
output solution is the one data partition, among those
that were generated in both stages, that has the frac-
tion of positive samples closest to the ratio π, given as
prior.
4.1.1 Stage 1: Solving HNC- with Positive
Labeled Samples
The given positive labeled set L
+
is used as the seed
set for the set S in HNC+ and HNC-. Since no nega-
tive labeled samples are provided, L
= , that is, no
seed sample is required to be in
¯
S. The seed set con-
straint imposed on HNC+ and HNC- is then L
+
S.
Without a seed set for
¯
S, HNC+ is not well
defined: the optimal solution to HNC+ is always
(S
,
¯
S
) = (V,) for any tradeoff µ 0. That is,
HNC+ has only the trivial solution in which all un-
labeled samples are predicted to be positive. HNC+,
however, will be used in stage 2 when likely-negative
samples are available.
On the other hand, HNC- , with only positive la-
beled samples, gives non-trivial data partitions for
various values of the tradeoff parameter.
The optimal data partition for HNC- is depen-
dent on the tradeoff µ. We solve HNC-, under
the constraint L
+
S, for all tradeoff µ 0 as a
parametric minimum cut problem on an associated
parametric graph. For µ = 0, the optimal parti-
tion (S
,
¯
S
) is (V,). As µ increases, the opti-
mal partition gradually changes, for some µ, until µ
reaches a sufficiently large value, at which (S
,
¯
S
)
is (L
+
,V \L
+
). The result of the associated paramet-
ric minimum cut problem is a sequence of data par-
titions: (S
1
,
¯
S
1
),(S
2
,
¯
S
2
),.. ., (S
q
,
¯
S
q
), that correspond
to increasing values of µ. Here, q is the number of
different partitions in the parametric minimum cut so-
lution, and can be different for different data. This se-
quence of partitions for increasing values of µ, in fact,
is nested. That is,
¯
S
1
¯
S
2
···
¯
S
q
. We discuss
the procedure of solving HNC- as a parametric mini-
mum cut problem, as well as the nested cut property
in Section 4.2. Stage 1 ends here with the data par-
tition sequence, that is the optimal solution to HNC-
for different tradeoff values, as an output.
4.1.2 Stage 2: Solving HNC+ with Positive
Labeled Samples and Likely-Negative
Unlabeled Samples
Solving HNC- in stage 1 does not require negative la-
beled samples and gives us, for each tradeoff µ, a par-
tition of data samples into the positive prediction set
S
and the negative prediction set
¯
S
. However, HNC-
only considers the scenario where the intra-similarity
of the negative prediction set
¯
S is given higher im-
portance than that of the positive prediction set S.
Here, we consider HNC+, before combining the re-
sults from both stages as a final step of 2-HNC.
To handle the issue of HNC+ being ill-defined in
the absence of the negative labeled set, as discussed
in Section 4.1.1, we add to the problem the seeds for
¯
S. We select the set of samples that are likely to be
negative, or L
N
, from the unlabeled set U as the seed
set for
¯
S. The random sampling procedure to form L
N
,
called SelectNeg, is based on the results of stage 1.
SelectNeg takes as input the sequence of optimal
data partitions (S
1
,
¯
S
1
),(S
2
,
¯
S
2
),.. ., (S
q
,
¯
S
q
), which
are the results of solving HNC- for all µ 0, for in-
creasing values of µ, in stage 1. The nested sequence
¯
S
1
¯
S
2
· ··
¯
S
q
starts from
¯
S
1
= and expands un-
til
¯
S
q
= V \L
+
, which is the largest possible since we
require L
+
to be in S
q
. The implication of the nested-
ness is that, for an unlabeled sample that is predicted
negative for a particular µ, it is also predicted negative
for any larger value of µ.
We consider unlabeled samples that belong to the
negative prediction set
¯
S
for small µ as likely to be
negative. As µ increases from zero, these samples
are predicted negative before other unlabeled sam-
ples. Formally, for an unlabeled sample i U, we
denote q
i
= max{γ | i S
γ
} as the index of the last
partition in the sequence where sample i is still in
the positive prediction set. η(i) = |S
q
i
| is the num-
ber of samples that are predicted negative at the same
or larger values of tradeoff µ. A large η(i) implies that
sample i is more likely to be predicted negative than a
large number of samples. In our sampling method Se-
lectNeg, the probability that unlabeled sample i is se-
lected as likely-negative is η(i)/
jU
η( j). The num-
ber of likely-negative samples to be selected, or the
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
63
size of the set L
N
, is chosen in this work to be equal
to the number of the positive labeled samples, |L
+
|.
Once the likely-negative set L
N
is formed, we use
the positive labeled set L
+
and the likely-negative set
L
N
as the seed sets for S and
¯
S, respectively, and solve
HNC+ with the seed set constraint L
+
S V \L
N
.
As a result, the output from stage 2 is another se-
quence of data partitions, which are the optimal so-
lutions to HNC+ for all nonnegative tradeoff µ.
4.1.3 Combining Results from Both Stages
Among all data partitions generated in both stages, we
select the partition whose positive fraction, computed
as
|S
|
|V |
for a partition (S
,
¯
S
), is closest to the prior π.
Unlabeled samples in S
of the selected partition are
predicted positive, and those in
¯
S
negative.
4.2 Solving Parametric Minimum Cut
Problems in 2-HNC
We mentioned in the previous subsection that 2-HNC
involves solving HNC+ and HNC- as parametric min-
imum cut problems on associated graphs. We first ex-
plain how the two problems are solved for a single
tradeoff µ 0 as minimum cut problems, in Subsec-
tion 4.2.1. In 2-HNC, we solve them for all tradeoffs
µ 0, prior to selecting one partition from all that are
generated. We describe how this is done as parametric
minimum cut problems in Section 4.2.2. The nested
cut property of the partition sequence as a result of
stage 1 is also discussed here.
4.2.1 Solving HNC+ and HNC- for a Tradeoff
Parameter µ as a Minimum Cut Problem
HNC+ and HNC- are special cases of monotone inte-
ger programs, (Hochbaum, 2002; Hochbaum, 2021),
and as such can be solved as a minimum cut prob-
lem on an associated graph, which is a mapping from
the integer programming formulation of both prob-
lems (Hochbaum, 2010). This is because any mono-
tone integer programming problem can be solved as
a minimum cut problem on an associated graph, the
construction of which is a mapping from the formula-
tion (Hochbaum, 2002; Hochbaum, 2021).
Using the standard formulations of HNC+ and
HNC-, in the associated graph, there is a node for each
sample, and a node for each pair of samples. As a re-
sult, the size of this graph is quadratic in the size of the
data. However, there are alternative formulations that
are “compact”, (Hochbaum, 2010), in that the associ-
ated graph has number of nodes equal to the number
of samples, |V |, only. The alternative formulations are
shown for HNC+ and HNC- in the following lemma.
Lemma 4.1. HNC+ is equivalent to the following
problem:
minimize
SV
C(S,
¯
S) λ
iS
d
i
(4)
and HNC- is equivalent to
minimize
SV
C(S,
¯
S) λ
i
¯
S
d
i
(5)
where λ =
µ
µ+2
and d
i
=
jV\{i}
w
i j
for i V .
Proof. C(S,S) =
i, jS,i< j
w
i j
=
1
2
iS
jS\{i}
w
i j
since w
i j
= w
ji
.
iS
jS\{i}
w
i j
=
iS
(
jV\{i}
w
i j
j
¯
S
w
i j
) =
iS
d
i
C(S,
¯
S).
Hence, C(S, S) =
1
2
(
iS
d
i
C(S,
¯
S))
We rewrite the objective of HNC+ as C(S,
¯
S)
µ
2
(
iS
d
i
C(S,
¯
S)) = (1 +
µ
2
)(C(S,
¯
S)
µ
µ+2
iS
d
i
).
Hence, HNC+ can be solved by minimizing (4):
C(S,
¯
S) λ
iS
d
i
, with λ =
µ
µ+2
. The equivalence
of HNC- and (5) can be shown similarly by rewriting
C(
¯
S,
¯
S) in HNC- as
1
2
(
i
¯
S
d
i
C(S,
¯
S)).
When both labeled sets L
+
and L
are given, the
seed set constraint is L
+
S V \L
. Under this con-
straint, the solution to HNC+ for a tradeoff µ, which
is now solved via (4) with a tradeoff λ =
µ
µ+2
, is ob-
tained from the minimum cut solution of the associ-
ated graph, G
+
st
(λ). Let ({s} S
,{t}
¯
S
) denote
the minimum cut solution of G
+
st
(λ). Then, (S
,
¯
S
) is
the optimal solution to HNC+. The proof provided in
(Hochbaum, 2010) is omitted here.
The construction of G
+
st
(λ) for (4), with the con-
straint L
+
S V \L
, is illustrated in Figure 1a and
described as follows: We add to graph G, described in
Section 3.1, source node s and sink node t, and con-
nect s to all nodes of samples in L
+
with arcs of infi-
nite capacity. Similarly, nodes in L
are connected to
t with arcs of infinite capacity. In addition, all unla-
beled sample nodes, i V \(L
+
L
), or equivalently
i U, have arcs from s to i of capacity λd
i
.
Let ({s} S
,{t}
¯
S
) be the minimum cut solu-
tion of G
+
st
(λ), then we predict unlabeled samples in
S
are positive, and those in
¯
S
negative.
HNC- may also be used for binary classification
and can be solved similarly, via (5) for a tradeoff
λ =
µ
µ+2
, as a minimum cut problem on the associ-
ated graph, G
st
(λ), illustrated in Figure 1b. The only
difference between G
+
st
(λ) and G
st
(λ) is that, in the
latter, each i V \(L
+
L
) is connected to t, rather
than s, with capacity of λd
i
.
In PU learning, negative labeled samples are not
given and therefore L
= . HNC+ and HNC- in
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
64
(a) Graph G
+
st
(λ) for solving HNC+ with the constraint
L
+
S V \L
.
(b) Graph G
st
(λ) for solving HNC- with the constraint
L
+
S V \L
.
Figure 1: Associated graphs with HNC+ and HNC- formu-
lations, when labeled samples from both classes are given.
Nodes in the middle, outside the blue and yellow shaded ar-
eas, correspond to unlabeled samples in U.
this context are then solved, for a tradeoff λ, as mini-
mum cut problems on the graphs in Figure 2a and 2b,
which are G
+
st
(λ) and G
st
(λ) where L
= . As ex-
plained in Section 4.1.1, HNC+, with L
= , has a
trivial solution for all λ 0. This is also reflected in
the minimum cut of G
+
st
(λ) (Figure 2a) with L
= ,
which is ({s}V, {t}), as t is disconnected from other
nodes. Hence, in stage 1, we solve only HNC- us-
ing the graph G
st
(λ) in Figure 2b. Once the likely-
negative samples are used as seed samples in stage 2
(Section 4.1.2), HNC+ can be solved using the graph
G
+
st
(λ) in Figure 1a.
4.2.2 Solving HNC+ and HNC- for All Tradeoff
Values with a Parametric Minimum Cut
Procedure
Graphs G
+
st
(λ) and G
st
(λ) are parametric flow net-
works in that the capacities of source-adjacent and
sink-adjacent arcs ((s, i) and (i,t) for i V ) are mono-
tone non-increasing and non-decreasing with the pa-
rameter value (λ), or vice versa. For instance, G
+
st
(λ)
in Figure 1a, has source-adjacent capacities that can
only increase with λ, and sink-adjacent capacities that
are fixed. The minimum cuts in a parametric flow
(a) Graph G
+
st
(λ) for solving HNC+ with the constraint
L
+
S, when L
= .
(b) Graph G
st
(λ) for solving HNC- with the constraint
L
+
S, when L
= .
Figure 2: Graphs on which we solve HNC+ and HNC- as
minimum cut problems, in PU learning where negative la-
beled samples are not provided.
network are solved for all values of the parameter in
the complexity of a single minimum cut procedure us-
ing the parametric cut (flow) algorithm, (Gallo et al.,
1989; Hochbaum, 1998; Hochbaum, 2008). The first
is based on the push-relabel algorithm, and the latter
two on the HPF (pseudoflow) algorithm.
For our method, 2-HNC, HNC- with no negative
seed for
¯
S (L
= ) and HNC+ with the seed set
L
= L
N
for
¯
S are solved for all nonnegative trade-
off λ in stage 1 and 2, respectively, with a parametric
cut procedure.
In stage 1, HNC- with L
= is solved on the
parametric graph G
st
(λ) in Figure 2b. As explained
in Section 4.1.1, the result is a sequence of minimum
cuts, or data partitions, for increasing values of µ (and
also λ), with the nestedness property that motivates
how we select likely-negative samples.
Nested Cut Property. (Gallo et al., 1989;
Hochbaum, 1998; Hochbaum, 2008): Given a para-
metric flow graph G(λ), where, as the parameter λ
increases, the capacities of the source-adjacent, sink-
adjacent and other arcs are non-increasing, non- de-
creasing and constants, respectively, and a sequence
of values λ
1
< λ
2
... < λ
q
, then the corresponding
minimum cut partitions, (S
1
,
¯
S
1
),(S
2
,
¯
S
2
),.. ., (S
q
,
¯
S
q
),
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
65
satisfy
¯
S
1
¯
S
2
·· ·
¯
S
q
.
Since the parametric graph G
st
(λ), in Figure 2b,
is a parametric flow graph, it follows that the nested
cut property applies. Let the sequence of parti-
tions according to the parametric minimum cut of
G
st
(λ) for increasing λ, λ
1
< λ
2
... < λ
q
, be (S
1
,
¯
S
1
),
(S
2
,
¯
S
2
),.. ., (S
q
,
¯
S
q
). It follows that
¯
S
1
¯
S
2
···
¯
S
q
. This nested data partitions sequence, that is the
output of stage 1, is then used in stage 2 (Section
4.1.2) to find the likely-negative set, L
N
. An exam-
ple of the nested sequence is shown in Figure 3.
At the end of stage 2, we obtain the predictions of
unlabeled samples by selecting one partition, from all
partitions that are generated in stage 1 and 2, whose
positive fraction is closest to the prior π.
A general drawback of using minimum cut in very
dense graphs is that the solution tends to not favor
“balanced” partitions. In a balanced partition, there is
a constant fraction f < 1 of nodes on one side, and
the number of edges between the two sets in the parti-
tion is f n(1 f )n, which is O(n
2
). In that case, even
if many edges in the partition have small capacities,
their sheer number makes the capacity of such cuts
much higher than cuts that contain a small number of
nodes on one side. In the graphs we study, all pairwise
similarities are evaluated. Therefore, such graphs are
complete and dense. The standard approach to ob-
taining meaningful cut partitions is to apply graph
sparsification. There are many approaches for graph
sparsification in the context of semi-supervised learn-
ing, as studied by (de Sousa et al., 2013). Among the
approaches evaluated therein, we select the method
that was shown to give the best performance, which
is the k-nearest neighbor (kNN) sparsification (Blum
and Chawla, 2001) where samples i and j are con-
nected only if i is among the k nearest neighbors of
j, or vice versa. This results in a graph representa-
tion G = (V, E) where E is the set of similar samples
according to the kNN sparsification.
5 IMPLEMENTATION OF 2-HNC
This section includes the specification of several im-
plementation details. First, we give a brief descrip-
tion of the parametric minimum cut solver used in this
work. Second, we describe the choice of k in the k-
nearest neighbor graph sparsification method, as men-
tioned in the previous section. Finally, we specify the
pairwise similarity measure between pairs of samples.
5.1 Parametric Minimum Cut Solver
Solving HNC, via (4) and (5), for all nonnegative
tradeoff λ as a parametric minimum cut problem can
be done using the pseudoflow algorithm, given by
(Hochbaum, 2008) as a fully parametric minimum cut
solver that identifies all tradeoff values where opti-
mal partitions change as the tradeoff increases. In
this work, we use an implementation
1
of the pseud-
oflow algorithm that is a simple parametric mini-
mum cut solver. It takes as input the list of val-
ues of λ for which we solve for the minimum cut
of G
+
st
(λ) and G
st
(λ). The λ values we use are
{0,0.001,0.002, .. .,0.500}. The simple parametric
minimum cut solver finds the minimum cuts for all
the listed λ values, efficiently, in the complexity of a
single minimum cut procedure.
5.2 Graph Sparsification
As described in Section 4.2.2, we apply the kNN spar-
sification to G
+
st
(λ) and G
st
(λ) on which we solve
the parametric minimum cut problem. For each data,
we use multiple values of k and find the partitions
for all of them prior to selecting one for the predic-
tion. For data of size less than 10000, we use k
{5,10,15, 20,25}. For larger data, we use k {5,10}.
The procedure to select a data partition from those
generated by all ks is as follows: For each k, we find
the parametric minimum cut on the kNN-sparsified
graph and select the partition whose positive fraction
is closest to π as the candidate partition. Among the
candidate partitions from all k, we choose the one
with the largest k that has its positive fraction within
2% from π. Larger k is preferred since it maintains
more pairwise information. If no candidate partition
has positive fraction within 2% from π, we choose the
one with the fraction closest to π.
Here, only smaller values of k are evaluated on
large data. This is because, as discussed in Section
4.2.2, large datasets, with dense graph representa-
tion, often have highly unbalanced cuts. These large
datasets benefit from a higher degree of sparsification.
Hence, smaller ks are applied.
5.3 Pairwise Similarity Computation
Given H-dimensional vector representations of sam-
ples i and j, x
i
,x
j
R
H
, we compute their distance
d
i j
as a Euclidean distance between x
i
and x
j
. The
pairwise similarity w
i j
is then computed using the
1
https://riot.ieor.berkeley.edu/Applications/Pseudoflo
w/parametric.html
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
66
Figure 3: An example of a nested sequence of data partitions as a result of solving the parametric minimum cut problem in
stage 1 of 2-HNC, illustrated on the graph G
st
(λ). The sets of nodes in blue and yellow are the sets of positive and negative
predictions, respectively, for increasing tradeoff values λ.
Gaussian kernel, which is commonly used in meth-
ods that rely on pairwise similarities (Jebara et al.,
2009; de Sousa et al., 2013; Baumann et al., 2019),
as w
i j
= exp(d
2
i j
/2σ
2
). We use σ = 0.75 for data
with less than 10000 samples. For larger datasets, we
use σ = 0.25. Again, large datasets require a higher
degree of graph sparsification. Hence, a smaller σ is
applied so that similarities of distant pairs are brought
closer to zero, for the same effect as the sparsification
technique discussed in Section 4.2.2 and 5.2.
In addition to the standard Euclidean distance,
we also use a weighted Euclidean distance as an
alternative: d
i j
=
q
H
h=1
ρ
h
(x
ih
x
jh
)
2
where ρ =
[ρ
1
,.. ., ρ
H
] is the weight for the feature vector of size
H. ρ is scaled so that
H
h=1
ρ
h
= H. We use the feature
importance from a random forest-based PU learning
method (Wilton et al., 2022) as the weight ρ. Features
with high importance contribute to high impurity re-
duction at tree node splits in the random forest.
We refer to 2-HNC with the unweighted Euclidean
distance as 2-HNC(EU) and the variant with feature
importance as 2-HNC(FI).
6 TIME COMPLEXITY
ANALYSIS
Let N denote the data size, that is, N = |L
+
| + |U|.
Scikit-Learn implementation using the k-d tree data
structure for kNN sparsification and distance com-
putation runs in O(N log N) (Pedregosa et al., 2011).
The similarity weights computation takes O(N) time
since there are O(N) pairs remain after sparsification.
The pseudoflow algorithm, known as HPF or
Hochbaum’s PseudoFlow, solves the parametric min-
imum cut problem in the complexity of a single min-
imum cut procedure (Hochbaum, 2008). The com-
plexity of HPF on a graph with n nodes and m arcs,
denoted by T (n,m), depends on the implementation.
For instance, (Hochbaum and Orlin, 2013) provides
a version of HPF that runs in O(mn log(
n
2
m
)). Since
the number of nodes in the graphs of both stages are
at most N. The numbers of arcs are at least 2kN and
at most 4kN due to the kNN sparsification. Hence,
solving HNC in both stages runs in O(N
2
logN). This
runtime dominates other steps. Therefore, the time
complexity of 2-HNC is O(N
2
logN).
7 EXPERIMENTS
We evaluate 2-HNC with benchmark methods on real
data. The test for the methods’ robustness against the
misspecification of the prior π is also included.
7.1 Datasets
Datasets are listed in Table 1, with the number of all
samples, labeled and unlabeled samples, the number
of features and the fraction of positive samples (π) of
each data. All datasets are from the UCI ML Repos-
itory (Kelly et al., ), except for CIFAR10 from Keras
(Chollet et al., 2015), and 20News and MNIST from
Scikit-learn (Pedregosa et al., 2011). Samples in each
dataset are assigned labels (positive vs negative) as
follow: Vote: {Democrat} vs {Republican}, Obesity:
{Obesity Type I, II and III} vs {Insufficient, Normal,
Overweight}, Mushroom: {Edible} vs {Poisonous},
20News: {alt., comp., misc., rec.} vs {sci., soc.,
talk.}, Letter: {A-M} vs {N-Z}, CIFAR10: {bird,
cat, deer, dog, frog, horse} vs {airplane, automobile,
ship, truck}, MNIST: {1,3,5,7,9} vs {0,2,4,6,8}. Fol-
lowing (Kiryo et al., 2017; Wilton et al., 2022), we
use a pre-trained GloVe word embedding (Penning-
ton et al., 2014) to map each document in 20News to
a 300-dimension vector.
For each dataset, except for Vote, we randomly
sample 10% of the positive samples (with the num-
ber rounded to the nearest hundred) as the positive
labeled set L
+
. All the remaining samples are used as
unlabeled samples, or the set U. For Vote, as a small
dataset, we randomly select 40 samples as the posi-
tive labeled set. We run the experiments 5 times, with
different sampling of labeled samples.
As described in the introduction, 2-HNC is a
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
67
Table 1: Datasets: 10% of positive samples are randomly selected as labeled samples. The unlabeled set consists of negative
samples and the remaining 90% of positive samples. π is the fraction of positive samples in each dataset.
Name # Samples # Labeled # Unlabeled # Feature π
Vote 435 40 395 16 0.61
Obesity 2111 100 2011 19 0.46
Mushroom 8124 400 7724 112 0.52
20News 18846 1000 17846 300 0.56
Letter 20000 1000 19000 16 0.50
CIFAR10 60000 3600 56400 3072 0.60
MNIST 70000 3500 66500 784 0.51
Table 2: Classification accuracy (%) average (and standard error) across 5 runs of both variants of 2-HNC and benchmark
methods. Number in bold for each data is the highest accuracy.
Data uPU nnPU PU ET 2-HNC(EU) 2-HNC(FI)
Vote 51.60 (1.42) 84.08 (6.80) 92.51 (2.93) 90.33 (0.46) 94.99 (1.22)
Obesity 85.92 (7.23) 91.92 (1.23) 92.74 (0.66) 89.64 (1.78) 96.54 (1.18)
Mushroom 87.92 (4.54) 98.94 (0.64) 99.35 (0.66) 99.67 (0.18) 99.85 (0.09)
20News 58.83 (1.23) 70.90 (0.73) 84.89 (0.41) 76.63 (0.54) 86.03 (1.46)
Letter 81.33 (1.97) 87.50 (0.76) 86.21 (0.55) 88.88 (1.51) 87.92 (2.09)
CIFAR10 43.00 (0.01) 87.98 (0.65) 81.55 (0.11) 78.46 (0.85) 77.81 (0.41)
MNIST 72.65 (1.85) 94.25 (0.91) 95.30 (0.12) 96.44 (0.07) 94.87 (1.17)
Table 3: F1 score (%) average (and standard error) across 5 runs of both variants of 2-HNC and benchmark methods. Number
in bold for each data is the highest F1 score.
Data uPU nnPU PU ET 2-HNC(EU) 2-HNC(FI)
Vote 26.26 (11.81) 88.85 (4.67) 93.20 (3.22) 91.54 (0.40) 95.63 (1.00)
Obesity 74.59 (13.70) 86.26 (7.75) 91.04 (0.88) 87.94 (2.24) 94.63 (2.80)
Mushroom 85.19 (6.93) 98.91 (0.66) 99.34 (0.69) 99.67 (0.19) 99.85 (0.09)
20News 20.34 (4.38) 70.97 (3.39) 85.81 (0.23) 78.55 (0.61) 87.11 (1.28)
Letter 74.97 (3.26) 85.72 (1.82) 83.49 (0.73) 88.34 (1.64) 87.42 (2.14)
CIFAR10 21.02 (10.11) 89.44 (1.19) 83.99 (0.11) 81.05 (0.77) 80.92 (0.49)
MNIST 30.75 (7.96) 93.27 (2.13) 95.11 (0.16) 96.27 (0.07) 94.63 (1.09)
transductive method that predicts specifically for sam-
ples in the given unlabeled set. Hence, we evaluate
the models on their predictions of unlabeled samples
in U that the models are trained on. The metrics that
we use are the classification accuracy and F1 score,
averaged over 5 experiments on each dataset.
7.2 Benchmark Methods
2-HNC is compared against the following bench-
marks: uPU (Du Plessis et al., 2014; Du Plessis et al.,
2015), nnPU (Kiryo et al., 2017) and PU ET (Wilton
et al., 2022), as discussed in Section 2.
The choices of neural networks of uPU and nnPU
are similar to (Kiryo et al., 2017): a 6-layer MLP
with Softsign activation function for 20News, a 13-
layer CNN with a ReLU final layer for CIFAR10
and MNIST, and a 6-layer MLP with ReLU for other
datasets. For PU ET, we use the default hyperparam-
eters as suggested in (Wilton et al., 2022). We use the
available implementations
2
of these methods.
As explained in Section 5.3, we use two variants
of 2-HNC: 2-HNC(EU) and 2-HNC(FI) that use un-
weighted and feature importance-weighted Euclidean
distance, respectively.
7.3 Results
The accuracy and F1 score of both variants of 2-
HNC and benchmark models are reported in Table
2 and 3. 2-HNC(FI) yields the best result on tabu-
lar data (Vote, Obesity, Mushroom) and the text data
(20News). 2-HNC(EU) outperforms all methods on
Letter and MNIST. However, nnPU has the best per-
formance for CIFAR10. The relative performance of
the models are similar for both accuracy and F1 score.
We also test the statistical significance of the out-
performance of 2-HNC over other methods. The best
2
uPU, nnPU:https://github.com/kiryor/nnPUlearning,
PU ET:https://github.com/jonathanwilton/PUExtraTrees
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
68
Table 4: P-values for the t-test on the performances of 2-HNC and the best benchmarks. (*) denotes p-values where HNC
outperforms with high statistical significance (α = 0.05). P-values on CIFAR10 are not shown as 2-HNC does not give the
highest performance on CIFAR10.
Data Best 2-HNC variant Best benchmark P-values: accuracy P-values: f1 score
Vote 2-HNC(FI) PU ET 0.0903 0.0875
Obesity 2-HNC(FI) PU ET 0.0017* 0.0119*
Mushroom 2-HNC(FI) PU ET 0.0312* 0.0302*
20News 2-HNC(FI) PU ET 0.1025 0.0465*
Letter 2-HNC(EU) nnPU 0.0251* 0.0269*
MNIST 2-HNC(EU) PU ET 1.2584e-5* 8.4961e-5*
(a) Vote (b) Obesity
(c) Mushroom (d) 20News
(e) Letter (f) CIFAR10
(g) MNIST
Figure 4: Average accuracy (with the shaded regions as error bars) of each PU learning method when the prior of the positive
fraction π is misspecified, compared to the results when the correct π is provided.
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
69
variant between 2-HNC(EU) and 2-HNC(FI), is com-
pared to the best among the three benchmarks for each
dataset. P-values of the paired t-tests are reported in
Table 4. 2-HNC outperforms other methods with high
statistical significance (significance level of 0.05) on
most data for both metrics. The exceptions are accu-
racy and F1 score on Vote, where p-values are 0.0903
and 0.0875, and accuracy on 20News, with p-values
of 0.1025. Despite that, these p-values still demon-
strate the statistical significance level of around 0.1.
7.4 Sensitivity Analysis
We evaluate the models’ sensitivity to the misspec-
ification of the prior of positive fraction π. For each
dataset, with true positive fraction π
0
, we over-specify
and under-specify the prior using π = 1.1π
0
and π =
0.9π
0
, respectively. Results are shown in Figure 4
with uPU omitted for clarity of the plots as uPU
achieves lowest accuracy in all cases. In this analy-
sis, we use the better variant of 2-HNC for each data,
according to the result from the previous subsection.
As shown in Figure 4, 2-HNC exhibits higher
robustness than other methods when π is under-
specified, for all datasets except Mushroom and CI-
FAR10. For CIFAR10, 2-HNC yields similar perfor-
mance as PUET, where the two methods have the ac-
curacy of 80.43± 0.77 and 80.35±0.25, respectively.
Moreover, the rate of accuracy decline for 2-HNC is
lower than that of nnPU and PUET on many datasets
such as Vote, Obesity, 20News and Letter.
When π is over-specified, 2-HNC is not as ro-
bust as other methods. On data such as Obesity and
20News, the improvements become smaller.
8 CONCLUSIONS
Our PU learning method called 2-HNC is a two-stage
variant of a network flow-based Hochbaum’s Normal-
ized Cut that was previously used in binary classifica-
tion with labeled samples of both classes. The output
of 2-HNC is the partition of samples into the positive
and negative prediction sets.
Both stages of 2-HNC generate nested sequences
of data partitions for varying tradeoffs between the
inter-similarity of the positive and negative predic-
tion sets, and the intra-similarity within sets, solved
as parametric minimum cut problems. Stage 1 puts
more weights on the intra-similarity of the negative
prediction set, whereas stage 2 emphasizes on the pos-
itive one. Stage 2 utilizes the set of likely-negative
unlabeled samples, determined by the order in which
unlabeled samples enter the negative prediction set in
the nested sequence of stage 1. A partition whose pos-
itive fraction approximates the prior π most closely is
selected as the predictions for unlabeled samples.
Experiments on real datasets demonstrate that 2-
HNC outperforms benchmark methods in terms of ac-
curacy and F1 scores, as well as better robustness to
the under-specification of the prior π.
Future research directions include methods that
learn accurate pairwise similarities measure based on
the PU data as the current similarity measure is un-
supervised. Another potential direction is the selec-
tion of likely-negative samples from the unlabeled set.
While an approach based on the nested partition se-
quence is employed in this work, other techniques are
also worth further investigation.
ACKNOWLEDGEMENTS
This research was supported in part by the AI Institute
NSF Award 2112533. We would also like to thank
William Pham for his help with the literature review.
REFERENCES
As
´
ın Ach
´
a, R., Hochbaum, D. S., and Spaen, Q. (2020).
Hnccorr: combinatorial optimization for neuron iden-
tification. Annals of Operations Research, 289:5–32.
Baumann, P., Hochbaum, D. S., and Yang, Y. T. (2019).
A comparative study of the leading machine learning
techniques and two new optimization algorithms. Eu-
ropean journal of operational research, 272(3):1041–
1057.
Bekker, J. and Davis, J. (2020). Learning from positive
and unlabeled data: A survey. Machine Learning,
109(4):719–760.
Blum, A. and Chawla, S. (2001). Learning from labeled and
unlabeled data using graph mincuts.
Carnevali, J. C., Rossi, R. G., Milios, E., and de An-
drade Lopes, A. (2021). A graph-based approach for
positive and unlabeled learning. Information Sciences,
580:655–672.
Chollet, F. et al. (2015). Keras. https://keras.io.
de Sousa, C. A. R., Rezende, S. O., and Batista, G. E.
(2013). Influence of graph construction on semi-
supervised learning. In Machine Learning and Knowl-
edge Discovery in Databases: European Conference,
ECML PKDD 2013, Prague, Czech Republic, Septem-
ber 23-27, 2013, Proceedings, Part III 13, pages 160–
175. Springer.
Du Plessis, M., Niu, G., and Sugiyama, M. (2015). Convex
formulation for learning from positive and unlabeled
data. In International conference on machine learn-
ing, pages 1386–1394. PMLR.
KDIR 2024 - 16th International Conference on Knowledge Discovery and Information Retrieval
70
Du Plessis, M. C., Niu, G., and Sugiyama, M. (2014). Anal-
ysis of learning from positive and unlabeled data. Ad-
vances in neural information processing systems, 27.
Gallo, G., Grigoriadis, M. D., and Tarjan, R. E. (1989). A
fast parametric maximum flow algorithm and applica-
tions. SIAM Journal on Computing, 18(1):30–55.
Hochbaum, D. S. (1998). The pseudoflow algorithm and
the pseudoflow-based simplex for the maximum flow
problem. In International Conference on Integer
Programming and Combinatorial Optimization, pages
325–337. Springer.
Hochbaum, D. S. (2002). Solving integer programs
over monotone inequalities in three variables: A
framework for half integrality and good approxima-
tions. European Journal of Operational Research,
140(2):291–321.
Hochbaum, D. S. (2008). The pseudoflow algorithm: A new
algorithm for the maximum-flow problem. Operations
research, 56(4):992–1009.
Hochbaum, D. S. (2010). Polynomial time algorithms for
ratio regions and a variant of normalized cut. IEEE
transactions on pattern analysis and machine intelli-
gence, 32(5):889–898.
Hochbaum, D. S. (2021). Applications and efficient algo-
rithms for integer programming problems on mono-
tone constraints. Networks, 77(1):21–49.
Hochbaum, D. S. and Orlin, J. B. (2013). Simplifications
and speedups of the pseudoflow algorithm. Networks,
61(1):40–57.
Jebara, T., Wang, J., and Chang, S.-F. (2009). Graph con-
struction and b-matching for semi-supervised learn-
ing. In Proceedings of the 26th annual international
conference on machine learning, pages 441–448.
Kelly, M., Longjohn, R., and Nottingham, K.
Khan, S. S. and Madden, M. G. (2014). One-class classifi-
cation: taxonomy of study and review of techniques.
The Knowledge Engineering Review, 29(3):345–374.
Kiryo, R., Niu, G., Du Plessis, M. C., and Sugiyama, M.
(2017). Positive-unlabeled learning with non-negative
risk estimator. Advances in neural information pro-
cessing systems, 30.
Lee, W. S. and Liu, B. (2003). Learning with positive
and unlabeled examples using weighted logistic re-
gression. In ICML, volume 3, pages 448–455.
Li, H., Chen, Z., Liu, B., Wei, X., and Shao, J. (2014). Spot-
ting fake reviews via collective positive-unlabeled
learning. In 2014 IEEE international conference on
data mining, pages 899–904. IEEE.
Li, W., Guo, Q., and Elkan, C. (2010). A positive and unla-
beled learning algorithm for one-class classification of
remote-sensing data. IEEE transactions on geoscience
and remote sensing, 49(2):717–725.
Li, X. and Liu, B. (2003). Learning to classify texts using
positive and unlabeled data. In IJCAI, volume 3, pages
587–592. Citeseer.
Liu, B., Lee, W. S., Yu, P. S., and Li, X. (2002). Partially
supervised classification of text documents. In ICML,
volume 2, pages 387–394. Sydney, NSW.
Lu, F. and Bai, Q. (2010). Semi-supervised text catego-
rization with only a few positive and unlabeled doc-
uments. In 2010 3rd International conference on
biomedical engineering and informatics, volume 7,
pages 3075–3079. IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer,
P., Weiss, R., Dubourg, V., Vanderplas, J., Passos,
A., Cournapeau, D., Brucher, M., Perrot, M., and
Duchesnay, E. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation. In
Empirical Methods in Natural Language Processing
(EMNLP), pages 1532–1543.
Ren, Y., Ji, D., and Zhang, H. (2014). Positive unlabeled
learning for deceptive reviews detection. In Proceed-
ings of the 2014 conference on empirical methods in
natural language processing (EMNLP), pages 488–
498.
Spaen, Q., As
´
ın-Ach
´
a, R., Chettih, S. N., Minderer, M.,
Harvey, C., and Hochbaum, D. S. (2019). Hnccorr:
A novel combinatorial approach for cell identification
in calcium-imaging movies. eneuro, 6(2).
Wilton, J., Koay, A., Ko, R., Xu, M., and Ye, N. (2022).
Positive-unlabeled learning using random forests via
recursive greedy risk minimization. Advances in
Neural Information Processing Systems, 35:24060–
24071.
Yang, P., Li, X., Chua, H.-N., Kwoh, C.-K., and Ng, S.-
K. (2014a). Ensemble positive unlabeled learning for
disease gene identification. PloS one, 9(5):e97079.
Yang, P., Li, X.-L., Mei, J.-P., Kwoh, C.-K., and Ng, S.-K.
(2012). Positive-unlabeled learning for disease gene
identification. Bioinformatics, 28(20):2640–2647.
Yang, Y. T., Fishbain, B., Hochbaum, D. S., Norman, E. B.,
and Swanberg, E. (2014b). The supervised normal-
ized cut method for detecting, classifying, and identi-
fying special nuclear materials. INFORMS Journal on
Computing, 26(1):45–58.
Yi, J., Hsieh, C.-J., Varshney, K. R., Zhang, L., and Li, Y.
(2017). Scalable demand-aware recommendation. Ad-
vances in neural information processing systems, 30.
Zhang, C., Ren, D., Liu, T., Yang, J., and Gong, C.
(2019). Positive and unlabeled learning with label dis-
ambiguation. In IJCAI, pages 4250–4256.
Positive-Unlabeled Learning Using Pairwise Similarity and Parametric Minimum Cuts
71