particularly challenging, either to optimality, or very
close to optimality. All this is done within a small
fraction of the running time required by competing
approaches, such as an integer programming solver,
or by state-of-the-art meta-heuristics.
The Lagrangean relaxation of the budget con-
straint has been previously used in the context of the
quadratic knapsack problem, most recently by (Spiers
et al., 2023). The quadratic knapsack, which is iden-
tical to the maximum diversity and maximum facil-
ity dispersion problems, can be formulated as IPM
with a budget constraint. However, the methods de-
veloped for the quadratic knapsack were ad-hoc and
suitable for this case only, and the general structure
has not been recognized up till now. In addition, all
the literature that uses this approach, going back to
(Chaillou et al., 1989; Pisinger, 2006) can solve the
problems to optimality for at most a few hundred vari-
ables, and for selected, relatively high, values of the
budget (accommodating 10%-50% of the variables),
whereas the harder problems, that are more prevalent
in applications, are for smaller budgets. In addition,
the running times of the current approaches do not
scale well. Indeed in the reported results of (Spiers
et al., 2023) the largest instances contain up to 2000
nodes of the GKD benchmark. These instances were
shown to be particularly easy, in (Hochbaum et al.,
2023). In contrast, in (Hochbaum et al., 2023), new
insights as to how to deal with harder problems, with
perturbation, were able to provide, often optimal, or
very close to optimal, solutions, within a tiny fraction
of the running time of competing approaches, includ-
ing integer programming software.
Here we demonstrate that the breakpoints algo-
rithm is applicable to a vast collection of hard prob-
lems, with the potential of providing optimal or very
close to optimal, solutions One example explored
here is the text summarization problem, aka multi-
document summarization, which is modeled, under
the MMR criterion, as combining one goal of max-
imizing the sum of dissimilarities in the selected set
(of sentences), to enhance the diversity of the selected
set and eliminate redundancy, with the second goal
maximizing the similarities between the selected set
and its complement. This combined objective is NP-
hard to solve even without the budget constraint on
the total size of the sentences selected. A straightfor-
ward formulation of this optimization problem, given
in (Lin and Bilmes, 2010), is reported to be solved
for an instance of the problem of size 178 sentences,
in 17 hours, using an integer programming software.
Our approach is to model the problem as budgeted
IPM simply by replacing the similarity weights by
dis-similarity weights, e.g. by taking the reciprocal
of the similarities. Once the problem is modeled as
IPM with a budget constraint, the framework pre-
sented here can utilize the concave envelope to solve
the problem effectively, and with a highly scalable al-
gorithm. This is discussed in detail in Section 3.
There is a close relationship between ratio prob-
lems and budgeted IPM problems. This relationship is
reflected in the concave envelope, where the optimal
solution to the respective budget problem is the first
breakpoint, for the smallest budget value, that cor-
responds to a generalization of the maximum density
subgraph problem.
Our contributions here include:
• Introducing a large class of NP-hard problems that
are formulated as monotone integer programming
with a budget constraint: budgeted IPM.
• Demonstrating that for all budgeted IPM prob-
lems the concave envelope (for maximization,
convex for minimization) related to the La-
grangean relaxation of the budget constraint is
constructed as the output of a (parametric) min-
imum cut procedure on a respective graph.
• The breakpoints in the concave envelope are
shown to be optimal solutions for the respective
budget values, and correspond to nested solutions.
• The perturbation concept, of (Hochbaum et al.,
2023), applies to the class of budgeted IPM prob-
lems, and can increase the number of breakpoints
and enhance their distribution around the budget
values of interest.
• Relationship of budgeted IPM problems to the
respective IPM ratio problems, that are polyno-
mial time solvable with a parametric cut proce-
dure, showing that the first (leftmost for maxi-
mization) breakpoint, solves the ratio problem op-
timally. The first breakpoint is shown to general-
ize the concept of maximum density subgraph.
• The newly introduced, procedure incremental-
para, that solves IPM ratio problems, with a given
initial feasible solution, in the complexity of a sin-
gle minimum cut procedure and generates sequen-
tially all breakpoints.
• Show how all budgeted IPM problems are
amenable to the breakpoints algorithm of
(Hochbaum et al., 2023) which bodes well to the
chances of being able to use a scalable algorithm
that delivers high quality solutions.
• Demonstrating a new formulation for the text
summarization problem that renders it a budgeted
IPM problem with the potential of new scalable
methods for the problem.
KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval
366