
our proposed approach. Additionally, in terms of
GFLOPs, our model showcases a reduction of 0.4
compared to the Attention UNet. This highlights our
model’s efficiency in terms of both parameter count
and computational workload.
4.4 Qualitative Analysis
We visually compare our outputs with those of (Kim
et al., 2018) using examples from all datasets except
the random lines. Although both methods are able
to segment correct instances of strokes in most cases,
both have distinct shortcomings.
Figure 8 presents examples where our method per-
formed qualitatively better than (Kim et al., 2018). As
shown in Figure 8k and Figure 8l, despite the pres-
ence of a missing pixel in the overlapping regions of
the ground truth, we are still able to maintain conti-
nuity in the strokes. A user can draw a cat in various
styles, such as using a single stroke for the whole face,
two separate strokes for the ears and face, or individ-
ual strokes for the ears and face. In Figure 8n, our
model not only segments the instances but also infers
the user’s drawing style. This stands in contrast to
(Kim et al., 2018), as shown in Figure 8i, which erro-
neously predicts the face and ears as one stroke (i.e.
the most common drawing style).
Figure 9 shows some shortcomings of our method
when compared to (Kim et al., 2018). In the char-
acter dataset (shown in Figure 9k and Figure 9l), it
can be observed that multiple instances are assigned
to a single stroke instance, resulting in a phenomenon
known as “vector soup”. However, if human interven-
tion is allowed as a postprocessing step, the number
of required edits would be minimal; thus our output
could be successfully utilized for further editing. In
Figure 9m and Figure 9n, some of the instances were
mislabeled, however, our model accurately identifies
circles in both categories, unlike (Kim et al., 2018).
Overall, it can be concluded that our model is able
to follow three main perceptual grouping principles
(similarity, continuity and closure).
5 CONCLUSIONS
We propose a novel segmentation method that pro-
cesses the input raster image by labeling each pixel as
belonging to a particular stroke instance. Our novel
architecture, named Multi-Focus Attention UNet
builds upon Attention UNet by introducing multi-
focus attention gates and highway skip connections,
two architectural elements which play a key role in
capturing global and local image context and in high
computational efficiency. Our loss function includes
a margin-regularised component which allows us to
handle successfully a heavy-tailed label distribution,
as well as infer correctly the user’s drawing style. Our
approach is significantly faster, exceeding state of the
art by seven orders of magnitudes. Future work in-
volves extending our focus to complex line drawing
art that would significantly benefit from image vec-
torization when shown on large displays.
REFERENCES
Bartolo, A., Camilleri, K. P., Fabri, S. G., Borg, J. C., and
Farrugia, P. J. (2007). Scribbles to vectors: prepa-
ration of scribble drawings for cad interpretation. In
Proceedings of the 4th Eurographics workshop on
Sketch-based interfaces and modeling, pages 123–
130.
Bouton, G. D. (2014). CorelDRAW X7. Mcgraw-hill
Education-Europe.
Cao, K., Wei, C., Gaidon, A., Arechiga, N., and Ma, T.
(2019). Learning imbalanced datasets with label-
distribution-aware margin loss. Advances in neural
information processing systems, 32.
de Casteljau, P. (1963). “courbes et surfaces a poles,” tech-
nical report. Citroen, Paris.
Dori, D. (1997). Orthogonal zig-zag: an algorithm
for vectorizing engineering drawings compared with
hough transform. Advances in Engineering Software,
28(1):11–24.
Favreau, J.-D., Lafarge, F., and Bousseau, A. (2016). Fi-
delity vs. simplicity: a global approach to line drawing
vectorization. ACM Transactions on Graphics (TOG),
35(4):1–10.
Graves, A. and Graves, A. (2012). Long short-term mem-
ory. Supervised sequence labelling with recurrent
neural networks, pages 37–45.
He, K., Gkioxari, G., Dollar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE ICCV.
Hilaire, X. and Tombre, K. (2006). Robust and accu-
rate vectorization of line drawings. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
28(6):890–904.
Inoue, N. and Yamasaki, T. (2019). Fast instance seg-
mentation for line drawing vectorization. In 2019
IEEE Fifth International Conference on Multimedia
Big Data (BigMM), pages 262–265. IEEE.
Jimenez, J. and Navalon, J. L. (1982). Some experiments
in image vectorization. IBM Journal of research and
Development, 26(6):724–734.
Kim, B., Wang, O.,
¨
Oztireli, A. C., and Gross, M. (2018).
Semantic segmentation for line drawing vectorization
using neural networks. In Computer Graphics Forum,
volume 37, pages 329–338. Wiley Online Library.
Kingma, D. P. and Ba, J. (2017). Adam: A method for
stochastic optimization.
Single-Class Instance Segmentation for Vectorization of Line Drawings
225