social media, and fake news sources prolific in spread-
ing misinformation. The unifying property of these
networks is that normal agents rarely link to aberrant
ones. We call this aberrant linking behavior.
We formulated the detection problem in a novel
way: as a directed Markov Random Field (MRF) prob-
lem. This formulation balances obeying any given
prior information with minimizing the links from nor-
mal to aberrant agents. We discussed how the formula-
tion is solved optimally and efficiently.
To compare the performance of the algorithms, we
developed a new, asymmetric variant of the modularity
metric for directed graphs, addressing a known short-
coming of the existing metric. We showed that our
metric has desirable properties and proved that max-
imizing it is NP-hard. We also used several ad-hoc
metrics to better understand properties of the solutions.
In an empirical experiment, we found that the MRF
method outperforms competitors such as PageRank,
TrustRank, AntiTrustRank, and Random. The solu-
tions returned by MRF had the largest modularity score
on thirteen of the twenty-three datasets tested. The
modularity for MRF was, on average,
25
percent bet-
ter than the modularity returned by TrustRank or Anti-
TrustRank.
ACKNOWLEDGEMENTS
D. S. Hochbaum was supported in part by National
Science Foundation (NSF) award CMMI 1760102.
M. Velednitsky was supported by the National Physical
Science Consortium (NPSC).
REFERENCES
Abernethy, J., Chapelle, O., and Castillo, C. (2010). Graph
regularization methods for web spam detection. Ma-
chine Learning, 81(2):207–225.
Ahuja, R. K., Hochbaum, D. S., and Orlin, J. B. (2003). Solv-
ing the convex cost integer dual network flow problem.
Management Science, 49(7):950–964.
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
Baeza-Yates, R. (2008). Web spam detection: Link-
based and content-based techniques. In The European
Integrated Project Dynamically Evolving, Large Scale
Information Systems (DELIS): proceedings of the final
workshop, volume 222, pages 99–113.
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., and
Baezayates, R. (2006). Linkbased characterization and
detection of web spam. In 2nd International Work-
shop on Adversarial Information Retrieval on the Web,
AIRWeb 2006-29th Annual International ACM SIGIR
Conference on Research and Development in Informa-
tion Retrieval, SIGIR 2006.
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-
parameter optimization. Journal of Machine Learning
Research, 13(Feb):281–305.
Bergstra, J. S., Bardenet, R., Bengio, Y., and K
´
egl, B. (2011).
Algorithms for hyper-parameter optimization. In Ad-
vances in neural information processing systems, pages
2546–2554.
Brandes, U., Delling, D., Gaertler, M., G
¨
orke, R., Hoefer,
M., Nikoloski, Z., and Wagner, D. (2006). Maximizing
modularity is hard. arXiv preprint physics/0608255.
Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi,
S., Santini, M., and Vigna, S. (2006). A reference
collection for web spam. SIGIR Forum, 40(2):11–24.
http://chato.cl/webspam/datasets/.
Castillo, C., Donato, D., Gionis, A., Murdock, V., and Sil-
vestri, F. (2007). Know your neighbors: Web spam
detection using the web topology. In Proceedings of
the 30th annual international ACM SIGIR conference
on Research and development in information retrieval,
pages 423–430. ACM.
Erd
´
elyi, M., Garz
´
o, A., and Bencz
´
ur, A. A. (2011). Web
spam classification: a few features worth more. In Pro-
ceedings of the 2011 Joint WICOW / AIRWeb Workshop
on Web Quality, pages 27–34. ACM.
Fire, M., Kagan, D., Elyashar, A., and Elovici, Y. (2014).
Friend or foe? fake profile identification in online
social networks. Social Network Analysis and Mining,
4(1):194.
Gallo, G., Grigoriadis, M. D., and Tarjan, R. E. (1989). A fast
parametric maximum flow algorithm and applications.
SIAM Journal on Computing, 18(1):30–55.
Gan, Q. and Suel, T. (2007). Improving web spam classifiers
using link structure. In Proceedings of the 3rd interna-
tional workshop on Adversarial information retrieval
on the web, pages 17–20. ACM.
Garey, M. R., Johnson, D. S., and Stockmeyer, L. (1974).
Some simplified np-complete problems. In Proceed-
ings of the sixth annual ACM symposium on Theory of
computing, pages 47–63. ACM.
Geman, S. and Geman, D. (1984). Stochastic relaxation,
gibbs distributions, and the bayesian restoration of im-
ages. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 6(6):721–741.
Ghosh, S., Viswanath, B., Kooti, F., Sharma, N. K., Korlam,
G., Benevenuto, F., Ganguly, N., and Gummadi, K. P.
(2012). Understanding and combating link farming in
the twitter social network. In Proceedings of the 21st
international conference on World Wide Web, pages
61–70. ACM.
Goldberg, A. V. and Tarjan, R. E. (1988). A new approach
to the maximum-flow problem. Journal of the ACM
(JACM), 35(4):921–940.
Gori, M. and Pucci, A. (2006). Research paper recommender
systems: A random-walk based approach. In Web Intel-
ligence, 2006. WI 2006. IEEE/WIC/ACM International
Conference on, pages 778–781. IEEE.
Gy
¨
ongyi, Z., Garcia-Molina, H., and Pedersen, J. (2004).
Combating web spam with trustrank. In Proceedings
of the Thirtieth international conference on Very large
data bases-Volume 30, pages 576–587. VLDB Endow-
ment.
KDIR 2019 - 11th International Conference on Knowledge Discovery and Information Retrieval
80