A comparison between hypothesis testing and the
Jaccard coefficient is shown in Fig. 9. Lines (1)-(a)
and (b) show the same similarities by the Jaccard co-
efficient. On the other hand, lines (2)-(a) and (b) show
the same similarities by hypothesis testing. Line (1)
and (2)-(a) shows the value of 0.6 and (1) and (2)-(b)
shows the value of 0.4.
Then, it is open to question to equate the case of
n = 4, k = 3 with the case of n = 20, k = 15. The
former case would arise more often than the latter.
Therefore, in the Jaccard coefficient, a small value
of n leads to worse results. In other words, we have
to avoid any accidental co-occurrence for a high-
precision and high-recall recommendation system.
Hypothesis testing shows a small value when n or
k is small and a large value when both n and k are
large. Then, Eq. 3 can calculate similarity except in
accidental co-occurrences.
We proposed a novel recommendation system using
SBM data. Several conventional systems using folk-
sonomy have focused on actual tag names. However,
we focused on item clusters, which are sets of items
tagged by each SBM user. We assumed SBM users’
behavior follows binomial distribution and used hy-
pothesis testing to calculate the similarities between
two item clusters. In addition, we evaluated our rec-
ommendation system. The results showed high re-
call and precision. We compared our proposed sys-
tem with the systems using actual tag names and
showed that our proposed system was more appropri-
ate. We also compared our proposed similarity calcu-
lation based on hypothesis testing with a conventional
similarity calculation and verified that our resultant
similarities were better than the conventional ones.
