by examples can be found in (Kozmina and
Solodovnikova, 2011).
In practice, there are two types of similarity
coefficient calculated: fact-based (i.e., value of
hierarchical similarity is calculated for each report
for measures, fact tables and schemas) and
dimension-based (i.e., for attributes, hierarchies,
dimensions and schemas). It has been decided to
distinguish two types of similarity coefficients due
to the well-known characteristics of the data stored
in data warehouses, i.e., quantifying (measures) and
qualifying (attributes). However, the essence of any
data warehouse is in facts, while the describing
attributes give the auxiliary information. Thereby,
the recommendations are filtered (i) firstly, by the
value of the fact-based similarity coefficient, (ii)
secondly, by the one of dimension-based similarity
coefficient, and (iii) finally, by aggregate function
DOI.
Recommendations generated in Activity mode
for one of the reports – Total monthly students’
grade count by course (i.e, Kopējais vērtējumu
skaits mēnesī pa kursiem) – are presented in Figure
2. The usage scenario includes 10 recommendations
sorted in descending order, first, by the fact-based
similarity coefficient value, then, by the dimension-
based similarity coefficient, and finally, by
aggregate function DOI. As fact-based and
dimension-based similarity coefficient values may
highly differ, they are both shown to the user, for
instance, to make him/her aware of high extent of
dimension-based similarity even if the fact-based
similarity is average (e.g., reports #1: Monthly
distribution of students’ grade types by course, #3:
Total monthly grade count by course and by
professor, #4: Total monthly students’ final grade
count by course, #5: Total monthly students’ interim
grade count by course, and #6: Total monthly
students’ grade count by course) or low (e.g.,
reports #7: Gradebook usage by course, #9:
Students’ tasks by course, and #10: Total monthly
students’ task count by course and by professor).
The rest of the examples are with average fact-based
similarity and low dimension-based similarity
(report #2: Monthly distribution of students’ grade
types by study program) and low values of both
fact-based and dimension-based similarities (report
#8: Gradebook usage by course category).
In its turn, aggregate function DOI coefficient is
hidden from the user as it is considered to be less
informative but helpful in sorting in case when two
or more reports have same fact-based and
dimension-based similarity coefficient values, e.g.,
reports #4–#6 have equal fact-based and dimension-
based similarity values (respectively, 0.512; 0.679).
Such coefficient values illustrate that all three
reports consist of logical metadata with similar total
DOI value, whereas restrictions on data in these
reports may vary.
The cold-start method is composed of two steps:
(i) performance of structural analysis of existing
reports, and (ii) revealing likeliness between pairs of
reports. To be more precise, a pair of reports
consists of the report executed by the user at the
moment, and any other report which the user has a
right to access.
Here the report structure means all elements of
the data warehouse schema (e.g., attribute, measure,
fact table, dimension, hierarchy), schema itself, and
acceptable aggregate functions, which are related to
items of some report. In terms of structural analysis,
each report is represented as a Report Structure
Vector (RSV). In its turn, each coordinate of the
RSV is a binary value that indicates presence (1) or
absence (0) of the instance of the report structure
element. For example, in a RSV of a report Total
monthly grade count by course and by professor the
only element instances that are marked with 1 are:
attributes Month, Course, and Professor, measure
Grade count, dimensions Time, Course, and
Person,
fact table Students’ grades, schema Gradebook, and
aggregate function SUM. All the rest element
instances are marked with 0. Note that all report
structure elements are ordered the same way in all
reports. In case if any kind of change occurs, for
instance, a report is altered or a new report is
created, RSV of each report should be created all
over again.
To reveal likeliness between pairs of reports by
calculating the similarity coefficient, it is offered to
make use of Cosine/Vector similarity. It was
introduced by (Salton & McGill, 1983) in the field
of information retrieval to calculate similarity
between a pair of documents by interpreting each
document as a vector of term frequency values.
Later it was adopted by (Breese et al., 1998) in
collaborative filtering with users instead of
documents, and items’ user rating values instead of
term frequency values.
In recommender systems literature
Cosine/Vector similarity is extensively used
(Vozalis and Margaritis, 2004); (Rashid et al.,
2005); (Adomavicius et al., 2011), etc. to compute a
similarity coefficient for a pair of users in
collaborative filtering, or items in content-based
filtering. So, Cosine/Vector similarity of a pair of
AddingRecommendationstoOLAPReportingTool
173