is confidential. Similarly, the base does not answer any query when, for example, the
average is calculated on the basis of a simple record, i.e. a query concerning only one
individual. Consequently, it refuses to answer for example the query: how much is the
average salary of the women employees who work for the computer science depart-
ment? because the average here is calculated from only one record.
A query on a SDB R consists to compute a subset of R using a characteristic formula
C, which is a logical formula built from the values of the attributes of R by using the
logical operators ∧ (and), ∨ (or), and ¬ (not). For example, the subset of records rep-
resenting the women employees who work for the computer science department, can be
represented by the following characteristic formula:
C = (sex=F) ∧ (department=computer science).
The set of records which satisfy the characteristic formula C, denoted by X
C
, is called
the result of the query. Applying the formula C on the relation R given in Table 1, we
get: COUNT (C) = 1, AV G(Age, C) = 31 and SUM(Salary, C) = 3200.
Generally, a statistical query taken separately does not allow to deduce confidential
information. For this reason, a user with good intentions should be able to form any
interesting characteristic formula, and to carry out any statistical measurement on the
resulting set of the records. However, it is possible that a user forms statistical queries
which can be employed to deduce specific values of a field of the database, which is not
acceptable if the values represent confidential information. In this case, we say that the
database has been compromised.
A characteristic formula used in order to compromise a database is called a tracker
[2, 3]. This formula is chosen so that it gives as a result a set X
C
whose size is equal
to 1. Denning et col. [2] have shown that for any real database, a tracker can always be
found.
In the next section, we propose a new strategy to prevent attacks based on trackers.
3 Our approach
In the everyday life and particularly in the medical field, medical analyses are gener-
ally expressed by linguistic descriptions (Example: Temperature of the body is raised,
normal, etc). This is especially used for the non-specialists in the medical field. In this
paper, we take as a starting point this method to deal with the illegitimate inference
problem in statistical databases. More precisely, we replace the results of the statistical
queries (quantitative answers) by linguistic descriptions (qualitative answers) in order
to limit the risk of illegitimate inference.
For this, our idea consists in replacing the numerical answers (e.g. numbers of patients
= 10) by linguistic descriptions (e.g. medium) formalized in fuzzy logic framework.
Intuitively, each numerical answer is associated to a given class then a qualitative an-
swer is associated to each class. Thus, the formalization of our approach requires two
steps: classification and fuzzification. Let us recall these two concepts:
– Classification is the procedure which consists in decomposing the scale of the used
numerical values into non-empty classes so that each numerical value belongs to
one and only one class.
Let I be a set of elements. We say that Q(I) is a partition of I if there exists a set
219