5 CONCLUSIONS
We have concluded that the FIN-based calculation
of the similarity between documents is a novel
method for solving various problems in the case of
CLIR-systems. We verified in our experiments that
if we focus on the documents (dissertations etc) that
are related to different search terms then we can
apply FIN-based techniques and calculate correctly
the distance (similarity) between documents.
Such techniques include the following cases:
1. Use partitions (“collections”) of the sample (e.g.
you can use the NLP-Collection and the IR-
Collection) and calculate the distance of the
(“unclassified”) document from such partitions
(“collections”). This distance is some kind of
average distance of the document from all the
elements of the collection.
2. Use partitions (“collections”) of the sample, and
define the number of “hits” (e.g. top-4 or top-5).
Calculate the distances of the (“unclassified”)
document from the (e.g. four or five) documents
of each partition. Use only the (top-x) documents
which are closer to the unclassified one. Then
you can calculate the average distance of the
unclassified document from these top-x
documents.
3. Use increased weights for the search terms that
are contained in the retrieved documents.
4. Use of positive weights in the case that the
search term is included in a document of the
partition and use of penalty (negative weight) in
the case that the search term is not included.
5. If the length of the documents is greater then the
results of the method are better.
A strategy related to the specific sample of
classified and unclassified documents could be
defined. As an example, you can combine 1 & 2
and if it is necessary weights and penalty: If the
general (average) distance of a document from a
collection (which is calculated from the
distances of the document from all the classified
documents of the collection) and the partial one
(which is calculated from a number of the top-x
distances) are not “consistent” you must use
weights following the appropriate technique.
ACKNOWLEDGEMENTS
This work was co-funded by 75% from the
European Social Fund and 25% by National
Resources (EPEAEK-II)-Archimedes.
REFERENCES
Radecki,T (1979), “Fuzzy Set Theoretical Approach to
Document Retrieval” in Information Processing and
Management, v.15, Pergammon-Press 1979.
Kraft, D.H. and D.A. Buell (1993), “Fuzzy Set and
Generalized Boolean Retrieval Systems” in Readings
in Fuzzy Sets for Intelligent Systems, D. Dubius,
H.Prade, R.R. Yager (eds).
Kaburlasos, V.G. (2004), “Fuzzy Interval Numbers
(FINs): Lattice Theoretic Tools for Improving
Prediction of Sugar Production from Populations of
Measurements,” IEEE Trans. on Man, Machine and
Cybernetics – Part B, vol. 34, no 2, pp. 1017-1030.
Petridis, V. and V.G. Kaburlasos (2003), “FINkNN: A
Fuzzy Interval Number k-Near-est Neighbor Classifier
for prediction of sugar production from populations of
samples,” Journal of Machine Learning Research, vol.
4 (Apr), pp. 17-37, 2003
Kaburlasos and Petridis (2000), Fuzzy Lattice
Neurocomputing models, Neural Networks, 13(10),
1145-1170.
Petridis and Kaburlasos (1998), Fuzzy lattice neural
network (FLNN): A hybrid model for learning, IEEE
Trans. Neural Networks, 9(5), 877-890.
Petridis and Kaburlasos (2000), An intelligent
mechatronics solution for automated tool guidance in
the epidural surgical procedure, Proc. 7
th
Annual conf.
Mechatronics and Machine Vision in Practice, pp 201-
206.
Petridis and Kaburlasos (2001), Clustering and
classification in structured data domains using Fuzzy
Lattice Neurocomputing, IEEE Trans. Knowledge
Data Engineering, 13(2), 245-260, 2001
Kaburlasos et al (1997), Automatic detection of bine
breakthrough in orthopedics by fuzzy lattice
reasoning: The case of drilling in the osteosynthesis of
long bones, Proc. Mechatronics Computer systems for
Perception and Action, pp 33-40.
Athanassiadis and Mitkas (2003), Applying machine
learning techniques on air quality data for real-time
decision support, Proc. Intl. NAISO Symposium on
Information Technologies in Environmental
Engineering.
Kaburlasos V.G., Spais V, Petridis V, Petrou L, Kazarlis S,
Maslaris N, and Kallinakis A, Intelligent clustering
techniques for prediction of sugar production,
Mathematics and Computers in Simulation, 60(3-5),
159-168, 2002
Kaburlasos V.G. Papadakis S. (2005) granular Self
Organizing Map (grSOM) neural network for
industrial quality control, Proc of SPIE, Mathematical
Methods in Pattern and Image Analysis, 2005
Kaburlasos V.G. , Fuzzy Interval Numbers (FINs): Lattice
Theoretic Tools for Improving Prediction of Sugar
Production from Populations of Measurements
Marinagi, Alevisos, Kaburlasos, Skourlas, Fuzzy Interval
Number (FIN) Techniques for Cross Language
Information Retrieval, Proc. 8
th
ICEIS, 2006
FUZZY INTERVAL NUMBER (FIN) TECHNIQUES FOR MULTILINGUAL AND CROSS LANGUAGE
INFORMATION RETRIEVAL
355