model and with the inclusion of the possibility of re-
modelling, a satisfactory selection of the parameters
is made in all the cases, enabling us to take the appro-
priate decisions about their values prior to the execu-
tion.
4 CONCLUSIONS AND FUTURE
WORKS
The use of modelling techniques can contribute to im-
prove the decisions taken in order to reduce the execu-
tion time of the routines. The modelling allows us to
introduce information about the behavior of the rou-
tine in the tuning process, guiding this process.
It is necessary that the modelling time is small be-
cause at least part of this process could be carried out
in each installation of the routines. Therefore, differ-
ent ways of reducing it have been studied here, and
the results have been satisfactory.
Today, our research group is working on the inclu-
sion of meta-heuristics techniques in the modelling
(Mart
´
ınez-Gallar et al., 2006), and in applying the
same methodology to other types of routines and al-
gorithmic schemes (Cuenca et al., 2005) and (Carmo-
Boratto et al., 2006).
ACKNOWLEDGEMENTS
This work partially has been supported by the Conse-
jer
´
ıa de Educaci
´
on de la Regi
´
on de Murcia, Fundaci
´
on
S
´
eneca 02973/PI/05.
REFERENCES
Carmo-Boratto, M. D., Gim
´
enez, D., and Vidal, A. M.
(2006). Automatic parametrization on divide-and-
conquer algorithms. In proceedings of International
Congress of Mathematicians.
Caron, E., Desprez, F., and Suter, F. (2005). Parallel exten-
sion of a dynamic performance forecasting tool. Scal-
able Computing: Practice and Experience, 6(1):57–
69.
Chen., Z., Dongarra, J., Luszczek, P., and Roche, K. (2004).
LAPACK for clusters project: An example of self
adapting numerical software. In proceedings of the
HICSS 04’, page 90282.1.
Cuenca, J., Gim
´
enez, D., and Gonz
´
alez, J. (2004). Archi-
tecture of an automatic tuned linear algebra library.
Parallel Computing, 30(2):187–220.
Cuenca, J., Gim
´
enez, D., and Mart
´
ınez-Gallar, J. P. (2005).
Heuristics for work distribution of a homogeneous
parallel dynamic programming scheme on heteroge-
neous systems. Parallel Computing, 31:735–771.
Dongarra, J., Croz, J. D., and Duff, I. S. (1988). A set of
level 3 basic linear algebra subprograms. ACM Trans.
Math. Software, 14:1–17.
Dongarra, J. and Eijkhout, V. (2002). Self-adapting numer-
ical software for next generation applications. In ICL
Technical Report, ICL-UT-02-07.
Douglas, C. C., Heroux, M., Slishman, G., and Smith, R. M.
(1994). GEMMW: A portable level 3 BLAS Wino-
grad variant of Strassen’s matrix–matrix multiply al-
gorithm. J. Comp. Phys., 110:1–10.
Frigo, M. (1998). FFTW: An adaptive software architecture
for the FFT. In proceedings of the ICASSP conference,
volume 3, pages 1381–1384.
Huss-Lederman, S., Jacobson, E. M., Tsao, A., Turnbull,
T., and Johnson, J. R. (1996). Implementation of
strassen’s algorithm for matrix multiplication. In pro-
ceedings of Supercomputing ’96, page 32.
Katagiri, T., Kise, K., Honda, H., and Yuba, T. (2003).
FIBER: A generalized framework for auto-tuning
software. Springer LNCS, 2858:146–159.
Katagiri, T., Kise, K., Honda, H., and Yuba, T. (2005). AB-
CLib
DRSSED: A parallel eigensolver with an auto-
tuning facility. Parallel Computing, 32:231–250.
Lastovetsky, A., Reddy, R., and Higgins, R. (2006). Build-
ing the functional performance model of a processor.
In proceedings of the SAC’06, pages 23–27.
Mart
´
ınez-Gallar, J. P., Almeida, F., and Gim
´
enez, D. (2006).
Mapping in heterogeneous systems with heuristical
methods. In proceedings of PARA’06.
Singer, B. and Veloso, M. (2000). Learning to predict per-
formance from formula modeling and training data.
In proceedings of the 17th International Conference
on Mach. Learn., pages 887–894.
Strassen, V. (1969). Gaussian elimination is not optimal.
Numerische Mathematik, 3(14):354–356.
Tanaka, T., Katagiri, T., and Yuba, T. (2006). d-spline based
incremental parameter estimation in automatic perfor-
mance tuning. In proceedings of the PARA’06.
Vadhiyar, S. S., Fagg, G. E., and Dongarra, J. J. (2000). Au-
tomatically tuned collective operations. In proceed-
ings of Supercomputing 2000, pages 3–13.
Vuduc, R., Demmel, J. W., and Bilmes, J. (2001). Statistical
models for automatic performance tuning. In proceed-
ings of ICCS’01, LNCS, volume 2073, pages 117–126.
Whaley, R. C., Petitet, A., and Dongarra, J. J. (2001). Au-
tomated empirical optimizations of software and the
ATLAS project. Parallel Computing, 27(1-2):3–35.
Wolski, R., Spring, N. T., and Hayes, J. (1999). The net-
work weather sevice: a distributed resource perfor-
mance forescasting service for metacomputing. Jour-
nal of Future Generation Computing System, 15(5–
6):757–768.
INCLUDING IMPROVEMENT OF THE EXECUTION TIME IN A SOFTWARE ARCHITECTURE OF LIBRARIES
WITH SELF-OPTIMISATION
161