this work, the objective is not building a crawler, but
rather effective service discovery, initially, we
decided to use an off-the-shelf crawler and examined
several existing open source crawlers, including
Java-based crawlers and .net based crawlers. In
order to provide a seed to the crawlers, we used the
results obtained from search engines (Google and
Yahoo). However, this approach could retrieve only
a few web service description files. For this reason,
we decided to try other ways to collect service
descriptions.
As an alternative approach, we considered using
search engines directly in finding Web service
description sources. In this approach, we used the
filtering capability of search engines of Google and
Yahoo for file extensions while searching Web
content. Most of the Web service descriptions are
constructed in WSDL (wsdl, 2001) and stored with
“.wsdl” extension. At this point, as described in the
next paragraph in more detail, we made another
extension. Since “.wsdl” file type filtering did not
yield satisfactory enough results, we preferred to use
“.asmx” in file filtering.
Web service description file in WSDL format
contains many constructs. Indeed, these constructs
are XML based text files. Therefore, WSDL file
should be parsed in order to retrieve advertised web
services provided on a given URL. To this aim, we
implemented a customized WSDL parser that
extracts inputs, output, documentation of web
service and complex types defined as either input or
output. We have observed that not all of the
descriptions and well-formed and the parser return
error messages for such cases. When we examined
the error messages, we have seen three types of
problems with the service descriptions:
Service description can be specified in an
earlier version of WSDL.
Service description can be a malformed
WSDL document.
Service description is not given in WSDL at
all; it is specified as an ordinary html
document with wsdl extension.
The validation task is applied on the parsable
descriptions. The next concern is to identify the
input types to provide suitable values for parameters.
Input types may be in primitive or complex
structure. Handling the primitive typed parameters is
straightforward, by using the following rules:
For primitive types of enumeration, character,
boolean, integer and floating point number, the
input is set as the value1.
Check string type parameter names are checked
to see if it contains the word “date”. If it
contains the word “date”, use the date that has
the same month and day value and that is
closest to the current date as the parameter
value.
For other strings, the parameter is set to be a
simple text such as “text”.
When compared with primitive types, handling the
complex typed parameters is a challenging issue.
While dealing with complex types, we have
encountered the following situations that problems:
complex type of a parameter may refer to
another complex type
parameter type is an array of complex type
parameter type is a reference type.
parameter type is an interface.
parameter type is a nullable primitive.
For each of these situations, we used the following
solutions:
In order to create an instance of a complex
typed parameter, Reflection utility of .net
framework is used. Existence of a reference to a
previously created type may lead to infinite
loop. For this reason, constructed complex type
instances are kept in a list and we refer to this
list for setting the values when necessary.
For complex typed arrays, we create one
dimensional array having a single place. Its
instance is constructed and set by using the
procedure described above.
When parameter is of a reference type, only the
instance of top level type is created and we do
not set any values to its attributes.
In order to provide Interface type parameters,
all the complex types that are previously
defined are searched until. When a type
implementing that interface is found is found,
type instance is created and values are set.
Since primitive types cannot be set to null due
to its nature, instead of a primitive typed
instance, an instance of object type that
corresponds to the given primitive type is
constructed and its value is set.
After solving all the problems with the complex
types, web services are invoked one by one and the
responds are analyzed to find the validated (active)
services. For the filtering with “.wsdl” files,
although a long source URL list is obtained, most of
the descriptions cause parse errors. Among the
parsable ones, the number of validates services is
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
96