data:image/s3,"s3://crabby-images/0084e/0084e8046b1a2db1c69e6c34154ec7392665f996" alt=""
summing, counting, averaging. Other programs store
or retrieve the data via certain interfaces.
Platform Service Layer (PSL): to provide common
programming routines, data mining algorithm
libraries, standard business process procedures,
parametric models, and so on.
Software Service Layer (SSL): to provide well-
defined and easy-to-use applications for end users.
Although the four internal service layers looks like
hierarchical, it does not mean one service layer must
build on another service layer. They are flat. The
data service portal deploys many programs that
provide various types of public financial data
services to end users.
3.2 Data Resource Center
The data resource center consists of a distributed file
system and a distributed scalable non-relational
database. It also contains some standalone relational
databases on the specific hosts. The distributed
scalable non-relational database holds most of data
and the distributed file system stores files. We
retrieve data (including files) from different sources,
for example, crawling from the Web, importing from
external databases, uploading or inputting by users.
And these data are in several subjects, such as
government notice, economy news, market/trade
data, index, forum thread, and so on.
FIND integrates these data in a proper way and
puts them into the distributed scalable non-relational
database. For structured data come from a relational
database, we re-design the schema and transform
tuples by copying or mapping attributes, or simply
import the schema to the distributed scalable non-
relational database by using a column family as the
relation and column qualifiers as attributes in that
relation. The latter is easy to implement but may be
difficult to use. For other structured data that are not
from relational databases, we design some common
column families as subjects, and append attributes of
structured data to the corresponding column family.
For unstructured data, such as a text file or a
multimedia clip, we save the file to the distributed
file system, and save the file name, length, keywords,
and other retrievable embedded attributes (if exists,
such as version information or document variables)
to the non-relational database. For semi-structured
data, we regard it as unstructured data but extract
more attributes and most of contents from it.
The data resource center also has a lot of
component libraries, internal functions and
interfaces that are used in other programs. It
provides following internal functions:
Data Query: to query data from data resource center,
by keyword or certain conditions.
Data Download: to download data that is matched
the conditions.
Data Upload / Save: to support users or programs
upload or save data to the data resource center.
Data Retrieve/Import: to retrieve data from other
data sources, especially from internet or other file
systems. It also supports importing data from an
external relational database or a record file.
Statistical Analytics: to provide some basic
statistical algorithms, such as summing, counting,
maximum / minimum selecting, etc.
Data Integration: to integrate special sets of data (in
most cases they are retrieved from different data
sources), by performs data cleaning, extracting,
translating, loading and other necessary steps.
Data Mining: to provide advanced data analysing
and data mining algorithms libraries. It can perform
more complex statistical analysing and common data
mining algorithms, including pattern matching,
clustering, classification, outlier detection, etc. Other
programs invoke these algorithms and reach their
data mining targets.
Build-in Components: include most of program
libraries and components, parametric models, script
interpreters, etc. Parametric model libraries provide
some sophisticated parametric data processing
models and business models. Programmer only
needs to determine the parameter values, and then
the model will give a good result.
Interfaces: include program interfaces, web service
interfaces and cloud management interfaces.
Program interfaces support to load, run, suspend or
terminate a program. Web service interfaces support
a program to communicate with other services or
programs. Cloud management interfaces support the
most internal managing operations of FIND, such as
job dispatching, task scheduling, message queuing,
intermediate data maintaining, etc.
3.3 Data Service Portal
The data service portal deploys a lot of software and
provides public data services to end users. Users
could use the software without any customization.
Generally depending on the software execution
policy, any user permits to execute software, or only
an authorized user allows executing. Software
implements the basic and common functions that
users want to use, for example, market replaying,
searching, data mining or information publishing.
FIND also provides some business models, for
FIND-ADataCloudPlatformforFinancialDataServices
119