
 
summing, counting, averaging. Other programs store 
or retrieve the data via certain interfaces. 
Platform Service Layer (PSL): to provide common 
programming routines, data mining algorithm 
libraries, standard business process procedures, 
parametric models, and so on. 
Software Service Layer (SSL): to provide well-
defined and easy-to-use applications for end users. 
Although the four internal service layers looks like 
hierarchical, it does not mean one service layer must 
build on another service layer. They are flat. The 
data service portal deploys many programs that 
provide various types of public financial data 
services to end users. 
3.2  Data Resource Center 
The data resource center consists of a distributed file 
system and a distributed scalable non-relational 
database. It also contains some standalone relational 
databases on the specific hosts. The distributed 
scalable non-relational database holds most of data 
and the distributed file system stores files. We 
retrieve data (including files) from different sources, 
for example, crawling from the Web, importing from 
external databases, uploading or inputting by users. 
And these data are in several subjects, such as 
government notice, economy news, market/trade 
data, index, forum thread, and so on. 
FIND integrates these data in a proper way and 
puts them into the distributed scalable non-relational 
database. For structured data come from a relational 
database, we re-design the schema and transform 
tuples by copying or mapping attributes, or simply 
import the schema to the distributed scalable non-
relational database by using a column family as the 
relation and column qualifiers as attributes in that 
relation. The latter is easy to implement but may be 
difficult to use. For other structured data that are not 
from relational databases, we design some common 
column families as subjects, and append attributes of 
structured data to the corresponding column family. 
For unstructured data, such as a text file or a 
multimedia clip, we save the file to the distributed 
file system, and save the file name, length, keywords, 
and other retrievable embedded attributes (if exists, 
such as version information or document variables) 
to the non-relational database. For semi-structured 
data, we regard it as unstructured data but extract 
more attributes and most of contents from it. 
The data resource center also has a lot of 
component libraries, internal functions and 
interfaces that are used in other programs. It 
provides following internal functions: 
Data Query: to query data from data resource center, 
by keyword or certain conditions. 
Data Download: to download data that is matched 
the conditions. 
Data Upload / Save: to support users or programs 
upload or save data to the data resource center. 
Data Retrieve/Import: to retrieve data from other 
data sources, especially from internet or other file 
systems. It also supports importing data from an 
external relational database or a record file. 
Statistical Analytics: to provide some basic 
statistical algorithms, such as summing, counting, 
maximum / minimum selecting, etc. 
Data Integration: to integrate special sets of data (in 
most cases they are retrieved from different data 
sources), by performs data cleaning, extracting, 
translating, loading and other necessary steps. 
Data Mining: to provide advanced data analysing 
and data mining algorithms libraries. It can perform 
more complex statistical analysing and common data 
mining algorithms, including pattern matching, 
clustering, classification, outlier detection, etc. Other 
programs invoke these algorithms and reach their 
data mining targets. 
Build-in Components: include most of program 
libraries and components, parametric models, script 
interpreters, etc. Parametric model libraries provide 
some sophisticated parametric data processing 
models and business models. Programmer only 
needs to determine the parameter values, and then 
the model will give a good result.  
Interfaces: include program interfaces, web service 
interfaces and cloud management interfaces. 
Program interfaces support to load, run, suspend or 
terminate a program. Web service interfaces support 
a program to communicate with other services or 
programs. Cloud management interfaces support the 
most internal managing operations of FIND, such as 
job dispatching, task scheduling, message queuing, 
intermediate data maintaining, etc. 
3.3  Data Service Portal 
The data service portal deploys a lot of software and 
provides public data services to end users. Users 
could use the software without any customization. 
Generally depending on the software execution 
policy, any user permits to execute software, or only 
an authorized user allows executing. Software 
implements the basic and common functions that 
users want to use, for example, market replaying, 
searching, data mining or information publishing. 
FIND also provides some business models, for 
FIND-ADataCloudPlatformforFinancialDataServices
119