X
2
- pick-up area - key,
X
3
- (driver, pick-up area) – key combination,
X
4
- car class - key,
X
5
- car number - key,
X
6
- (driver, car number) - key combination.
The following attributes act as indicators (Y)
(possible values are indicated in brackets):
Y
1
- order served (1 or 0),
Y
2
- delivery time (time interval of the car
delivery from the moment the order is received),
Y
3
- passenger refusal (1 or 0),
Y
4
- driver refusal (1 or 0),
Y
5
- the car got into a road accident (1 or 0),
Y
6
- the car was stopped by the police (1 or 0),
Y
7
- negative feedback from the passenger (1
or 0).
The flow receives a completed order <keys,
indicators>. Key values are extracted from it: driver,
pick-up area, car class, car number (X
1
, X
2
, X
4
, X
5
).
Two key combinations are built: (driver, pick-up
area) and (driver, car number) (X
3
, X
6
). The number
n
Xi
is read from each hash table i. The n
Хi
number is
used to update the cells (by offset n
Хi
-1) of all vector
i.j (j = 1..7). In this case, the value of the indicator Y
j
is added (summed up) to the cell. If there is no
corresponding record in the hash table i, then it is
included and the number n
Xi
is assigned to it.
Below are examples of select statements that
conform to specifications (9).
1. Find the average time for a taxi driver arrival in
some area:
SELECT X
3
, X
3
.Y
2
/ X
3
.Y
1
FROM vector 3.1, vector 3.2;
(10)
All records of hash table 3 are scanned (see Figure
5). For each key X
3
= (driver, pick-up area) number
n
X3
is read. This number is used to read the values Y
2
= (delivery time) (from vector 3.2) and Y
1
= (order
served) (from vector 3.1). Division of these values is
performed. This is analogous to grouping by the X
3
composite key.
2. Display the performance indicators of all
drivers involved in a road accident:
SELECT X1.*
FROM vector 1.*
WHERE X1.Y
5
>0;
(11)
All records of hash table 1 are viewed. For each
key X
1
= (driver) number n
X1
is read. This number is
used to read the value of the Y
5
indicator. If it is
greater than 0, then all performance indicators of this
driver are output from vector 1.j, j = 1 ... 7.
Queries are executed when the window is moved.
The indicator values collected over the window time
interval are read. The following algorithm is applied:
1) put T=0, W=W
0
- the initial size of the window
(over time it can be floating),
2) reset all hash tables and vectors "vector i.j",
3) set the size of the current floating window
equal to W=t-T when the element number in the
stream is greater than n,
4) at time t =T+W, activate the program which:
- executes queries, displays current window
results, these values are added to the previous results
to obtain trends,
5) put T=t, W=W
0
, go to step 2 of the algorithm.
Web sockets can be used to access the in-memory
data store (see Data Access Layer). Upon receiving
the "slow down" command from the client (i.e. the
client is overloaded), the window size can be
automatically increased (this will reduce the λ load on
the client). The element number quantity in the
stream shall be controlled as it may become larger
than the size of the vector n (see the previous
algorithm).
Sliding window application is not feasible here.
The maintenance of hash tables and vectors on the
sliding window interval becomes much more
complicated. It would require figuring out each time
what you want to delete in hash tables and vectors
after the next sliding interval.
The vector linearity should certainly be used here.
Vectors can be updated on different servers, and then
combined on the coordinating server at the end.
The volume of vectors that are stored in the node
RAM is small. Suppose n=w⋅d=2
7
⋅2
3
=2
10
(one sketch
size). One vector volume is v1=n⋅4 (bytes) = 4KB.
Let the number of hash tables be 6 (the number of
keys and their combinations), and the number of
indicators is 7. Then the volume of all vectors in the
RAM of the node is V = 4 (KB) · 6 · 7 = 168 KB.
The proposed approach for the Analysis Layer
implementation has the following advantages:
- Select statement (9) provides greater search
capabilities than ordinary sketches (Psaltis et al.,
2017; Cormode et al., 2005; Cormode et al., 2011;
Chen et al., 2017).
- New Y parameter vectors can be included (or
excluded) dynamically.
- Hash tables with new keys or their combinations
(X) can be included (or excluded) dynamically.
- It is possible to build key combinations, which
allows executing select operators on these
combinations.
- Floating window can be used if the number of
different elements in the stream exceeds n. This saves