reaches elevated positive rewards by selecting
feasible storage positions that are optimal in terms of
product synergy, turnover frequency, and the
expected distance to the warehouse exit.
To validate the performance of the LinUCB agent
the learned storage allocation policy is compared to
that of a conventional ABC-analysis-based allocation
strategy. To this end, based on their normalized
turnover frequency, all product types are categorized
as A-, B-, or C-type products. If a product has a
normalized turnover frequency higher than 80 %, the
product is of A-type; if it is between 80 % and 20 %,
the product is of B-type; all other products are C-type.
Additionally, each storage position in the warehouse
is categorized by industry experts as either A-, B-, or
C-type. The ABC-analysis-based allocation strategy
then positions A-type products in A-type spatial
clusters, B-type products in B-type spatial clusters,
and so on. If there are more spatial clusters belonging
to the same type, priority is given to the cluster that
has the smallest expected distance to the warehouse
exit.
In the following, the decision-making of the
LinUCB agent is evaluated using two storage
allocation examples. In both cases, there is about 30
% remaining capacity of the warehouse.
Figure 4: Results of the learning process of the LinUCB
agent over 2.25 𝑥 10
iterations.
3.2.3 An Exemplary Storage Allocation for
an a-Type Product
First, we consider the storage allocation suggestion of
the LinUCB agent for an incoming product of product
type “K20". This product has a normalized turnover
frequency of 0.985 (A-type) and thus constitutes one
of the most frequently ordered product types. The
action valuation of the LinUCB agent for a subset of
the 50 spatial clusters is shown in Figure 5. In the first
column, the normalized expected distance of each
cluster is shown.
Figure 5: Example of the action valuation of the LinUCB
Agent for the A-type product “K20”.
Moreover, information about the remaining
capacity of each cluster is provided in the second
column. Information about the product synergy is
given in the remaining columns. Specifically, each
column gives the percentage of synergy class
products, ordered from synergy class 0 to synergy
class 3 from left to right, that are located in each
cluster.
In this particular example, the agent indicates
cluster 18 as the best allocation target. This decision
is reasonable, given the low expected distance to the
warehouse exit, the sufficient feasibility, and the high
product synergy of “K20” with products already
stored in this spatial cluster, as indicated by the high
product synergy scores.
As the second-best choice, the agent indicates
cluster 10, which has the same expected distance to
the warehouse exit as cluster 18. Note that cluster 10
has a considerably higher product synergy than
cluster 18. However, its remaining storage capacity is
smaller than that of cluster 18, although it would
suffice to store product “K20”. This raises the
question why the LinUCB agent still prefers cluster
18 over cluster 10? The reason for this is that the
agent experienced during training that the selection of
a cluster with zero capacity leads to a negative
reward. It thus learned a linear dependence between
the received reward and the remaining storage
capacity of the selected cluster. Therefore, the
LinUCB agent associates clusters having low
remaining storage capacity with a higher probability
to receive a negative reward.
Cluster 25 constitutes the lowest valued option.
This is once again reasonable, considering that this
cluster has neither sufficient remaining storage
capacity nor existing product synergies.
The conventional ABC-analysis-based allocation
strategy suggests to store product “K20” in cluster 10.
The second and third best choices are clusters 18 and
49. Note that the LinUCB agent considers cluster 49
as the fifth best choice. These results indicate that the
suggestions of the LinUCB agent match those of the