colony algorithm to solve the problem for research
(Shi, 2010). Zhao Hairu, Chen Ling In order to reduce
logistics costs and improve distribution efficiency, a
multi-objective logistics node location model was
built with the minimum logistics system operating
cost and maximum customer time satisfaction as the
objective function (Zhao, 2016).
Wang Jiaju systematically expounded the
advantages and disadvantages of the center of gravity
method in the location of logistics centers and its
applicable scope, providing a theoretical basis for the
study of the location of logistics centers (Wang,
2008). Wang Feifei and Lin Wen improved the
limitations of the center of gravity method in the
location of logistics center, established a model and
obtained the best location, and finally verified the
correctness of the model through an example (Wang,
2014).
3 K-MEANS ALGORITHM
CLUSTERING ANALYSIS
Clustering is a process of classifying data members
based on similarity characteristics. K-Means
algorithm clustering analysis, also known as K-center
clustering, is a statistical analysis method that sums
the homogeneous continuous variables together
through an iterative process.
Distance is usually taken as the index of
similarity, and the data set
{, 1,2, }
i
Uui n==
is
divided into K classes
{, 1,2, }
j
Ccj K==
, and
each class
j
c
has a clustering center
j
m
. Calculate
the sum of squares of distances from each data point
to the clustering center in all clusters:
2
1
()
ij
K
ij
juc
Dc u m
=∈
=−
(1)
The clustering result is obtained by making
()Dc
minimum through iterative operation.
Because it has the advantages of fast convergence
speed and small computation, it is suitable for
analyzing and processing large sample data, which
can effectively reduce the computation time and
improve the operation efficiency.
For warehousing and logistics, the logistics
network is usually divided into different distribution
areas with each warehouse as the center and the
distribution distance as the radius. K-means
clustering algorithm divides regions according to
distance, and this method can be well applied to the
regional division of warehousing logistics. K in k-
means algorithm is the number of warehousing
logistics distribution regions.
In the k-means algorithm, the number K of
clusters is randomly selected (Xu, 2019). Although
such clustering algorithm is fast and simple, the
randomly selected value of K will affect the
clustering effect, thus leading to the decline of
clustering quality. A good clustering method can
generate clusters with good clustering, the samples
within clusters are very close and the clusters between
clusters are very large. The clustering evaluation
index CH (Calinski, 1974) mainly considers the
similarity degree of samples in the cluster and the
distancing degree of clusters. Between 2 and the
number of samples, the clustering evaluation index
CH is used to evaluate the clustering effect of each K
value, and the optimal cluster number K value is
calculated and evaluated, so as to determine the
number of warehouses.
4 K-MEANS ALGORITHM
CLUSTERING ANALYSIS
WITH DETERMINE K VALUE
Too large or too small cluster number K value will
affect the clustering effect of k-means algorithm, and
the determination of K value becomes the most
important problem to be solved by K-means
clustering algorithm.
4.1 Algorithm Steps
This paper mainly optimizes the first step of K-means
algorithm, and improves this shortcoming by
evaluating the best K value with clustering evaluation
index, so as to achieve a better clustering. The steps
are as follows:
1. Determine K value by clustering evaluation
index;
2. Calculate the distance between each data point
and each cluster center, and determine which cluster
center it belongs to if the distance between each data
point and a cluster center is small;
3. Recalculate the centers of K clusters, and the
arithmetic mean value of the points contained in the
cluster is the cluster center;
4. If the cluster center does not change, go to Step
5; Otherwise, go to Step 2.
5. Output the classification result of the dataset.