When estimating at the service rate process the sit-
uation gets more complicated. One possible cause
of complications occur if the start and finish times
of chat dialogues are not recorded. In our data-sets
only the number of initiated chats per agent and in-
terval is available. Since an agent can serve several
customers in parallel we make the assumption that
the service per customer is a non-increasing func-
tion in the number of currently served customers. In
(Bekker et al., 2004) the authors explore varying ser-
vice levels and in (Bekker et al., 2011) adapting ser-
vice rates are investigated. In computer systems pro-
cessor sharing is a common phenomena, see (Cohen,
1979). Our model is inspired by both the previous
situations, where an agent has capacity to perform si-
multaneous tasks but at varying rates. A further com-
plication is due to data often only being available on
an aggregated level. Thus it is not possible to dis-
cern the actual (pointwise) workload distribution for
the interval. We suggest a missing data approach,
via the expectation maximation algorithm and Gibbs
sampling, to handle this problem.
There might also exist general information about
the system, such as how likely it is that there are cus-
tomers waiting in the queue and the arrival rate from
a previous estimation. One might also include data
from other chat systems and assume that there are
similarities. Hence we propose to model service rate
per customer as a continuous non-increasing func-
tion, depending on the state of the chat system and
the specific agent. Such a function can provide an-
swers about the maximum allowed chats in parallel to
fullfill some quality of service goal, like maximizing
throughput through the system or to support staffing
decisions.
In Section 2, data is discussed and the data-sets are
presented. In Section 3, the proposed queueing-based
state-space model is introduced and parametrized.
The parameters to be estimated are also stated. In
Section 4, the estimation models for the arrival pro-
cess is explained and the hypothesis testing is show
for specific data-sets. Also the missing data approach
for estimation of the service rates is presented.
2 DATA CHARACERISTICS
What can be achieved in terms of reliable estimates, in
a contact center environment, is highly dependent on
the amount and quality of the available data. There-
fore, it makes sense to categorize data in terms of
quality. We identify three major aspects that deter-
mine the overall quality and three subsets that are im-
portant for estimations in queueing systems, namely:
1. Number of data records, 2. The level of detail,
3. Relevant data-sets. The data-sets can be split into
general system, agent specific and customer specific
data.
The number of data records is an important factor
in determining the level of accuracy of estimates. The
level of detail determines how easily one can perform
estimations. Furthermore, in the context of queue-
ing systems, it is meaningful to differentiate between
three types of data-subsets. The first set concerns
data on a system level, such as offered load per inter-
val. The second subset pertains to agent specific data,
data like agent-id and number of initiated chat dia-
logues per interval. The third subset of data records
contain information on individual customers, such as
customer-id, arrival time to the system and waiting
time in queue.
In cases where there are few data records, low de-
tail level or when not all three subcategories are avail-
able leads to uncertainty in the estimations. This type
of uncertainty has to be managed, which motivates
why we need methods to provide reliable estimates in
the face of poor data quality.
The given data-sets, on which this paper is based,
come in two subsets, where the first subset contain
general queue data and the second contain agent spe-
cific data. Thus customer specific data is missing
in all cases. The data deemed useful in the context
of this paper is presented, while other data posts not
deemed to influence the procedings is supressed.
After discussing the matter with responsible data
base administrators it is found that the data is not
completely machine generated and thus may contain
errors due to human factors. This type of problem
requires serious attention but for the purposes of this
text it is ignored apart from some pre-processing with
respect to outliers and records with low information
content.
2.1 General Queue Related Data
In the first type of data subset the important data posts
are the ones representing date, intraday intervals and
offered load. The data is given per date and per in-
terval, thus we introduce d ∈ D = {1, ... ,D} index-
ing the days, i ∈ I = {1, ... ,I} indexing the intraday
intervals and w
d
∈ {1, . . . ,W } index the day of the
week, where W = 7. Let N
d,i
∈ N represent the num-
ber of arrivals on day d in interval i, i.e., offered load.
The notation was inspired by (Gans et al., 2009).
Chat Based Contact Center Modeling - System Modeling, Parameter Estimation and Missing Data Sampling
465