Deep Reinforcement Agent for

Efﬁcient Instant Search

Ravneet Singh Arora

, Sreejith Menon

, Ayush Jain

and Nehil Jain

Bloomberg, U.S.A.

Google, U.S.A.

Keywords:

Instant Search, Deep Learning, Reinforcement Learning, Information Retrieval, Search.

Abstract:

Instant Search is a paradigm where a search system retrieves answers on the ﬂy while typing. The na

ıve

implementation of an Instant Search system would hit the search back-end for results each time a user types

a key, imposing a very high load on the underlying search system. In this paper, we propose to address the

load issue by identifying tokens that are semantically more salient toward retrieving relevant documents and

utilizing this knowledge to trigger an instant search selectively. We train a reinforcement agent that interacts

directly with the search engine and learns to predict the word’s importance in relation to the search engine.

Our proposed method treats the search system as a black box and is more universally applicable to diverse

architectures. To further support our work, a novel evaluation framework is presented to study the trade-off

between the number of triggered searches and the system’s performance. We utilize the framework to evaluate

and compare the proposed reinforcement method with other baselines. Experimental results demonstrate the

efﬁcacy of the proposed method in achieving a superior trade-off.

1 INTRODUCTION

Interactivity in search engines has substantially grown

in popularity in recent years. To further enrich the

user experience, most modern search engines such as

Google and Bing provide instant search capabilities

(Venkataraman et al., 2016b). Instant search retrieves

results on the ﬂy at every keystroke compared to con-

ventional search engines that trigger search at the end

of the query. Analyses of query-logs performed by

(Cetindil et al., 2012) have shown that the instant

search improves user experience by reducing the over-

all time and effort to retrieve the relevant results and

helps users ﬁnd information when they are not sure of

the exact query. This feature is very relevant to mo-

bile applications. Recently, these systems have also

become extremely popular in Social Networking web-

sites such as Linkedin (Venkataraman et al., 2016a).

Instant Answers is another variation of this paradigm,

which is very common in search engines these days.

Instant answers allows users to view answers instantly

while typing questions such as “how is weather to-

day?” This feature is also handy in open-domain

question answering where user needs are ambiguous.

The implementation of instant search systems

faces a signiﬁcant challenge in the form of immense

load on the back-end search engine. The instant

search leads to an increase of tens or up to hun-

dreds of more queries for a single search session.

This becomes more severe in the case of longer nat-

ural language queries. Managing such load becomes

problematic for several reasons: the software or the

hardware might not be able to cope with high query

throughput during spikes of requests, or it might cause

high energy consumption by the servers, or just con-

sume computational resources needed by other pro-

cesses like indexing.

Several approaches have been proposed to im-

prove the performance and scalability of instant

search. Many of these studies are based on design-

ing more efﬁcient index data structures for faster re-

trieval of results (Bast and Weber, 2006; Fafalios and

Tzitzikas, 2011; Li et al., 2012; Ji et al., 2009; Li

et al., 2011; Wang et al., 2010). These data struc-

tures are examined together with the techniques such

as caching (Fafalios et al., 2012) for their ability to

improve the search engine query throughput. Caching

has been further extensively applied for large-scale

traditional search systems in various studies such as

(Markatos, 2001; Saraiva et al., 2001; Dean, 2009;

Gan and Suel, 2009; Fagni et al., 2006; Long and

Suel, 2006). New index data-structures and ﬁle sys-

Arora, R., Menon, S., Jain, A. and Jain, N.

Deep Reinforcement Agent for Efﬁcient Instant Search.

DOI: 10.5220/0012178300003598

In Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2023) - Volume 1: KDIR, pages 281-288

ISBN: 978-989-758-671-2; ISSN: 2184-3228

281

Figure 1: Behaviors of Traditional, Instant and proposed Instant search system for the query Flight service from New York

to Barcelona Spain. Searches are triggered at tokens marked green. The whole preﬁx is forwarded to the search engine as a

query.

tem formats for boosting the overall speed of search

engines have also been explored (Brin and Page,

1998; Dean, 2009).

In this paper, we propose a new method to solve

the instant search paradigm’s scalability challenges.

Our approach stems from the idea that a subset of to-

kens heavily inﬂuences the retrieval of the most rele-

vant results. This subset generally includes keywords

that are either topical or tokens that can alter the se-

mantic meaning of the query. We have applied this

idea towards training a reinforcement agent that pre-

dicts if a typed token is salient and selectively trig-

gers search only for such tokens. This is illustrated

in Figure 1. Searches are triggered at tokens marked

green. A traditional search system would wait till the

last token before issuing the search; an instant search

system, on the other hand, queries at every new to-

ken. Our proposed approach, in addition to common

stopwords, decides to skip the search at word New as

it is very common and needs more context (York in

this case) to retrieve the correct answer. Also, since

there is only one Barcelona city present in Spain, the

word Spain does not inﬂuence the results returned

and hence is skipped. The new approach treats the

underlying retrieval engine as a black box and is de-

coupled from the internal implementation. During the

training, the agent updates weights based on the feed-

back received during its interaction with the search

system. This methodology has the following advan-

tages: a) More universal application to a diverse set

of modern architectures; b) No need to scale up in-

dividual components of complicated search and QA

pipelines such as (Yang et al., 2019); c) Easy integra-

tion with the existing techniques such as caching.

Reinforcement learning provides the framework

to integrate and experiment with different reward

functions. Furthermore, there can be a lot of differ-

ent states based on the decision taken by the algo-

rithm and it is not easy to calculate exact true labels

for a pure supervised setting. Recently, reinforcement

learning has been successfully applied to an iden-

tical problem in the ﬁeld of Simultaneous Machine

Translation (SMT) (Grissom II et al., 2014; Satija and

Pineau, 2016; Gu et al., 2016). SMT is deﬁned as the

task of producing a real-time translation of a sentence

while typing. The goal here is to achieve a good trade-

off between the quality and delay of the translation.

We further evaluate the loss in the quality of in-

stant search due to introducing the proposed rein-

forcement agent. Instant search quality is measured

based on the studies that have compared the instant

search system with a traditional one (Cetindil et al.,

2012; Chandar et al., 2019). Instant search query logs

have been analyzed by (Cetindil et al., 2012) to un-

derstand the properties of instant search that lead to

a better user experience. Recently (Chandar et al.,

2019) combined user-query interaction logs with user

interviews and proposed new metrics that can eval-

uate user satisfaction for an instant search system.

Both the studies have proposed results-quality and

user-effort as the two primary metrics to measure

user experience improvement. Quality measures how

relevant the search system results are to the user

query, whereas Effort captures how quickly the rel-

evant results are retrievable using a search engine.

We use these metrics to estimate how well the pro-

posed methods can reduce the overall system load

while preserving the performance. Experiments are

performed on three different combinations of datasets

with two retrieval systems. Our experiments show

that the proposed model achieves a superior trade-off

by achieving near-optimal performance while reduc-

ing the number of triggered searches by 50%.

2 BASELINES

This section introduces the baselines that are evalu-

ated and compared with the proposed model.

Search at Every Token: SET issues search for every

new token. This method represents the true instant

search paradigm.

Search at Last Token: SLT waits for the entire query

and triggers a single search request at the end. This

baseline mimics the behavior of a regular retrieval en-

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

282

gine.

Skip Stop-Words: SS simply issues a search at every

token except the stop-words.

Similarity Matching Pre-Trained Model: SM pre-

trained model issues a query only when the query’s

semantic meaning has changed by more than a cer-

tain threshold. We utilize the pre-trained Universal

Sentence Encoder model (Cer et al., 2018) to gener-

ate an embedding for the query at every new token and

compare the similarity with the embedding of the pre-

viously searched sub-query. We use CosineDistance

between sentence embedding vector pairs to measure

the similarity. A sentence pair S

, S

is considered

to be semantically different if CosineDistance(S

, S

)

≥ threshold (Gomaa et al., 2013). We treat the

threshold as a hyper-parameter, and the actual value

is later stated in Section 4. Algorithm 1 describes this

approach in more detail.

Q ← Query ;

N ← Number of tokens in query Q ;

D ← Set of Retrieved Documents ;

searched

← Sequence of tokens previously

searched ;

searched

← Embedding Vector of q

searched

;

current

← Current sequence of tokens ;

current

← Embedding Vector of q

current

;

for i ← 1 to N do

current

← Q[1, i];

current

← GetEmbedding(q

current

);

if CosineDistance (V

searched

current

) ≥ threshold then

searched

← q

current

;

searched

← V

current

;

D ← RetrieveDocuments(q

current

);

end

Algorithm 1: Inference using Similarity Matching pre-

trained model Method.

3 REINFORCEMENT AGENT

Deep Q Networks: In Q-learning (Watkins and

Dayan, 1992), the environment is formulated as a se-

quence of state transitions (s

, a

, r

, s

t+1

) of a Markov

Decision Process (MDP). At a given time-step t for

state s

, the agent takes an action a

and in response re-

ceives the reward r

. As a result, the environment tran-

sitions into state s

t+1

. The agent chooses action a

for

the state s

by referring a state-action value function

Q(s

, a

), which measures the action’s expected long-

term reward. The algorithm updates the Q-function

by interacting with the environment and obtaining re-

wards. In large environments, it is impractical to

maintain a Q function for a substantially large num-

ber of states. DQN (Mnih et al., 2013) solves this

problem by approximating Q(s, a) using a deep neu-

ral network, which takes state s as input and calculates

value for every state/action pair.

Environment: The environment yields new words

for the agent and also interacts with the underlying

search engine. For a given query, the agent receives a

new word x

from the environment at every time-step t

and, in response, takes action a

. Based on the action,

the environment requests the underlying retrieval en-

gine, and the agent is provided feedback in the form

of reward r

. An episode terminates at the last token

of the query.

State: The state represents the portion of the query

that is already observed by the Agent. For a given

query q, let us assume that the agent has received to-

kens x

, ··· , x

denoted by partial query q

. The envi-

ronment maintains two sequences of tokens for every

• q

: the list of tokens x

, ··· , x

used in the last

search query submitted to the system.

• q

: the list of tokens x

··· , x

the system has

seen since it last submitted a search query.

This state formulation allows the agent to learn

the overall importance of q

conditioned on already

searched sequence q

. At every time-step t, the agent

receives a new token x

which is then appended to

the unseached sub-sequence q

: q

= q

∪ x

. After

a search is triggered, q

is appended to the searched

sub-sequence q

and q

is cleared back to empty.

Actions: For every new token x

, the agent chooses

one of the following actions:

• WAIT: Instant search is not triggered, and the

agent waits for the next token.

• SEARCH: Typed query q

is issued to the under-

lying search system, and new results are retrieved.

SEARCH action results in following state transi-

tion: q

= q

∪ q

; q

Reward: During training, at every time-step t, the

agent receives reward r

based on (s

, a

). The re-

ward function is designed to encourage the agent to

improve the search result’s quality while keeping the

number of searches issued to the underlying retrieval

engine low. The agent receives a positive reward if

a SEARCH (S) action leads to an improvement in

Mean Average Precision (MAP) by more than a ﬁxed

threshold R

. Otherwise, a constant penalty of -1

is imposed. The positive reward is directly propor-

tional to the improvement in map: ∆

MAP

. We treat

the threshold R

as a hyper-parameter and the actual

value is later stated in Section 4. Since the WAIT (W)

Deep Reinforcement Agent for Efﬁcient Instant Search

283

Figure 2: The Bi-LSTM Siamese DQN network for calcu-

lating rewards for both WAIT and SEARCH actions. The

numbers represent the dimension of outputs generated by

each layer.

action does not affect the Quality and Total Searches,

the reward is set as 0. The following equation sum-

marizes the reward function:

R =











0, action = W

1 + ∆ MAP, action = S and ∆

MAP

≥ R

-1, action = S and ∆

MAP

< R

Bi-LSTM Deep Q Network: This section describes

the base network architecture, as shown in Figure 2

that calculates rewards for a given state. Input to

the model is the state, formulated as a pair of sub-

queries (q

, q

). Input tokens for each sub-query

are represented using pre-trained GloVe (Penning-

ton et al., 2014) word-embeddings, that are then

passed to a Bi-Directional Long Short-Term Mem-

ory (LSTM) (Hochreiter and Schmidhuber, 1997)

Siamese Encoder. Since both the sequences have

originated from the same query, it is intuitive to ap-

ply a Siamese Network that allows the sharing of Bi-

LSTM weights. The output vectors for both the sub-

queries are concatenated, and the ﬁnal single feature

vector is fed to a fully-connected layer that generates

a two-dimensional vector representing the rewards for

both the actions. The whole network can be summa-

rized using the below equation:

= f

Bi-LST M

);u

= f

Bi-LST M

)

= relu(W

·u

+ b

);v

= relu(W

·u

+ b

)

H = [v

⊕ v

];R = W

·H + b

Inference: For every typed token during an instant

search session, a state is prepared as a pair of sub-

queries: preﬁx of the already searched query and suf-

ﬁx that still needs to be searched. The state is passed

as an input to the trained model described in Figure 2.

The base model generates rewards for both WAIT and

SEARCH actions. The agent picks the action with the

best reward, and based on that, the search to the un-

derlying system is either skipped or triggered using

the query entered so far. The state for the agent is

updated accordingly, and the agent waits for the next

token. An episode terminates at the end of the query

session.

4 METRICS AND

EXPERIMENTAL SETUP

Metrics: We utilize the following metrics to evaluate

the performance of the proposed methods.

Average Number of Triggered Searches (TS) - System

Load: This metric represents the load on the search

system and is measured as the average number of re-

quests made to the search system during an instant

search session.

Average Effort: Studies (Cetindil et al., 2012; Chan-

dar et al., 2019) have found the Effort to be a very

crucial factor that differentiates an instant search user-

experience from a traditional search system. Effort

is deﬁned as the minimum number of tokens that a

user would have to type to retrieve the best possible

ranking of results. Ranking quality is measured using

Mean Average Precision (MAP) and the best ranking

achieves the maximum MAP. Let N

be the number of

tokens in a given query q. n

is the minimum number

of tokens needed to retrieve the best possible ranking

for query q. Metric e f f ort is the average effort across

all queries in the dataset and is computed as follows:

e f f ort =

∑

≤ N

∀q ∈ Q

Quality: We use MAP to capture the quality of the

results. MAP is calculated using the open source

PyTREC-Eval (Van Gysel and de Rijke, 2018) library.

Evaluation Procedure: To measure the TS vs. Ef-

fort trade-off, we simulate an action function in a real

instant search session for every query and keep track

of both the metrics. The action function returns an

action(WAIT and SEARCH) at every new token based

on the decision taken by the method being evaluated.

For instance, the SET(subsection 2) method would re-

turn SEARCH for every token in the query. TS is in-

cremented, and results get updated at every search.

For every query, we invoke the action function un-

til the retrieval has achieved the best possible MAP or

has reached the last token. The total number of tokens

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

284

used to achieve the best MAP is added to the Effort at

the end of the query session. For Quality, we keep

track of the MAP achieved at every token position for

all the queries.

Datasets: We have evaluated the methods on three IR

datasets: MS Marco passage ranking(Nguyen et al.,

2016), Wiki IR(Frej et al., 2019) 59k version and

InsuranceQA (Feng et al., 2015). InsuranceQA is

adapted to a pure Document Retrieval task using

(Tran and Niedere

ee, 2018). InsuranceQA is used in

order to test how well methods generalize to differ-

ent domains. To ensure that the underlying search

engine can retrieve relevant documents in top 1000

for enough queries, we have reduced the total number

of documents to 400k and 500k for MS Marco and

Wiki IR, respectively by random sampling. For In-

suranceQA, we use the full set of 27,413 documents.

The evaluation sets of size 1000 queries are kept un-

seen for all three datasets.

Retrievers: We conduct experiments using both the

BM25(Robertson and Zaragoza, 2009) and semantic-

based matching retrieval systems. For semantic re-

trieval, we use a transformer-based pre-trained sen-

tence encoding model known as Universal Sentence

Encoder (USE) (Cer et al., 2018) for representing the

queries and documents with embeddings and further

use cosine similarity to rank results.

Hyper-Parameters: For SM, we set a threshold of

0.1. For the proposed DQN agent, we trained the

model with following settings: future reward γ =

0.05, ε = 1, ε

decay

= 0.995, learning rate α = 0.01 and

min

= 0.7. Furthermore, weights are learned using

Adam optimizer(Kingma and Ba, 2014) with a batch

size of 32. Reward threshold R

mentioned in the re-

ward section for determining the action is set to 0 for

MS Marco and Wiki IR and 0.0001 for Insurance QA.

5 RESULTS

TS vs. Effort: Table 1 shows the drop in Average

Number of Triggered Searches achieved by different

methods and compares it with extra Average Effort

introduced in the system. The top two rows high-

light the absolute values achieved by two basic search

systems: SLT, which mimics a traditional search en-

gine, and SET, representing a true instant search sys-

tem. These systems set the upper and lower bounds

respectively on Effort and TS. The bottom three rows

list down the percentage change in the metrics intro-

duced by the proposed methods with respect to a true

instant search system(SET).

Skip Stop-words method manages to achieve op-

timum Effort. This can be attributed to the fact that

generally, stop-words are not deemed salient in com-

mon language usage, allowing SS not to miss a search

for any salient words. On the other hand, since SS in-

ﬂuences only a limited and ﬁxed set of tokens, the

achieved TS is not up to the optimal. Also, the overall

performance of SM is quite comparable to SS. Results

also show that that the pre-trained model is unable to

transfer its knowledge to this new task.

While all the methods are able to retain the Ef-

fort within 5% of the SET, the proposed DQN method

manages to reduce the overall TS on average by more

than 20% across all the datasets compared to other

baselines. Compared to a pure instant search system,

DQN reduces the overall load by more than 50%. The

performance of DQN agent is directly proportional to

the training size of the dataset and hence is highest for

MS Marco.

Impact on Quality: We have captured the loss of

quality in results at every token position by plotting

the average MAP over all the queries at every token

position for the proposed DQN method and further

comparing it with the ideal SET instant search. Fig-

ure 3 plots the average MAP(y-axis) at every time-

step t(x-axis) for both SET and DQN. The plot shows

that the MAP achieved by DQN is very close to that

of SET at all the token positions, and hence the loss

in quality introduced is minimal.

Subjective Analysis: Lastly, we subjectively ana-

lyzed the predictions made on the unseen queries by

the learned model. Figure 4 lists queries with tokens

at which the search triggered is marked green. We

also report the incremental difference in MAP intro-

duced by the triggered search(highlighted as blue) to

capture the search action quality.

For MS Marco, besides stop-words, the agent

waits for the words “cost” , “install” and “purpose”.

MS Marco is a large QA dataset with verbose pas-

sages. It is difﬁcult for a basic BM25 algorithm to re-

trieve a good ranking without additional context early

in the query; thus the model decides to wait. For the

ﬁrst InsuranceQA query, the agent decides to execute

the search for the token “a” as in insurance jargon,

“vest a retirement plan” is a common phrase, and a

semantic model such as Universal Sentence Encoder

does not ignore this as a stop-word. The same is

not true for the second sentence where the search is

skipped for the same token. For WikiIR’s ﬁrst query,

the phrase “chief justice” is often present as a whole

in documents, and any improvement in ranking con-

tributed by the phrase itself is already captured by the

ﬁrst word “chief”. In the second example, the name

“juan carlos” is unique enough to retrieve relevant

documents; therefore, the token “i” is skipped.

Deep Reinforcement Agent for Efﬁcient Instant Search

285

Table 1: Metrics achieved by different methods. Effort and TS metrics are averaged over all the queries. The top two rows

are the absolute values achieved by two base search systems. The bottom three rows list down the % change in the metrics

introduced by methods with respect to a true instant search system.

Statistical signiﬁcance is tested using a two-tailed paired

t-test. We mark signiﬁcant improvements when p < 0.01.

MS Marco - BM25 Wiki IR - BM25 Insurance QA - USE

Methods Effort TS Effort TS Effort TS

SLT (Regular Search) 10.76 1 5.83 1 8.25 1

SET (Instant Search) 8.24 8.24 4.74 4.74 7.70 7.70

Percentage change in metrics with respect to SET(Pure Instant Search)

∆Effort(%) ∆TS(%) ∆Effort(%) ∆TS(%) ∆Effort(%) ∆TS(%)

SS (Baseline) 0 -49.75 0 -22.62 0.59 -39.25

SM (Baseline) 4.00 -45.43 3.24 -26.88 1.50 -40.42

DQN (Proposed) 4.00 -74.15

3.94 -44.88

1.37 -55.47

Figure 3: Average MAP achieved by DQN vs. SET at every token position. X-axis is token index and Y-axis is MAP averaged

over all the queries.

Figure 4: Predictions of DQN Network on unseen queries.

6 CONCLUSION

This paper has introduced a Reinforcement Agent that

relieves the load on the back-end search system in

an instant search paradigm. Proposed agent achieves

the goal by learning word importance based on the

search system behavior and utilizes this knowledge

towards judiciously issuing searches to the underly-

ing retrieval system. We further evaluated the trade-

off between system load and performance. Experi-

ments demonstrate the ability of the proposed agent

to achieve near-optimal trade-off.

REFERENCES

Bast, H. and Weber, I. (2006). Type less, ﬁnd more: fast

autocompletion search with a succinct index. In Pro-

ceedings of the 29th annual international ACM SIGIR

conference on Research and development in informa-

tion retrieval, pages 364–371.

Brin, S. and Page, L. (1998). The anatomy of a large-scale

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

286

hypertextual web search engine.

Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John,

R. S., Constant, N., Guajardo-Cespedes, M., Yuan,

S., Tar, C., et al. (2018). Universal sentence encoder.

arXiv preprint arXiv:1803.11175.

Cetindil, I., Esmaelnezhad, J., Li, C., and Newman, D.

(2012). Analysis of instant search query logs. In

WebDB, pages 7–12. Citeseer.

Chandar, P., Garcia-Gathright, J., Hosey, C., St. Thomas,

B., and Thom, J. (2019). Developing evaluation met-

rics for instant search using mixed methods methods.

In Proceedings of the 42nd International ACM SIGIR

Conference on Research and Development in Informa-

tion Retrieval, pages 925–928.

Dean, J. (2009). Challenges in building large-scale informa-

tion retrieval systems. In Keynote of the 2nd ACM In-

ternational Conference on Web Search and Data Min-

ing (WSDM), volume 10.

Fafalios, P., Kitsos, I., and Tzitzikas, Y. (2012). Scal-

able, ﬂexible and generic instant overview search. In

Proceedings of the 21st International Conference on

World Wide Web, pages 333–336.

Fafalios, P. and Tzitzikas, Y. (2011). Exploiting available

memory and disk for scalable instant overview search.

In International Conference on Web Information Sys-

tems Engineering, pages 101–115. Springer.

Fagni, T., Perego, R., Silvestri, F., and Orlando, S. (2006).

Boosting the performance of web search engines:

Caching and prefetching query results by exploiting

historical usage data. ACM Transactions on Informa-

tion Systems (TOIS), 24(1):51–78.

Feng, M., Xiang, B., Glass, M. R., Wang, L., and Zhou, B.

(2015). Applying deep learning to answer selection:

A study and an open task. In 2015 IEEE Workshop

on Automatic Speech Recognition and Understanding

(ASRU), pages 813–820. IEEE.

Frej, J., Schwab, D., and Chevallet, J.-P. (2019). Wikir:

A python toolkit for building a large-scale wikipedia-

based english information retrieval dataset. arXiv

preprint arXiv:1912.01901.

Gan, Q. and Suel, T. (2009). Improved techniques for re-

sult caching in web search engines. In Proceedings of

the 18th international conference on World wide web,

pages 431–440.

Gomaa, W. H., Fahmy, A. A., et al. (2013). A survey of

text similarity approaches. International Journal of

Computer Applications, 68(13):13–18.

Grissom II, A., He, H., Boyd-Graber, J., Morgan, J., and

Daum

e III, H. (2014). Don’t until the ﬁnal verb

wait: Reinforcement learning for simultaneous ma-

chine translation. In Proceedings of the 2014 Confer-

ence on empirical methods in natural language pro-

cessing (EMNLP), pages 1342–1352.

Gu, J., Neubig, G., Cho, K., and Li, V. O. (2016). Learning

to translate in real-time with neural machine transla-

tion. arXiv preprint arXiv:1610.00388.

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term

memory. Neural computation, 9(8):1735–1780.

Ji, S., Li, G., Li, C., and Feng, J. (2009). Efﬁcient interac-

tive fuzzy keyword search. In Proceedings of the 18th

international conference on World wide web, pages

371–380.

Kingma, D. P. and Ba, J. (2014). Adam: A

method for stochastic optimization. arXiv preprint

arXiv:1412.6980.

Li, G., Ji, S., Li, C., and Feng, J. (2011). Efﬁcient

fuzzy full-text type-ahead search. The VLDB Journal,

20(4):617–640.

Li, G., Wang, J., Li, C., and Feng, J. (2012). Supporting

efﬁcient top-k queries in type-ahead search. In Pro-

ceedings of the 35th international ACM SIGIR con-

ference on Research and development in information

retrieval, pages 355–364.

Long, X. and Suel, T. (2006). Three-level caching for ef-

ﬁcient query processing in large web search engines.

World Wide Web, 9(4):369–395.

Markatos, E. P. (2001). On caching search engine query

results. Computer Communications, 24(2):137–143.

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,

Antonoglou, I., Wierstra, D., and Riedmiller, M.

(2013). Playing atari with deep reinforcement learn-

ing. arXiv preprint arXiv:1312.5602.

Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary,

S., Majumder, R., and Deng, L. (2016). Ms marco:

A human-generated machine reading comprehension

dataset.

Pennington, J., Socher, R., and Manning, C. D. (2014).

Glove: Global vectors for word representation. In

Proceedings of the 2014 conference on empirical

methods in natural language processing (EMNLP),

pages 1532–1543.

Robertson, S. and Zaragoza, H. (2009). The probabilistic

relevance framework: BM25 and beyond. Now Pub-

lishers Inc.

Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W.,

Fonseca, R., and Ribeiro-Neto, B. (2001). Rank-

preserving two-level caching for scalable search en-

gines. In Proceedings of the 24th annual international

ACM SIGIR conference on Research and development

in information retrieval, pages 51–58.

Satija, H. and Pineau, J. (2016). Simultaneous machine

translation using deep reinforcement learning. In

ICML 2016 Workshop on Abstraction in Reinforce-

ment Learning.

Tran, N. K. and Niedere

ee, C. (2018). Multihop attention

networks for question answer matching. In The 41st

International ACM SIGIR Conference on Research

& Development in Information Retrieval, pages 325–

334.

Van Gysel, C. and de Rijke, M. (2018). Pytrec eval: An ex-

tremely fast python interface to trec eval. In The 41st

International ACM SIGIR Conference on Research

& Development in Information Retrieval, pages 873–

876.

Venkataraman, G., Lad, A., Guo, L., and Sinha, S. (2016a).

Fast, lenient and accurate: Building personalized in-

stant search experience at linkedin. In 2016 IEEE In-

ternational Conference on Big Data (Big Data), pages

1502–1511. IEEE.

Deep Reinforcement Agent for Efﬁcient Instant Search

287

Venkataraman, G., Lad, A., Ha-Thuc, V., and Arya, D.

(2016b). Instant search: A hands-on tutorial. In Pro-

ceedings of the 39th International ACM SIGIR con-

ference on Research and Development in Information

Retrieval, pages 1211–1214.

Wang, J., Cetindil, I., Ji, S., Li, C., Xie, X., Li, G., and Feng,

J. (2010). Interactive and fuzzy search: a dynamic way

to explore medline. Bioinformatics, 26(18):2321–

2327.

Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine

learning, 8(3-4):279–292.

Yang, W., Xie, Y., Lin, A., Li, X., Tan, L., Xiong, K.,

Li, M., and Lin, J. (2019). End-to-end open-domain

question answering with bertserini. arXiv preprint

arXiv:1902.01718.

KDIR 2023 - 15th International Conference on Knowledge Discovery and Information Retrieval

288