http://fuzzpault.com/memcache.
As an example, the set command would have a la-
tency of l
2
under SDC, while using SDS a set would
be
1
r
× l
local
+
r−1
r
× l
3
because some portion of the
keys would be local. SDR would take l
3
because mul-
tiple sets can be issued simultaneously. Snoop would
need l
local
+ l
3
with the data being sent to the local
and messages sent to all other racks.
Dir requires a round trip to the directory to learn
about other racks which may already contain the same
key. Thus the formula for a set using Dir is more
complex: 2 × l
2
+ ps × l
local
+ (1 − ps)l
3
+ l
local
Using a set of variables, here from a run of Medi-
aWiki(mediawiki, 2011), we can vary parameters to
gain an understanding of the performance space un-
der different environments.
We first look at how network switch speed can ef-
fect performance. Remember we assumed the num-
ber of devices linearly relates to network latency, so
we vary the single device speed between 12.7µs and
1.85ms, with an additional 4.4ms OS delay, in the Fig-
ure 4 plots (at end of document). Latency measures
round trip time, so our X axis varies from 0.025ms to
3.7ms. Three plots are shown with ps values of 10%,
50%, and 90% with weightings derived from our Me-
diaWiki profile.
As seen in Figure 4, ps compresses the plots ver-
tically, showing improved performance for location-
aware schemes using higher ps values. When ps=0.9
and a switch latency of 0.3ms SDS and Snoop are
equivalent, with Snoop preforming better as switch
latency increases further.
Next we take a closer look at how ps changes re-
sponse time in Figure 5 using a fixed switch latency
of 1.0ms and our MediaWiki usage profile.
Predictably all 3 location-averse schemes (SDC,
SDS, and SDR) exhibit no change in performance
as ps increases. As ps increases Snoop and Dir im-
prove with Snoop eventually overtaking SDC when
ps=0.86.
So far we’ve analyzed performance using Medi-
aWiki’s usage profile. Now we look at the more gen-
eral case where we split the 20 possible commands
into two types: read and write, where read consists of
a get request hit or miss, and write is any command
which changes data. MediaWiki had 51% reads when
fully caching, or about one read per write. Figure 6
varies the read/write ratio while looking at three ps
values.
With high read/write ratios Snoop is able to out-
perform SDC, here when switch=1.0ms at rw = 0.75.
These plots show when ps is near one and slow
switches are used, Snoop is able to outperform all
other configurations. In some situations, like session
storage (ps = 1) across a large or heavily loaded dat-
acenter, Snoop may make larger gains. From an esti-
mated latency standpoint Dir does not preform well,
though as we’ll see in the next section its low network
usage can overcome this.
6 EXPERIMENTAL RESULTS
To validate our model and performance estimation
formula, we implemented our alternate Memcache
schemes and ran a real-world web application, Medi-
aWiki(mediawiki, 2011), with real hardware and sim-
ulated user traffic. Three MediaWiki configurations
were used:
1. Full - All caching options were enabled and set to
use Memcache.
2. Limited - Message and Parser caches were dis-
abled, with all other caches using Memcache.
3. Session - Only session data was stored in Mem-
cache.
The simulated traffic consisted of 100 users regis-
tering for an account, creating 20 pages each with text
and links to other pages, browsing 20 random pages
on the site, and finally logging out. Traffic was gen-
erated with jMeter 2.5 generating 9600 pages per run.
The page request rate was tuned to stress Memcache
the most, keeping all webservers busy, resulting in
less-than optimal average page generation times. A
run consisted of a specific MediaWiki configuration
with a Memcache configuration.
The mock datacenter serving the content consisted
of 23 Dell Poweredge 350 servers running CentOS
5.3, Apache 2.2.3 with PHP 5.3, APC 3.1, PECL
Memcache 3.0, 800MHz processors, 1GB RAM, par-
titioned into 4 racks of 5 servers each. The remaining
3 servers were used for running the HAProxy load
balancer, acting as a central Memcache server, and a
MySQL server respectively. Four servers in each rack
produced web pages, with the remaining acting as the
rack’s Memcache server.
To measure Memcache network traffic accurately
the secondary network card in each server was placed
in separate subnet for Memcache traffic only. This
subnet was joined by one FastEthernet switch per
rack, with each rack connected to a managed FastEth-
ernet (10/100 Mb/s) central switch. Thus, we could
measure intra-rack Memcache traffic using SNMP
isolated from all other traffic. To explore how our
configurations behaved under a more utilized network
we reran all experiments with the central switch set to
Ethernet (10 Mb/s) speed for Memcache traffic.
EXPLORINGNON-TYPICALMEMCACHEARCHITECTURESFORDECREASEDLATENCYANDDISTRIBUTED
NETWORKUSAGE
41