Hypervisor based Memory Introspection: Challenges, Problems and

Limitations

Andrei Lutas

, Daniel Ticle

and Octavian Cret

Bitdefender, 1 Cuza Voda Str., City Business Center, Bitdefender, 400107, Cluj-Napoca, Romania

Computer Science Department, Technical University of Cluj-Napoca,

26-28 Gh. Baritiu Str., 400027, Cluj-Napoca, Cluj, Romania

{vlutas, dticle}@bitdefender.com, octavian.cret@cs.utcluj.ro

Keywords:

Hypervisor, Introspection, Challenges, Limitations, Solutions.

Abstract:

Hypervisor-based memory introspection is a well-known topic, in both academia and the industry. It is ac-

cepted that this technique brings great advantages from a security perspective, but it is known, as well, that this

comes at greater implementation complexity and performance penalty. While the most obvious challenges,

such as the semantic gap, have been greatly discussed in the literature, we aim to elaborate on the engineer-

ing and implementation challenges encountered while developing a hypervisor-based memory introspection

solution and to offer theoretical and practical solutions for them.

1 INTRODUCTION

Traditionally, one thinks about security as a piece of

software that runs inside the operating system (OS),

providing services such as ﬁle scanning or application

monitoring. Due to the increasing complexity of mal-

ware and attacks, security solutions had to rely more

on isolation, thus certain components were moved

from the user space into the kernel space, making at-

tacks more challenging. Lately, an increasing number

of attacks rely on complex techniques such as exploits

and privilege escalation that can easily render a secu-

rity solution inert. Thanks to the latest advances in

hardware virtualization, designers can now take ad-

vantage of features such as hardware-enforced isola-

tion and use several extensions in order to provide in-

creased security in a hardware-isolated environment.

Garﬁnkel and Rosennblum ﬁrst proposed the

memory introspection technique in 2003 (Garﬁnkel

and Rosenblum, 2003). It involves moving the se-

curity solution outside the OS, thus isolating it from

possible attacks from within the virtual machine. The

main challenge of this technique, the semantic gap,

was thoroughly discussed in papers such as (Carbone

et al., 2009), (Baliga et al., 2008), (Cozzie et al.,

2008), (Dolan-Gavitt et al., 2009) or (Lin et al., 2011),

but creating an HVI (hypervisor-based memory intro-

spection) solution involves some low-level engineer-

ing challenges that have not been discussed in detail

so far.

In this paper, we aim at detailing on engineer-

ing challenges encountered while developing a real-

time hypervisor-based memory introspection engine.

This is a very serious engineering challenge that re-

quires complex knowledge from many domains: OS,

computer security, low-level programming, etc. We

have personally faced these challenges while develop-

ing U-HIPE (Lutas et al., 2015a): a hypervisor-based

memory introspection engine, capable of protecting

both the kernel space of the OS and the user space

of the applications. The next section contains a brief

introduction in the virtualization ﬁelds. The third sec-

tion details how memory introspection works, while

the fourth section will detail on some of the most im-

portant engineering challenges encountered. The con-

clusions are drawn in section ﬁve.

2 HARDWARE VIRTUALIZATION

A BRIEF OVERVIEW

Hardware-based virtualization has ﬁrst been intro-

duced in 1960, in the experimental IBM M44/44X. In

2005, it was introduced in x86 CPUs, as the SVM (Se-

cured Virtual Machine) extensions on AMD (AMD

Corporation, 2005) and VT-x (Virtualization Exten-

sions) on Intel (Intel Corporation, 2016b). The key

role of virtualization is to allow multiple guest OSs, or

virtual machines (named from here on VMs or guests)

Lutas, A., Ticle, D. and Cret, O.

Hypervisor based Memory Introspection: Challenges, Problems and Limitations.

DOI: 10.5220/0006125802850294

In Proceedings of the 3rd International Conference on Information Systems Security and Privacy (ICISSP 2017), pages 285-294

ISBN: 978-989-758-209-7

285

to run concurrently on a host system. A virtual-

machine monitor, named from here on VMM or hy-

pervisor, controls all these VMs. A hypervisor gen-

erally uses a trap & emulate architecture, where the

CPU generates an event (VM exit) whenever it needs

special handling from the hypervisor. Examples of

VM exits include executing a privileged instruction,

accessing restricted I/O ports or MSRs, an external

interrupt or accessing restricted memory pages. Af-

ter ﬁnishing handling the event, the hypervisor returns

the control to the interrupted guest via a VM entry.

The core structure of any hypervisor is the

VMCS on Intel (Virtual-Machine Control Structure)

or VMCB (Virtual Machine Control Block) on AMD

(named from here on VMCS) which represents a vir-

tual CPU (or VCPU). The VMCS contains all the es-

sential information about the VCPUs: the host state,

the guest state, the guest control area, VM exit and

entry control and VM exit information. This structure

contains the saved state of the guest or hypervisor on

VM entries and VM exits, and control ﬁelds that con-

ﬁgure how the CPU should handle various events and

instructions.

VT-x and SVM were further extended with mem-

ory virtualization capabilities a second level address

translation, that allows the hypervisor to directly con-

ﬁgure a mapping from guest-physical addresses to

host-physical addresses. The second level address

translation (SLAT), named EPT on Intel (Extended

Page Tables) and NPT (Nested Page Tables) on AMD,

has a structure similar to that of the legacy IA page ta-

bles, with entries containing control bits that conﬁg-

ure read, write or execute access. A VM has its own

SLAT structure just like a regular process has its own

page table hierarchy. The SLAT is fully controlled by

the hypervisor, and thus enables it to enforce page-

level access restrictions over the guest-physical mem-

ory, without interfering with the guest page tables.

When SLAT is in use, there are three different types

of addresses on a host system: guest-virtual addresses

(1) are those addresses normally used by programs in-

side the VM; these translate via the legacy IA page ta-

bles into guest-physical addresses (2), which are fur-

ther translated into host-physical addresses (3) using

the SLAT, which are then accessed by the hardware.

Figure 1 shows the memory translation mechanisms

in a virtualized system.

3 VIRTUAL MACHINE

INTROSPECTION

Hypervisor-based VM introspection is a technique of

analysing the state and behaviour of a VM from the

Figure 1: Address translation with SLAT active.

outside (from hypervisor’s level). It involves access-

ing the hardware state (CPU registers) and the phys-

ical memory of the analysed VM. However, this in-

formation is not sufﬁcient one needs to correlate this

low-level data with OS speciﬁc structures and events

in order to gain knowledge about the VM state. This

process is known as bridging the semantic gap, and

several solutions were proposed for it, as mentioned

in section 1. Once meaningful structures have been

identiﬁed inside the guest VM, the hypervisor could

conﬁgure, using the SLAT, restricted access for cer-

tain structures. In general, the process of protecting

memory sits at the heart of hypervisor based intro-

spection, and the challenges that are discussed here

refer mainly to it. In addition, the VM introspection

can be done both on-premise (for instance, memory

analysis of a live VM or a dump for forensics) and in

real-time (where the guest behaviour is continuously

monitored and where performance is critical). In this

paper, we will focus on the second approach.

By using the SLAT, the introspection solution can

enforce restricted access to various areas of the mem-

ory. For example, kernel code-pages could be marked

as being non-writeable, thus preventing rootkits (ad-

vanced kernel malware) (G. Hoglund and J. Butler,

2005) from placing inline hooks inside them. On the

other hand, the introspection solution may enforce no-

execute policy on certain areas such as the stacks or

heaps, thus intercepting any attempt to execute code

from those areas, which are almost always indicative

of an attack. No-read policies can also be used, for ex-

ample, to hide code or data that the introspection so-

lution protects inside the guest (for example, from an

in-guest security solution). The normal ﬂow of events

on a protected VM looks like the one depicted in Fig-

ure 2. There are two main possibilities for handling

each event intercepted by the introspection logic: it

will either be emulated if it is considered legitimate or

blocked, if it is deemed to be malicious. While block-

ing an attack is simple (the faulting instruction can

simply be skipped), legitimate accesses must be emu-

lated. In general, there are two types of such cases:

1. Benign accesses inside protected structures. In

some cases, the introspection logic wishes to al-

low certain components (usually belonging to the

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

286

OS itself) to access the protected structure;

2. If the protected structure is less than 4K (mini-

mum page size), and since the protection works

by restricting access for 4K guest-physical pages

inside the SLAT, there may be cases where the

software accesses other structures located inside

the same page with the protected structure;

Creating a security solution than runs outside the

protected VM is a very complex task, but there are

several reasons why it is worth implementing:

1. Hardware-enforced isolation. A traditional secu-

rity solution runs within the VM, and is suscepti-

ble to attacks: even if it runs in the most privileged

mode inside the OS kernel, an attack may employ

complex techniques such as privilege escalation

(ref, a) in order to gain higher privileges. When

the malware runs at the same privilege level as the

security solution, it can easily bypass or disable it.

2. Ability to monitor CPU-level events. Certain

events cannot be monitored from within the VM.

For example, it is not possible to be notiﬁed when

a hardware register is being modiﬁed. When run-

ning inside a hypervisor, there are multiple CPU-

level events that the introspection logic can inter-

cept in order to provide security; for instance, the

system call registers may be modiﬁed by rootkits

in order to place a system-wide syscall hook.

3. Ability to monitor memory without interfeering

with the OS. It is not possible to monitor memory

accesses inside the VM without heavily interfer-

ing with the OS. When running inside a hypervi-

sor, one can leverage the SLAT in order to place

restrictions on guest-physical pages, beyond the

OS capabilities.

4. Increased usage of virtualized environments.

Cloud service providers heavily rely on virtualiza-

tion, which offers the capability of securing mul-

tiple VMs without having to install a security so-

lution in each one of them.

It is therefore clear that hypervisor-based introspec-

tion is a solution to current security, scalability and

deployability demands. In addition, CPU vendors

Figure 2: Introspection events handling ﬂow.

keep adding new features that can help improve per-

formance for such an application: virtualization ex-

ception, VM functions and probably more are yet to

be revealed (D. Durham, 2014).

4 CHALLENGES AND

SOLUTIONS

As discussed earlier, creating a hypervisor-based

memory introspection solution has signiﬁcant advan-

tages: isolation, greater visibility inside the monitored

system and the possibility to provide CPU and mem-

ory protection that would otherwise be very difﬁcult

to achieve. Aside the obvious challenge of the seman-

tic gap, there are several other low-level engineering

challenges that arise when developing such a solution.

4.1 Accessing the Guest Memory

Challenges. As already detailed in the previous sec-

tion, the SLAT mechanism introduces a new level

of address translation and has to deal with access-

ing guest-virtual and guest-physical memory. In both

cases, the translation of the accessed memory must

be handled by the introspection logic (Lutas et al.,

2015b), in order to ensure that the page is present and

to obtain the ﬁnal host-physical address.

Mapping and translation events can described by

two functions: f for mappings and g for transla-

tions. We ﬁrst deﬁne the function f

hpa

that maps

host-physical memory. This function needs to ﬁnd a

free region inside the hypervisor virtual address space

and add a new entry for it, thus making it accessible.

Translation Lookaside Buffer (TLB) invalidations on

all the physical CPUs would further be required, ei-

ther when mapping or unmapping the desired page.

Once such a function is available, we need a function

gpa

that maps guest-physical memory. In order to

do that, we also need a function g

gpa

that translates

the given guest-physical address into a host-physical

address using the SLAT tables. This involves walk-

ing each level of translation and validating the results

at each step. Finally, in order to map guest-virtual

memory, we need f

gpa

together with a new function,

gva

capable of translating the guest-virtual address

into a guest-physical address, using the guest legacy

IA page tables. If we monitor a guest OS running

in long mode, the legacy IA page tables will be four

levels deep, meaning that mapping one guest-virtual

page v would lead to the following sequence of func-

tion calls:

gva

(v) = f

gpa

gva

(v)) (1)

Hypervisor based Memory Introspection: Challenges, Problems and Limitations

287

Each translation involves more mappings: trans-

lating the guest-virtual address v into a guest-physical

address p involves 4 physical address mappings, one

for each level of translation (page map level 4, page

directory pointer table, page directory and page table)

assuming long-mode paging:

gva

(v) = f

gpa

(pml4) + f

gpa

(pd p)+

gpa

(pd) + f

gpa

(pt) (2)

The f

gpa

function implies two steps as well:

gpa

(p) = f

hpa

gpa

(p)) (3)

The ﬁnal sequence of function calls is com-

plex, involving mapping ﬁve different guest-physical

pages, which in turn translate to mapping ﬁve differ-

ent host-physical pages. Such events may be rare in

some scenarios, but when a high-performance, real-

time, user-mode memory introspection solution is de-

sired, the events may be dense enough to pose per-

formance issues. Figure 3 shows the time spent, on

average, inside memory mapping routines on a 64 bit

Windows 8.1 system, in each event, in a normal usage

scenario. It is worth mentioning that some intervals

include each other: for example, the Map GVA inter-

val includes portions of the Translate GVA and Map

GPA intervals, since mapping a guest-virtual page in-

volve multiple physical mappings.

Solutions. There are several optimization options:

1. Keeping the entire host-physical space mapped in-

side the host virtual space. This eliminates the

need to map host-physical memory, as it would be

already mapped at a predetermined address. This

solution considerably improves the performance,

but it has the drawback of using a signiﬁcant por-

tion of the host virtual address space to map the

guest space. Furthermore, while modern CPUs

use 48 bits of virtual addresses (in long mode),

the width of the physical addresses may exceed

40 bits (it is currently deﬁned to be up to 52 bits),

thus making this approach unscalable.

2. Using caches, as a trade-off between scalability

and keeping pages mapped inside the introspec-

tion memory space. According to our tests, it

is clear that the improvement is signiﬁcant (see

Figure 4). But caches increase the code-base of

the introspection solution and may be complex to

implement, as they need to be ﬂushed on certain

events (for example, translation modiﬁcations).

The caches we implemented simply maintain a

batch of often-accessed pages mapped inside the

introspection solution’s address space, thus avoid-

ing costly translations and memory mappings.

3. Dedicated CPU instructions. This would be the

most efﬁcient solution, bringing native perfor-

mance for guest memory access. Some research

in this direction has already been done (Serebrin

and Haertel, 2008). The CPU could implement

instructions capable of reading guest-physical or

guest-virtual memory. The access would be done

in the context of the current VMCS, thus directly

exposing both the guest CR3 and the SLAT tables

used by that VM. We cannot accurately estimate

the complexity of such a solution, but it should of-

fer native performance, similar to regular memory

accesses (Lutas et al., 2015b).

Figure 3: Percent of time spent in memory-mapping rou-

tines.

Figure 4: Percent of time spent in memory-mapping rou-

tines (with caches).

4.2 Protecting Pageable Memory

Challenges. Protecting memory inside SLAT works

only for guest-physical pages; however, the OS and

the applications use guest-virtual addresses, which

translate to guest-physical addresses. In order to pro-

vide protection for any given guest-virtual page, the

introspection solution has to ﬁrst translate it into a

guest-physical page, by using the in-guest legacy IA

page tables. This strategy is effective as long as the

translation does not change; if it does, the introspec-

tion solution must ﬁnd a way to maintain protection

on the guest-virtual address. There are three cases

that must be handled (Lutas et al., 2015a):

1. Page is removed from memory (swapped-out);

2. Page is brought in memory (swapped-in);

3. Page is moved somewhere else.

Another problem may be maintaining page table

protection among various virtual address spaces: each

process has its own, private virtual address space, rep-

resented by a dedicated page table hierarchy. While

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

288

modern OSs, such as Windows and Linux share the

kernel space amongst all the processes (meaning that

the kernel space is global and identical in every virtual

address space), an attacker may build custom page ta-

ble hierarchies that would lead to malicious transla-

tions. In order to avoid this, introspection logic would

have to intercept CR3 loads and make sure it inter-

cepts the page tables in every existing virtual address

space. This has a signiﬁcant negative impact on the

performance: the number of CR3 loads will increase

linearly with the number of existing processes.

Solutions. If not handled properly, these situations

may lead to undesired effects, such as loosing pro-

tection on the page or protecting an undesired page.

It is imperative to ﬁnd a way to intercept such swap

events. The most obvious way to do this is by inter-

cepting the guest page tables. The implementation of

this method is complex and it leads to signiﬁcant per-

formance penalties, as for every guest-virtual page,

one needs to intercept writes in up to 5 actual guest-

physical pages: one for the actual guest-physical page

that is translated from the protected page and up to

four more writes on each page table that translate the

given guest-virtual address. By intercepting the en-

tire hierarchy of page tables, the introspection logic

ensures that any translation modiﬁcation would be

trapped via an EPT violation, thus allowing the pro-

tection to be adjusted. On each page table write, the

introspection logic needs to decode the written value,

in order to analyse the modiﬁcation type. While the

OS may modify several bits inside the entries (for ex-

ample, the accessed or dirty ﬂags), only three types of

events are of particular interest:

1. The new value has the present bit set, while the

old value has the present bit 0: this is a swap-in

operation, meaning that the guest-virtual page has

just been mapped back into the memory; the intro-

spection logic must add protection on the newly

mapped guest-physical page;

2. The new entry has the present bit 0, and the old

entry has the present bit set; this means that the

page is being swapped out, and the introspec-

tion logic has to unprotect the underlying guest-

physical page;

3. Both the new and the old entry have the present bit

set, but the guest-physical addresses are different;

in this case, the introspection logic must unprotect

the old page and add protection on the new page.

4.3 Accessing Swapped-out Memory

Challenges. Inside the kernel mode, most of the crit-

ical data structures are always present in physical

memory they are non-paged, which means that the

OS will never swap them out. Doing kernel intro-

spection is, therefore, usually very straightforward. In

the case of process memory introspection, it is usually

the reverse only the most accessed pages are commit-

ted and present inside the physical memory, while all

the other pages are swapped out. The introspection

logic may need to access such swapped-out pages in

order to properly identify the process-speciﬁc struc-

tures, but it cannot do so if these pages are not present

inside the physical memory.

Solutions. Here are some ways to solve this issue:

1. Directly access the swapped out data inside the

swap ﬁle. This can be very difﬁcult, since the for-

mat of the swap ﬁle is highly speciﬁc to the OS

and the introspection logic would need access to

the storage device where the swap ﬁle resides;

2. Intercept writes inside the IA page tables entries

that translate the needed page and wait for it to be

swapped in. This has the advantage of simplic-

ity, as it only relies on permission modiﬁcations

inside SLAT, but it does not ensure that the page

will ever be swapped in;

3. Forcefully inject a page-fault exception inside the

guest (Lutas et al., 2015a). When employing the

technique from point 2, one can force the needed

page to be swapped in by injecting a page-fault

inside the guest. This adds complexity to the pre-

vious technique, as the page-fault must be injected

with care; the correct process context must be

loaded and the OS must accept the fault. There

are cases, for example high IRQL (M. Rusinovich

and D. Solomon and A. Ionescu, 2012) on Win-

dows, where a page-fault would cause on OS

crash. While in user-mode, the OS will always

accept page-faults. This method has been already

described in detail in (Lutas et al., 2015a).

4.4 4K Granularity

Challenges. As previously mentioned, one of the

key roles of hypervisor-based memory introspection

is memory monitoring. This can be done by using the

SLAT, as it is fully in the hypervisors control. One

way to provide protection and monitor the guest VM

is by modifying guest-physical pages permissions in-

side the SLAT. For example, one could intercept all

write operations inside a given guest-physical page

by setting the write bit to 0. This way, any instruc-

tion or event that would cause a memory store inside

that guest-physical page would trigger a SLAT viola-

tion. The memory introspection logic can analyse the

event and decide whether it is legitimate or not.

Hypervisor based Memory Introspection: Challenges, Problems and Limitations

289

The problem is that the minimum granularity is

the minimum size of an EPT page (4K). In order to

intercept writes inside a smaller range, one has to han-

dle writes outside the given range, as well. While

there are multiple ways to address this issue, the num-

ber of writes inside the 4K page, but outside the pro-

tected range, might induce a signiﬁcant performance

impact. Depending on the speciﬁc of the introspec-

tion logic, this may severely limit either the function-

ality or the performance of the solution. A simple

example is the protection of small, heap or pool al-

located objects, such as processes or threads. These

structures are usually smaller than a single page, and

even inside these structures, one may not wish to in-

tercept writes on every ﬁeld. A security application

may protect, for example, a ﬁeld that links the struc-

ture into a global list of such objects, or a ﬁeld that

contains security related information, in order to pre-

vent malicious modiﬁcations. In this case, a word size

or a small multiple of a word size interception would

be useful. However, protecting such a small range

inside the page may incur a signiﬁcant performance

penalty, as the same page can contain multiple other

structures allocated in it. Even in the same protected

structure, there may be portions that are written very

often. In many cases, the performance impact of in-

tercepting a small range of any structure inside a page

of memory will be difﬁcult to predict: it may be near

zero if that page does not contain any other allocated

structures or it may be extremely high if volatile read-

write structures are allocated there.

Solutions. Here is a list of possible solutions:

1. Make sure each protected structure is allocated at

a page boundary. This approach has 3 drawbacks:

• The memory allocator must be intercepted;

• Writes inside the structure can still take place;

• Increased memory usage.

This solution can be a compromise between com-

plexity and performance. However, it may not

be very scalable, as forcing many structures to

be allocated into their own page can cause se-

rious memory consumption. For example, 100

128B sized structures would cause a total of al-

most 400KB of space to be wasted. In addition,

simply intercepting the memory allocator may in-

duce additional performance overhead, although

deterministic and not noticeable, according to our

tests (Lutas et al., 2015a).

2. CPU extensions. Ideally, all the unwanted exits

would be eliminated if the CPU would provide a

bit-level protection mechanism. This is probably

extremely difﬁcult to achieve in today’s hardware,

but a granularity of several bytes (Sahita et al.,

2014) (for example, 128 bytes) would be more

than enough to offer signiﬁcant performance gain

for existing solutions and to offer the possibility

to protect new structures. We have used the 128B

(2 x cache line size) granularity and we have mea-

sured the potential performance improvement for

various data structures, on Windows; the results

are illustrated in Figure 5

Figure 5: 128 bytes granularity protection stats.

4.5 Accessed and Dirty Bits

Challenges. Whenever the CPU does a page walk,

it sets the accessed and dirty bits (A/D bits) inside

the page tables. These writes trigger a SLAT vio-

lation if the page tables are marked as non-writable.

Handling these events by the introspection logic can

be done either by single stepping the instruction that

triggered it, or by emulating the entire page walk.

The A/D bits may not be of a signiﬁcant importance

for the introspection logic: in our particular imple-

mentation, we ignore them entirely. In order to asses

the performance impact induced by the page-walker,

we have simulated a high memory pressure scenario,

where large amounts of memory are allocated, used

and freed frequently. We ran the simulation for 10

minutes, on Windows 8.1 x64, and we plotted the

number and percentage of A/D bits induced EPT vi-

olation in each second, creating two histograms: one

with the absolute number of A/D bit induced faults

and one with the fraction of A/D bit induced faults out

of the total number of EPT violations. The results can

be seen in Figure 6 and Figure 7. In ﬁgure 7, one can

notice that often the number of page-walker caused

faults exceeds 100,000 per second and many times the

totality of the EPT violations are represented by such

events.

Solutions. Unlike the other challenges discussed

so far, this one is difﬁcult to avoid directly in software.

Here are some of the possible solutions:

1. Always keep A/D bits set. This ensures that once

the CPU sets the A/D bits inside a page table, the

OS will never reset them. This has the disadvan-

tage of being invasive and it may even cause dis-

ruptions to the OS memory manager.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

290

Figure 6: Frequency of A/D bit induced EPT violations in a

10 minutes time frame.

Figure 7: Fraction of A/D bit induced faults in a 10 minutes

time frame.

2. Hardware Solution. Some CPUs (speciﬁcally In-

tel) already provide a dedicated bit inside a VMCS

ﬁeld that indicates when an EPT violation was

caused by the CPU page walker (AMD Corpora-

tion, 2005, Vol. 3, Ch. 27). It may be possible

to avoid these exits altogether, and simply ignore

the page walker whenever it accesses the page ta-

bles. Just like the other hardware extensions, this

is also speculative with regard to the complex-

ity of the possible implementation, but the im-

provement would be signiﬁcant: not only would

the A/D bit exits be removed, but a page-walker

would no longer be needed inside the introspec-

tion logic.

4.6 Instruction Decoding

Challenges. When dealing with SLAT faults, the in-

trospection logic must decide whether the access is

legitimate or not. Many times, in order to do so, the

instruction that caused the fault must be analysed, in

order to obtain some information such as the size of

the access or the new value stored in memory, so an

instruction decoder must be part of the introspection

logic. While simply decoding an instruction in or-

der to determine the operands and access size is not

a very computational intensive task, there are some

special cases and events, due to the CISC character

of the x86 instructions set, that need special handling

and cannot be generically treated:

1. IDT (Interrupt Descriptor Table) accesses as part

of an exception or interrupt delivery. There is

no dedicated ﬁeld inside the VMCS to indicate

this kind of access, and intercepting reads in-

side the page that contains the IDT may lead to

such events. This kind of event could be handled

by checking dedicated VMCS ﬁelds indicative of

event delivery, which means that the fault took

place as a result of such an event;

2. GDT (Global Descriptor Table) or LDT (Local

Descriptor Table) accesses as part of loading a

segment descriptor or setting an accessed bit in-

side a segment descriptor. This kind of events can

be handled by inspecting the faulting instruction -

any instruction that loads a segment, for example,

will also load the underlying descriptor;

3. TSS (Task State Segment) accesses as part of a

task switch or delivery of an interrupt or excep-

tion. Certain branch instructions may lead to a

task switch in certain conditions; this can be in-

ferred from the instruction itself. A more tricky

case is a task switch that takes place as a result of

an exception or interrupt;

4. BTS (Branch Trace Store) and PEBS (Precise

Event Based Sampling) memory stores. These

take place asynchronously and decoding the in-

struction at which boundary the fault took place

will not be helpful. A possible solution is to check

the memory range of the faulted address (which

can be done by inspecting certain MSRs), and see-

ing if it lies within BTS or PEBS region;

5. PT (Processor Trace) stores. These are very sim-

ilar with the BTS/PEBS events, and they may be

detected by checking the faulted address against

the PT memory range.

These events, although peculiar with respect to

SLAT faults, may be expected and properly detected

if the instruction is carefully inspected or if additional

validations are made, but in some cases, it is very dif-

ﬁcult to determine the size of such a memory access.

In the case of IDT, GDT, LDT or TSS, the size of the

access would be obvious (one or two memory words);

in case of a BTS, PEBS or PT store the size may not

be directly obtainable, leading to a possible problem

in handling the fault event. In addition to these events,

there are also instructions that need special handling:

1. XSAVE/XRSTOR instructions. These instruc-

tions operate on multiple register sets and the ac-

cess size cannot be directly determined. Instead,

an introspection logic has to query for enabled

features in XCR0 (Extended Control Register 0)

or IA32 XSS ENABLE. In addition, it has to do

several CPUID queries to determine exactly the

offsets inside de XSAVE area and the size of each

saved component;

2. MPX instructions. BNDLDX and BNDSTX have

the documented side-effect of doing a load or a

store inside the bounds tables. This structure is

somewhat similar to the IA page tables, and it

Hypervisor based Memory Introspection: Challenges, Problems and Limitations

291

is not encoded in the instruction its base can be

extracted from the BNDCFGU (in user-mode) or

BNDCFGS (in kernel-mode) register. It is two

levels deep, so special care must be taken when

checking for bound access, in order to properly

handle both bound directory and bound table ac-

cesses;

3. CET enabled instructions. If Intel Control ﬂow

Enforcement Technology is enabled (Intel Cor-

poration, 2016a), the behaviour of some instruc-

tions changes. The most important modiﬁcation

appears at the level of procedure call and return

instructions: these access a new CPU structure,

called the Shadow Stack. Procedure calls au-

tomatically store the return address on this new

structure, and return instructions load it, in order

to make sure it hasn’t been altered. The shadow

stack is pointed by a new CPU register, called the

SSP (Shadow Stack Pointer), and this capability

can be enabled for both user and supervisor code.

Initial documentation doesn’t indicate whether a

shadow stack access is ﬂagged in the VMCS, so

speciﬁc range checks can be made, in order to see

if the accessed address lies within that range;

4. Scatter-gather instructions. This is a special

class of instructions introduced in the AVX2 and

AVX512F instruction sets that are capable of

accessing multiple separate memory addresses

- these addresses can even reside in separate

pages. For example, the instruction VPSCAT-

TERDD [RAX + ZMM0], ZMM1 can write to up

to 16 different memory locations;

5. String instructions. Instructions such as MOVS

both read and write memory, so special handling

is needed, since they cause both read and write

SLAT faults;

6. Instructions that may trigger another type of VM

exit before doing the memory access. These in-

structions will trigger a VM exit before actual ex-

ecution (before accessing the memory and get-

ting a change to cause a SLAT fault). The in-

trospection logic needs to make sure that on ev-

ery event pertaining the execution of such an in-

struction it will also validate the accessed guest-

virtual address and guest-physical address against

the legacy IA page tables and SLAT permissions.

For example, the guest VM may attempt to exe-

cute XSAVES instruction, which is conﬁgured to

cause a VM exit. The hypervisor may emulate the

instruction without fully validating the SLAT, and

thus it may bypass protections established by the

introspection logic. Instructions that can cause a

speciﬁc VM exit and that can also access memory

include string I/O instructions, VMX instructions

or descriptor table accesses.

Solution. Although there are many special cases

that must be handled by the introspection logic, usu-

ally there are a small number of instructions that trig-

ger events. In our tests, we have discovered that more

than 95% of all the instructions that ever trigger an

EPT violation are simple MOV instruction. This be-

haviour is constant on both 32 and 64-bit OSs and has

been conﬁrmed on both Windows and Linux.

In addition to the logic handling challenges,

instruction decoding constitutes another problem.

While this can be done quite easily, and there are sev-

eral disassemblers available (i.e. Capstone, distorm,

udis86, etc.), the introspection logic has to repeat this

task on every VM exit. Mapping memory and decod-

ing the instruction on each exit can be very expensive,

so an instruction cache can be used: save tuples (in-

struction pointer, decoded instruction) on each event,

which allows a fast lookup of the faulting instruction

on future events. If a new VM exit is triggered from

an instruction pointer that has already been cached,

the introspection logic can retrieve the decoded in-

struction directly from there, thus signiﬁcantly im-

proving performance. In addition, the CPU might

provide basic information to the hypervisor, such as

the access size that caused the fault or the instruction

bytes, thus relieving the hypervisor or the introspec-

tion logic from doing costly decodes.

4.7 Instruction Emulation

Challenge. Legitimate memory accesses that trig-

ger SLAT faults must be handled by the introspection

logic. There are two possible solutions to this:

1. Instruction Emulation. An instruction emulator

must simulate the behaviour of the emulated in-

struction entirely. Not only is this complex, but

it should also be able to handle every instruc-

tion supported by the target ISA. In the case of

the x86 architecture, the ISA has been greatly ex-

tended over the years and it now contains several

thousand different instructions, some of which are

very complex to handle. The code base of a com-

plete emulator would be of signiﬁcant size, there-

fore increasing the attack surface and making the

introspection logic or the hypervisor vulnerable

(ref, b). In addition, an emulator would have

to pay extreme attention to special cases such as

cross-page accesses, time of check vs. time of use

or instructions that access multiple pages and are

capable of triggering more that one SLAT fault;

2. Single stepping. This approach has the great

advantage of being generic: it can handle any

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

292

instruction without needing special knowledge

about it. In addition, this mechanism does not

have to specially handle instructions that cause

multiple faults: its incremental nature (granting

permission for each page once it is validated)

ensures that an arbitrary number of concomitant

faults can be handled. This mechanism works by

temporary removing the protection from the ac-

cessed page, thus allowing the instruction to com-

plete successfully. If the instruction would trig-

ger a new fault, the mechanism can be invoked in

a recursive manner. While the protection is dis-

abled, the introspection logic has to make sure

that other VCPUs won’t modify the content of the

pages while the single-stepping occurs. This can

be done in at least two ways:

• Pause all the other VCPUs while the faulting

one single-steps the instruction. This ensures

that only the faulting VCPU runs code while

the protection is removed from SLAT;

• Create a dedicated single-step SLAT that would

be used only while single-stepping the faulting

instruction. This allows use to load a new SLAT

on the VCPU that will single-step the access.

In each case, a signiﬁcant performance impact

would accompany this technique: at least two sep-

arate VM exits would have to be handled for each

single-stepped instruction - one for the actual event

that triggered the single-stepping and one more when

ﬁnishing single-stepping, that would allow us to re-

sume all other VCPUs or restore the original SLAT.

A better approach would be to use an instruction em-

ulator for the most common instructions (for exam-

ple, the MOV instruction, which accounts for more

than 90% of all the VM exits) and use the single-

step mechanism only for unsupported instructions,

thus using the great advantages from both techniques.

A security issue with both emulation and single-

stepping is the cross-VCPU instruction modiﬁcation

attack (ref, b). Such an attack would leverage multi-

processors systems in order to modify an instruction

at the right moment: after it has generated a VM exit

and has been analysed, but before being emulated or

single-stepped. Such an attack can be prevented by

properly validating the instruction before emulation

or single-stepping.

Since single-stepping is the most desired ap-

proach, the hardware could help into achieving this

more efﬁciently by allowing the hypervisor to tempo-

rary override SLAT permissions. Such a mechanism

would involve one or more ﬁelds in the VMCS that

would contain a guest-physical address together with

the override bits. While enabled, such a mechanism

would allow the faulting instruction to access a given

address if it is in the override list, even if SLAT would

otherwise deny it. The single-stepping mechanism

could be automatically disabled by the CPU once the

faulting instruction has been executed. Therefore,

we would not need an emulator or a speciﬁc single-

stepping mechanism, and one could handle any in-

struction by simply writing to some ﬁelds inside the

VMCS. In order to handle instructions that cause mul-

tiple faults, several such guest-physical override ﬁelds

could be present. In our tests, we discovered that

in more than 57% of the cases, the instruction can

be single-stepped in a single iteration (the instruction

triggers only one SLAT fault), and about 20% of the

instructions trigger two and three faults. The detailed

results can be seen in Figure 8. Less than 1% of the

instructions trigger 5, 6 or 7 faults, and we did not en-

counter any instruction that triggers 8 or more faults.

Figure 8: Probability of multiple SLAT-faults events.

The reason why multiple faults can be triggered

by a single instruction is that multiple guest-physical

pages are accessed when executing any given instruc-

tion. For instance (ignoring the accesses made to

fetch the instruction), an instruction that writes a sin-

gle byte inside a given page may cause ﬁve different

guest-physical accesses in long mode: four accesses

inside the IA page tables and one access inside the ac-

tual page. An instruction such as MOVS that makes a

page-boundary access may access 2 pages when read-

ing, 2 pages when writing and up to 16 page tables (2

x 4 + 2 x 4). These cases are rare, however, and if they

occur, they could be handled directly in software.

5 CONCLUSIONS

We presented in this paper low-level engineering

challenges that we have encountered during the de-

velopment of a hypervisor-based memory introspec-

tion engine. While we do not claim that this is a com-

plete list or that the solutions that we implemented

or proposed are the only ones possible, we think they

are representative, relatively easy to implement and

effective.

The most problematic challenges are the hard-

ware limitations that also induce the highest perfor-

mance impact: page granularity protection and ac-

cessed/dirty bits. While we cannot asses the complex-

ity or costs of implementing the proposed extensions

Hypervisor based Memory Introspection: Challenges, Problems and Limitations

293

in hardware, we strongly believe we will see them in

the future assisting memory introspection solutions.

Other challenges relate to instruction decoding

and emulation, and while they can be handled in soft-

ware using caches and emulators, they involve deep

knowledge of the instruction set and behaviour of the

CPU. Problems such as protecting paged memory or

accessing swapped out pages may not appear in a

kernel-mode memory introspection scenario, but are

very common when dealing with user-mode memory

introspection. While the solutions are not necessary

complex, they are neither obvious nor straightforward

to implement and may not be very effective.

We have discussed various possible improvements

that could be made inside the CPU itself in order

to aid memory introspection tasks, and while they

are purely theoretical, they may bring signiﬁcant im-

provement to such applications, both from the imple-

mentation complexity and performance perspective.

The complexity of implementing these in the CPU,

however, may vary signiﬁcantly, although emulators

such as Bochs or QEMU and simulation tools such as

PIN may provide an overview on how such extensions

may improve memory introspection. Many hardware

extensions were implemented recently for various al-

gorithms, like AES, SHA or CRC, showing an obvi-

ous trend of moving as much logic as possible on the

chip.

The software improvements that we have dis-

cussed were implemented and tested in U-HIPE and

some of them were presented in papers such as (Lutas

et al., 2015a) and (Lutas et al., 2015b), and, while the

performance increases, so do the attack surface and

the implementation complexity.

It is worth mentioning that currently, introspection

solutions are somewhat ahead of their time: they are

complex software that leverage the latest CPU inno-

vations in order to provide security, although the vast

majority of these extensions were not created for this

speciﬁc purpose. We keep seeing signiﬁcant improve-

ments in hardware, especially in security & virtualiza-

tion ﬁelds, and we think that future CPU generations

will include extensions that may help ﬁx at least some

of these issues, making hypervisor-based memory in-

trospection solutions easier the develop, deploy and

much more efﬁcient.

REFERENCES

AMD Corporation (2005). AMD64 Virtualization Code-

named Paciﬁca Technology. Secure Virtual Machine

Architecture Reference Manual.

Baliga, A., Ganapathy, V., and Iftode, L. (2008). Automatic

Inference and Enforcement of Kernel Data Structure

Invariants. In In Proc. Annual Computer Security Ap-

plications Conference, pages 77–86.

Carbone, M., Cui, W., Lu, L., Lee, W., Peinado, M., and

Jiang, X. (2009). Mapping kernelobjects to enable

systematic integrity checking. In In Proc. The 16th

ACM conference on Computer and communications

security Pages, pages 555–565.

Cozzie, A., Stratton, F., Xue, H., and King, S. T. (2008).

Digging for data structures. In In Proc. 8th USENIX

conference on Operating systems design and imple-

mentation, pages 255–266.

D. Durham (2014). Mitigating Exploits, Rootkits and Ad-

vanced Persistent Threats.

Dolan-Gavitt, B., Srivastava, A., Traynor, P., and Gifﬁn, J.

(2009). Robust signatures for kernel data structures.

In In Proc. 16th ACM conference on Computer and

communications security, pages 566–577.

G. Hoglund and J. Butler (2005). Rootkits: Subverting the

Windows Kernel.

Garﬁnkel, T. and Rosenblum, M. (2003). A Virtual Ma-

chine Introspection Based Architecture for Intrusion

Detection. In In Proc. Network and Distributed Sys-

tems Security Symposium, pages 191–206.

Intel Corporation (2016a). Control-ﬂow Enforcement Tech-

nology Preview.

Intel Corporation (2016b). Intel



64 and IA-32 Architec-

tures Software Developer’s Manual. Number 325462-

060US.

Lin, Z., Rhee, J., Zhang, X., Xu, D., and Jiang, X. (2011).

Graph-based signatures for kernel data structures. In

In Proc. 12th Annual Information Security Sympo-

sium, page Article no. 21.

Lutas, A., Colesa, A., Lukacs, S., and Lutas, D. (2015a). U-

HIPE: hypervisor-based protection of user-mode pro-

cesses in Windows.

Lutas, A., Lukacs, S., Colesa, A., and Lutas, D. (2015b).

Proposed Processor Extensions for Signicant Speedup

of Hypervisor Memory Introspection. In Trust and

Trustworthy Computing, pages 249–267.

M. Rusinovich and D. Solomon and A. Ionescu (2012).

Windows Internals 6th edition.

Sahita, R., Shanbhogue, V., Neiger, G., Edwards, J., Ouziel,

I., Huntley, B., Shwartsman, S., Durham, D. M.,

Anderson, A., and LeMay, M. (2014). Method

and apparatus for ﬁne grain memory protection.

US20150378633.

Serebrin, B. and Haertel, M. (2008). Alternate address

space to permit virtual machine monitor access to

guest virtual address space. US20090187726.

ICISSP 2017 - 3rd International Conference on Information Systems Security and Privacy

294