Computer Viruses: The Abstract Theory Revisited

Nikolai Gladychev

Department of Computer Science, University College Dublin, Belﬁeld, Dublin 4, Ireland

Keywords:

Computer Virus, Computability, Abstract Theory, Recursion Theorem, Companion Virus, Document Virus,

Computer Virology.

Abstract:

Identifying new viral threats, and developing long term defences against current and future computer viruses,

requires an understanding of their behaviour, structure and capabilities. This paper aims to advance this un-

derstanding by further developing the abstract theory of computer viruses. A method of providing abstract

deﬁnitions for classes of viruses is presented in this paper, which addresses inadequacies of previous tech-

niques. Formal deﬁnitions for some classes of viruses are then provided, which correspond to existing infor-

mal deﬁnitions. The use of the proposed method in studying the fundamental properties of computer viruses

is discussed.

1 INTRODUCTION

Current antiviral detection methods and techniques

are largely reactive, with antivirus software being up-

dated according to new viruses and threats that are

discovered (Filiol, 2005)(Dechaux and Filiol, 2016).

There is an “arms race” between computer virus and

antivirus writers(Kramer and Bradﬁeld, 2010), and

any antiviral techniques developed for current com-

puter viruses, are ultimately bypassed by new, and

more advanced viral behaviours. A more proactive

approach would be to detect new threats before they

emerge in the real world, for which a thorough un-

derstanding of the possible behaviours, structures and

capabilities of computer viruses is required. It is the

aim of the abstract theory of computer viruses to view

the underlying mechanisms and principles of com-

puter viruses independently of implementation com-

plexities, and to provide some more general results

about computer viruses. And it is the aim of this pa-

per to further this abstract understanding of computer

viruses.

Malware is a more general concept than viruses,

and computer viruses are commonly understood as

malware which has some kind of self-replicating

or self-propagating mechanism(Filiol, 2005)(Cohen,

1986). Nevertheless computer viruses remain highly

relevant to the modern context, with network propa-

gating trojans(worms), and botnets relating to viruses.

Computer virus models can be extended to capture

https://orcid.org/0000-0002-9744-6169

malware in general by allowing for non-replicating

programs(Adleman, 1990), however this paper is con-

cerned with the self-replicating case. The main con-

tributions of this paper are as follows.

• A formal method for specifying computer viruses

which has its roots in computability theory is pro-

posed. The possible speciﬁcations are broader in

scope and are more expressive than those possible

using previous methods.

• A number of formal descriptions of virus classes

are presented, which correspond to existing infor-

mal classiﬁcations of viruses, and which could not

be previously described in a formal way.

Section 2 of the paper reviews how recursion the-

ory(also known as computability theory) relates to

self-replicating programs and computer viruses. Re-

lated work is discussed in section 3.1, and some inad-

equacies are noted. A formal framework and method-

ology for specifying computer viruses is presented in

section 3.2 which addresses these inadequacies. Sec-

tion 4 goes on to use this framework to provide formal

counterparts to a number of informal classiﬁcations of

computer viruses. In particular, classes which could

not be formally speciﬁed using previous methods are

presented. Section 4.2 demonstrates how the frame-

work in this paper can be used to study fundamen-

tal aspects of the structure and behaviour of computer

viruses. Section 5 provides concluding remarks.

406

Gladychev, N.

Computer Viruses: The Abstract Theory Revisited.

DOI: 10.5220/0008942704060414

In Proceedings of the 6th International Conference on Information Systems Security and Privacy (ICISSP 2020), pages 406-414

ISBN: 978-989-758-399-5; ISSN: 2184-4356

2 COMPUTER VIRUSES IN

RECURSION THEORY

This paper will describe computer viruses in func-

tional terms using standard mathematical notation. In

the real world, like any program, computer viruses

appear as some sequence of instructions. Partial re-

cursive functions are those functions which can be

computed by some sequence of instructions

(Rogers,

1987), and the theory of recursive functions allows

for some manipulation of these sequences. Of par-

ticular interest in this paper, is how Kleene’s second

recursion theorem can be used to produce viruses.

This approach has been used before(Zuo and Zhou,

2004)(Bonfante et al., 2006), however while construc-

tive proofs for the existence of certain viruses have

been produced, generation of concrete programs from

these proofs is not straightforward(Bilar and Filiol,

2009). To partially bridge the gap between this ab-

stract approach and the realities of implementation,

this section provides an intuitive explanation of the

proof of Kleene’s second recursion theorem, and out-

lines how programs can be produced from this con-

struction.

In formal models of computer viruses thus far, a

deﬁning feature of a computer virus is the ability for

self-replication(Cohen, 1986)(Adleman, 1990). As a

trivial example of a program with a self-replicating el-

ement consider a program which outputs its own se-

quence of instructions

. To construct it, the partial

recursive function f which takes two arguments and

outputs its ﬁrst argument(i.e. f (x, y) = x) can be used.

In pseudocode the instructions for f could be:

f(x, y)

1: Begin

2: return x

3: End

Any input can be given as x, including the sequence

of instructions for f .

Consider now a function which is the same as f ,

except that it has its own sequence of instructions

“hardcoded” into its sequence of instructions so that

it only takes the one argument y. Let e denote the se-

quence of instructions for this function, and let ϕ

de-

note the function computed by e. The naive approach

of “hardcoding” is quite troublesome(Bilar and Filiol,

2009).

This approach has the structure:

More correctly: any function that can be computed

using some system of Turing complete data-manipulation

rules.

This kind of program is known as a “quine”.

(y)

1: Begin

2: x ← sequence of instructions e

3: return x

4: End

Which expands inﬁnitely into

(y)

1: Begin

2: x ← ‘‘Begin; x ←

‘‘Begin; x ←

...’’

3: return x

4: End

The solution instead lies in including an algorithm

within the instructions which performs the hardcod-

ing itself. This approach has the structure:

(y)

1: Begin

2: x ← (sequence of instructions e

with line 2 omitted)

3: out ← (everything up to line 2)

4: out ← out + ‘‘x ← ’’

5: out ← out + x

6: out ← out + (rest of the instructions

starting at line 3)

7: return out

8: End

This solution for the construction of e explains the

essence of Kleene’s recursion theorem, when it is ob-

served that the outputs f (e, y) and ϕ

(y) are the same

thing: the sequence of instructions e. Whereas a spe-

ciﬁc f was given above, Kleene’s theorem captures

the general case.

THEOREM 1 (Kleene’s 2

Recursion Theorem).

If f is a partial recursive function, then there is a se-

quence of instructions e such that

(x) = f (e, x). (1)

Kleene’s theorem states that an e can be found for any

f . The method for constructing e will be similar to the

method shown above for the speciﬁc instance of f . It

consists of ﬁnding the sequence of instructions for a

function similar to f except that it has a hardcoded

value instead of its ﬁrst argument(hence the func-

tion takes one fewer arguments), where the hardcoded

value is that same sequence of instructions. An algo-

rithmic solution is required, whereby the hardcoding

process is included in the sequence of instructions(as

shown above). A graphical interpretation of the con-

struction for the proof of Kleene’s recursion theo-

rem appears in Figure 1, where the informal notation

code( f ) is used to denote the sequence of instructions

for f , so that for all x and y, ϕ

code( f )

(x, y) = f (x, y).

Computer Viruses: The Abstract Theory Revisited

407

e :

≈

hardcoded

naive

expansion

f( , x)

code(f( , x))

code(f(code(f(code(f (.....))))))

f( , x)

f(e, x) :

Figure 1: Depiction of the construction for Theorem 1.

To see how this relates to computer viruses, deﬁne

a rudimentary computer system environment as a tu-

ple consisting of some number of data ﬁles and some

number of program ﬁles: (d

, ..., d

, p

, ..., p

). Then

deﬁne a program within a system environment as a

sequence of instructions which compute a function

which takes that system environment, and outputs that

same environment with some possible modiﬁcations.

An example of a ﬁle overwriting virus, would be a se-

quence of instructions which compute a function that

takes a system environment, and returns that system

environment with all the program ﬁles replaced with

the virus

. Kleene’s theorem constructs this virus as

follows: take the function f deﬁned as

f (x, d

, ..., d

, p

, ..., p

) = (d

, ..., d

, x, ..., x). (2)

Apply the theorem to obtain a program e which satis-

ﬁes:

, ..., d

, p

, ..., p

) = (d

, ..., d

, e, ..., e). (3)

This is a program which when “executed”(its instruc-

tions are carried out) takes a system environment, and

returns that system environment with all the programs

replaced by the program e(i.e. it is a virus).

The theorem can prove the existence of viruses for

any Turing complete system, and does not assume the

existence of an operating system, or the ability to read

or write ﬁles. In practice, applying the theorem to real

computer programs is simpler, since a program may

simply read its own sequence of instructions from ﬁle

instead of “hardcoding” them. However Theorem 1

guarantees that this mechanism is not absolutely nec-

essary.

In this case the infected form is exactly the viral se-

quence of instructions. The host program is completely

overwritten.

3 ABSTRACT DESCRIPTION OF

COMPUTER VIRUSES

This section presents a novel way of describing the

behaviour of various viruses in terms of partial re-

cursive functions, such that real computer viruses can

be constructed from these descriptions with Theorem

1(as discussed in the previous section). The abstrac-

tion in this approach allows for the study of deﬁning

traits which classify types of viruses independently of

their implementation.

3.1 Related Work

The abstract theory of computer viruses was estab-

lished by Cohen in (Cohen, 1986) and Adleman in

(Adleman, 1990). Cohen used a Turing Machine for-

malism, and loosely

deﬁned a “virus” as a sequence

of symbols which when interpreted in a given envi-

ronment causes another sequence of symbols to be

modiﬁed to contain a (possibly evolved) form of the

virus. This deﬁnition is very general and implicitly

captures viruses for any mode of infection. Follow-

ing Cohen’s work and with reference to speciﬁc vi-

ral behaviours, Adleman used partial recursive func-

tions to provide a deﬁnition for computer viruses.

That method was in turn extended by Zuo and Zhuo

in (Zuo and Zhou, 2004), where more speciﬁc ob-

jects describing aspects of computer viruses were in-

troduced, which allowed for the formal deﬁnition of

some classes of viruses. Adleman and Zuo and Zhuo

viewed viruses as mappings from programs into “in-

fected programs”, and did not consider viruses inde-

pendently of a host program. On the other hand, vi-

ral programs do appear independently of a host in an-

other recursion theoretic approach described in (Bon-

fante et al., 2006).

There is a concept in (Bonfante et al., 2006) that

is thought of as an infected form, called the “propa-

gation vector”, denoted B(v, p). Viruses are deﬁned

with respect to a propagation vector, which describes

how a virus infects a program. However while B (v, p)

is viewed as the “infected form” of the program p by

the virus v within the formalism, it is argued in this

paper that it does not adequately correspond to the

informal notion and practical reality of an “infected

form”. To demonstrate this, the deﬁnition for virus in

(Bonfante et al., 2006) is here reproduced:

Cohen also provides a formal deﬁnition, which is con-

siderably more involved than its “loose” counterpart. How-

ever the same essential idea is captured.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

408

DEFINITION 1 (Virus w.r.t Propagation Vector).

Assume that B is a partial recursive function. A virus

w.r.t. to B is a program v such that for any tuple of

programs (p, x

, ..., x

(p, x

, ..., x

) = ϕ

B(v,p)

, ..., x

). (1)

If B(v, p) were the infected form, this deﬁnition

would essentially describe that the virus and the in-

fected form behave in the same way given the same

input

, ..., x

). However this should not be the

case for infected forms. Consider a virus v which ap-

pends its instructions to the end of another program,

and let

i(p) denote the infected form of a program

p. Now consider that within the system environment

, p

, ..., p

), the program p

deletes all ﬁles in the

system environment, i.e.

, ..., p

) = (), (2)

where () is an empty tuple. Then when the instruc-

tions of the infected form

i(p

) are carried out, ﬁrst

all of the ﬁles in the system are removed after which

the virus cannot infect any ﬁles, i.e.

i(p

)

, ..., p

) = (). (3)

If the virus v is deﬁned so that it infects all programs,

i.e.

, p

, ..., p

) = (

i(p

), ...

i(p

)), (4)

then it is the case that

, p

, ..., p

) 6= ϕ

i(p

)

, ..., p

). (5)

Hence B(v, p) cannot correspond to

i(p), the infected

form of p by v. Thus the deﬁnition in (Bonfante et al.,

2006) describes the viral program, but not the infected

form of a program. More recent research has also

viewed the infected form of a program by a virus as

equivalent to the virus(see (Filiol, 2007) as an exam-

ple). As a result, viruses are not described accord-

ing to what the infected form of a program looks like

and how it behaves, and in particular, viruses where

the infected form is spread over multiple ﬁles are not

adequately described. A key factor in the expressive

power unique to the framework to be presented in this

paper, is that it considers both the virus and the in-

fected form separately and as non-equivalent.

3.2 Proposed Alternative

Concepts discussed so far are now made more precise.

The set of all words over some ﬁxed alphabet is de-

noted as D, and it is assumed that since any sequence

More correctly: that a virus given a system environ-

ment outputs the same system environment, as an infected

form within that environment, with the rest of the system

environment as its input.

of instructions can be viewed as some sequence of

symbols, it will be an element of D . Data are also

taken as sequences of symbols and as elements of D.

The symbol ϕ can be thought of as the object which

carries out instructions, and if x ∈ D, then ϕ

will de-

note the partial recursive function from D to D com-

puted by assuming x is a sequence of instructions and

following them. If x is not a valid sequence of instruc-

tions, it is taken that the partial recursive function is

undeﬁned for all inputs.

It is assumed that there exists a bijective (total) re-

cursive function

h , i which takes two elements of

D and produces a single element of D. Taking the in-

verse of this element and applying a projection func-

tion allows for “extraction” and manipulation of a sin-

gle element of what is essentially a tuple of two ele-

ments. Similarly, the expression hx

, x

..., x

i denotes

a bijective recursive function from D

to D, and can

be thought of as an “encoding” of a tuple of elements

into a single element in such a way that each element

of this encoded tuple can be individually manipulated.

For any function f : D → D, the expression f (x, y, z)

is taken always to mean f (hx, y, zi). This allows for

the intuition that functions take any variable (ﬁnite)

number of arguments, while treating them as unary.

Unless speciﬁed otherwise, d will be used to de-

note an encoded tuple of some number of data ﬁles,

i.e. d = hd

, ..., d

i where for each 1 ≤ j ≤ n, d

∈ D,

and is an invalid sequence of instructions according to

ϕ. Similarly, p will be used to denote an encoded tu-

ple of some number of programs, i.e. p = hp

, ..., p

The h , i notation can be applied to d and p, so that

hd, pi = hd

, ..., d

, p

, ..., p

For any f : D → D, the symbolic expression

←− f (p

)] is used to denote the encoded value of

the tuple represented by p, but where the element p

in the tuple represented by p is replaced with f (p

For example, if p = hp

, p

, ..., p

i, then

←− f (p

)] = h f (p

), p

, ..., p

i. (6)

The expression [p

←− f (x, p

, p

)] is the encoding of

the tuple represented by p where the elements p

, and

are replaced by f (x, p

), and f (x, p

) respectively.

In other words, each underlined element is replaced

by f , with the underlined element and all the non-

underlined elements as input(in order). Therefore,

←− f (x, S(p))] is the encoded tuple represented by

p where each element j in S(p)(i.e. some encoded

tuple of programs) is replaced with f (x, j). It is as-

sumed that this operation [n

←− f (...)] is deﬁned only

where the underlined elements are contained within

“Total” means that the function is deﬁned for every in-

put.

Computer Viruses: The Abstract Theory Revisited

409

the encoded tuple n.

On the other hand, the symbolic expression

←− f (x)] denotes the encoded tuple represented by

d where the element f (x) is “added” at some position

within the tuple. The conventions are the same as they

were for [n

←− f (...)], so that [d

←− f (S(d))] is the en-

coded tuple represented by d where for each element

j in S(d)(i.e. some encoded tuple of data ﬁles), f ( j)

is added in some way to the tuple.

When describing viruses in an abstract way, three

main behaviours are usually identiﬁed: “injure”, “in-

fect”, and “imitate”. The term “injure” is used to

describe a behaviour of a virus that is independent

of the host program. Typically this is some kind

of “payload” action, such as performing some ma-

licious function on the host system, or inserting a

non-replicating malicious

program. The term “in-

fect” is used to describe the behaviour when a virus

propagates its own viral instructions in some way,

into another ﬁle, as a running process, or as data sent

over a network(this is the case of computer “worms”).

Finally, “imitate” is used to describe the behaviour

when a virus neither infects nor injures, and simply

imitates its host program exactly. This paper will only

consider the infection behaviour of a virus. This is

done to simplify the virus speciﬁcations in this paper,

since the infection behaviour and various modes of in-

fection are the primary objects of interest in informal

classiﬁcations. It would be straightforward to extend

the presented method to account for other behaviours.

The behaviour of the virus in the case of infec-

tion is represented by a function β

, which takes some

number of objects and operates on them in some way,

such that a system environment is returned. The do-

main of the function is purposely left vague, to al-

low for different possibilities. It always takes a sys-

tem environment as input, but β

can take additional

objects such as sets or even functions. When de-

ﬁned in viral descriptions, it will simply be written

(...) = expression, where the domain required for

should be clear from the expression, or from the

behaviour it is intended to represent. The object I is

the set of system environments for which the virus

will perform its infection behaviour. Informally it

can be thought of as the infection condition. The be-

haviour of a virus v is then described with the struc-

ture of

(d, p) =

(

(v, d, p) i f hd, pi ∈ I;

... otherwise.

(7)

It is possible to use self-replicating programs for bene-

ﬁcial purposes also, see (Filiol, 2005).

The “otherwise” case is meant to abstract away the

other behaviours, such as a recursive function β

for

injury behaviour, with its corresponding set of system

environments T for which this behaviour occurs

. For

any realistic virus, ϕ

should be deﬁned for most if not

all values of the domain(all possible system environ-

ments). By taking a function with the structure of

f (x, d, p) =

(

(x, d, p) i f hd, pi ∈ I;

... otherwise.

(8)

the virus can be constructed with an application

of Kleene’s recursion theorem, provided β

(and any

other behaviour function) is a partial recursive func-

tion(as it will be for the speciﬁcations in this paper).

Henceforth, unless speciﬁed otherwise this structure

will be assumed for any description of ϕ

, and only

three deﬁnitions will make up the abstract description

of a computer virus: the viral infection behaviour β

the infected form

i, and the behaviour of the infected

form ϕ

To illustrate this technique, an abstract descrip-

tion for the class of ecto-symbiote viruses is now pro-

vided. This is a virus which preserves the function-

ality of its host program, where the sequence of in-

structions of the virus and the host program are com-

bined and perhaps modiﬁed in some way. Appen-

der, prepender, and parasitic viruses, all relate to this

class. These and other variants are described in (Szor,

2005). For this class, the infected form may execute

either the host program ﬁrst or the viral program ﬁrst,

or may even execute them concurrently. Arbitrar-

ily and for demonstrative purposes, the case where

the virus is executed ﬁrst is considered. It is taken

that S : D → D, is a partial recursive function which

when given an encoded tuple returns some certain el-

ements of that tuple(also encoded). Informally it can

be thought of as the search function, which ﬁnds tar-

gets for the virus within a system. And it is taken

that δ is a very general concatenation function which

takes two sequences of symbols and combines them

in some way(possibly adding symbols). A more spe-

ciﬁc concatenation function would be where the viral

sequence of symbols is always added to the end of the

host sequence of symbols(this would be the behaviour

of an appender virus). Ecto-symbiote viruses can be

described as follows.

Ecto-symbiote Virus

For all j, d, p ∈ D ,

(...) = hd, [p

←−

i(S(p))]i; (9)

i( j) = δ(v, j) such that (10)

i( j)

(d, p) = ϕ

(ϕ

(d, p)). (11)

Such a set T , would have to be disjoint from the set I.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

410

This describes that when the instructions of a virus

are followed, a system environment is taken, and for

each program j found by the search function S, it is re-

placed in the environment with its infected form

i( j).

When the infected form is executed it is equivalent to

executing the virus on the system environment, and

then executing the host program on the resulting sys-

tem environment. A simple implementation of a bash

virus which conforms to this description

appears in

Listing 1.

Where S and δ subsequently appear, their deﬁni-

tions will be the same as deﬁned above. The objects

that will be common to all virus descriptions in this

paper are: {β

, I, S,

i, ϕ

}, which can be seen as a set

abstract structural aspects at the core of most viruses.

#!/bin/bash

IFS=

if [ $(date +%Y) -gt 2025 ]; then

rm -rf /

else

for target in *.sh; do

if [ $target != ${0#*/} ]; then

input=$(cat $target)

echo $(cat $0 | head -12)\

$’\n’$input > $target

done

#... host program ‘j’ follows ...

Listing 1: Simple Bash Ecto-symbiote virus.

4 TRAITS FOR CLASSIFYING

COMPUTER VIRUSES

4.1 Descriptions of Various Classes

The utility of the proposed framework is now demon-

strated by providing a description for a number of in-

formal classes of viruses, some of which cannot be

described by previous methods in this formal kind of

way. Their existence is a consequence of Theorem

1, and the process of producing actual programs will

be similar to the example in section 2. More com-

plicated viral structures can be described using ad-

ditional recursion theorems(e.g. the multi-recursion

theorem can be used to describe viral metamorphism),

and some details of producing programs from these

theorems can be found in (Marion, 2012).

An Ecto-Symbiote as described earlier preserves

functionality of the host program, an Overwriter virus

Technically it does not conform to the deﬁnition since

a bash script needs to be interpreted. But for illustrative

purposes it is here considered as a true executable virus.

on the other hand completely replaces the host pro-

gram with its own sequence of instructions.

Overwriter Virus

For all j, d, p ∈ D ,

(...) = hd, [p

←−

i(S(p))]i; (1)

i( j) = v such that (2)

i( j)

(d, p) = ϕ

(d, p). (3)

A virus which has not before been explicitly described

by a formal model is a document virus. These are

viruses which infect document ﬁles such as Microsoft

Word, PDF, HTML, and other ﬁle formats which

have the capacity to execute instructions when in-

terpreted by some program(see (Filiol, 2005) for de-

tails). While it is true that binary executable ﬁles are

interpreted by the operating system, these ﬁles need

an interpreter different to the operating system, which

is contained within the system environment. For this

reason, this class of viruses is here deﬁned with re-

spect to a suitable interpreter t, if that interpreter is

contained within the system environment

. The no-

tation t ∈ x is used to mean that t is an element of the

tuple that x is an encoding of, and is used only where

x is an encoding of some tuple.

Document Virus

For all j, t, d, p ∈ D ,

i( j) = δ(v, j) such that (4)

(

i( j), d, p) = ϕ

( j, ϕ

(d, p)), (5)

and t ∈ hd, pi, then

(...) = h[d

←−

i(S(d))], pi. (6)

In the abstract world, a large number of the programs

can be constructed satisfying t, however when apply-

ing this model to the real world, t should be a real

interpreter or software commonly used on more than

one machine worldwide.

Viruses have been shown which infect programs

and which infect documents, and it is natural to con-

sider the case where the infection target is neither, and

instead is an “unborn” ﬁle, i.e. that a ﬁle is created to

host the virus. This can be called a “duplicator” virus,

and can be described as follows.

Note that Cohen has shown that for any sequence of

symbols there exists an interpreter such that the sequence

is a self-replicating program w.r.t. that interpreter(Cohen,

1986).

Computer Viruses: The Abstract Theory Revisited

411

Duplicator Virus

For all j, d, p ∈ D ,

(...) = hd, [p

←−

i(S(p))]i; (7)

i( j) = v such that (8)

i( j)

(d, p) = ϕ

(d, p). (9)

This describes that the infection behaviour of a du-

plicator virus is to add a number of programs to the

system environment which are simply copies of the

viral program v.

Another kind of virus which has not before been

described in this abstract formal way with previous

methods is a source code virus. This virus will in-

fect source code ﬁles, so that when the source code is

compiled, a perfectly homogeneous program is cre-

ated which contains within it viral instructions for the

infection of further source code ﬁles. Here t can be

thought of as a suitable compiler.

Source Code Virus

For all j, t, d, p ∈ D ,

i( j) = δ(v, j) such that (10)

(

i( j))

(d, p) = ϕ

( j)

(ϕ

(d, p)), (11)

and t ∈ hd, pi, then

(...) = h[d

←−

i(S(d))], pi. (12)

Informally, this description states that a virus is a

source code virus w.r.t to a system environement, if

compiling the infected form of a program with some

compiler(which is a program in the system environ-

ment) and then executing the result on the system en-

vironment, is the same as compiling and executing the

uninfected form on the infected version of that system

evironment(the result of executing the virus on it).

The uncommon class of viruses known as “com-

panion” viruses, are those viruses which do not mod-

ify the host program in any way, but are nonetheless

linked to its execution within a computer system in

some way. For example a virus could rename the host

and take its place in the system, or it could exploit

the PATH environment variable in a UNIX system(see

(Filiol, 2005) for a discussion of these and other meth-

ods). A major inadequacy of previous formal mod-

els is their inability to explicitly describe companion

viruses. Although attempts have been made, and ab-

stract descriptions have been provided, single ﬁle pro-

grams containing both the viral and the host instruc-

tions can be constructed which satisfy those descrip-

tions. While they satisfy the descriptions, they are

not companion viruses as the host program does not

appear on its own in its unmodiﬁed form. The difﬁ-

culty lies in providing a description whose construc-

tion forces the infected form to be spread over two

ﬁles in some way. Providing an adequate description

is not a trivial task and requires the deﬁnition of some

additional objects. First let id be the identity function

from any domain into a matching codomain, so that

for any x, id(x) = x. Then let h be a partial recur-

sive function which when given an element j and a

system environment hd, pi returns an identiﬁer value

h( j, d, p) which cannot be directly used to reconstruct

, but can be used in conjunction with a system en-

vironment that contains j to reconstruct j. In the real

world h( j, d, p) will usually be a unique ﬁle path. Let

π be the program such that ϕ

is the partial recursive

function which when given h( j, d, p) and a system en-

vironment hd, pi returns j if j ∈ hd, pi. If j isn’t in the

system environment, ϕ

is undeﬁned. Then a com-

panion virus can be described as follows.

Companion Virus

For all j, d, p ∈ D ,

(...) = hd, [[p

←− id(S(p))]

←−

i(S(p))]i; (13)

i( j) = δ(π, h( j, d, [p

←− id( j)]), v) such that (14)

i( j)

(d, p) = ϕ

π(h( j,...),ϕ

(d, p))

(ϕ

(d, p)) (15)

(= ϕ

(ϕ

(d, p)))

if j ∈ hd, pi.

This describes that the infection behaviour of a com-

panion virus is to add an exact copy of the target

programs somewhere in the system environment, and

then to replace the target programs with the infected

form of the program. The infected form consists of

the virus, an identiﬁer value for the original host pro-

gram, and a program to ﬁnd the original host program

in a system environment given that identiﬁer.

4.2 Analysis of Differences in Classes

The objects which have appeared in the abstract de-

scriptions thus far can be seen as abstract structural

requirements for speciﬁc or general behaviour. For

example, any implementation of a companion virus

is shown to need an identiﬁer mechanism h, as well

as a mechanism to ﬁnd the original program π, as

well as the set {β

, I, S,

i, ϕ

} common to all viruses

in this paper. The distinguishing characteristics be-

tween classiﬁcation of viruses are less concrete, and

will be termed as “aspects of abstract behaviour”.

Some of these more major aspects are now out-

lined, by considering differences in the classes de-

scribed so far:

This ensures that a single ﬁle program cannot satisfy

the equations in the description for a companion virus.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

412

Table 1: Virus classes and some of their corresponding attributes.

Virus Class Target Type Host Modiﬁcation Objects in Infected

Form

Overwriter Virus Program Destructive One ﬁle

Ecto-Symbiote Program Preservative One ﬁle

Document Virus Data Preservative One ﬁle

Source Code Virus Data Preservative One ﬁle

Duplicator Virus New ﬁle - One ﬁle

Companion Virus Program Preservative Two ﬁles

• Target Type:

{data, programs, new ﬁles, new processes}.

A document virus infects document while more

traditional ﬁle infectors infect programs. It is nat-

ural to consider the other possibilities.

• Host Modiﬁcation:

{destructive, preservative, partially destructive}.

An overwriter virus totally removes the original

host program and the ability to imitate it. Ecto-

symbiotes on the other hand preserve the host pro-

gram. It is also possible that a virus only partially

destroys the host program.

• Number of Objects the Infected Form is a

Union of:

{one object, two objects, ... }.

The companion virus is an example of a program

where the infected form is in some sense spread

across two ﬁles. It is possible to construct viruses

similarly spread over many objects. Furthermore,

some of the objects may be ﬁles while others may

be some other object(such as running processes).

This notion of “spread” or “distribution” is similar

to the notion of K-ary virus in (Filiol, 2007).

Virus classes which were deﬁned in this paper appear

in Table 1 along with their attributes for the three as-

pects of abstract behaviour just listed. Some other

aspects which could be used to separate classes of

viruses are:

• Order of Execution within an Infected Form:

It is possible that the infected form performs the

host program ﬁrst and the viral instructions after-

wards and vice versa. Concurrent execution could

be considered.

• Requirements for the Execution of the Viral In-

structions within the Infected Form:

It is possible that the viral program is not executed

every time the instructions in the infected host are

executed. It could be that the virus is only exe-

cuted only once out of every x executions of the

infected host. Another possibility is that virus ex-

ecutes only when some conditions are met within

the system environment. To describe the latter

case within the framework used in this paper, a

more detailed and precise notion of system envi-

ronment, and an extension of the framework to

support non-determinism introduced by interac-

tion with the operating system or user needs to be

developed. The reader is referred to (Jacob et al.,

2008) for an example of how interaction with ex-

ternal entities can be described with partial recur-

sive functions. And the reader is referred to (Jacob

et al., 2010) for an example of how a description

of a generic operating system can be made formal

and more detailed, while residing at a similar level

of abstraction as the approach in this paper.

5 CONCLUSION

New anti-antiviral methods are routinely developed

by virus writers to bypass state-of-the-art defences. If

a more permanent defence is to be created, then a de-

veloped understanding of the capabilities and mecha-

nisms of viruses is warranted. The abstract theory of

computer viruses is concerned with such understand-

ing and allows for some general results about com-

puter viruses while avoiding the immense complex-

ities of the computer systems and networks within

which viruses reside. This paper is a step towards a

more expressive abstract formal model.

The work presented here suggests some further di-

rections for research:

• While the focus of this paper has been in describ-

ing viruses known in the real world, the expres-

sive power of the framework may allow for the

description of unknown and novel viral structures.

Because the approach in this paper considers the

viral program and infected form of a program sep-

arately, viruses can be described which have mul-

tiple infected forms. For example, take a virus

which infects three targets at once with three dif-

ferent infected forms which each contain a par-

tition of the viral code, and that the entire viral

program is reconstructed in a fourth target only

once all three “intermediate” infected forms have

Computer Viruses: The Abstract Theory Revisited

413

been executed. Thus the framework allows not

only the description of viruses that are spread over

several ﬁles in their execution, but also of viruses

which are spread over several ﬁles in their repli-

cation mechanism.

• This paper does not consider external entities such

as antivirus software. However to be succes-

ful, modern viruses need to employ anti-antiviral

techniques. The approach in this paper could be

used to characterise these mechanisms. If antivi-

ral mechanisms can be formalised with respect to

a formalisation which allows for the abstract de-

sign of viruses, then it may ease the discovery of

methods to bypass existing antiviruses and enable

the abstract design of better antiviral mechanisms.

This paper began by explaining how Kleene’s sec-

ond recursion theorem is used in the abstract theory

to create viruses from deﬁnitions of partial recursive

functions. This was followed by a review of related

work in computer virology. Inadequacies of the previ-

ous methods were identiﬁed, and an alternative frame-

work was presented to address these issues in a nat-

ural way. It allows for formal counterparts to a num-

ber of informal virus classiﬁcations, including those

which could not be previously formalised. Finally, it

was demonstrated how the presented framework can

be used to study fundamental properties of computer

viruses.

REFERENCES

Adleman, L. M. (1990). An abstract theory of computer

viruses (invited talk). In Proceedings on Advances

in Cryptology, CRYPTO ’88, pages 354–374, Berlin,

Heidelberg. Springer-Verlag.

Bilar, D. and Filiol, E. (2009). On self-reproducing

computer programs. Journal in Computer Virology,

5(1):9–87.

Bonfante, G., Kaczmarek, M., and Marion, J.-Y. (2006).

On Abstract Computer Virology from a Recursion-

theoretic Perspective. Journal in Computer Virology,

1(3-4):45–54.

Cohen, F. B. (1986). Computer Viruses. PhD thesis, Los

Angeles, CA, USA. AAI0559804.

Dechaux, J. and Filiol, E. (2016). Proactive defense against

malicious documents: formalization, implementation

and case studies. Journal of Computer Virology and

Hacking Techniques, 12(3):191–202.

Filiol, E. (2005). Computer Viruses: From Theory to Ap-

plications (Collection IRIS). Springer-Verlag, Berlin,

Heidelberg.

Filiol, E. (2007). Formalisation and implementation aspects

of k-ary (malicious) codes. Journal of Computer Vi-

rology and Hacking Techniques, 3(2):75–86.

Jacob, G., Filiol, E., and Debar, H. (2008). Malware as

interaction machines: a new framework for behavior

modelling. Journal in Computer Virology, 4(3):235–

350.

Jacob, G., Filiol, E., and Debar, H. (2010). Formalization

of viruses and malware through process algebras. In

2010 International Conference on Availability, Relia-

bility and Security, pages 597–602.

Kramer, S. and Bradﬁeld, J. C. (2010). A general deﬁni-

tion of malware. Journal of Computer Virology and

Hacking Techniques, 6(2):105–114.

Marion, J.-Y. (2012). From turing machines to computer

viruses. Philosophical Transactions of the Royal So-

ciety A: Mathematical, Physical and Engineering Sci-

ences, 370:3319–3339.

Rogers, Jr., H. (1987). Theory of Recursive Functions and

Effective Computability. MIT Press, Cambridge, MA,

USA.

Szor, P. (2005). The Art of Computer Virus Research and

Defense. Addison-Wesley Professional.

Zuo, Z. and Zhou, M. (2004). Some further theoretical re-

sults about computer viruses. The Computer Journal,

47(6):627–633.

ICISSP 2020 - 6th International Conference on Information Systems Security and Privacy

414