// [HINT] type of ’x’ is ambiguous in
// expression at right-hand of
// assignment: assuming
// initialization type num[2]
// [WARNING] possible truncation
// detected in assignment:
// num[3] :> num[2]
}
where x : num[2] := 11
In the first statement, where
x
is incremented by 1,
the type of the variable is annotated both in its usage
as an expression term and as the target on the left side
of an assignment. In the right-hand case its type is the
initialization type
num[2]
that appears in the global
declaration, which happens to be its current type at
the beginning of the program; in the left-hand case
x
should be given a wider numeric type, because the re-
sult of the sum of a
num[2]
and a literal whose type is
num[1]
would lead to
num[3]
1
, but it gets truncated
in order to fit the initialization type as COBOL run-
time would do and therefore, being the resulting stor-
age class still
num
, its final type happens to be equiv-
alent to its initialization type.
The system tracks the type that variable are sup-
posed to have from a type-flow point of view, i.e. as if
data movements were tracked across expressions and
statements and the type of what variables are sup-
posed to contain is recorded.
Encountering the
if
statement makes the analyzer
descend into its
then
block: a truncation is detected
therein, being
alpha[3]
surely wider than the tar-
get type
num[2]
, and the truncated type
alpha[2]
is given to
x
, which fits the initialization type. Such
information must be then merged to that previously
collected before branching: hence the reason why the
type of
x
in the expression at the right hand of the as-
signment after the
if
block is not a simple type. The
flow-type has grown here due to the merge and it now
consists of all possible types
x
might have at the mo-
ment. That leads to an ambiguous choice when typing
the sum operation and so the system needs to recover
to the initial type declaration - which might seem odd,
but is in fact a viable solution, as in COBOL every
variable strictly adheres to its picture declaration, thus
falling back to it is not an unsafe decision in case a
better information cannot be reconstructed.
1.2 Comparisons and Motivation
As already mentioned, the legacy software analysis
system thoroughly presented in (Moonen, 2003)
2
rely
1
In general, a number made of 2 digits plus a number made
of 1 digit could possibly lead to a number made of 3 digits, as in
99+ 9 = 109. See type rules for expressions in table 6 for details
on how arithmetic operations formally affect numeric type formats.
2
That is a Ph.D. thesis collecting previous works on the same
subject and anticipating some that yet had to come. In general, that
on mechanisms for producing information over types
that mainly serve Program Understanding techniques,
Concept Analysis (Kuipers and Moonen, 2000) and
other high-level elaborations. In general, its scope
is wider than ours and not entirely overlapping.
Nonetheless there is something in common, that is
giving somehow interesting types to COBOL vari-
ables, that can be taken into consideration for mak-
ing a comparison with what we believe is the most
advanced system for COBOL analysis based on types
available to date.
• We translate COBOL into a simpler interme-
diate language as (van Deursen and Moonen,
1998) does, though without leaving out impor-
tant language constructs whose behavior is rele-
vant to typing real-world programs, such as
goto
,
perform
and
perform-thru
jump statements,
call-by-reference procedure calls and
if
state-
ments.
• Our type syntax is more complete, clearer and
open to more orthodox type manipulation, as
it doesn’t provide just a plain AST-ization of
COBOL picture declarations
3
.
• The type inference
4
rules given in (van Deursen
and Moonen, 2001) are sometimes trivial. We de-
fine a type-system that reconstruct more detailed
type information, e.g. our type rules for arith-
metic operators in table 6 recalculate the resulting
type format length in order to include within the
type itself as much information as possible about
changes in value ranges.
• We don’t infer a type equivalence when two or
more types are expected to be the same (as would
happen in ML in a homogeneous binary applica-
tion, for example). Our system rather falls back to
a variable initialization type in case a type mis-
match or ambiguity is detected. This trade off
makes type derivations simpler, does not neces-
sarily imply a loss of information and reflects
COBOL run-time semantics better.
system has been proposed several times in more articles with some
additions - we might therefore refer to either (van Deursen and
Moonen, 1998), (van Deursen and Moonen, 2001), (van Deursen
and Moonen, 2000), (Kuipers and Moonen, 2000) or (Moonen,
2003).
3
Syntax of types in (van Deursen and Moonen, 1998) oddly
carries along the variable identifiers and picture format strings as
is, leaving unclear how the type environment and type comparisons
formally related to them.
4
That system uses the word inference, with a clear reference
to the world of ML and functional languages, though we’d prefer
reconstruction, as there is actually no use of type variables and uni-
fication for resolving a set of constraints over type equations.
ICSOFT 2011 - 6th International Conference on Software and Data Technologies
66