Remark 4. For a better accuracy, m
M
i
can be evalu-
ated using formulas from section 2.1, it avoids double
sign bit in general case (given by the +1 in eq. (24)).
Moreover, if `
c
i
+ `
v
i
> `
M
i
, then `
c
i
+ `
v
i
− `
M
i
zeros
are added to the right of p
i
s to ensure `
i
= `
f
− δ for
these p
i
s.
Error evaluation. Adding two numbers in FxP
arithmetic requires to align them onto the same LSB
using right-shifts. A rounding error may occur, which
introduces a numerical error. After the second step of
LSB formatting, where rounding errors may be intro-
duced, all p
i
s have the same LSB, i.e. `
f
− δ. There
is no need of right-shift to align operands of additions
and therefore no additional rounding errors are intro-
duced by the global sum of p
i
s.
The total number of right-shifts involved in the
first step of our method can be bounded as follows.
Proposition 3. With this LSB formatting technique
and for a n
th
-order SoP, the number of right-shifts is
bounded by n + 1, at most one right-shift by multi-
plier (denoted d
i
for multiplier M
i
) and exactly one
final right-shift (denoted d
f
). Their values are:
d
i
, `
f
− δ − `
c
i
− `
v
i
∀i ∈ I (26)
and
d
f
, δ (27)
where I = {i | 1 6 i 6 n and `
f
− δ > `
c
i
+ `
v
i
}.
Proof: d
i
is the right-shift in multiplier M
i
if a right-
shift is necessary, i.e. if `
f
− δ > `
c
i
+ `
v
i
so the
number of bits to remove to ensure `
i
= `
f
− δ is
`
f
− δ − `
c
i
− `
v
i
. All the additions are computed on
w
f
+ δ bits, and the result is w
f
bits long, so the final
right-shift value is δ.
Remark 5. In Proposition 3 only non-zero right-
shifts are considered. If d
i
is defined as max(`
f
−
δ − `
c
i
− `
v
i
,0) rather than just `
f
− δ − `
c
i
− `
v
i
, all
multipliers have a right-shift, possibly null, and so the
exact number of right-shifts is n + 1.
Finally, it is possible to bound the error introduced
by our method. As seen in (Hilaire and Lopez, 2013),
the right shifting of d bits of a variable x (with (m,`)
as FPF) is equivalent to add an interval error [e] =
[e;e] with
Truncation Round to the nearest
[e,e] [−2
`+d
+ 2
`
;0] [−2
`+d−1
+ 2
`
;2
`+d−1
]
(28)
So the global interval error for the LSB technique can
be evaluated with the following properties.
Proposition 4. The global interval error using LSB
formatting technique is [e] = [e; e] with:
Truncation:
e =
∑
i∈I
(−2
`
i
+d
i
+ 2
`
i
) − 2
`
f
+ 2
`
f
−δ
(29)
e = 0 (30)
Round to nearest:
e =
∑
i∈I
(−2
`
i
+d
i
−1
+ 2
`
i
) − 2
`
f
−1
+ 2
`
f
−δ
(31)
e =
∑
i∈I
(2
`
i
+d
i
−1
) + 2
`
f
−1
(32)
with I = {i | 1 6 i 6 n and `
f
−δ > `
c
i
+`
v
i
} and `
i
=
`
c
i
+ `
v
i
, where `
c
i
and `
v
i
are positions of LSBs of c
i
and v
i
respectively.
Proof: By using (28) on a multiplier i, e equals to
−2
`
i
+d
i
−1
+ 2
`
i
where d
i
is the right-shift value given
by Proposition 3 and p
i
is the initial LSB of the mul-
tiplier result, i.e. the optimal LSB which is the sum
of LSBs of product operands c
i
and v
i
. For the final
right-shift, the initial LSB equals to `
f
− δ and the fi-
nal result is δ bits right-shifted.
Remark 6. The precise bounds of the global inter-
val error shown in Proposition 4 can be bounded by
a power of 2. Indeed, for truncation rounding mode
the global interval error is included in ] − 2
`
f
+1
;0],
whereas for round-to-nearest rounding mode it is in-
cluded in ] − 2
`
f
;2
`
f
[.
In (Lopez et al., 2012), all p
i
s have different LSBs,
and therefore the global error depends on the order of
the additions. Consequently, all the different evalua-
tion schemes (ES), i.e. all the different possible orders
of the additions, are generated and the choice is made
meanly for ES with a minimal error. Here, all ES have
the same global error value (Proposition 4), so error
can not be a criteria to choose the best ES representing
the sum. The criteria chosen in the section 5 is the in-
finite parallelism criteria, i.e. the most parallelizable
ES.
3.3 MSBs Formatting
The MSBs of p
i
s having a greater positions than the
final MSB, m
f
can be removed using a new formal-
ization of the Jackson’s Rule (Jackson, 1970). This
Rule states that in consecutive additions and/or sub-
tractions in two’s complement arithmetic, some inter-
mediate results and operands may overflow. As long
as the final result representation can handle the final
result without overflow, then the result is valid.
Example 2. Let us consider a sum S of three 8-bit in-
tegers with two’s complement arithmetic, for example
104 + 82 − 94. The result S = 92 is in the range of 8-
bit signed numbers, but the intermediate sum 104+82
PECCS2014-InternationalConferenceonPervasiveandEmbeddedComputingandCommunicationSystems
108