successive tuples of the quotient. As before, at these
instances (t > 0), higher order tuples (T
n-2
, T
n-3
, T
n-4
,
T
n-5
..... etc.) are progressively loaded into register 2.
The subtractor then outputs the partial quotient and
remainder for each time instant, while the
decrementor and the second subtractor normalize the
partial quotient and the remainder which are fed to
the output queue and to register 1 for the processing
of the next sample instant, respectively. The cycle
generates the quotient tuple bits for the whole part
(non-fractional) part in the case of both integer and
floating point division.
The processing is terminated when all the input
dividend tuples have been processed. In the case of
integer division, the process remainder equals the
partial remainder obtained from the second
subtractor for the last time sequence instant (t = n-1).
In the case of floating point division, the fractional
part of the quotient is obtained as non-terminating,
recurring tuple bits computed by repeating the tuple
bits representing the partial remainder less 1 and its
2
p
-1 complement. In other words the fraction part
consists of repeating sets of 2 tuples, the first of
which is one less than the partial remainder (R – 1)
and the second tuple is (2
P
-1) – (R-1).
As before, the fractional tuples can be
generated to any length based on the required levels
of system accuracy and precision.
4 ANALYSIS
Numerically, the algorithm presented in this paper
has an analogue in (Guei, 1985). However, unlike
the (Guei, 1985) algorithm which requires the entire
input stream for the computation, our scheme
requires only two tuples at any time sequence
instant, giving out one tuple of quotient bits along
with the partial remainder, based on carry/ borrow
calculation. Thus, our algorithm is well suited for
serial processing. Moreover, the circuit requirements
are constant for varying input bit lengths.
Additionally, our algorithm is well suited for integer
as well as floating point divisions and can generate
fractional results with arbitrary accuracy/ precision.
Since tuples can be processed serially using a
single adder and 2 incrementors (example case of
division by 2
P
-1), the constant circuit can be
efficiently implemented in hardware. As noted
earlier, the requirements do not change with increase
in input bit length and the same circuit can be
replicated for operation in parallel mode, in which
case the number of such computation units will be
equal to n – 1. Also, the design provides a natural
way to trade-off speed and circuit requirements
through the possibility of using a serial mode of
operation working on multiple tuples (or parallel
tuples) at the same time.
The possibility of calculating the fractional part
of the quotient to any arbitrary length with full
accuracy supports the use of the algorithm for
constant divider circuits in DSPs and other
embedded systems.
Since the number of computational units in the
pipeline is less than or is at least comparable to other
state-of-art methods, it can be expected that the
computational time numbers also favor use of our
approach.
5 CONCLUSIONS
AND FUTURE WORK
In this paper, we have presented the design for a
constant divider circuit of the form of the form 2
p
±1.
Analyses have also been presented to demonstrate
the constant computation requirements of the
approach. The method is well suited for processing
serialized inputs, dividend inputs with apriori
unknown bit length while producing full-precision,
full-accuracy, floating point capable results. The
next step would be to implement the design using
VHDL/Verilog for simulation and testing followed
by actual implementation in VLSI for a thorough
evaluation of timing, power requirements, memory
footprint and chip area estimation. This is expected
to be followed by performance evaluation of the
circuitry in consonance with a DSP or Embedded
System or a GPU targeted at applications such as
image processing, signal processing and statistical/
mathematical computation and modeling.
REFERENCES
A. Th. Schwarzbacher, M. Brutscheck, O. Schwingel, J. B.
Foley, ‘Constant Divider Structures of the Form 2
n
±1’, pp. 368-375, Irish Signals And Systems
Conference, 2000.
P. Srinivasan, F. E. Petry, ‘Constant-Division Algorithms’,
IEEE Proc. Computers and Digital Techniques, Vol.
141, No. 6, 2007 (1994).
A. Th. Schwarzbacher, P. A. Comiskey and J. B. Foley,
‘Reduction of the power consumption at the
algorithmic level of CMOS circuits’, Electronic
Systems and Devices Conference, pp. 5-8, June 1998.
B. Al-Besher, A. Bouridane, A. S. Ashur, ‘An RNS-based
Division Architecture for Constant Divisors of the
EFFICIENT SERIAL FLOATING-POINT CONSTANT DIVIDER STRUCTURE OF THE FORM 2P±1
489