Designing of ALU Block for RISC-V-Based Processor Core with Low

Power

K. Madhava Rao

, T. Vasudeava Reddy and Uthej

B. V. Raju institute of Technology, BVRIT, Narsapur, Marsapur, Medak, India

Keywords: RISC-V Architecture, Parallel-Prefix Adders, Kogge-Stone Adder, Galois-Field Multiplier, Approximate

Computing, Instruction Sets.

Abstract: We are in an era of technology where the speed, power, and area of the VLSI chip are the three extremely

crucial factors. Adders and multipliers form key components in a variety of arithmetic and logical processes.

For binary addition, Parallel Prefix Adder (PPA) is an effective and useful circuit that has evolved. Fast

Kogge-Stone Adder (KSA) is Parallel Prefix Adder that is used for making fast addition operations. A Galois

Filed Multiplier (GFM) is also developed for making fast operations that work on shift operation technique.

The objective of the research is to integrate FKS and GFM that accommodate superior systems to improvise

a performance. Both of these modules are integrated into an ALU block of the RISC V processor. The design

has been implemented for a 32-Bit sequence. and the entire design was implemented in Xilinx ISE and Open

Lane toolset.

1 INTRODUCTION

In an emerging digital world, systems like

microprocessors and data processing jobs like

filtering, object recognition, convolution, image and

video compression, modulation, and so on, the

addition and multiplication of the binary values (say

two) is amongst the most fundamental and important

aspects of the arithmetic logic unit (ALU). The binary

adders and multipliers in these systems are crucial

aspects of an ALU. The requirements referring to an

adder in VLSI systems need to be quick and reliable

in terms of speed, area, and power. The adder design

architecture is primarily responsible for performing

operations. Because RISC V is an open-sourced

definition of an Instruction Set Architecture, the

architecture considered here is RISC architecture

(ISA). In contrast to most other ISA designs.

Nowadays, we do not want to waste our crucial

time doing a single task the same goes for the

processors. Nobody would want to have an electronic

device with a slow processor, but how are adders and

multipliers related to processors? The answer is

simple – the better the adder and multiplier design

techniques the faster the processor works. We

https://orcid.org/0000-0003-3115-6385

measure the speed of processors by the time it takes

them to complete a task; the time it takes the

processors to complete a task is normally equal to the

overall propagation delay in the circuit. The

circuit/processor becomes slower as the propagation

delay increases, and the opposite is also true. With the

preceding argument, we may also conclude that 'the

fewer the propagation delays, the faster the

circuit/processor, and vice versa. The data transfer

rate of the carry chain is a key concern for binary

addition. As the input bit or operand size rises with

the length of the carry chain. The Modern PPA

architecture consists of Pre-Processing (Felzmann et

al 2020), Carry Look-Ahead, and Post-Processing

parts utilized to prevent the problem of Carry chain

propagation. In binary addition of digital systems,

these adders are used to build the most efficient

circuit (Shilpa et al 2016) (here we used an ALU

block based on RISC architecture). The performance

and latency of these adders are mostly determined by

total levels in carry generation. To reduce

propagation latency, we added two modules to the

CPU's ALU block: the Kogge-stone adder and the

Galois field multiplier.

Rao, K., Reddy, T. and Uthej, .

Designing of ALU Block for RISC-V-Based Processor Core with Low Power.

DOI: 10.5220/0012559700003739

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Artiﬁcial Intelligence for Internet of Things: Accelerating Innovation in Industry and Consumer Electronics (AI4IoT 2023), pages 71-76

ISBN: 978-989-758-661-3

2 RISC-V PROCESSOR UNIT

The Reduced Instruction Set Computer [RISC]

processor is a computer arithmetic-logical unit that

has a small set of instructions that optimizes the entire

logic in most frequently used instructions which can

be quickly accessible for faster execution. Compared

to CISC and RISC processors software must handle a

greater number of operations per cycle. The RISC

processors have a trump card in entreaty that will be

helpful to perform faster instruction execution. The

process of designing, testing, and manufacturing

them is also relatively inexpensive. The instruction

control unit occupies most of the chip area the

instructions that access memory results in decreasing

the speed execution. Thus, the RISC processor is

developed, and a small set of instructions results in

simplifies the design and improves the overall

performance of the processor.

In this paper, we present a RISC processor

equipped with three major components, which are

summarized in Figure 1, the three most crucial

components of the processor unit are data Path,

control, and memory Unit.

2.1 Data Path Unit

The hardware responsible for making all the vital

operations like ALU operations, internal bus

communication, and registers is known as Data Path

Unit.

2.2 Control Unit

The hardware decides on which data path needs to be

selected in the call of selection of operation and the

movement of data between the ALU components is

known as Control Unit.

Figure 1: System Architecture.

2.3 Memory Unit

The data can be stored in a Memory unit also known

as a Storage unit. The data stored can be expressed in

terms of Bytes.

2.4 Operation

The overall processor architecture diagram is shown

in figure 2. The architecture consists of five stages of

operations such as fetch, decode, execute, memory,

and write Back.

Figure 2: Data Flow model.

Since we are integrating the modules into the

ALU block, our design mainly focuses on Execute

stage of Operation.

3 INTEGRATING ADDITIONAL

LOGICS INTO THE EXISTING

ALU BLOCK

The existing RISC V-based ALU block consider here

is the Risk-5 module. In this module, Kogge-Stone

Adder is the fast adder adopted in the industry, and

Galois Field Multiplier is best proven used for

cryptographic applications integrated into the ALU

block.

The two modules which are integrated into the

Risk-5 ALU block were Kogge-Stone Adder and

Galois Field Multiplier.

3.1 Kogge-Stone Adder

KS Adder is a CLA is a Parallel-Prefix form of logic

attached to it. It generates carry in no time 0 (Log2 N)

(can say it as ZERO time) and is commonly regarded

as the quickest adder in the industry upon its great

work in terms of performance in arithmetic and logic

circuits. The Kogge-stone adder performs faster

computations by having parallel computing with a

little overhead of area which is acceptable

considering the area-to-performance ratio. The Carry

outputs generated from this adder were very quick in

that the computations were done in parallel and thus

the pace of operation is fairly inflated as the depth of

the node is low & the result is relying on initial inputs.

The whole design of architecture and its functioning

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

can be effortlessly realized by looking over it

concerning three evident sections:

a. Pre-Processing Block

b. Carry Look Ahead Generation Block

c. Post-Processing block

Figure 3: Stages of KS parallel prefix adder.

After product output X is obtained, results are

divided by p(x) to reduce represents by LPR as shown

below (Chen et al 2014). The basic building block of

Kogge-Stone Adder is consisting of three blocks as

discussed above. Below is the figure of basic 4bit

length Kogge-Stone Adder.

1. Pre-Processing. This stage is responsible for

creating a pair of create and cultivate signals for A

and B inputs. Below two are the logic equations for

create and cultivate signals.

ppi = Ai xor Bi

gi = Ai and Bi

3.2 CLA Network

KSA has an advantage over other adders due to this

block and is the source of its high performance. The

calculation of carries for each bit takes place in this

stage. And utilizes the obtained intermediate cultivate

and create signals. The logic equations for the

intermediate create and cultivate group of signals is

given.

Pi:j = Pi:k+1 and Pk:j

Gi:j = Gi:k+1 or (Pi:k+1 and Gk:j )

3.3 Post Processing

Figure 4: KS Parallel Prefix Adder Architecture.

All adders of this family go through this closing step

(carry look ahead). This stage eventually necessitates

in computation of sum bits. The logic equation below

computes the sum bits:

Si = pi xor Ci-1

KSA and other adders have a greater distance

when more bits are required.

3.4 Galois Field Multiplier

The efficient Galois Field athematic logic (GF(2m))

are likely used in error decoding, error controlling and

cryptosystems. The efficient execution and

implementation of GF(2m) multiplication can

immensely improve system intricacy and timing

performance.

As GF(q) is expounded by primitive polynomial

to form GF(q^m), where GF(q) is a field with finite

elements. Exor operation is represents GF addition

and multiplication is seen as multiplication as modulo

p(x) (Chen et al 2014). The multiplication of a and b

symbols were given with a mod primitive

polynomial. The primitive AND gate is used to

multiply bits and the product output X obtained are

connected to each other nearly to the actual as given

below.

After product output X is obtained, results are

divided by p(x) to reduce represents by LPR as shown

below (Chen et al 2014. Torres et al. 2024).

This circuit affects regular two-bit multiplication

elements of Galois Field (2m). Then each bit of the

field is intimated as m-bit binary number. The 2m-

1bit partial products are generated when two elements

are given to binary multiplier and final m-bit binary

products are generated, that is divided by Galois field

generator of polynomial.

When a(x) & b(x) are two field elements also s(x)

is their product.

s(x) = a(x)*b(x) p(x)I (1)

Designing of ALU Block for RISC-V-Based Processor Core with Low Power

Figure 5: Cell-View of RISC-V design in Synopsys ICC2.

Figure 6: Routing-View of RISC-V design in Synopsys ICC2.

The product entails 2 steps that involves the

polynomial multiplication and reduction to a non-

reducible polynomial for a 4 bit multiplier as shown

in figure 1. 2m-2 polynomial of d(x) indicates the

product of polynomials indicates field elements a(x)

& b(x), where d(x)=a(x)b(x). The degree 2m-2

polynomial d(x) is decreased by degree of m, in

modular reduction s(x) = d(x)Ip(x)I, by irreducible

polynomial p(x) iteratively” (Chiou et al 2007).

Hence, modular reduction may be simplified by

choosing the irreducible polynomial p(x). It is usually

more efficient to use irreducible polynomials with

fewer nonzero terms.

4 IMPLEMENTATION

The attached images below refer to the physical

implementation of the RISC-V design with the in-

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics

house developed Kogge-Stone Adder and Galois-

Field Multiplier in cell-view and in routing-view.

5 RESULTS

The obtained results of Kogge-Stone Adder and

Galois Field Multiplier shows the Power, Area and

Timing parameters. The parametric figures obtained

were good enough. The functionality seems to be

working as the requirements with optimum power

consumption, reasonable area consumed and have

met the timing in worst case (Kaiyrakhmet 2024).

Table 1: Kogge_stone_32bit Project Statistics.

Figure 8: Galois_field_32bit Project Statistics.

6 COMPARISONS

Comparison of different types of Adders:

• Table 1 - Comparison of 8/16/32-Bit Kogge-

Stone Adders.

• Table 2 - Comparison of Parallel Adders.

• Table 3 - Comparison of 32-Bit KSA with

RCA, CLA, and Basic Adder.

• Table 4 - Comparison of 8/16/32-Bit Galois-

Field Multipliers

• Table 5 - Comparison of 32-Bit Multipliers

Table 2: Comparison of 32-Bit Parallel Adders.

Table 3: Comparison of 32-Bit Kogge-Stone Adder with

other Adders.

Table 4: Comparison of 8-bit, 16-Bit, 32-Bit Galois-Field.

Table 5: Comparison of different 32-Bit Multipliers.

7 CONCLUSION

A fast Kogge-Stone adder and Galois-Field

Multiplier are designed here, and simulated using

Xilinx ISE 14.7 software. The obtained gate level

netlist is integrated with ALU block which is based

on RISC-V architecture. Technology based on

0.18um process is physically implemented.

Validation of high-speed adder and multiplier are

compared with other adder and multiplier

architectures with RCA, CLA and Wallace Tree

(Anjana et al. 2014)

All architectures were implemented using Verilog

and synthesized using front-end tools of Xilinx ISE

14.7, back-end tools Open Lane, QFLow to generate

the ASIC layouts (Lee Mei et al 2018). Significant

improvement in delay area and power consumption is

obtained by comparing KSA and GFM with other

architectures to accelerate complex applications.

ACKNOWLEDGMENT

This research and implementation were supported by

Mr. K. Madhava Rao and Mr. T Vasudeva Reddy for

designing and integrating the fast adder and multiplier

modules. Xilinx ISE 14.7 and Model Sim tools are

used and implemented in the Verilog HDL and

Python programming languages.

Designing of ALU Block for RISC-V-Based Processor Core with Low Power

REFERENCES

Felzmann, I., Filho, J. F., & Wanner, L. (November 2020).

Risk-5: Controlled approximations for RISC-V.

Shilpa C. N, Kunjan D. Shinde, Nithin H. V. (2016).

Design, Implementation and Comparative Analysis of

Kogge Stone Adder using CMOS and GDI design: A

VLSI Based Approach. PESITM, Shivamogga.

Lee Mei Xiang, Muhammad Mun’im Ahmad Zabidi, Ainy

Haziyah Awab, Ab Al-Hadi Ab Rahman. (2018). VLSI

Implementation of a Fast Kogge-Stone Parallel-Prefix

Adder. Journal of Physics: Conference Series.

Kim, N.S., & Karpuzcu, U.R. (2019). Approximate ultra-

low voltage many-core processor design.

Vimukth John, Shylu Sam, S. Radha, P. Sam Paul, Joel

Samuel. (2020). Design of a power-efficient Kogge–

Stone adder by exploring new OR gate in 45nm CMOS

process. Circuit World.

Sunil M, Ankith R D, Manjunath G D, & Premananda B S.

(Jan 2014). Design and Implementation of Faster

Parallel Prefix Kogge Stone Adder.

Song, L., & Parhi, K.K. (1998). Low-Energy Digit-

Serial/Parallel Finite Field Multipliers.

Kaiyrakhmet, O. (2024). Next generation storage: Non-

Volatile Memory. SPAST Reports, 2(2).

https://spast.org/ojspath/article/view/23

Chiou, C.W., Lee, C.Y., & Lin, J. M. (2007). Finite field

polynomial multiplier with linear feedback shift

Chen, R.J., Fan, J.W., & Liao, C.H. (2014). Reconfigurable

Galois Field multiplier.

Rosa Elia Martínez Torres, Thakur, S., Oruganti, S. K., &

Nita Sukdeo. (2024). Ring Resonator Measurement:

Characterizing High Dielectric Constant Materials for

the Fourth Industrial Revolution. SPAST Reports, 2(2).

https://spast.org/ojspath/article/view/4952

R Anjana, B Abishna, M. S Harshitha, E Abhishek, V

Ravichandra, M S Suma. (2014). Implementation of

vedic multiplier using Kogge-stone adder. In 2014

International Conference on Embedded Systems

(ICES).

AI4IoT 2023 - First International Conference on Artiﬁcial Intelligence for Internet of things (AI4IOT): Accelerating Innovation in Industry

and Consumer Electronics