FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic Mathematics.

Anju & V.K. Agrawal,
Department of ECE, I.M.S. Engineering Collage, NH-24, Adhyatmik Nagar, Ghaziabad – 201009, U.P., India

Abstract: A high speed processor depends greatly on the multiplier as it is one of the key hardware blocks in most digital signal processing system as well as in general processors. This paper proposes the design of 8x8 bit Vedic multiplier based on vertical and crosswise structure of Ancient Indian Vedic Mathematics. The proposed architecture is for two 8-bit numbers, the multiplier and multiplicand each are grouped as 4-bit numbers; so that it decomposes into 4x4 multiplication modules. This gives chance for modular design where smaller blocks can be used to design the bigger one. Further, the VHDL coding of Urdhava Tiryakbhyam sutra for 8x8 bits multiplication and their FPGA implementation by Xilinx Synthesis Tool have been done.

Keywords- FPGA, Urdhva Tiryagbhram sutra, Vedic mathematics, Xilinx.

I. Introduction

With the latest advancement of VLSI technology the demand for portable and embedded digital signal processing (DSP) systems has increased efficiently. Multipliers are key components of many high performance systems such as FIR filters, Microprocessors, Digital Signal Processors etc. For higher order multiplications, a large number of adders or components are used. The speed of the multiplier depends on the number of partial product and the speed of adder.

Vedic mathematics was reconstructed from ‘Vedas’ by Sri Bharti Krishna Tirthaji (1884-1960) after his eight years of research on Vedas [1]. According to him, Vedic mathematics is mainly focused on sixteen very important principles or word-formulae which are termed as sutras. In this paper after a gentle introduction of Urdhavatiryakbhyam sutra, multiplier architecture is discussed and is illustrated with two 8-bit numbers.

The multiplier and multiplicand, each are grouped as 4-bit numbers so that it decomposes into 4x4 multiplication module. After decomposition, vertical and crosswise is applied to carry out the multiplication on first 4x4 multiply module[2]. The multiplier based on this sutra has the advantage that as the number of bits increases, gate delay and area increase very slowly as compared to other conventional multipliers. Vedic Sutras are applied to binary multipliers using carry save adders. The Vedic multiplier which is implemented by us and is discussed in this paper, performs partial product generation and addition in parallel has better performance in terms of speed and area.

II. Vedic Multiplier using ‘Urdhva Tiryagbhyam Sutra

The use of Vedic mathematics lies in the fact that it reduces the typical calculations in conventional mathematics into very simple one. This is so because the Vedic formulae are claimed to be based on the natural principles on which the human mind works. Vedic mathematics is a methodology of arithmetic rules that allow more efficient speed implementation. Urdhva Tiryakbhyam is the general formula applicable to all cases of multiplication.

‘Urdhva’ and ‘Tiryagbhyam’ words are derived from Sanskrit literature. ‘Urdhva’ means ‘Vertically’ and ‘Tirya-gbhyam’ means ‘Crosswise’[3]. It is based on a novel concept through which the generation of all partial products can be done with the concurrent addition of these partial product.

The sutra is illustrated in Fig 1. and the hardware architecture is depicted in Fig. 2 in this example two decimal numbers 252x846 are multiplied.

The digits on the both sides of the line are multiplied and added with the carry from the previous step. This generates one of the bits of the result and a carry. This carry is added in the next step and hence the process goes on. If more than one line is there in one step, all the results are added to the previous carry. In each step, least significant bit acts as the result bit and all other bits act as carry for the next step. Initially the carry is taken to be zero.
FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic

Example 1: $325 \times 738 = 239850$

Answer: $325 \times 738 = 239850$.

Fig. 1 Multiplication of two decimal Numbers by Urdhava Tiryakbyam

The 'Urdhva Tiryakbyam' algorithm can be implemented for binary number system in the same way as decimal number system. For the multiplication algorithm, let us consider the multiplication of two 4 bit numbers $a_3 a_2 a_1 a_0$ and $b_3 b_2 b_1 b_0$. As the result of this multiplication of two 4 bit, we express it as $c_7 r_6 r_5 r_4 r_3 r_2 r_1 r_0$.

Least significant bit $r_0$ is obtained by multiplying the least significant bits of the multiplicand and the multiplier as shown in Fig 3. The digits on both sides of the lines are multiplied and added with the carry from the previous step [4 - 7]. This generates on the bits of the result ($r_n$) and a carry ($c_n$). This carry is added in the next step and thus the process goes on.

Thus the following expressions (1) to (7) are derived:

1. $r_0 = a_0 b_0$
2. $c_1 = a_1 b_0 + a_0 b_1$
3. $c_2 = c_1 + a_2 b_0 + a_1 b_1 + a_0 b_2$
4. $c_3 = c_2 + a_3 b_0 + a_2 b_1 + a_1 b_2 + a_0 b_3$
5. $c_4 = c_3 + a_4 b_0 + a_3 b_1 + a_2 b_2 + a_1 b_3$
6. $c_5 = c_4 + a_5 b_0 + a_4 b_1 + a_3 b_2 + a_2 b_3$
7. $c_6, c_7, \ldots$
FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic

\[ c_0 r_0 = c_5 + a_3 b_3 \]  

(7)

With \( c_0 r_0 \) being the final product. Partial products are calculated in parallel and hence the delay involved is just the time it takes for the signal to propagate through the gates.

\[
\begin{align*}
&\text{Line Diagram for 2 digits} \\
&\text{Line Diagram for 3 digits} \\
&\text{Line Diagram for 4 digits}
\end{align*}
\]

Fig. 3. Line Diagram for Urdhva Multiplication of 2, 3 and 4 digits

III. The Proposed Multiplier Architecture

The hardware architecture of 4x4 and 8x8 bit Vedic multiplier module are displayed in the below sections. Here UrdhvaTiryagbyham (vertically and crosswise) sutra is used to propose such architecture for the multiplication of two binary numbers. The beauty of Vedic multiplier is that here partial product generations and additions are done concurrently. Hence it is well adapted to parallel processing. The feature makes it more attractive for binary multiplications. This in turn reduces delay, which is the primary motivations behind this work.

3.1. Vedic Multiplier for 4x4 bit module.

The 4x4 bit Vedic multiplier module is implemented using four 2x2 bit Vedic multiplier modules. Let’s analyze 4x4 multiplications say \( A = A_3 A_2 A_1 A_0 \) and \( B = B_3 B_2 B_1 B_0 \). The output line for the multiplication results is \( S_7 S_6 S_5 S_4 S_3 S_2 S_1 S_0 \). Let’s divide \( A \) and \( B \) into two parts say \( A_3 A_2 A_1 A_0 \) for \( A \) and \( B_3 B_2 B_1 B_0 \) for \( B \) using the fundamental of Vedic multiplication, taking two bit at a time and using 2 bit multiplier block, we can have the following structure for multiplication as shown in Fig. 4.

\[
\begin{align*}
&\text{A}_3 A_2 A_1 A_0 \\
&B_3 B_2 B_1 B_0
\end{align*}
\]

Fig 4: Algorithm of 4x4 bit Vedic Multiplier.

Each block as shown above is 2x2 bit Vedic multiplier. First 2x2 bit multiplier inputs are \( A_3 A_2 \) and \( B_3 B_2 \). The last block is 2x2 bit multiplier with inputs \( A_1 A_0 \) and \( B_1 B_0 \). The middle one shows two 2x2 bit multiplier with inputs \( A_3 A_2 \) & \( B_3 B_2 \)and \( A_1 A_0 \) & \( B_1 B_0 \). So the final result of multiplication, which is of 8 bit, \( S_7 S_6 S_5 S_4 S_3 S_2 S_1 S_0 \) to understand the concept, the Block diagram of 4x4 bit Vedic multiplier is shown in Fig 4.To get final product \( (S_7 S_6 S_5 S_4 S_3 S_2 S_1 S_0) \) four 2x2 bit Vedic multiplier and three 4-bit Ripple-Carry (RC) adders are required.
The proposed Vedic multipliers can be used to reduce delay. Early literature speaks about Vedic multiplier based on array multiplier structures. On the other hand, we proposed a new architecture, which is efficient in terms of speed. The arrangements of RC Adders showed in Fig. 5 helps us to reduce delay. Interestingly, 8x8 Vedic multiplier modules are implemented easily by using four 4x4 multiplier modules[8-10].

3.2. Vedic Multiplier for 8x8 bit module.

The 8x8 bit Vedic multiplier module as shown in block diagram in Fig.6 can be easily implemented using four 4x4 bit Vedic multiplier module. Let’s analyze 8x8 multiplications say A=A_7A_6A_5A_4A_3A_2A_1A_0 and B=B_7B_6B_5B_4B_3B_2B_1B_0. The output line for the multiplications result will be of 16 bits as S_{15}S_{14}S_{13}S_{12}S_{11}S_{10}S_{9}S_{8}S_{7}S_{6}S_{5}S_{4}S_{3}S_{2}S_{1}S_{0}.Lets divide A and B into two parts; say the 8 bit multiplicand A can be decomposed into pair of 4, bits AH-AL. Similarly multipli-cand B can be decomposed in BH-BL. The 16 Bit product can be written:

\[ P = A \times B = (A_{\text{H}} - A_{\text{L}}) \times (B_{\text{H}} - B_{\text{L}}) \]

Using the fundamental of Vedic multiplication, taking four bits at a time and using 4 bit multiplier block we can perform the multiplication. The outputs of 4x4 bit multipliers are added accordingly to obtain the final product [11-13]. Here total three 8 bit Ripple-Carry Adders are required as shown in Fig 6.

IV. Design Verification and Implementation

In this work, 8x8 bit Vedic multiplier is designed in VHDL (very High speed Integrated Circuits Hardware Description Language) Logic Synthesis and Simulation is done in Xilinx ISE 8.2 i – Project Navigator and Isim simulator integrated in the Xilinx package. The perfor-mance of circuit is evaluated on the Xilinx device family, Spartan 2, XC2550 and package TQ144. The summary of the device description of the vertex FPGA used is explained in the table 1.
Table 1: Summary of FPGA Features

<table>
<thead>
<tr>
<th>Device Family</th>
<th>Vertex</th>
</tr>
</thead>
<tbody>
<tr>
<td>Device</td>
<td>XC 2550</td>
</tr>
<tr>
<td>Package</td>
<td>TQ 144</td>
</tr>
<tr>
<td>Speed Grade</td>
<td>-6</td>
</tr>
</tbody>
</table>

4.1. Simulation Results

After the successful compilation the RTL view generated is shown in Fig. 7(a) and 7(b).

Fig. 8 and Fig. 9: show the simulation result of 8x8 bit Vedic multiplier for unsigned binary and decimal number respectively.

Fig 8(a) and 8(b) show the simulation result, when multiplicand Vedic_mult 8x8/multi is =111111111 (255) and the multiplier Vedic_mult 8x8/mplc is =111111111 (255). The results of 16-bit output is obtained, Vedic_mult 8x8/p-out =1111111100000001 (65025).

Fig 9(a) and Fig 9(b) show the simulation result of binary and decimal respectively. The 8 bit operands are unsigned and the product of 000011111 (15) x 000011111 (15) = 0000000011100001 (255).
FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic

8(b)

Fig. 8 Simulation waveforms of 8x8 bit Vedic Multiplier for multiplier : For multiplicand = 11111111 (255) and multiplier = 11111111 (255)

9(a)

Fig. 9 Simulation waveforms of 8x8 bit Vedic Multiplier for multiplier : For multiplicand = 00001111 (15) and multiplier = 00001111 (15)

4.2 Synthesis Results

Table.2 below shows the Synthesis report of the Vedic multiplier with the logic resource utilization.

<table>
<thead>
<tr>
<th>Device utilization Summary (estimated values)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Logic utilization</td>
</tr>
<tr>
<td>Number of Slices</td>
</tr>
<tr>
<td>Number of slice Flip Flop</td>
</tr>
<tr>
<td>Number of 4 input LUTs</td>
</tr>
<tr>
<td>Number of Bounded 10 B's</td>
</tr>
<tr>
<td>Number of GCLKs</td>
</tr>
</tbody>
</table>

Synthesis was done using Xilinx ISE 8.21. The device chosen for synthesis is 2s15cs144-6. The Computation path delay for proposed 8x8 bit Vedic multiplier is found to be 8.460 ns.

The power consumption was measured by using Xpower option available in project Navigator in ISE 6.1 Power consumption is 9.29 mW. Total memory usage is 166744 Kilobytes.

V. Conclusion

This paper presents an efficient method of multiplication (UrdhvaTiryakbhyam) Sutra's based on Vedic mathematics. It gives us method for hierarchical multiplier design and clearly indicates the computational advantages offered by Vedic methods.

The Computational path delay for proposed 8x8 bit Vedic multiplier is found to be lesser then the conventional multiplier. Hence our motivation to reduce delay is finally fulfilled. Vedic multiplier has less
FPGA Implementation of Low Power and High Speed Vedic Multiplier using Vedic

number of gates required for given 8x8 bit multiplier so its power dissipation is very small as compared to other multiplier architecture. In terms of area also, the proposed multiplier is better than the conventional multiplier. An awareness of Vedic mathematics can be effectively increased if, it is included in engineering education which may lead to improvement significantly in many areas where fast arithmetic computational are critical such as real time DSP applications.

References


Ms. Anju is a M. Tech. scholar in VLSI Design at IMS Engineering Ghaziabad, affiliated to Mahamaya Technical University Noida.

V.K. Agrawal is a senior faculty at IMS Engineering Ghaziabad & is project guide of Ms. Anju. His interest of research is, VLSI Design & Allied branches. He has published many research papers in International Journal & Conferences.