Choice of Adders for Multimedia Processing Applications:
Comparison of Various Existing and a Novel 1-Bit Full Adder

Parameshwara M. C*, and H. C. Srinivasaiah+
*Department of Electronics and Communication Engineering, Vemana Institute of Technology, Bangalore, India.
+Department of Telecommunication Engineering, Dayananda Sagar College of Engineering, Bangalore, India.

Abstract: We propose a full adder (FA) circuit and compare its performance metrics with the performance metrics of 13 other 1-bit FA circuits reported till date. In comparison with all 13, 1-bit adder cell architectures, the proposed 1-bit adder cell architecture (labelled AL-31) is better by 10% in power delay product (PDP) over one of the 13 other architectures (labelled TGA), when used in a 32-bit ripple carry adder (RCA) benchmark circuit. But the TGA cell is having best PDP among all the 14, 1-bit cell architectures, when simulated standalone. To calculate the PDP parameter, the performance metrics under consideration are worst-case delay, and worst-case power. The work in this paper shows that the proposed 1-bit FA is a better choice for implementing long word adders. This study is based on Cadence’s Spectre simulation using generic 90nm Process Design Kit (PDK). All the standalone 1-bit FA cells and their corresponding 32-bit RCA based benchmarking are done at a supply voltage $V_{DD} = 1.2V$, with at least 2fF loading at their outputs.

Keywords: Adder cell architecture, Ripple carry adder, Power dissipation,Propagation delay, Power delay product, Carry propagation, and Carry skip adder.

I. Introduction

Modern portable device demands are high speed (conversely, low delay), low power, and lowest possible area; each one of these performance metrics compete with one another during design process, leading to challenges. The low power and high speed (or high frequency of operation) parameters are coupled with ‘power delay product’ (PDP) design metric. To meet specified value of PDP, one has to minimize delay (conversely maximize speed) and power consumed to meet the specified PDP. Portable devices always require lowest possible PDP in a given technology node for longer better life.

The characterization of such portable devices is based on 3 fundamental design metrics: power, delay, and size of the design. The main focus in this paper is the development of efficient transistor level architecture for the most basic building block-adder cells, targeted for long word size adders of the order of 128-bits based on carry skip adder (CSA) architecture [1], required for applications in special areas; e.g. modern Internet technology for running security algorithms where the security keys of large number of digits needs to be operated upon [2]. In this work, the benchmarking of 14 different FA cells reported till date including the proposed 1-bit FA cell is done in a 32-bit RCA chain.

The comparative study among a total of 14, 1-bit FA cell [3-12] is done at 2 stages viz., standalone level, and benchmarking level, with respect to power, delay, and PDP design metrics. The standalone level observations leading to certain claims for all the 1-bit adders reported till date, needs further validation at higher architecture levels. Therefore, in second stage we validate the earlier observation and claims of all the 14, 1-bit adder cells in 32-bit RCA chain, for their power, delay (in critical carry path), and PDP design metrics. In this approach the performance order of rank of all the 14, 1-bit FA cells at standalone level study is reordered based on the observation during benchmarking in 32-bit RCA, in view of making choice for long word adder architecture design. In the context of long word adders, our ‘28-transistors’ (28T) 1-bit FA cell architecture shows a better compatibility for its integration in carry skip/carry bypass long chain adders, when compared to other 13, 1-bit FA cells of varying transistor counts, reported till date.

The rest of this paper is organized as follows: Section-II gives an introduction of methodology followed in this paper, in studying performance of all 14, 1-bit FA cell architecture. Section III presents a comparative study of all the 14 different 1-bit FA cells in standalone level, and 32-bit RCA based benchmarking level. In this section we discuss on all observations made, on a comparative basis. Finally, we draw conclusion in Section IV, based on the overall observations made during this study.
II. Simulation Environment And Methodology

The comparison among 14 different adder architectures is done in 2 stages. In the first stage, we have compared the performance metrics of all the 14, 1-bit FAs, which includes the proposed 1-bit FA (labelled AL-31). In this stage we extract power, delay, and PDP through standalone simulation of all 14, 1-bit FA circuits. The term standalone in this context is simulation of 1-bit adders in isolation. The comparison of power, delay, and PDP parameters extracted through simulation are presented in subsequent sections.

In the second stage we compare the performance (design) metrics of all the 14, 1-bit FA circuits in a 32-bit ripple carry adder (RCA) chain, in which case each instance of 1-bit FA subcircuit interacts with other instances of itself in the RCA chain; this accounts for various parasitics within RCA chain. This interactive environment evaluates the driving strength of each one of the 14 FA circuits, within in 32-bit RCA. In this manner all the 14 architectures will be compared under identical conditions. This has led to different conclusion, meaning a 1-bit FA architecture performed best in standalone level is not the best in the benchmark circuit, as discussed in the subsequent sections. This study has further shed light on other facts; e.g. the low transistor count 1-bit FA cells fail when used in long word adder chain, demanding buffers and more adjunct circuitries at intermediate stages. This adds extra transistors which offsets the merit of low transistor count.

Generally long word adders required for special application depends on efficient architectures, e.g. carry skip adder (CSA) [13]. In CSA the addend and augend bits are considered in groups. This grouping requires computation of group propagate-generate (PG) signal, and bit PG signals, which requires group PG and bit PG logic circuitry. The proposed 1-bit FA cells provides the bit PG circuitry and the XOR circuitry for sum and carry bit calculation at its intermediate nodes; this avoids adding of extra circuits for that purpose. All these adders are designed and simulated using industry standard, Cadence Electronic Design Automation (EDA) tool based on 90nm Generic Process Design Kit (PDK).

To evaluate the performance of each of the 14, 1-bit adder circuits in close to realistic conditions a common test setup as suggested in [9] is used, with 3 inputs vector [Ai, Bi, Ci] and 2 outputs vector [Si, Ci+1], where Ai, Bi, and Ci are the augend, addend, and carry-input bits respectively; Si and Ci+1 are the sum and carry-output bits respectively of ith stage adder. The supply voltage $V_{DD}$ = 1.2V, maximum frequency applied at each input is 200MHz, under a nominal load capacitance $C_L$ = 2IF. The inputs Ai, Bi, and Ci are applied to the FA under test through minimum sized buffers.

To evaluate the worst-case delay of each of the 14 1-bit FA circuits, a standard input test pattern as suggested in [3, 4] is used, which is a combination of 56 input vector transitions. The term vector transition refers to change in bit levels of vector [Ai, Bi, Ci] from one combination to any other 7 remaining combinations during a smallest time interval/slot. Thus with the 3 inputs, we have $2^3$=8 input vectors; thus each one of the 8 vectors can transit to any of the remaining 7 input vectors, giving rise to a total input vector transitions of 8x7=56. For worst-case delay measurement, the delay at Si and Ci+1 for each one of the 56 input vector transitions is measured. One vector transition among all the possible 56 vector transition will yields worst case (maximum) delay either in Si or Ci+1 outputs. Thus the worst case delay is Max(Si, Ci+1) is the amount of time measured from 50% change in input signals: Ai, Bi, and Ci (0.6V for $V_{DD}$=1.2V) to the corresponding 50% change in the output signals, Si and Ci+1.

For worst-case power estimation, we consider 3 frequencies as suggested in [3], [4], which are $f_{MAX} = f_H = 200MHz$, $f_{M} = f_H/2 = 100MHz$, and $f_{L} = f_H/4 = 50MHz$. These 3 frequencies are applied at the 3 inputs Ai, Bi, and Ci, in a time division multiplexer manner over time slot with each time slot equal to period of signal with frequency $f_i$. All the other frequencies are considered as frequencies of inputs Ai, Bi, and Ci respectively. The input combinations are $f_i$, $f_{M}$, and $f_i$ in time slot 1, $f_{M}$, $f_i$, and $f_{M}$ in time slot 2, etc. In addition to 6 frequency combinations, three more input frequency combinations are generated using input signal frequency $f_{MD}$, where $f_{MD}$ is $f_H$ with an offset of 50% pulse width of $f_H$ i.e. 2.5ns. Three more frequency combinations mapped time slots from 7 to 9 are $f_{DF}f_{MD}f_{H}$, $f_{MD}f_{DF}f_{H}$, and $f_{DF}f_{MD}f_{H}$. The average power consumed over the 9 combinations of the 3 frequencies is considered as worst case power.

In the second stage of 32-bit RCA simulation, we connect 32, 1-bit FA cells of 14 different types in a ripple carry chain resulting in 32-bit RCA. This RCA is used as the benchmark circuit to assess the performance metrics of each of 14 different 1-bit adder cells. The worst case delay is measured along the carry path from the LSB adder cell to MSB adder cell carry out bit.

To begin with, we initialize all augend, addend, and carry-in inputs to zero; this will set the sum Si, and carry Ci+1 bit to zero over the entire 32-bit adder chain. Subsequently, all the augend bits Ai, 0<i<31 are set to 1, keeping addend bits Bi, 0<i<31 and C0 in logic zero level. This sets sum Si 0<i<31, to logic 1 level, and the carry-out of the last 1-bit cell-C32 is in logic 0 level. In the next step carry-in C0 of LSB 1-bit adder cell is switched to logic 1, which activates the carry propagation in the longest critical path from C0 to C32, which eventually sets C32 to logic 1 level. The worst case delay is the time that elapses between 50% of $V_{DD}$ points on the rising edges of C0 and C32. The average power is calculated during this critical path (from C0 to C32) activation and all the output sum bits toggle from logic 1 to logic 0.
III. Performance Metrics Of 13 Existing 1-Bit Full Adders And The Proposed 1-Bit Fill Adder

Fig. 1 shows estimated power, delay, and PDP performance metric presented in the form of normalized bar-chart for all the 14 1-bit FA cells under consideration, including the proposed 1-bit FA cell labelled AL-31 in the chart. All the 3 estimated performance metrics for each type of 1-bit FA cell are normalized with respect to the lowest corresponding value for 14 types. The normalized performance metrics of each type of 1-bit FA cell are grouped together for clarity, and each group are sorted in the increasing order of the normalized PDP values of all the 14 cells. The 1-bit FA cell labelled TGA has the lowest normalized PDP=1, while the circuit labelled MB has the highest normalized PDP=2.53, while the AL-31 has 1.2. The power and delay performance metrics are computed based on the method explained in the earlier section. For a given PDP, we can trade-off power with delay parameter.

Fourth group from the origin, in the chart of Fig. 1 is the proposed 1-bit FA cell, labelled AL-31, which is having approximately 20% high PDP compared to the adder cell labelled TGA. Low PDP for a circuit implies low energy spent during addition operation. Some of the low transistor count 1-bit FA cells, like 14T architecture [5] verify the functionality when studied in isolation (standalone) but fail to meet the requirements of long chain adders with poor driving, and severe logic level degradation.

Fig. 2 shows the transistor level circuit of 3, 1-bit FA cells picked from a set of 14, 1-bit FAs under consideration based on PDP criteria. Fig. 2(a) is the circuit of the proposed AL-31, 1-bit FA cell, which is 28T. This architecture is best suited for long word adder architectures, e.g. carry skip adder (CSA) [13]. In CSA the required bit-propagate and bit-generate logic circuits are readily derived from the intermediate nodes in the case of AL-31 circuit, thus simplifying the implementation of CSA using 1-bit AL-3 architecture. Due to mixed logic style of AL-31 adder, e.g. static CMOS NAND, NOR, and NOT along with transmission gate (TG) based XOR, and MUX implementation, Fan-out of AL-31 is proven to be relatively better compared to any other 13 types of 1-bit FA cells. The inherent buffering and topological features of AL-31 architecture to complement the implementation of CSA, makes it to have considerable advantage over other 13, 1-bit FA cells.

Fig 2(b) is the circuit schematic of 14T, 1-bit FA cell, labelled MB-architecture has the highest PDP, as shown in the last group of barchart of Fig. 1, and Fig. 2(c) shows the circuit schematic of 20T, 1-bit FA cell architecture, labelled TGA in the first group of barchart in Fig. 1, which has lowest PDP. Although TGA has lowest PDP as a standalone circuit, when connected as a 32-bit RCA, it has relatively high PDP (4<sup>th</sup> group of barchart in Fig. 3) as compared to 32-bit RCA due to AL-31, 1-bit FA cell (2<sup>nd</sup> group of barchart in Fig. 3). The relative position of barchart group of 32-bit RCA due to 1-bit MB cell in Fig. 3 remains approximately at the same position in the barchart group of 1-bit MB cell in Fig. 1.
All the 1-bit FA cells are simulated under identical conditions.

**Fig. 2:** Some of the important 1-bit full adder circuits picked from a set of 14 different FA architectures; (a) Proposed 1-bit (AL-31), 28 transistor (28T) full adder circuit, (b) A 14T MB 1-bit full adder circuit [6], and (c) A 20T 1-bit transmission gate adder (TGA) [7].

**Fig. 3:** Comparison of power, delay, and PDP of 14, 32-bit RCAs implemented using 14, 1-bit FAs.
IV. Conclusions

In this paper we have compared the performance of 14 different 1-bit FA cell architectures which includes the proposed 1-bit FA cell. Three performance metrics; power, delay, and PDP of all the 14, 1-bit FA circuits are compared on a scale normalized with respect to the lowest of power, delay, and PDP of 14, 1-bit FA circuits. In doing this we have designed and simulated all the 14, 1-bit FA cell architectures in standalone. The standalone performance is shown to be not necessarily reflect, the performance of given 1-bit FA cell in the long word adder chain; this fact is highlighted in the study of this paper. For example the proposed 1-bit FA circuit labelled AL-31 stands 4th best in PDP parameter when simulated as standalone circuit; when this cell is tested in the 32-bit RCA benchmark circuit, it stands 2nd, almost on par with 1st 1-bit FA cell architecture labelled TFA, with respect to PDP. Going further, the proposed 1-bit FA circuit has the merit of having the intermediate variables required for bit wise propagate-generate (PG) logic available for the CSA implementation. This eliminates the need of additional circuit to build the PG logic. As per preliminary observations of the ongoing CSA study, use of AL-31 architecture proves the best in terms of PDP, as well speed for long word adders; an extensive study is required to confirm this fact, which is underway. Further the low transistor count 1-bit FA cell architectures from 10T to 20T have degraded performance in the long word adder chains unless they are augmented by buffers at every intermediate and output stages.

Acknowledgement

Authors acknowledge management, Dayananda Sagar Group of Institutions (DSI), Bangalore for all its support in pursuing this research in Research center, Department of Telecommunication Engineering, Dayananda Sagar College of Engineering, Bangalore. Our special thanks are due to Dr. Premachandra Sagar, Vice-chairman, DSI for all his encouragement for research.

References