An Efficient VLSI Architecture of 1D/2D and 3D for DWT Based Image Compression and Decompression Using a Lifting Scheme

Venkateshappa¹, P.H.Sunitha², Dr. Cyril Prasanna Raj P.³

¹²Research Scholar, M.S. Engineering College, Bangalore, Karnataka, India, 562110
³Professor & Dean (R&D), M.S. Engineering College, Bangalore, Karnataka, India, 562110

Abstract: An efficient architecture is proposed in this paper for high speed Discrete Wavelet Transform computing. The proposed architecture includes Line Buffers, PIPO and Lifting Block. This architecture works in non-separable fashion using a lifting scheme computes 1D, 2D and 3D-DWT at different resolution levels. The lifting scheme represents the fastest implementation of the DWT. A Verilog model is described and synthesized using Xilinx 14.4. The architecture has regular systolic structure, simple control flow for data extraction and small embedded buffers.

Keywords: DWT, HDL, FSM, lifting scheme, Verilog, PIPO.

I. Introduction

Multilevel one dimensional, two dimensional and three dimensional - discrete wavelet transform (1D-DWT, 2D-DWT, 3D-DWT) is used for digital signal processing (DSP) and image processing applications. Lifting based one dimensional- discrete wavelet transform (1D-DWT) provides less image resolution and processing time is also high. The wavelet transformation is an extensively used technique for image processing applications. Unlike traditional transforms such as the fast Fourier transform (FFT) and discrete cosine transform (DCT), the discrete wavelet transform (DWT) holds both time and frequency data, based on a multi-resolution analysis technique. This is a powerful approach to signal processing and analysis. As its name implies, multi-resolution theory is concerned with the representation and analysis of signals or images at more than one resolution. This facilitates improved quality of reconstructed picture for the same compression than is possible by other transforms. In order to implement the real time codec based on DWT, it needs to be targeted on a fast device. Field programmable gate array (FPGA) implementation of DWT results in higher processing speed and lower costs when compared to other implementations such as advanced RISC (reduced instruction set computer) machines (ARM) processors, DSPs etc. The Discrete wavelet transform is therefore increasingly used for image coding.

This is because the DWT can decompose the signals into different sub-bands with both time and frequency information and facilitate to arrive a high compression ratio. It supports features like progressive image transmission (by quality, by resolution), ease of compressed image manipulation, region of interest coding, etc.

The rest of the paper is structured as follows. Section 2, DWT structure that summarizes the lifting scheme discrete wavelet transforms. In section 3 and 4, the high efficient architecture for the (5, 3) filter based 1D, 2D and 3D lifting-based DWT is proposed, followed by the implementation and performance analysis in section 5, and section 6 concludes the work.

II. DWT Structure

The wavelet transform provides a time-frequency domain representation for the analysis of signals. Therefore, there are two main methods to produce and implement wavelet transforms, 1. The frequency based method is Filter Banks (FB) 2. The time based one is called Lifting Scheme (LS)

2.1 Lifting Scheme

Wim Sweldens developed a lifting scheme for the construction of bi-orthogonal wavelets. The main feature of the lifting scheme is that all constructions are derived in the spatial domain. Lifting scheme is a simple and an efficient algorithm to calculate wavelet transforms as a sequence of lifting steps. Constructing wavelets using lifting scheme comprises three steps:

1. Split step: The original signal, input image X (n), is split into odd and even samples.
2. Lifting step: This step is executed as N sub steps depending on the type of the filter, where the odd and even samples are filtered by the prediction and update filters, p(z) and u(z).
3. Normalization or Scaling step: After N lifting steps, scaling coefficients K and 1/K are applied respectively to the odd and even samples in order to obtain the low pass sub band i.e. significance coefficient Y_L(i) and the high-pass sub band i.e. detailed coefficient Y_H(i).
An Efficient VLSI Architecture of 1D/2D and 3D for DWT Based Image Compression and......

![Figure 1: Forward Lifting Scheme using 5/3 wavelets transform](image1)

Fig. 1 shows the lifting scheme of wavelet filter computing one dimension signal. The inverse transform could easily be found by exchanging the sign of the predict step and the update step and applying all operations in reverse order as shown in Fig. 2.

![Figure 2: Backward Lifting Scheme using 5/3 wavelets transform](image2)

Lifting based Inverse transform (IDWT) is simple and involves the reversal of the order of operations in DWT. Therefore the same resources can be reused to define a general programmable architecture for forward and inverse DWT.

2.2 General Algorithm of Lifting Based, 5/3 Wavelets Transform:

The word prediction is used here because P function predicts odd samples using even samples, defined in the equation P1. The difference between this prediction and the actual value of odd sample creates the high frequency part of the signal which is called "detail" coefficients (d), defined in the equation D.

Then update function, defined in equation U1, is applying on detail signal and combining the result with even samples update them so that the output coefficients (s) have the desired properties. Usually the desired properties of ‘s’ are the same as the properties of input signal (x) but with half size. So the s signal is an approximation for x and is called significance coefficient, defined in equation S.

The lifting steps lead to the following equations:

- **P1**: $Y_{i,2j} = a \times (X_{i,2j} + X_{i,2j+1}) + X_{i,2j+1}; 0 \leq i \leq N-1, 1 \leq 2j + 1 \leq N-2$
- **U1**: $Y_{i,2j} = b \times (Y_{i,2j-1} + Y_{i,2j+1}) + X_{i,2j}; 0 \leq i \leq N-1, 2 \leq 2j \leq N-3$
- **D**: $Z_{i,2j} = k \times (Z_{i,2j}) ; 0 \leq i \leq N-1, 0 \leq 2j \leq N-1$
- **S**: $Z_{i,2j+1} = k^{-1} \times (Z_{i,2j+1}) ; 0 \leq i \leq N-1, 1 \leq 2j + 1 \leq N-2$

Where, N – size of the cell (Number of pixels)
(i, j) – (row, column)
Table 1: Filters and Co-efficient for 5/3 Wavelets Transform

<table>
<thead>
<tr>
<th>Filters for 5/3 wavelets transform</th>
<th>Co-efficient</th>
</tr>
</thead>
<tbody>
<tr>
<td>Prediction first filter</td>
<td>a ≈ -1.58613432</td>
</tr>
<tr>
<td>Update first filter</td>
<td>b ≈ -0.05298011854</td>
</tr>
<tr>
<td>Scaling (Normalization filter)</td>
<td>k ≈ 1.14904398</td>
</tr>
</tbody>
</table>

Filters and co-efficient used in the scheme are defined in the Table 1.

III. Proposed Algorithm of 5/3 Wavelets Based Dwt

The proposed work is specialized for the DWT 5/3 wavelet in lifting scheme implementation. ‘X’ be the input image, which has predefined pixels.

Let, \( X = [X(1), X(2) ... X(2n)] \) be an array of length 2n

In this work we begin with the "poly-phase decomposition," splitting \( X \) into two sub-bands, each of length \( N \). the original signal i.e. input image pixels are split into even and odd pixels in split step of the design.

\( X_0 = [X(1), X(3), X(5) ... X(2n-1)] \)
\( X_e = [X(2), X(4), X(6) ... X(2n)] \)

There are four stages in the lifting scheme architecture which is summarized by the equations as follows:

\[
P_1(n) = X_0(n) + a(X_e(n) + X_e(n+1))
\]

\[
U_1(n) = X_e(n) + b(P_1(n) + X_e(n+1))
\]

\[
dc(n) = k \times P_1(n)
\]

\[
sc(n) = \frac{1}{k} \times U_1(n)
\]

\( P_1(n) \) and \( U_1(n) \) are scaled by the constant \( K \) and \( K^{-1} \) respectively, for normalizing their magnitude. Filter co-efficient are described in table 1. The inverse transform is done by performing the lifting steps in the reverse order and with \( a \), \( b \) and \( k \) negated.

IV. Architecture of Dwt, Using 5/3 Wavelet Transform

![Figure 3: Top level architecture of 1D-DWT](image1)

Architecture of 1D-DWT consist of clock and reset as control inputs and even_in[7:0] and odd_in[7:0] as even and odd pixels extracted from input image. Output signal are Detailed co-efficient dc_out1[27:0] and dc_out2[27:0], significance co-efficient sc_out1[23:0] and sc_out2[23:0].

![Figure 4: Building blocks of 1D-DWT](image2)
Proposed architecture of DWT consist of two line buffers, one for even data extraction and one for odd data extraction, two PIPO’s to capture the data, a lifting block with 5/3 wavelet transform as shown in the figure 4. To compute N point 1D-DWT it takes N/2 cycles.

Figure 5: Architecture of 2D- DWT

1D-DWT is the basic functional unit of 2D-DWT and 3D-DWT. 2D-DWT does row operation and column operation as well. Whereas 1D-DWET does only column operation as shown in the figure 4. dc_out1[27:0] and dc_out2[27:0] are low frequency bands and sc_out1[23:0], are sc_out2[23:0] are high frequency bands, are separated from input samples by 1D-DWT. These are again spitted in to LL, LH, HL and HH by 2D-DWT as shown in the figure 5.

Figure 6: Architecture of 3D-DWT
Same as 2D-DWT process, the outputs of 2D-DWT is further processed and decomposes to eight cells, with one low frequency and seven high frequency bands. LL, LH, HL and HH samples are decomposed to LLL, LLH, LHL, LHH, HLL, HHL, HHH as shown in the figure 6.

V. Experimental Results and RTL Schematic

As per the proposed algorithm, 1D-DWT design is modeled in Verilog HDL for 640*480 pixels standard lena image. even_in[7:0] and odd_in[7:0] are the even and odd pixels, separated by the 640*480 pixels.

DWT model is consist of FSM based line buffers, PIPO registers and lifting block, dc_out1[27:0] and dc_out2[27:0] are low pass values and sc_out1[23:0] and sc_out2[23:0] are high pass values. RTL schematic view of the DWT, using 5/3 wavelets transform is shown in the figure 7 and figure 8.
An Efficient VLSI Architecture of 1D/2D and 3D for DWT Based Image Compression and......

As explained in the architecture of 2D and 3D-DWT in figure 5 and 6, 2D-DWT does the row and column operation, and splits the sample pixels into four frequency coefficients as shown in figure 9.

3D-DWT does the row and column operation, and row operation, splits the sample pixels into eight frequency coefficients as shown in figure 10.
5.1 Decomposition Levels of Image in 1D, 2D and 3D-DWT

1D-DWT is the basic functional unit of 2D-DWT and 3D-DWT. 2D-DWT does row operation and column operation as well. Whereas, 1D-DWT does only column operation. These L, H are again splitted to LL, LH, HL and HH by 2D-DWT. Same as 2D-DWT process, the outputs of 2D-DWT is further processed and decomposes to eight cells, with one low frequency and seven high frequency bands. LL, LH, HL and HH samples are decomposed to LLL, LLH, LHL, LHH, HLL, HLH, HHL and HHH as shown in the figure 4.

![Image decomposition in 1D, 2D and 3D-DWT](image)

**Figure 11:** Image decomposition in 1D, 2D and 3D-DWT

![Decomposition of images 1D, 2D and 3D-DWT](image)

(a) boats (512*512)  
(b) Decomposition 1D-DWT(512*256)  
(c) Decomposition 2D-DWT(256*256)  
(d) Decomposition 3D-DWT(256*128)

**Figure 5:** Decomposition of images 1D, 2D and 3D-DWT
VI. Synthesis Report

Table 2: Synthesis report

<table>
<thead>
<tr>
<th>Resources</th>
<th>1D-DWT</th>
<th>2D-DWT</th>
<th>3D-DWT</th>
</tr>
</thead>
<tbody>
<tr>
<td>Slice Logic Utilization:</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Number of Slice Registers:</td>
<td>24</td>
<td>244</td>
<td>244</td>
</tr>
<tr>
<td>Slice Logic Distribution:</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Number of LUT Flip Flop pairs used:</td>
<td>24</td>
<td>1749</td>
<td>1749</td>
</tr>
<tr>
<td>Number with an unused Flip Flop</td>
<td>0</td>
<td>1505</td>
<td>1505</td>
</tr>
<tr>
<td>Number with an unused LUT</td>
<td>24</td>
<td>124</td>
<td>124</td>
</tr>
<tr>
<td>Number of fully used LUT-FF pairs</td>
<td>0</td>
<td>120</td>
<td>120</td>
</tr>
<tr>
<td>Number of unique control sets</td>
<td>1</td>
<td>18</td>
<td>18</td>
</tr>
<tr>
<td>IO Utilization:</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Number of IOs:</td>
<td>34</td>
<td>107</td>
<td>107</td>
</tr>
<tr>
<td>Number of bonded IOBs:</td>
<td>34</td>
<td>107</td>
<td>107</td>
</tr>
<tr>
<td>Timing summary:</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Clock period:</td>
<td>0.515ns</td>
<td>3.928ns</td>
<td>3.928ns</td>
</tr>
<tr>
<td>Total number of paths / destination ports:</td>
<td>16 / 16</td>
<td>216/216</td>
<td>216/216</td>
</tr>
</tbody>
</table>

The synthesis results slice logic utilization, slice logic distribution and the maximum frequency of operation for the proposed architecture is shown in the table 2.

6. Simulation Results

Simulation result of lifting block is shown in the figure 8. Clock, reset and enable are the control inputs of the design and \(x_1[7:0], x_2[7:0], x_3[7:0]\) are input even pixels and \(x_0[7:0], \hat{x}_0[7:0]\) are input odd pixels. ‘a’ and ‘b’ are the lifting co-efficient. Ack_even and ack_odd are the triggering signal for lifting block.

\[P_1_{\text{out}}[18:0], P_1_{\text{out}}[18:0], U_1_{\text{out}}[14:0] \text{ and } U_1_{\text{out}}[40:0]\] predict and update function from which detailed co-efficient \(\text{dc}_{\text{out}}[27:0], \text{dc}_{\text{out}}[27:0]\), and significance co-efficient \(\text{sc}_{\text{out}}[23:0], \text{sc}_{\text{out}}[23:0]\) are evaluated, where ‘k’ is a scaling co-efficient.

VII. Conclusion

We have proposed high speed 5/3 DWT architecture for image compression. A Verilog model of 1D, 2D and 3D-DWT has been developed targeting FPGA design. Synthesis results on Xilinx Spartan-6-Lower Power XC6SLX4L FPGA implementation shows a frequency of 1939.864MHz for 1D-DWT and 254.58MHz for 4 level 2D-DWT. The architecture has been verified for fast computing and simple control flow.

References

An Efficient VLSI Architecture of 1D/2D and 3D for DWT Based Image Compression and...


