We continue with combinational circuit design exercises in SystemVerilog. This time we are going to do exercises on number representation formats using a simplified floating point format.

## Table of Contents

SystemVerilog Study Notes Chapters

- Gate-Level Combinational Circuit
- RTL Combinational Circuit Operators
- RTL Combinational Circuit - Concurrent and Control Constructs
- Hex-Digit to Seven-Segment LED Decoder RTL Combinational Circuit
- Barrel Shifter RTL Combinational Circuit
- Simplified Floating Point Arithmetic. RTL Combinational Circuit
- BCD Number Format. RTL Combinational Circuit
- DDFS. Direct Digital Frequency Synthesis for Sound
- FPGA ADSR envelope generator for sound synthesis
- AMD Xilinx 7 series FPGAs XADC
- Building FPGA-Based Music Instrument Synthesis: A Simple Test Bench Solution

# Floating point arithmetic

Floating-point arithmetic (FP) is arithmetic using formulaic representation of real numbers as an approximation to support a trade-off between range and precision. Floating point is another format to represent a number. With the same number of bits, the range in floating-point format is much larger than in signed integer format. In general, a floating-point number is represented approximately with a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

where significand is an integer, base is an integer greater than or equal to two, and exponent is also an integer.

For example:

SystemVerilog has a built-in floating point data type, it is too complex to be synthesized automatically.

# Simplified 13-bit format

For these exercises we will use a simplified 13-bit format, ignoring the round-off error.

The representation consists in

- 1-bit sign, s: which indicates the sign of the number (1'b1 for negative)
- 4-bit exponent field, e: which represents the exponent
- 8-bit significant field, f: which represents the significand or the fraction

In this format the value of a floating point number is

(-1)^s * .f * 2^e

The .f*2^e is the magnitude of the number.

(-1)^s is a formal way to state that s equal 1 implies a negative number. Sign bit is separated from the rest of the number.

When the MSB of the significand field is 1 it is in Normalized representation.

The smallest normalized nonzero magnitude in this number format representation is

0.1000_0000 * 2^0000

We also make the following assumptions:

- Both exponent and significand fields are in unsigned format
- The representation has to be either normalized or zero, if the magnitude of the computation result is smaller than the smallest normalized nonzero magnitude it must be converted to zero.

A floating-point number consists of two fixed-point components, whose range depends exclusively on the number of bits or digits in their representation. The floating-point range linearly depends on the significand range and exponentially on the range of exponent component, which attaches outstandingly wider range to the number.

Under the above assumptions, the largest and smallest nonzero magnitudes for our simplified 13-bit format are 0.1111_1111 * 2 ^ 1111 and 0.1000_0000 * 2 ^ 0000. Between 0.1 and 8,355.84

# Simplified floating point adder

We are going to design a floating point adder that follows the same steps as when we do the addition manually when working with scientific notation.

The computation is done in several steps as indicated in the diagram:

sort |
align |
add/sub |
normalize |
||

eg. 1 |
+0.54e3 |
-0.87e4 |
-0.87e4 |
-0.87e4 |
-0.87e4 |

eg. 2 |
+0.54e3 |
-0.55e3 |
-0.55e3 |
-0.55e3 |
-0.55e3 |

eg. 3 |
+0.54e0 |
-0.55e0 |
-0.55e0 |
-0.55e0 |
-0.55e0 |

eg. 4 |
+0.56e3 |
+0.56e3 |
+0.56e3 |
+0.56e3 |
+0.56e3 |

- Sorting: puts the number with the larger magnitude on the top and the number with the smaller magnitude on the bottom. The results are big_number and small_number.
- Alignment: aligns the two numbers so that they have the same exponent. Adjust the exponent of the small_number to much the exponent of the big number. The significand of the small_number has to shift to the right according the difference in exponents.
- Addition/subtraction: adds or substracts the significands of the two aligned numbers.
- Normalization: adjusts the result to normalized format if
- after subtraction the result contains leading zeros
- or after subtraction the result is too small to be normalized, so needs to be converted to zero
- or after addition the result generates a carry-out bit

We will ignore rounding, during the alignment and normalization the lower bits of the significand will be discarded when shifted out.

## Floating point packed struct

In SystemVerilog we can create structured data types which we use to group a number of related variables together. We will create a packed structure to group the data that represents a 13-bit floating point number of the format that we have previously defined.

package FloatingPointPkg; // 13-bit floating point // 1-bit sign, s: which indicates the sign of the number (1'b1 for negative) // 4-bit exponent field, e: which represents the exponent // 8-bit significant field, f: which represents the significand or the fraction typedef struct packed { logic sign; logic [3:0] exp; logic [7:0] frac; } fp_t; endpackage: FloatingPointPkg

We will define a new type for the 13-bit floating point struct type

- 1-bit sign, sign: which indicates the sign of the number (1'b1 for negative)
- 4-bit exponent field, exp: which represents the exponent
- 8-bit significant field, frac: which represents the significand or the fraction

We can define all your types inside a `package`

and simply import them wherever we want in our code. We will save the code with the new data type as a new file: "fp_types.sv" so that all modules that use it can import it.

package FloatingPointPkg;

typedef struct packed {

logic sign;

logic [3:0] exp;

logic [7:0] frac;

} fp_t;

endpackage:FloatingPointPkg

To import:

import FloatingPointPkg::fp_t;

## Floating point sorter module

We design the adder in stages, the first stage rearranges the numbers from highest magnitude to lowest without taking into account the sign, as when we place one above, the largest, and one below, smallest when we are going to subtract them.

The sorter module assigns the number with the larger magnitude to big_number output // and assignd the number with the smaller magnitude to small_number output

One possible implementation in SystemVerilog. Note that we use the structure we have created to represent floating point numbers for code clarity.

// Assigns the number with the larger magnitude to big_number output // and assignd the number with the smaller magnitude to small_number output module fp_sorter( input fp_t a, input fp_t b, output fp_t big_number, output fp_t small_number); assign big_number = ({a.exp, a.frac} >= {b.exp, b.frac})? a: b; assign small_number = ({a.exp, a.frac} < {b.exp, b.frac})? a: b; endmodule

We need to import our floating point data type .

import FloatingPointPkg::fp_t;

A possible testbench

module fp_sorter_testbench; fp_t a; fp_t b; fp_t bign; fp_t smalln; fp_sorter uut(.a(a), .b(b), .big_number(bign), .small_number(smalln)); initial begin a ='{1'b0, 4'b1111, 8'b1111_1111}; b ='{1'b0, 4'b0001, 8'b1111_0000}; #10; a ='{1'b0, 4'b0000, 8'b0000_0000}; b ='{1'b0, 4'b0001, 8'b1111_0000}; #10; a ='{1'b0, 4'b0000, 8'b0000_0000}; b ='{1'b1, 4'b0001, 8'b1111_0000}; #10; a ='{1'b0, 4'b0001, 8'b1111_0000}; b ='{1'b0, 4'b1111, 8'b1111_1111}; #10; $stop; end endmodule

Simulation

The new sorter module returns the largest and smallest number in magnitude regardless of sign.

Schematic

Two comparators compare the fractional parts and exponents of both numbers. Based on the output signals of the two comparators, four 2-to-1 multiplexers route the fractional and exponent part signals of the two numbers to the outputs representing the largest and smallest number in our number sorter.

## Alignment module

The alignment module aligns the two numbers so that they have the same exponent. It will adjust the exponent of the small_number to much the exponent of the big number. The significand of the small_number has to shift to the right according the difference in exponents.

`timescale 1ns / 1ps import FloatingPointPkg::fp_t; module fp_aligment( input fp_t bign, input fp_t smalln, output fp_t aligned ); logic [3:0] exp_diff; always_comb begin exp_diff = bign.exp - smalln.exp; aligned.frac = smalln.frac >> exp_diff; aligned.exp = bign.exp; aligned.sign = smalln.sign; end endmodule

Simulation

Schematic

The difference in exponents is passed to a right shifter that shifts the significand of the small_number. The exponent of the aligned result is set to the value of the exponent of the big number.

## Add/substract module

This module adds or substracts the significands of two aligned numbers.

`timescale 1ns / 10ps import FloatingPointPkg::fp_t; // This module adds or substracts the significands of two aligned numbers, same exponent // assumes the number are ordered big then small module fp_sum_significands ( input fp_t bign, input fp_t smalln, output logic [8:0] sum); assign sum = (bign.sign == smalln.sign) ? {1'b0, bign.frac} + {1'b0, smalln.frac} : {1'b0, bign.frac} - {1'b0, smalln.frac}; endmodule

Testbench

module fp_sum_significands_testbench; fp_t bign; fp_t smalln; logic [8:0] sum; fp_sum_significands uut(.sum(sum), .bign(bign), .smalln(smalln)); initial begin bign ='{1'b0, 4'b0011, 8'b1111_1111}; smalln ='{1'b0, 4'b0011, 8'b1111_0000}; #10; bign ='{1'b1, 4'b0011, 8'b1111_1111}; smalln ='{1'b1, 4'b0011, 8'b0011_0000}; #10; bign ='{1'b1, 4'b0011, 8'b1111_1111}; smalln ='{1'b0, 4'b0011, 8'b0011_0000}; #10; bign ='{0'b1, 4'b0011, 8'b1111_1111}; smalln ='{1'b1, 4'b0011, 8'b0011_0000}; #10; $stop; end endmodule

Simulation

The 2-to-1 multiplexer selects the output based on the sign signal of both numbers, if both signs are equal then it routes the addition result, if they are different then it routes the subtraction result.

## Leading 0s counter module

This module counts the number of leading zeros. It is like a priority encoder. It outputs the number of leading zeros in an 8-bit number, assumes that the are at least one high bit (value 1'b1) in case the are no bit in high it returns the higher count, 7.

This won't affect the next stage because the result will be used to shift the number to the left by the number of leading zeros. In the event that all bits are low to zero, the value it returns is irrelevant.

`timescale 1ns / 1ps // outputs the number of leading zeros in an 8-bit number // assumes that the are at least one high bit (value 1'b1) // in case the are no bit in high it returns the higher count, 7 module fp_leading_zeros( input logic [7:0] number, output logic [2:0] lead0s ); always_comb begin if(number[7]) begin lead0s = 3'o0; end else if (number[6]) begin lead0s = 3'o1; end else if (number[5]) begin lead0s = 3'o2; end else if (number[4]) begin lead0s = 3'o3; end else if (number[3]) begin lead0s = 3'o4; end else if (number[2]) begin lead0s = 3'o5; end else if (number[1]) begin lead0s = 3'o6; end else begin lead0s = 3'o7; end end endmodule

Test-bench

module fp_leading_zeros_testbench; logic [7:0] number; logic [2:0] lead0s; fp_leading_zeros uut(.*); initial begin number = 8'b1111_1111; #10; number = 8'b0111_1111; #10; number = 8'b0011_1111; #10; number = 8'b0001_1111; #10; number = 8'b0000_1111; #10; number = 8'b0000_0111; #10; number = 8'b0000_0011; #10; number = 8'b0000_0001; #10; number = 8'b0000_0000; #10; $stop; end endmodule

Simulation

Schematics

Like a priority encoder the priority network is implemented by a sequence of 2-to-1 multiplexers.

## Normalization module

The Normalization module adjusts the result to normalized format if after subtraction the result contains leading zeros or after subtraction the result is too small to be normalized, so needs to be converted to zero or after addition the result generates a carry-out bit

A possible SystemVerilog implementation.

First shifts significand according leading 0s

// normalizes an unnnormalized floating point with carry out signal module fp_normalize( input logic carry_out, input fp_t unnormalized, output fp_t normalized ); logic [2:0] lead_zeros; // leading zeros not incluiding the carry out fp_leading_zeros lead_zeros_unit(.number(unnormalized.frac),.lead0s(lead_zeros)); always_comb begin if(carry_out) // with carry out, shift frac to the right begin normalized.exp = unnormalized.exp + 1; normalized.frac = {1'b1, unnormalized.frac[7:1]}; end else if(lead_zeros > unnormalized.exp) begin normalized.exp = 0; // set to zero normalized.frac = 0; end else begin normalized.exp = unnormalized.exp - lead_zeros; normalized.frac = unnormalized.frac << lead_zeros; // shift significand accoding to leading 0 end normalized.sign = unnormalized.sign; end endmodule

Testbench

module fp_normalize_testbench; logic carry_out; fp_t unnormalized; fp_t normalized; fp_normalize uut(.*); initial begin carry_out = 1; unnormalized='{1'b1, 4'b0011, 8'b0000_1000}; #10; carry_out = 1; unnormalized='{1'b1, 4'b0011, 8'b1000_1000}; #10; carry_out = 0; unnormalized='{1'b1, 4'b0111, 8'b0000_1000}; #10; carry_out = 0; unnormalized='{1'b0, 4'b1011, 8'b1000_1000}; #10; $stop; end endmodule

Simulation

## Putting all together: Top Floating point Adder module

Finally we instantiate and connect the modules that we have designed previously:

// circuit for reordering the inputs

fp_sortersort(.a(a), .b(b), .big_number(bign), .small_number(smalln));

// circuit for aligning the smallest number

fp_aligmentalign(.aligned(small_aligned), .bign(bign), .smalln(smalln));

// circuit for add/substract the significands sum MSB 9th bit is carryout

fp_sum_significandssum_significands(.sum(sum), .bign(bign), .smalln(small_aligned));

// circuit for normalizing the output

fp_normalizenormalize(.carry_out(sum[8]), .unnormalized(unnormalized), .normalized(result));

// connect addition/substraction result with the normalizer

assign unnormalized = '{bign.sign, bign.exp, sum[7:0]};

SystemVerilog Code

`timescale 1ns / 10ps import FloatingPointPkg::fp_t; // binary floating point adder module fp_adder ( input fp_t a, input fp_t b, output fp_t result ); fp_t bign; // big operand in absolute magnitude after sorting fp_t smalln; // small operand in absolute magnitude after sorting fp_t small_aligned; // small operand aligned whith the big one, same exponents logic [8:0] sum; // sum of the two aligned significands with carry out fp_t unnormalized; // result before normalization // circuit for reordering the inputs fp_sorter sort(.a(a), .b(b), .big_number(bign), .small_number(smalln)); // circuit for aligning the smallest number fp_aligment align(.aligned(small_aligned), .bign(bign), .smalln(smalln)); // circuit for add/substract the significands fp_sum_significands sum_significands(.sum(sum), .bign(bign), .smalln(small_aligned)); // circuit for normalizing the output fp_normalize normalize(.carry_out(sum[8]), .unnormalized(unnormalized), .normalized(result)); // connect addition/substraction result with the normalizer assign unnormalized = '{bign.sign, bign.exp, sum[7:0]}; endmodule

Test bench

module fp_adder_testbench; fp_t a; fp_t b; fp_t c; fp_adder uut(.a(a), .b(b), .result(c)); initial begin a ='{1'b0, 4'b0001, 8'b1000_0000}; b ='{1'b0, 4'b0001, 8'b1000_0000}; #10; a ='{1'b0, 4'b0111, 8'b1000_0000}; b ='{1'b0, 4'b0001, 8'b1000_0000}; #10; a ='{1'b0, 4'b0011, 8'b1010_0000}; b ='{1'b0, 4'b0010, 8'b1001_0000}; #10; // 0.160 * 2 ^ 3 + 0.144 * 2 ^ 2 = 1.28 + 0.576 = 1,856 // 0,0011,11101000 = 0.232 * 2 ^ 3 = 1.856 a ='{1'b0, 4'b0000, 8'b1000_0000}; b ='{1'b0, 4'b0001, 8'b1000_0000}; #10; a ='{1'b0, 4'b0000, 8'b1000_0000}; b ='{1'b1, 4'b0001, 8'b1111_0000}; #10; a ='{1'b0, 4'b0001, 8'b1111_0000}; b ='{1'b0, 4'b0011, 8'b1111_1111}; #10; $stop; end endmodule

Simulation

Schematics

In the schematic we can see the four main blocks of our adder: the Classification circuit, the Alignment circuit, the Addition/Subtraction circuit and the Normalization circuit, all of them interconnected.

Expanded view

# SystemVerilog Study Notes Chapters

- Gate-Level Combinational Circuit
- RTL Combinational Circuit Operators
- RTL Combinational Circuit - Concurrent and Control Constructs
- Hex-Digit to Seven-Segment LED Decoder RTL Combinational Circuit
- Barrel Shifter RTL Combinational Circuit
- Simplified Floating Point Arithmetic. RTL Combinational Circuit
- BCD Number Format. RTL Combinational Circuit
- DDFS. Direct Digital Frequency Synthesis for Sound
- FPGA ADSR envelope generator for sound synthesis
- AMD Xilinx 7 series FPGAs XADC
- Building FPGA-Based Music Instrument Synthesis: A Simple Test Bench Solution