element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
FPGA
  • Technologies
  • More
FPGA
Blog SystemVerilog Study Notes. Simplified Floating Point Arithmetic. RTL Combinational Circuit
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
FPGA requires membership for participation - click to join
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: javagoza
  • Date Created: 1 Sep 2022 5:47 PM Date Created
  • Views 5651 views
  • Likes 7 likes
  • Comments 1 comment
  • fpga
  • vivado
  • combinational
  • floating point
  • systemverilog
  • verilog
Related
Recommended

SystemVerilog Study Notes. Simplified Floating Point Arithmetic. RTL Combinational Circuit

javagoza
javagoza
1 Sep 2022
SystemVerilog Study Notes. Simplified Floating Point Arithmetic. RTL Combinational Circuit

We continue with combinational circuit design exercises in SystemVerilog. This time we are going to do exercises on number representation formats using a simplified floating point format.

Table of Contents

  • Floating point arithmetic
  • Simplified 13-bit format
  • Simplified floating point adder
    • Floating point packed struct
    • Floating point sorter module
    • Alignment module
    • Add/substract module
    • Leading 0s counter module
    • Normalization module
    • Putting all together: Top Floating point Adder module
  • SystemVerilog Study Notes Chapters

SystemVerilog Study Notes Chapters

  1.  Gate-Level Combinational Circuit 
  2.  RTL Combinational Circuit Operators 
  3.  RTL Combinational Circuit - Concurrent and Control Constructs 
  4.  Hex-Digit to Seven-Segment LED Decoder RTL Combinational Circuit 
  5.  Barrel Shifter RTL Combinational Circuit 
  6.  Simplified Floating Point Arithmetic. RTL Combinational Circuit 
  7.  BCD Number Format. RTL Combinational Circuit 
  8.  DDFS. Direct Digital Frequency Synthesis for Sound 
  9.  FPGA ADSR envelope generator for sound synthesis 
  10.  AMD Xilinx 7 series FPGAs XADC 
  11.  Building FPGA-Based Music Instrument Synthesis: A Simple Test Bench Solution 

Floating point arithmetic

Floating-point arithmetic (FP) is arithmetic using formulaic representation of real numbers as an approximation to support a trade-off between range and precision. Floating point is another format to represent a number. With the same number of bits, the range in floating-point format is much larger than in signed integer format. In general, a floating-point number is represented approximately with a fixed number of significant digits (the significand) and scaled using an exponent in some fixed base; the base for the scaling is normally two, ten, or sixteen. A number that can be represented exactly is of the following form:

image where significand is an integer, base is an integer greater than or equal to two, and exponent is also an integer. 

For example:

image

SystemVerilog has a built-in floating point data type, it is too complex to be synthesized automatically.

Simplified 13-bit format

For these exercises we will use a simplified 13-bit format, ignoring the round-off error.

The representation consists in

  • 1-bit sign, s: which indicates the sign of the number (1'b1 for negative)
  • 4-bit exponent field, e: which represents the exponent
  • 8-bit significant field, f: which represents the significand or the fraction

In this format the value of a floating point number is

(-1)^s * .f * 2^e

 The .f*2^e is the magnitude of the number.

(-1)^s is a formal way to state that s equal 1 implies a negative number. Sign bit is separated from the rest of the number.

When the MSB of the significand field is 1 it is in Normalized representation.

The smallest normalized nonzero magnitude in this number format representation is

0.1000_0000 * 2^0000

We also make the following assumptions:

  • Both exponent and significand fields are in unsigned format
  • The representation has to be either normalized or zero, if the magnitude of the computation result is smaller than the smallest normalized nonzero magnitude it must be converted to zero.

A floating-point number consists of two fixed-point components, whose range depends exclusively on the number of bits or digits in their representation. The floating-point range linearly depends on the significand range and exponentially on the range of exponent component, which attaches outstandingly wider range to the number.

Under the above assumptions, the largest and smallest nonzero magnitudes for our simplified 13-bit format are 0.1111_1111 * 2 ^ 1111 and 0.1000_0000 * 2 ^ 0000. Between 0.1 and 8,355.84

Simplified floating point adder

We are going to design a floating point adder that follows the same steps as when we do the addition manually when working with scientific notation.
The computation is done in several steps as indicated in the diagram:

image

sort align add/sub normalize
eg. 1
+0.54e3
-0.87e4
-0.87e4
+0.54e3
-0.87e4
+0.05e4
-0.87e4
+0.05e4
-0.82e4
-0.87e4
+0.05e4
-0.82e4
eg. 2
+0.54e3
-0.55e3
-0.55e3
+0.54e3
-0.55e3
+0.54e3
-0.55e3
+0.54e3
-0.01e3
-0.55e3
+0.54e3
-0.10e2
eg. 3
+0.54e0
-0.55e0
-0.55e0
+0.54e0
-0.55e0
+0.54e0
-0.55e0
+0.54e0
-0.01e0
-0.55e0
+0.54e0
-0.00e0
eg. 4
+0.56e3
+0.52e3
+0.56e3
+0.52e3
+0.56e3
+0.52e3
+0.56e3
+0.52e3
+1.08e3
+0.56e3
+0.52e3
+0.10e4
  1. Sorting: puts the number with the larger magnitude on the top and the number with the smaller magnitude on the bottom. The results are big_number and small_number.
  2. Alignment: aligns the two numbers so that they have the same exponent. Adjust the exponent of the small_number to much the exponent of the big number. The significand of the small_number has to shift to the right according the difference in exponents.
  3. Addition/subtraction: adds or substracts the significands of the two aligned numbers.
  4. Normalization: adjusts the result to normalized format if
    1. after subtraction the result contains leading zeros
    2. or after subtraction the result is too small to be normalized, so needs to be converted to zero
    3. or after addition the result generates a carry-out bit

We will ignore rounding, during the alignment and normalization the lower bits of the significand will be discarded when shifted out.

Floating point packed struct

In SystemVerilog we can create structured data types which we use to group a number of related variables together. We will create a packed structure to group the data that represents a 13-bit floating point number of the format that we have previously defined.

package FloatingPointPkg;

// 13-bit floating point
// 1-bit sign, s: which indicates the sign of the number (1'b1 for negative)
// 4-bit exponent field, e: which represents the exponent
// 8-bit significant field, f: which represents the significand or the fraction
typedef struct packed {
    logic sign;
    logic [3:0] exp;
    logic [7:0] frac;
} fp_t;

endpackage: FloatingPointPkg

We will define a new type for the 13-bit floating point struct type

  • 1-bit sign, sign: which indicates the sign of the number (1'b1 for negative)
  • 4-bit exponent field, exp: which represents the exponent
  • 8-bit significant field, frac: which represents the significand or the fraction

We can define all your types inside a package and simply import them wherever we want in our code. We will save the code with the new data type as a new file: "fp_types.sv" so that all modules that use it can import it.


package FloatingPointPkg;

typedef struct packed {
logic sign;
logic [3:0] exp;
logic [7:0] frac;
} fp_t;

endpackage:
FloatingPointPkg

To import:


import FloatingPointPkg::fp_t;

Floating point sorter module

We design the adder in stages, the first stage rearranges the numbers from highest magnitude to lowest without taking into account the sign, as when we place one above, the largest, and one below, smallest when we are going to subtract them.

The sorter module assigns the number with the larger magnitude to big_number output // and assignd the number with the smaller magnitude to small_number output

One possible implementation in SystemVerilog. Note that we use the structure we have created to represent floating point numbers for code clarity.

// Assigns the number with the larger magnitude to big_number output
// and assignd the number with the smaller magnitude  to small_number output
module fp_sorter(
    input fp_t a,
    input fp_t b,
    output fp_t big_number,
    output fp_t small_number);

    assign big_number = ({a.exp, a.frac} >= {b.exp, b.frac})? a: b;
    assign small_number = ({a.exp, a.frac} < {b.exp, b.frac})? a: b;
endmodule

We need to import our floating point data type .


import FloatingPointPkg::fp_t;

A possible testbench

module fp_sorter_testbench;
    fp_t a;
    fp_t b;
    fp_t bign;
    fp_t smalln;

    fp_sorter uut(.a(a), .b(b), .big_number(bign), .small_number(smalln));

    initial
    begin
        a ='{1'b0, 4'b1111, 8'b1111_1111};   b ='{1'b0, 4'b0001, 8'b1111_0000}; #10;
        a ='{1'b0, 4'b0000, 8'b0000_0000};   b ='{1'b0, 4'b0001, 8'b1111_0000}; #10;
        a ='{1'b0, 4'b0000, 8'b0000_0000};   b ='{1'b1, 4'b0001, 8'b1111_0000}; #10;
        a ='{1'b0, 4'b0001, 8'b1111_0000};   b ='{1'b0, 4'b1111, 8'b1111_1111}; #10;
        $stop;
    end

endmodule

Simulation

The new sorter module returns the largest and smallest number in magnitude regardless of sign.

image

Schematic

Two comparators compare the fractional parts and exponents of both numbers. Based on the output signals of the two comparators, four 2-to-1 multiplexers route the fractional and exponent part signals of the two numbers to the outputs representing the largest and smallest number in our number sorter.

image

Alignment module

The alignment module aligns the two numbers so that they have the same exponent. It will adjust the exponent of the small_number to much the exponent of the big number. The significand of the small_number has to shift to the right according the difference in exponents.

`timescale 1ns / 1ps

import FloatingPointPkg::fp_t;

module fp_aligment(
    input fp_t bign,
    input fp_t smalln,
    output fp_t aligned );
    
    logic [3:0] exp_diff;
    always_comb
    begin
        exp_diff = bign.exp - smalln.exp;
        aligned.frac = smalln.frac >> exp_diff;
        aligned.exp = bign.exp;
        aligned.sign = smalln.sign;        
    end
endmodule

Simulation

image

Schematic

 The difference in exponents is passed to a right shifter that shifts the significand of the small_number. The exponent of the aligned result is set to the value of the exponent of the big number.

image

Add/substract module

This module adds or substracts the significands of two aligned numbers.

`timescale 1ns / 10ps

import FloatingPointPkg::fp_t;

// This module adds or substracts the significands of two aligned numbers, same exponent
// assumes the number are ordered big then small
module fp_sum_significands (
    input fp_t bign,
    input fp_t smalln,
    output logic [8:0] sum);
    
    assign sum = (bign.sign == smalln.sign) ?
     {1'b0, bign.frac} + {1'b0, smalln.frac}
     : {1'b0, bign.frac} - {1'b0, smalln.frac};
    
endmodule

Testbench

module fp_sum_significands_testbench;

    fp_t bign;
    fp_t smalln;
    logic [8:0] sum;

    fp_sum_significands uut(.sum(sum), .bign(bign), .smalln(smalln));

    initial
    begin
        bign ='{1'b0, 4'b0011, 8'b1111_1111};   smalln ='{1'b0, 4'b0011, 8'b1111_0000}; #10;
        bign ='{1'b1, 4'b0011, 8'b1111_1111};   smalln ='{1'b1, 4'b0011, 8'b0011_0000}; #10;
        bign ='{1'b1, 4'b0011, 8'b1111_1111};   smalln ='{1'b0, 4'b0011, 8'b0011_0000}; #10;
        bign ='{0'b1, 4'b0011, 8'b1111_1111};   smalln ='{1'b1, 4'b0011, 8'b0011_0000}; #10;
        $stop;
    end

endmodule

Simulation

image

The 2-to-1 multiplexer selects the output based on the sign signal of both numbers, if both signs are equal then it routes the addition result, if they are different then it routes the subtraction result.

image

Leading 0s counter module

This module counts the number of leading zeros. It is like a priority encoder. It outputs the number of leading zeros in an 8-bit number, assumes that the are at least one high bit (value 1'b1) in case the are no bit in high it returns the higher count, 7.

This won't affect the next stage because the result will be used to shift the number to the left by the number of leading zeros. In the event that all bits are low to zero, the value it returns is irrelevant.

`timescale 1ns / 1ps

// outputs the number of leading zeros in an 8-bit number
// assumes that the are at least one high bit (value 1'b1)
// in case the are no bit in high it returns the higher count, 7
module fp_leading_zeros(
    input logic [7:0] number,
    output logic [2:0] lead0s
);

    always_comb
    begin
        if(number[7])
            begin
                lead0s = 3'o0;
            end
        else if (number[6])
            begin
                lead0s = 3'o1;
            end
        else if (number[5])
            begin
                lead0s = 3'o2;
            end
        else if (number[4])
            begin
                lead0s = 3'o3;
            end
        else if (number[3])
            begin
                lead0s = 3'o4;
            end
        else if (number[2])
            begin
                lead0s = 3'o5;
            end
        else if (number[1])
            begin
                lead0s = 3'o6;
            end
        else
            begin
                lead0s = 3'o7;
            end
    end
endmodule

Test-bench

module fp_leading_zeros_testbench;

    logic [7:0] number;
    logic [2:0] lead0s;

    fp_leading_zeros uut(.*);

    initial
    begin
        number = 8'b1111_1111; #10;
        number = 8'b0111_1111; #10;
        number = 8'b0011_1111; #10;
        number = 8'b0001_1111; #10;
        number = 8'b0000_1111; #10;
        number = 8'b0000_0111; #10;
        number = 8'b0000_0011; #10;
        number = 8'b0000_0001; #10;
        number = 8'b0000_0000; #10;
        $stop;
    end

endmodule

Simulation

image

Schematics

Like a priority encoder the priority network is implemented by a sequence of 2-to-1 multiplexers.

image

Normalization module

The Normalization module adjusts the result to normalized format if after subtraction the result contains leading zeros or after subtraction the result is too small to be normalized, so needs to be converted to zero or after addition the result generates a carry-out bit

A possible SystemVerilog implementation.

First shifts significand according leading 0s

// normalizes an unnnormalized floating point with carry out signal
module fp_normalize(
    input logic carry_out,
    input fp_t unnormalized,
    output fp_t normalized  );

    logic [2:0] lead_zeros;
    // leading zeros not incluiding the carry out
    fp_leading_zeros lead_zeros_unit(.number(unnormalized.frac),.lead0s(lead_zeros));

    always_comb
    begin
        if(carry_out) // with carry out, shift frac to the right
            begin
                normalized.exp = unnormalized.exp + 1;
                normalized.frac = {1'b1, unnormalized.frac[7:1]};
            end else if(lead_zeros > unnormalized.exp)
            begin
                normalized.exp = 0; // set to zero
                normalized.frac = 0;
            end else
            begin
                normalized.exp = unnormalized.exp - lead_zeros;
                normalized.frac = unnormalized.frac << lead_zeros; // shift significand accoding to leading 0
            end
        normalized.sign = unnormalized.sign;
    end
endmodule

Testbench

module fp_normalize_testbench;

    logic carry_out;
    fp_t unnormalized;
    fp_t normalized;


    fp_normalize uut(.*);

    initial
    begin
           carry_out = 1; unnormalized='{1'b1, 4'b0011, 8'b0000_1000}; #10;
           carry_out = 1; unnormalized='{1'b1, 4'b0011, 8'b1000_1000}; #10;
           carry_out = 0; unnormalized='{1'b1, 4'b0111, 8'b0000_1000}; #10;
           carry_out = 0; unnormalized='{1'b0, 4'b1011, 8'b1000_1000}; #10;
        $stop;
    end

endmodule

Simulation

image

image

Putting all together: Top Floating point Adder module

Finally we instantiate and connect the modules that we have designed previously:


 // circuit for reordering the inputs
fp_sorter sort(.a(a), .b(b), .big_number(bign), .small_number(smalln));

// circuit for aligning the smallest number
fp_aligment align(.aligned(small_aligned), .bign(bign), .smalln(smalln));

// circuit for add/substract the significands sum MSB 9th bit is carryout
fp_sum_significands sum_significands(.sum(sum), .bign(bign), .smalln(small_aligned));

// circuit for normalizing the output
fp_normalize normalize(.carry_out(sum[8]), .unnormalized(unnormalized), .normalized(result));

// connect addition/substraction result with the normalizer
assign unnormalized = '{bign.sign, bign.exp, sum[7:0]};

SystemVerilog Code

`timescale 1ns / 10ps

import FloatingPointPkg::fp_t;

// binary floating point adder
module fp_adder (
    input fp_t a,
    input fp_t b,
    output fp_t result
);
    fp_t bign;  // big operand in absolute magnitude after sorting
    fp_t smalln; // small operand in absolute magnitude after sorting
    fp_t small_aligned; // small operand aligned whith the big one, same exponents
    logic [8:0] sum;  // sum of the two aligned significands with carry out
    fp_t unnormalized; // result before normalization

    // circuit for reordering the inputs
    fp_sorter sort(.a(a), .b(b), .big_number(bign), .small_number(smalln));
    // circuit for aligning the smallest number
    fp_aligment align(.aligned(small_aligned), .bign(bign), .smalln(smalln));
    // circuit for add/substract the significands
    fp_sum_significands sum_significands(.sum(sum), .bign(bign), .smalln(small_aligned));
    // circuit for normalizing the output
    fp_normalize normalize(.carry_out(sum[8]), .unnormalized(unnormalized), .normalized(result));
    
    // connect addition/substraction result with the normalizer
    assign unnormalized = '{bign.sign, bign.exp, sum[7:0]};

endmodule

Test bench

module fp_adder_testbench;
    fp_t a;
    fp_t b;
    fp_t c;

     fp_adder uut(.a(a), .b(b), .result(c));
    

    initial
    begin  
        a ='{1'b0, 4'b0001, 8'b1000_0000};   b ='{1'b0, 4'b0001, 8'b1000_0000}; #10; 
        a ='{1'b0, 4'b0111, 8'b1000_0000};   b ='{1'b0, 4'b0001, 8'b1000_0000}; #10; 
        a ='{1'b0, 4'b0011, 8'b1010_0000};   b ='{1'b0, 4'b0010, 8'b1001_0000}; #10; //  0.160 * 2 ^ 3 + 0.144 * 2 ^ 2 = 1.28 + 0.576 = 1,856
                                                                                     // 0,0011,11101000 = 0.232 * 2 ^ 3 =  1.856
        a ='{1'b0, 4'b0000, 8'b1000_0000};   b ='{1'b0, 4'b0001, 8'b1000_0000}; #10;
        a ='{1'b0, 4'b0000, 8'b1000_0000};   b ='{1'b1, 4'b0001, 8'b1111_0000}; #10;
        a ='{1'b0, 4'b0001, 8'b1111_0000};   b ='{1'b0, 4'b0011, 8'b1111_1111}; #10;
        $stop;
    end

endmodule

Simulation

image

Schematics

In the schematic we can see the four main blocks of our adder: the Classification circuit, the Alignment circuit, the Addition/Subtraction circuit and the Normalization circuit, all of them interconnected.

image

Expanded view

image


SystemVerilog Study Notes Chapters

  1.  Gate-Level Combinational Circuit 
  2.  RTL Combinational Circuit Operators 
  3.  RTL Combinational Circuit - Concurrent and Control Constructs 
  4.  Hex-Digit to Seven-Segment LED Decoder RTL Combinational Circuit 
  5.  Barrel Shifter RTL Combinational Circuit 
  6.  Simplified Floating Point Arithmetic. RTL Combinational Circuit 
  7.  BCD Number Format. RTL Combinational Circuit 
  8.  DDFS. Direct Digital Frequency Synthesis for Sound 
  9.  FPGA ADSR envelope generator for sound synthesis 
  10.  AMD Xilinx 7 series FPGAs XADC 
  11.  Building FPGA-Based Music Instrument Synthesis: A Simple Test Bench Solution 
  • Sign in to reply
  • DAB
    DAB over 2 years ago

    Nice walk through of the logic.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube