FPGA: Waves: 4 Tinker, Taylor, Soldier, Sine

jc2048

14 Oct 2019

Previous blogs:

FPGA: Making Waves
FPGA: Waves 2: Simple Sinewave
FPGA: Waves 3: Computed Sinewave Oscillators

Introduction

I have a small (Brevia 2)(Brevia 2) development board [1] from Lattice Semiconductor featuring one of their XP2-family FPGAs. I'm using it to explore, in a very simple, basic kind of way, digital signal generation and processing. These blogs aren't tutorials and there's no structured path to any of it: it's just me experimenting and trying things out.

The iterated sine oscillators that I looked at in Waves 3 are 'ballistic' in nature; that is, you launch them and hope they remain on course with the oscillation neither growing or shrinking. Curiously, at a fixed frequency, the two forms I looked at did maintain a constant amplitude, though the time I examined them for wouldn't suffice if you wanted to be sure of them operating for an extended period of time. It was very obvious, however, that changing the frequency of operation on the fly did affect the amplitude. Although it's possible to add some form of AGC [Automatic Gain Control] - the digital equivalent of what you'd do with a Wien bridge oscillator to stabilise the amplitude - that rapidly complicates something that was supposed to be very simple. A further problem with the iterated oscillators is that the control parameters are in terms of either sines or cosines, so changing frequency to anything other than fixed, pre-computed values starts to get complicated as well. As far as I can see, they are best suited to fixed-frequency tones, whether of short duration, or continuous if you can convince yourself of the long-term stability.

If you're interested in these kinds of oscillator, there's an interesting paper by Clay S. Turner [2] which analyses them in terms of a rotation matrix operating on a pair of variables. It gives the popular variants, with their advantages and disadvantages, and enough information to make up your very own unique oscillator if you've a mind to.

Other Approaches

The most obvious alternative to an interative approach gets called 'phase accumulation'. Here the phase increment for each sample period is accumulated and the sine is then directly calculated from that. One simple and very popular way of doing that is via a sine look-up table, which is where I started my journey before I got diverted. Another way to calculate a sine, which was suggested in the comments of a previous blog, is via a CORDIC approach. That looks interesting and I may still try that but, for the moment, I'm going to try another idea that I came across and that is to calculate the sine using a Taylor's series (this one is really 'back to basics' - the maths is so far back I can't even remember being taught it).

Taylor's Series Sine Representation

On the surface this looks a bit crazy, with all the calculations that seem to be involved, but, because of the factorials, the series converges quickly (only for small numbers) so that not too many terms need to be used.

Programming books would steer you away from this one for general use for calculating sines because of the way the powers balloon quickly but, in this case, for use as an oscillator, I can work with the interval -pi/2 to +pi/2 and that keeps them to fairly modest values. A bit of work with a spreadsheet suggests that using six terms would result in a worst-case error of about half a bit for a 24-bit output (worst case error is when the sine is close to one). I won't achieve that, the way I'm going to do it with the 32-bit multipliers available in the DSP blocks in the FPGA and fixed-point arithmetic, but it will be more than good enough for the 12-bit DAC I'm using.

Apparently, calculating it is more efficient if you factor the series first using Horner's alorithm, but I like the simplicity of the basic form and so I'm going to try using that directly and see if I can get it working.

Data-flow Diagram

Here's how I've structured it:

Incidentally, today's apple is an Egremont Russet.

The first multiplier keeps multiplying the theta value (the phase of the sine wave at that sample) by itself to generate each power in turn [all the powers, not just the odd ones that we want]. The second multiplier then multiplies the result by a fractional coefficient which is the reciprocal of the factorial (from a small table). Finally, the results are added up to give the sine value. Skipping the even powers is achieved simply by having a value of zero for the particular coefficient. The alternating additions and subtractions come from working with signed numbers and building the required sign into the coefficient.

As a scheme, it's obviously not as efficient as it could be, but it has the virtue of simplicity. Two 32-bit multipliers, an adder and a small, 16-value coefficient table seems quite reasonable to me for a sine to almost 24-bit accuracy.

To accommodate the powers, my fixed point format needs to be able to handle numbers up to (pi/2)^11 which is around 144, so I have 8 bits ahead of the implied point like the top format here:

The coefficients in the table have a maximum value of 1, so I've stored them in the look-up table in the lower format. The multiplier outputs get mangled in order to cope with the shifts in the point placement.

Coding and Testing

Here's the VHDL I arrived at. Keep in mind I'm not a specialist in programmable logic and my coding style is probably a bit odd and quirky compared to the kind of thing a professional would come up with. Also, it hasn't been tested thoroughly nor had the output validated in any way. It's an experiment: use it for real at your peril!

------------------------------------------------------------------
--    ***** waves_taylor *****       --
--  Test of Taylor expansion sine calculation                   --
------------------------------------------------------------------
-- JC 9th October 2019               --
------------------------------------------------------------------
-- Rev    Date         Comments         --
-- 01     09-Oct-2019            --
------------------------------------------------------------------
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

entity wave_taylor_con is port(
  clk_in:   in std_logic;        --- system clock in (50 MHz oscillator)
  --- DAC connections
  mcp4821_ncs: out std_logic;        --- DAC cs
  mcp4821_sck: out std_logic;        --- DAC sck
  mcp4821_sdi: out std_logic;        --- DAC sdi
  mcp4821_nldac: out std_logic;        --- DAC load
  --- misc control signals on evaluation board that it might be good to hold at fixed levels
  spi_cs:   out std_logic;        --- 
  hold_n:   out std_logic;        --- 
  sram_cen:  out std_logic;        --- 
  sram_oen:  out std_logic;        --- 
  sram_wen:  out std_logic;        --- 
  uart_tx:  out std_logic);        --- 
end wave_taylor_con;

architecture arch_wave_taylor of wave_taylor_con is

signal spi_send, spi_ncs, spi_ncs_del,taylor_reset,taylor_enable: std_logic;
signal spi_output_sr_bit_count: std_logic_vector (5 downto 0);
signal spi_output_sr: std_logic_vector (15 downto 0);
signal interval_count: std_logic_vector (8 downto 0);
signal acc_count: std_logic_vector (7 downto 0);
signal taylor_count: std_logic_vector (3 downto 0);
signal mult1_product: std_logic_vector (71 downto 0);
signal mult2_product: std_logic_vector (71 downto 0);
signal add1_product: std_logic_vector (35 downto 0);
signal adder2_result: std_logic_vector (35 downto 0);
signal theta: std_logic_vector (35 downto 0);
signal phase_increment: std_logic_vector (35 downto 0);
signal n1: std_logic_vector (35 downto 0);
signal n2: std_logic_vector (35 downto 0);
signal n3: std_logic_vector (35 downto 0);
signal n4: std_logic_vector (35 downto 0);
signal n5: std_logic_vector (35 downto 0);
signal sine_value: std_logic_vector (35 downto 0);
signal osc_reset: std_logic_vector (1 downto 0);
signal coeff: std_logic_vector (35 downto 0);

--- declare the multiplier, the adder, and the table modules as components

component mult_module is
    port (
        DataA: in  std_logic_vector(35 downto 0); 
        DataB: in  std_logic_vector(35 downto 0); 
        Result: out  std_logic_vector(71 downto 0));
end component;

component adder_module is
    port (
        DataA: in  std_logic_vector(35 downto 0); 
        DataB: in  std_logic_vector(35 downto 0); 
        Result: out  std_logic_vector(35 downto 0));
end component;

component rom_module is
    port (
        Address: in  std_logic_vector(3 downto 0); 
        Q: out  std_logic_vector(35 downto 0));
end component;

begin
 wave_taylor_stuff: process (clk_in)
  begin
   if (clk_in'event and clk_in='1') then
   
    --- interval_count counts at the clock rate (50MHz)
    --- counts down by 500, so spi_send occurs every 10us (100kHz)

    if (interval_count(8 downto 0) = b"000000000") then     --- if zero
     interval_count(8 downto 0) <= b"111110011";      --- preset to 499
     spi_send <= '1';
    else
     interval_count(8 downto 0) <= interval_count(8 downto 0) - 1; --- count down
     spi_send <= '0';
    end if;

    --- spi ncs goes low when triggered by spi_send, goes hi again when bitcount reaches 31
    
    if (spi_send = '1') then
     spi_ncs <= '0';
    elsif (spi_output_sr_bit_count(5 downto 0) = b"111111") then
     spi_ncs <= '1';
    end if;
    
    spi_ncs_del <= spi_ncs;

    --- taylor reset occurs at end of spi cs

    if (spi_ncs = '1' and spi_ncs_del = '0') then
     taylor_reset <= '1';
    else
     taylor_reset <= '0';
    end if;

    --- taylor count runs through the series
    
    if (taylor_reset = '1') then
     taylor_enable <= '1';
    elsif (taylor_count(3 downto 0) = b"1010") then
     taylor_enable <= '0';
    end if;
    
    if (taylor_enable = '1') then
     taylor_count(3 downto 0) <= taylor_count(3 downto 0) + 1;
    else
     taylor_count(3 downto 0) <= b"0000";
    end if;
    
    --- spi output bit count only counts when enable spi cs is low
    
    if (spi_ncs = '0') then
     spi_output_sr_bit_count(5 downto 0) <= spi_output_sr_bit_count(5 downto 0) + 1;
    else
     spi_output_sr_bit_count(5 downto 0) <= b"000000";
    end if;

    --- phase accumulator

    if (spi_send = '1') then
     if (acc_count(7 downto 0) = b"0000000") then     --- if zero
      acc_count(7 downto 0) <= b"01100100";      --- preset to 100
      theta(35 downto 0) <= b"1_11111110_011011011110000001001010110"; --- set to -pi/2
     else
      acc_count(7 downto 0) <= acc_count(7 downto 0) - 1;   --- count down
      theta(35 downto 0) <= adder2_result(35 downto 0);
     end if;
    end if;

    --- taylor expansion

    if (taylor_reset = '1') then
     n2(35 downto 0) <= theta(35 downto 0);
     sine_value(35 downto 0) <= b"0_00000000_000000000000000000000000000";
    elsif (taylor_enable = '1') then
     n2(35 downto 0) <= n1(35 downto 0);
     sine_value(35 downto 0) <= n4(35 downto 0);
    end if;

    --- spi output shift register
    
    if (spi_send = '1') then             --- load...
     spi_output_sr(10 downto 0) <= sine_value(26 downto 16);     ---   
     spi_output_sr(11) <= not sine_value(35);        --- inverse of sign bit  
     spi_output_sr(15 downto 12) <= b"0011";         ---   dac control bits
    elsif (spi_ncs = '0' and spi_output_sr_bit_count(1 downto 0) = b"11") then --- shift...
     spi_output_sr(15 downto 1) <= spi_output_sr(14 downto 0);    ---  the register contents
     spi_output_sr(0) <= '0';
    end if;
   end if;
   
   phase_increment(35 downto 0) <= b"0_00000000_000010000000101011011111110"; --- pi/100
   --- multiplier output connection mangling
   --- this corrects for the output shifts incurred by using fixed point binary representation
   --- (no point actually shifting it - we can simply just rewire it)
   
   n1(34 downto 0) <= mult1_product(61 downto 27);
   n1(35) <= mult1_product(71);
   n3(34 downto 0) <= mult2_product(68 downto 34);
   n3(35) <= mult2_product(71);
   
   ---
   
   mcp4821_ncs <= spi_ncs;
   mcp4821_sck <= spi_output_sr_bit_count(1);
   mcp4821_sdi <= spi_output_sr(15);
   mcp4821_nldac <= '0';
  
   --- hold these device control pins at a fixed level to stop them flapping around
   spi_cs <= '1';
   hold_n <= '1';
   sram_cen <= '1';
   sram_oen <= '1';
   sram_wen <= '1';
   uart_tx <= '1';
  end process wave_taylor_stuff;
 
  --- instantiate the multipliers, adder-subtractor, and table
  
  mult_module1: mult_module
   port map(
    DataA => theta(35 downto 0),
    DataB => n2(35 downto 0),
    Result => mult1_product(71 downto 0));
    
  mult_module2: mult_module
   port map(
    DataA => n2(35 downto 0),
    DataB => coeff(35 downto 0),
    Result => mult2_product(71 downto 0));
    
  adder1: adder_module
   port map(
    DataA => sine_value(35 downto 0), 
    DataB => n3(35 downto 0),
    Result => n4(35 downto 0));
    
  adder2: adder_module
   port map(
    DataA => phase_increment(35 downto 0), 
    DataB => theta(35 downto 0),
    Result => adder2_result(35 downto 0));

  rom_module1: rom_module
   port map(
    Address => taylor_count(3 downto 0),
    Q => coeff(35 downto 0));

end arch_wave_taylor;

I used IPExpress to generate a signed 32-bit multiplier with no registers, a 32-bit signed adder with no registers, and a 32-bit wide distributed ROM table with 16 entries, which you'll see treated as components in the code. The ROM table was given this initialisation file - this is the coefficients in S1.34 binary format.

010000000000000000000000000000000000
000000000000000000000000000000000000
111101010101010101010101010101010101
000000000000000000000000000000000000
000000001000100010001000100010001001
000000000000000000000000000000000000
111111111111110010111111110011000000
000000000000000000000000000000000000
000000000000000000001011100011101111
000000000000000000000000000000000000
111111111111111111111111111001010010
000000000000000000000000000000000000
000000000000000000000000000000000000
000000000000000000000000000000000000
000000000000000000000000000000000000
000000000000000000000000000000000000

And here it is generating the fourth and first quadrants of a sinewave. I've got 200 steps from -pi/2 to +pi/2, so the time is 10uS x 200 = 2mS before it repeats. Generating a complete sinewave would then require using symmetry to get the values in the other two quadrants (and all the bother of manipulating the phase which is in terms of radians and more difficult to deal with than with a sine table which would normally have a power of 2 entries and thereby be very amenable to modulo arithmetic).

Conclusion

Using programmable logic in an FPGA to experiment with computational algorithms is very interesting. Casting things in a form suitable for the logic leads to a somewhat different mindset to how you'd look at them if you were computing using a processor, both from the point of view of the choice of algorithm but also in how it gets implemented.

Tinker, Taylor, Soldier, Sine

For any of you who aren't British, the title is a rather convoluted pun on a pun. 'Tinker, tailor, soldier, sailor,...' is a traditional children's rhyme from centuries past. It was used last century as the title of a famous spy novel, 'Tinker, Tailor, Soldier, Spy', by the author John le Carre. And I'm now reusing it in the sense of tinkering with Taylor to work out a sine.

[1] http://www.latticesemi.com/en/Products/DevelopmentBoardsAndKits/LatticeXP2Brevia2DevelopmentKit.aspx
[2] http://www.claysturner.com/dsp/2nd_OSC_paper.pdf

Top Comments

DAB over 6 years ago

Very good post.

DAB
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel
clem57 over 6 years ago

The sin functions can be interesting to program and crazy to optimise. Makes me think I need to test this out.. Maybe a blog coming on in my head.
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel
genebren over 6 years ago

Jon,

Great blog! This is yet another clever way to generate your sinewaves!

Well done.
Gene
- Cancel
- Vote Up +2 Vote Down
- Sign in to reply
- More
- Cancel