Introduction
Some more simple DSP [digital signal processing] stuff with my lovely little Lattice Brevia-2 board with its XP2 FPGA.
A couple of years back, I did a collection of blogs called 'Waves' using this board (see the links at the end). For those blogs I wired up a Microchip 12-bit DAC, but I thought it might be interesting to get a converter that could give me a bit more resolution for doing audio stuff with, so I splashed out and bought myself a Pimoroni Pico Audio board [PIM544] as a quick way to prototype without having to design a pcb. The board is intended for use with a Pico, but the interface is standard I2S, so I don't suppose it will prove to be too difficult to drive it from an FPGA instead. The device they've used on the board is a PCM5100A from TI. It's not a full CODEC, just the DAC side, and it can apparently manage to work to 32 bits at the slower sampling rates.
To start with I'm going to wire it up and then, as a test, see if I can generate a pair of computed, fixed-frequency, quadrature sinusoids on the left and right channels using the Taylor-series method I experimented with in one of the Waves blogs [4]. I'll need to extend what I did previously because, back then, I stopped short and didn't implement fully the phase-accumulator that will be necessary to generate a continuous waveform.
Keep in mind that I'm not a DSP specialist and don't have a professional background in this area, so this is me experimenting and trying out stuff: it's most definitely not intended as a tutorial. Also, I'm self-taught with the HDL, so I may be passing on some bad habits here.
Hardware
The hardware side is very simple and consists of a carrier board, made out of stripboard, to connect the Brevia board to the Pico Audio board. Not very elegant, but it will work well enough for a prototype. The apple is a Red Pippin.
The extra red wire, that you can see in the photograph above, is 5V that I've had to bring across from the USB area of the Brevia board. That's needed because the Pimoroni board has a regulator on it, presumably to get a clean 3.3V rail, but there's only +3.3V on the Brevia's header. The additional chip and crystal, that you can see on the stripboard, forms a 12.288MHz oscillator that I'm going to feed to the FPGA and use as an alternative to the Brevia's 50MHz oscillator. I tried to derive this within the FPGA, but I couldn't get an exact 12.288MHz clock from the 50MHz using the PLL. The reason for wanting the frequency of 12.288MHz is because it's a multiple of common audio sample rates like 48ksps, 96ksps, etc.
FPGA Logic Design
The design of the logic in the FPGA falls naturally into several parts: a 'phase accumulator', to keep track of where we are on the waveform; the combined sine and cosine calculation, to generate the next pair of output samples given the phase; and the I2S interface, to transmit the left and right samples (sine and cosine results) to the external DAC.
These will all have to be kept 'in step' so that output samples arrive at the DAC at the right time. As the I2S output has to run continuously, I'm going to design it so that the I2S interface 'free runs' on the clock and, at a fixed point in the output cycle, triggers an update of the phase accumulator and then the subsequent sine and cosine calculation. Because, in this case, the sine and cosine have the same frequency, they can be calculated together, with some of the calculation shared.
To make life easy for myself, I'm going to have everything synchronous to one clock. Rather than that being directly the 12.288MHz clock that comes from my oscillator, I'm going to 'perch' an internal PLL [phase-locked loop] on top of it. That PLL will multiply the frequency by four and generate an internal 'master' clock of 49.152MHz that will drive all my logic. That's then still a multiple of the common sample rates used for audio (I'm actually using 96ksps here) but has the advantage over the 12.288MHz clock that I can do a good bit more calculation between the sample times (this particular XP2 part only has a small number of multipliers to work with, so I'm thinking ahead as to how I might make best use of them). It is, though, still a modest enough frequency that I'm not going to run into any real timing issues within the FPGA (provided I'm reasonably careful with the way I design) and I can focus most of my attention on what the logic does.
The I2S interface is little more than a shift register to turn the computed samples into a serial bitstream as per the I2S standard. Main challenge will be to make sure that the bits come out in the right order and that they line up properly with the word clock - I should be able to manage that with the help of an oscilloscope. One small curiosity, from going back and reading the original Philips spec for I2S, is that it doesn't seem to strictly specify which of the two samples comes out first (left or right channel of the stereo pair).
For the computed-sinewave oscillator, I'm going to do a small elaboration of the Taylor-series oscillator that I designed previously.
This is how I'm generating the powers within the FPGA, given the value for theta (the phase that the samples are being calculated for)
each value in the sequence is simply the last result, held by the register, multiplied again by the phase value theta. The register will be preset to 1 to start things off.
With the previous blog, I took the odd powers as they came along, to give me a sinewave, and discarded the even ones. That wasted cycles but made for a fairly simple structure in the FPGA. If we look at the respective series for sine and cosine, though,
we can see that directing the powers alternately to the sine and the cosine calculation to build each respective result could be reasonably efficient and would nicely elaborate the way my original algorithm worked, so I'm going to try that and see how I get on.
The other thing I'm going to do, that's slightly different to what I did before, is some 'normalisation' of the powers and constants. I'll explain that in a moment.
The starting value for the computation, the phase (theta) of the waveform for the next sample time, comes from a 'phase accumulator'. For a single, fixed frequency, like I'm aiming for here, the phase increment that gets added to the accumulator each time will be a constant. Change it to a different constant and we will step along our imaginary sinewave, that we imagine ourselves sampling, at a different pace and get a different, fixed output frequency as a result. Start varying the increment and we'll have a varying frequency, which sounds like it might be fun to play with.
That, then, leads to a fundamental and interesting question that I don't know the proper answer to, not having a background in real DSP stuff. How do I represent the phase? One possible approach might be to simply use a signed fixed-point or signed floating-point number and wrap the accumulator when we get to a value of pi (back to -pi). That, though, requires some awkward arithmetic to detect when we run past pi and subtract 2*pi to make it cyclic. An alternative method, and the one I'm going to try here, is to normalise the phase so that the value in the phase accumulator represents the fraction of a full turn. That way, the phase accumulator arithmetic will be operating modulo a power of 2 and will simply wrap round without me having to do any comparisons at all. The normalisation is simply to divide both sides of the series through by pi.
Finally, I need to consider the factorials in both equations. There are the two issues of where they come from and how we achieve what looks on paper like a division. Although the factorials could be computed by the FPGA, that's wasteful as they are exactly the same every time the calculation is done. Since there aren't too many of them, it's going to be more straightforward to precompute them and store them in a ROM. The division I'm going to avoid altogether by the simple expedient of calculating the reciprocal to put in the ROM: that can then be multiplied with the power using a multiplier, so avoiding all the messiness of division. The normalisation can also be done by precomputation, simply by adjusting those ROM values. So, in the end, what looks like a complex piece of computation, turns out to be something that can be done with two multipliers, a ROM (to hold the coefficients), a bit of addition (I'm going to use 2's complement, fixed-point arithmetic, so even the subtractions can be baked into the constants in the ROM by simply making them negative numbers where appropriate, rather than positive), some registers for temporary storage, and a bit of multiplexing to steer results to where we want them. After trundling through 20 cycles (10 terms for each of sine and cosine), I should then have my sine and cosine values.
VHDL
Here's the VHDL. It's specific to Lattice XP2 because I've use the IPExpress generator to give me the multipliers, adders, and ROM as easy-to-use components. All the components flow-through without registers except for the ROM which has the input address latched with the clock (that's inherent in the way the block RAM works when used as a ROM).
The first section is the I2S stuff, followed by the series computation. I'm calculating the sine to about 32 bits accuracy (give or take a bit). The accuracy is best at zero and decreases until it's at its worst at plus or minus 180 degrees. That's inherent in the way that the series approximates the sine (look at the nice, animated picture on Wikipedia if you want to see why visually) and is why I'm using the interval from -pi to +pi, rather than going from zero up to 2*pi. Better accuracy could come from using a smaller interval, and making use of the waveform symmetry, but I wanted to keep the simplicity of just starting with the phase value and obtaining the sample values from it without further manipulation.
------------------------------------------------------------------ -- ***** fpga_sinewave ***** -- -- -- -- Target: LFXP2-5E-B2-EVN (Lattice Brevia 2 evaluation board) -- -- -- -- Taylor-expansion calculated sine and cosine -- -- as NCO (numerically-controlled oscillator) -- -- driving out via I2S connection to Pico Audio board. -- -- Sample rate 96kSPS [needs to be within about 2% for audio -- -- DAC to properly detect rate being used]. -- -- Sinewave on left channel, cosine on right. -- -- -- -- Uses following components generated with IPexpress: -- -- 2 x multiplier: 36 signed x 36 signed = result 72 signed -- -- 2 x adder: 36 signed, result 36 signed -- -- 1 x ROM: 32 x 36-bit numbers in BRAM (from coeffs3.mem) -- -- 1 x PLL: multiplies 12.288MHz input clock by 4 -- ------------------------------------------------------------------ -- (c)Jon Clift 30th October 2022 -- -- -- -- Feel free to use however you want, but bear in mind this is -- -- just me messing around with an FPGA and there is no guarantee-- -- as to correctness, no claim as to to fitness for any purpose,-- -- and no promise of any support. -- ------------------------------------------------------------------ -- Rev Date Comments -- -- 01 30-Oct-2022 -- ------------------------------------------------------------------ library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity fpga_sinewave is port( --- input clocks clk_in: in std_logic; --- on-board 50MHz oscillator (not used in this design) clk_12_288: in std_logic; --- clock from my 12.288MHz crystal oscillator --- I2S connections - these are the connections out to the Pico Audio board mute_n: out std_logic; --- mute i2s_data: out std_logic; --- i2s data i2s_bck: out std_logic; --- i2s bit clock i2s_lrck: out std_logic; --- i2s left/right word clock --- various miscellaneous control signals on evaluation board that it might be good to hold at fixed levels --- but that don't play a part in the design spi_cs: out std_logic; --- hold_n: out std_logic; --- sram_cen: out std_logic; --- sram_oen: out std_logic; --- sram_wen: out std_logic; --- uart_tx: out std_logic); --- end fpga_sinewave; architecture arch_fpga_sinewave of fpga_sinewave is --- declare all the signals we'll use signal clk_49_152: std_logic; signal pll_locked: std_logic; signal prescale_count: std_logic_vector (2 downto 0) := b"000"; signal i2s_bit_count: std_logic_vector (5 downto 0) := b"000000"; signal i2s_sr: std_logic_vector (31 downto 0); signal i2s_bit_en_falling,i2s_load: std_logic; signal taylor_reset,taylor_enable: std_logic := '0'; signal taylor_enable_del: std_logic := '0'; signal acc_count: std_logic_vector (7 downto 0); signal taylor_count: std_logic_vector (4 downto 0); signal mult1_product: std_logic_vector (71 downto 0); signal mult2_product: std_logic_vector (71 downto 0); signal add1_product: std_logic_vector (35 downto 0); signal adder1_result: std_logic_vector (31 downto 0); signal theta: std_logic_vector (31 downto 0); signal adjusted_theta: std_logic_vector (35 downto 0); signal phase_increment: std_logic_vector (31 downto 0); signal adjusted_product1: std_logic_vector (35 downto 0); signal powers: std_logic_vector (35 downto 0); signal adjusted_product2: std_logic_vector (35 downto 0); signal sine_sum: std_logic_vector (35 downto 0); signal cosine_sum: std_logic_vector (35 downto 0); signal sine_out: std_logic_vector (35 downto 0); signal cosine_out: std_logic_vector (35 downto 0); signal osc_reset: std_logic_vector (1 downto 0); signal coeff: std_logic_vector (35 downto 0); --- declare the PLL, multiplier, adder, and ROM module as components --- these are components that were created with IPexpress component pll_module port ( CLK: in std_logic; CLKOP: out std_logic; LOCK: out std_logic); end component; component mult_module is port ( DataA: in std_logic_vector(35 downto 0); DataB: in std_logic_vector(35 downto 0); Result: out std_logic_vector(71 downto 0)); end component; component adder_module is port ( DataA: in std_logic_vector(35 downto 0); DataB: in std_logic_vector(35 downto 0); Result: out std_logic_vector(35 downto 0)); end component; component unsigned_adder_module is port ( DataA: in std_logic_vector(31 downto 0); DataB: in std_logic_vector(31 downto 0); Result: out std_logic_vector(31 downto 0)); end component; component block_rom_module port ( Address: in std_logic_vector(4 downto 0); OutClock: in std_logic; OutClockEn: in std_logic; Reset: in std_logic; Q: out std_logic_vector(35 downto 0)); end component; begin fpga_sinewave_stuff: process (clk_49_152) begin if (clk_49_152'event and clk_49_152 = '1') then --- prescaler divides by 8 to give I2S bit rate of 6.144M --- prescale_count(2) will be the I2S bck prescale_count(2 downto 0) <= prescale_count(2 downto 0) + 1; --- count up --- generate an enable that precedes the I2S bit clock falling edge for one (49.152MHz clock) cycle if (prescale_count = "110") then --- is this count 6? i2s_bit_en_falling <= '1'; --- yes: high on next cycle (count 7) else i2s_bit_en_falling <= '0'; end if; --- bit_count now counts off the 64 bit times in the I2S cycle --- that's 2 x 32-bit samples for dual-channel 96ksps --- i2s_bit-count(5) will be the I2S lrck if (i2s_bit_en_falling = '1') then --- qualify with the enable i2s_bit_count(5 downto 0) <= i2s_bit_count(5 downto 0) + 1; --- count up end if; --- my shift register for the data is 32 bits --- this load signal then has to occur twice during the I2S cycle --- once for the left sample and once for the right if (i2s_bit_en_falling = '1') then --- qualify with the enable if (i2s_bit_count = b"011111" or i2s_bit_count = b"111111") then i2s_load <= '1'; else i2s_load <= '0'; end if; end if; --- i2s output shift register --- for any bit cycle, either loads a new sample or does a shift --- the sample can be either left or right value, the multiplexing is bundled into the code if (i2s_bit_en_falling = '1') then --- qualify with the bit enable if (i2s_load = '1') then --- load? if (i2s_bit_count(5) = '0') then --- use lrclk to select what to load... --- i2s_sr(31) <= sine_out(35); --- sign bit --- i2s_sr(30 downto 0) <= sine_out(31 downto 1); --- sine value i2s_sr(31) <= sine_out(35); --- sign bit i2s_sr(30) <= sine_out(35); --- sign bit i2s_sr(29) <= sine_out(35); --- sign bit i2s_sr(28) <= sine_out(35); --- sign bit i2s_sr(27 downto 0) <= sine_out(31 downto 4); --- sine value else --- or i2s_sr(31) <= cosine_out(35); --- sign bit i2s_sr(30 downto 0) <= cosine_out(31 downto 1); --- cosine value end if; else --- else i2s_sr(31 downto 1) <= i2s_sr(30 downto 0); --- shift out the register contents i2s_sr(0) <= '0'; --- lsb end if; end if; --- the rest of the code is the sine and cosine calculation --- taylor reset occurs after second sample has loaded into the I2S shift register if ((i2s_bit_en_falling = '1') and (i2s_bit_count = b"100000")) then taylor_reset <= '1'; else taylor_reset <= '0'; end if; --- taylor_count counts the terms of the series if (taylor_reset = '1') then taylor_enable <= '1'; elsif (taylor_count(4 downto 0) = b"10011") then taylor_enable <= '0'; end if; taylor_enable_del <= taylor_enable; if (taylor_enable = '1') then taylor_count(4 downto 0) <= taylor_count(4 downto 0) + 1; else taylor_count(4 downto 0) <= b"00000"; end if; --- phase accumulator update if (taylor_reset = '1') then theta(31 downto 0) <= adder1_result(31 downto 0); end if; --- taylor expansion calculations if (taylor_reset = '1') then powers(35 downto 0) <= b"0_001_00000000000000000000000000000000"; elsif (taylor_enable_del = '1') then powers(35 downto 0) <= adjusted_product1(35 downto 0); end if; if (taylor_reset = '1') then sine_out(35 downto 0) <= b"0_000_00000000000000000000000000000000"; elsif (taylor_enable_del = '1' and taylor_count(0) = '0') then sine_out(35 downto 0) <= sine_sum(35 downto 0); end if; if (taylor_reset = '1') then cosine_out(35 downto 0) <= b"0_000_00000000000000000000000000000000"; elsif (taylor_enable_del = '1' and taylor_count(0) = '1') then cosine_out(35 downto 0) <= cosine_sum(35 downto 0); end if; end if; --- at this stage in developing my design, I'm just going to have a constant increment for the phase --- this gives me a 1kHz sinewave phase_increment(31 downto 0) <= b"00000010101010101010101010101011"; --- adapt 32 bit theta to 36 bit adjusted_theta(0) <= '0'; adjusted_theta(31 downto 1) <= theta(30 downto 0); adjusted_theta(32) <= theta(31); adjusted_theta(33) <= theta(31); adjusted_theta(34) <= theta(31); adjusted_theta(35) <= theta(31); --- multiplier output connection mangling --- corrects for the shift in the position of the 'fixed' point at the output of the multiplier --- (there is no point in using a physical shift register when we can simply rewire the output) --- note that the sign bit has to be handled separately in this case because it always --- sits at the msb position of the multiplier output and hasn't moved adjusted_product1(34 downto 0) <= mult1_product(66 downto 32); adjusted_product1(35) <= mult1_product(71); adjusted_product2(34 downto 0) <= mult2_product(66 downto 32); adjusted_product2(35) <= mult2_product(71); --- connect the external control signals to signals in the design mute_n <= '1'; --- mute i2s_bck <= prescale_count(2); --- i2s bit clock i2s_lrck <= i2s_bit_count(5); --- i2s left/right word clock i2s_data <= i2s_sr(31); --- i2s data --- finally, hold these device control pins at a fixed level to stop them flapping around spi_cs <= '1'; hold_n <= '1'; sram_cen <= '1'; sram_oen <= '1'; sram_wen <= '1'; uart_tx <= '1'; end process fpga_sinewave_stuff; --- instantiate the actual PLL, multipliers, adders, and ROM table components --- that we need for the design and 'wire up' their ports with signals that --- connect them to the code above pll_module1: pll_module port map ( CLK => clk_12_288, CLKOP => clk_49_152, LOCK => pll_locked); mult_module1: mult_module port map( DataA => adjusted_theta(35 downto 0), DataB => powers(35 downto 0), Result => mult1_product(71 downto 0)); mult_module2: mult_module port map( DataA => powers(35 downto 0), DataB => coeff(35 downto 0), Result => mult2_product(71 downto 0)); adder1: unsigned_adder_module port map( DataA => phase_increment(31 downto 0), DataB => theta(31 downto 0), Result => adder1_result(31 downto 0)); adder2: adder_module port map( DataA => sine_out(35 downto 0), DataB => adjusted_product2(35 downto 0), Result => sine_sum(35 downto 0)); adder3: adder_module port map( DataA => cosine_out(35 downto 0), DataB => adjusted_product2(35 downto 0), Result => cosine_sum(35 downto 0)); rom_module1 : block_rom_module port map ( Address(4 downto 0) => taylor_count(4 downto 0), OutClock => clk_49_152, OutClockEn => taylor_enable, Reset => '0', Q(35 downto 0) => coeff(35 downto 0)); end arch_fpga_sinewave;
ROM Coefficients
I needed a way to generate the file that initialises the ROM with the coefficient values, as I didn't fancy trying to get it right doing it by hand. Rather than use C, which I would normally turn to for something like this, I decided to try GNU Octave. I had hoped there would be some support for working with large binary numbers, but I ended up programming it in much the way that I would have had to do with C. For what it's worth, here is the code (it's not all that nice, with the way I use double floats to do the conversion to binary, but it seems to work).
#------------------------------------------------------------------------------ # GNU Octave program # (c) Jon Clift October 2022 # Use however you like. No guarantee as to accuracy, no warranty, no obligation # to support. #------------------------------------------------------------------------------ # creates initialisation file for Lattice XP2 FPGA ROM in a BRAM # coefficients generated are for Taylor series sinewave generation # each 36-bit coefficient is in two's complement 4.32 form (fixed point binary) #------------------------------------------------------------------------------ filename = "coeffs3.mem"; fid = fopen(filename,"w"); i = 0; coeff = 0.0; # loop 32 times to generate each coefficient in turn do # calculate coefficient if(i==0) coeff = 1.0; else coeff = (pi ^ i)/factorial( i); endif # print to console in decimal printf("%.24f\n", coeff); # test whether coeff needs to be +ve or -ve # if negative, fiddle to give 2's complement if((rem(i,4)==0) || (rem(i,4)==1)) place = 8.0; else coeff = 16.0 - coeff; place = 8.0 - (1/(2^33)); endif # work along all 36 bits generating the 1s and 0s for j=0:35 if(coeff >= place) fprintf(fid,"1"); coeff = coeff - place; else fprintf(fid,"0"); endif place = place/2.0; endfor fprintf(fid,"\n"); i++; until (i==32) fclose(fid);
And here are the values that it generates

Results
Here are the two waveforms that get generated. This is the 'line' output from the board. I was really surprised at the high level of the signals (for a 3.3V chip) but after looking at the datasheet realised that the PCM5100 has charge pumps to power the output buffer.
I was quite pleased with that (after I'd spent far too long fiddling around to get it working).
This is what it looks like to the oscilloscope's FFT.
The fundamental at 1kHz is fine, but it's also noticeable that there is some 3rd harmonic distortion too, and maybe some second. Whether that's down to my sine generation or the DAC needs a little more investigation.
References
[1] fpga-making-waves
[2] fpga-waves-2-simple-sinewave
[3] fpga-waves-3-computed-sinewave-oscillators
[4] fpga-waves-4-tinker-taylor-soldier-sine
[5] fpga-waves-5-cordic-sine
[6] fpga-waves-6-reconstruction-filter
[7] fpga-waves-7-random-sequence-generator
[8] fpga-waves-8-fast-cordic-sine-and-cosine
[9] Lattice XP2 Brevia 2 Development Kit
[10] I2S (Wikipedia)
[11] Pimoroni: Pico Audio Pack
[12] Taylor_series (Wikipedia)
[13] Sine and Cosine (Wikipedia)
Top Comments