Posted by Al Watt in Programmable Logic on May 25, 2019 1:06:43 PM
Sadly this project is not yet working. I need a bit more time but I will definitely get there!
Project Day: 16,083. I thought I should put all work aside today and try to get a bit further with this project and fire up Quartus 18.0 Lite on this PC. But all I saw was a white square about 5x5cms and nothing exciting was happening. It seemed that I also had Quartus 17.0 on the hard drive and had to go into its Uninstall directory and clink on both quartus and modelsim uninstall.exe files. Windows 10 didn't offer me the app removal option. Anyway, End of problem! I then loaded the last Vidor FPGA project which was TEXT_DEMO. That compiled OK and usage was LEs: 1335, regs: 433 and Memory:90112 bits. Next move was to investigate the built-in library functions for NCO, MAC, ADD and CORDIC. The NCO setup looked exciting at first as an option to have multiple channels appeared. Great I thought, lets put in 10,000 and the job is almost done! However the red error came up and said that I should instead choose between 1 and 8. The other functions did not really offer an easy NCO, so I found many examples of Verilog NCO code and tried one from Zipcpu.com. It compiled with no errors but usage had not changed, so I suspect that I have not wired the NCO up correctly and the compiler has probably decided it is not doing anything... The plan is to drive the NCO's increment/phase register from the text_demo MCU monitor port as an experiment. Once I have one NCO running, try to make instantiate said NCO N times, where N allows the system to fit the Vidor's FPGA. Finally, add the MIDI input code. Again, many examples of that are available in Verilog. I have tried to make an array of two 8-bit data ports but I very much doubt it will work. Looks nice though ;-)Each data output port will ultimately be combined and/or used as a left/right stereo pair. wire [7:0] af_out[2]; I am the first to admit that I am a complete n00b at Verilog ( as proved by the above line!) , so this will be an interesting learning curve. module TEXT_Demo
- ( input Clk_48MHz,
output [3:0] tmds_out_p,
output [3:0] tmds_out_n,
input SPI_CLK,
input SPI_MOSI,
input SPI_CS
); wire PixClk;
wire PixClk5;
wire HSync;
wire VSync;
wire Video;
wire [9:0] encRed;
wire [9:0] encGreen;
wire [9:0] encBlue;
wire [3:0] tmds_out;
wire [23:0] Pixel;
wire [10:0] Row;
wire [10:0] Col;
wire i_ld;
reg [15:0] i_dphase;
wire [7:0] af_out[2]; PLL ClockGen(.inclk0(Clk_48MHz), .c0(PixClk), .c1(PixClk5)); Syncro SYN(.PixClk(PixClk), .HSync(HSync), .VSync(VSync), .Video(Video), .Row(Row), .Col(Col));
TEXT TXT(.PixClk(PixClk), .Row(Row), .Col(Col), .Pixel(Pixel), .PixClk5(PixClk5), .SPI_CLK(SPI_CLK), .SPI_MOSI(SPI_MOSI), .SPI_CS(SPI_CS));
TMDS_encoder ENC(.inRed(Pixel[23:16]), .inGreen(Pixel[15:8]), .inBlue(Pixel[7:0]), .Hsync(HSync), .Vsync(VSync), .PixClk(PixClk), .Video(Video), .outRed(encRed), .outGreen(encGreen), .outBlue(encBlue));
TMDS_Serializer SER(.RedEncoded(encRed), .BlueEncoded(encBlue), .GreenEncoded(encGreen), .PixClk(PixClk), .PixClk5(PixClk5), .TMDS(tmds_out));
DiffBuf B_DB(.datain(tmds_out[0]), .dataout(tmds_out_p[0]), .dataout_b(tmds_out_n[0]));
DiffBuf G_DB(.datain(tmds_out[1]), .dataout(tmds_out_p[1]), .dataout_b(tmds_out_n[1]));
DiffBuf R_DB(.datain(tmds_out[2]), .dataout(tmds_out_p[2]), .dataout_b(tmds_out_n[2]));
DiffBuf C_DB(.datain(tmds_out[3]), .dataout(tmds_out_p[3]), .dataout_b(tmds_out_n[3]));
nco NCO1(.i_clk(clk_48MHz), .i_ld(i_ld), .i_dphase(i_dphase), .o_val(af_out[0]));
nco NCO2(.i_clk(clk_48MHz), .i_ld(i_ld), .i_dphase(i_dphase), .o_val(af_out[1]));
endmoduleSame date! Here is the NCO: module nco(i_clk, i_ld, i_dphase, o_val);
W = 32, // word size
OPW = 8; // output word size
localparam P = LGTBL; input wire i_clk;
input wire i_ld;
input wire i_dphase; output wire [OPW-1:0] o_val;
reg [W-1:0] r_step;
initial r_step = 0;
always @(posedge i_clk)
if (i_ld)
r_step <= i_dphase;
reg [W-1:0] r_phase;
initial r_phase = 0;
always @(posedge i_clk) r_phase <= r_phase + r_step;
ROMtable
stbl( r_phase[(W-1):(W-OPW)], i_clk,o_val);
endmodule
- Day 16084
- No progress as a high-priority job came up: mowing the lawn!But that took some hardware work, as the petrol engine had a completely broken starter mechanism and defective petrol tank valve. By valve, I do not mean a beam-tetrode! Anyway, you can see the before and after pictures here...
- Day 16085.Further thoughts: the clock seems to be 48MHz. If we use an audio sampling rate of say 48kHz ( DAT-Digital Audio Tape ) then we could if we really wanted use one 32-bit NCO if it had zero latency and could do the MAC (multiply and accumulate) operation in one cycle. The 32-bit phase and accumulator registers would each consist of 1,000 word blocks of RAM sequentially addressed by a data address generator. That storage is 32 x 1,000 x 2 = 64,000 bits and is well within the 504kbits available. So the question is how many clock cycles are needed to perform the MAC. Needless to say, these are approximate figures and reducing the audio sampling rate to 44.1kHz (CD-Compact Disc) would allow us to use a nicer number of 1024 for our blocks of registers, assuming single-cycle MAC performance.If it takes say 3 clock cycles for each MAC, then we simply use more MAC units as we could have 28 36x36 multipliers. I don't really need a multiplier unit to simply add a phase resister to an accumulator register, but in the world of DSP you tend to get given a MAC in the silicon and can use the coefficient of 1 as the 'other' input. But the FPGA designers are far brighter than me and they can offer soft multipliers that probably do exactly what I need. So I had better investigate the MegaWizard and see what things like altmemmult will do.
- Project day 16086.Here is a block diagram of the system. It is a 1024-channel stereo sine wave generator. However I have not wired up the inputs, outputs or clocks and one or two other bits.Also I have not yet worked out how to add two 10-bit data address generator block for the multiplexed NCO. The subtlety is that the second DAG needs to count 7 behind the other one, as 7 is apparently the latency of the MAC.If anyone can assist I would appreciate it!It assumes the NCO can run in one cycle. The NCO is multiplexed over 1024 channels. A sine look-up table converts the top part of each NCO output to a sine wave. The sine wave is multiplied by the left gain for that channel and accumulated.Same goes for the right stereo channel.After 1024 accumulates the left and right accumulators need to be cleared, which again is something left off.Everything is pipelined and each block should run in one clock cycle.It should be easy to copy this block N times until we either run out of silicon or get to 8,000 channels, which should be enough! That is if it fits of course...