Hi to all, welcome back.
Eight days left, WOW, this time has flown away quickly. I hope you all are where you wanted to be for your project.
As I knew my project would have presented many obstacles to me, but this is the purpose of a challenge. The next posts will be the last ones, so I will try to show and explain better the techniques that I used to solve problems and exceed the limits where possible. But before I would like to bring your attention to one (potential ?) issue that I am having with the DMA transfer and the SRAM, which is shared by the two cores.
I already explained that my emulator shows the gameplay through a video VGA driver. To do that it needs to render pixels on a video frame buffer, which lives in the SRAM. If I am not wrong, the memory space of the SRAM is always accessible by both cores, except we protect ranges and addresses with the memory protection unit at a low level, or by assigning a range to one core in the linker script at a high level.
That said, I used the core cm0+, clocked at 75MHz, to generate the video, and audio and to handle the Capsense joypad. While I left the cm4, clocked at 150MHz, to run the entire emulation which needs more operative power.
Now, at this point, we understand that the cm0+ has in its domain a frame buffer, and streams pixels stored in it, line by line to the GPIO port interfaced to the VGA monitor through a resistor DAC. But at the same time, the core cm4 needs to generate graphics on that frame buffer, so here the problem comes out.
The VGA driver streams pixels data using a DMAC channel. A pixel clock is used to trig the DMA that for each trig quickly transfers the data from the buffer to the GPIO (@75MHz). The cm4 should render the "next" line in a memory position subsequent to the one used by the VGA driver. In other words, the cm4 prepares the net line to stream by the cm0+.
I have read that the BUS that interfaces the SRAM to the cores and peripherals, has an arbiter that rules the accesses between the different elements. What I didn't expect is that the DMA seems NOT having priority over normal access by a core ( buffer[x] = byte, I mean).
I was able to notice this because one pixel, at a resolution of 256 pixels in width, is 95ns wide. this means that a small delay in accessing the memory by the DMA is absolutely visible on the monitor.
In this snapshot taken from the Logic Analyzer, is shown how the DMA transfer (Red) is affected by memory location assignment by the cm4 (yellow).
The result is a noisy image on the screen. I have worked hard to try to coordinate this access, but it is almost impossible to use IPC since the DMA can't pause the cm4, and the renderer in the cm4 eats a lot of cycles while rendering graphics and running the emulated CPU (Z80).
The best I could do is to instruct the VGA driver to show even lines and leave the odd lines black. This creates a nice old-fashioned CRT effect and leaves time for rendering graphics by the cm4 in peace without showing a shacked scanline. Another trick I used is to render graphics in between the shown scanlines and to emulate the virtual machine in the "Blank" area of a video frame at the bottom of the video frame.
This looks better, and it was possible by using memory-mapped registers instead of IPC. In other words, both cm0 and cm4 write/read a specific memory location and act accordingly. Like semaphores of IPC but faster and without interrupts.
The two next snapshots show the difference between toggling a GPIO using a shared memory location and using an IPC semaphore, please correct me if I am doing something wrong .
// cm0+ for (;;) { #if 0 _SHARED_gCommand = 1; GPIO_PRT10->OUT |= 1; while (_SHARED_gCommand) {}; #else Cy_IPC_Sema_Set(16, false); GPIO_PRT10->OUT |= 1; while (Cy_IPC_Sema_Status(16) == CY_IPC_SEMA_STATUS_LOCKED) { } #endif }
// cm4 for (;;) { #if 0 while (_SHARED_gCommand == 0) {}; _SHARED_gCommand = 0; GPIO_PRT10->OUT &= 0xFE; #else while (Cy_IPC_Sema_Status(16) == CY_IPC_SEMA_STATUS_UNLOCKED) { } GPIO_PRT10->OUT &= 0xFE; Cy_IPC_Sema_Clear(16, false); #endif }
Shared memory:
IPC semaphore:
In any case, even if the two processes are synchronized as shown, there are still problems as long as there is access to the SRAM.
That said, if I am not wrong, this is an issue for real-time projects or when there are tight timings. while for graphics It's possible to use a TFT module since many controllers have their own frame buffer. But if you want a VGA or a Video composite signal, things can become difficult.
The next posts will be the conclusion of this adventure.
Top Comments