element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Path to Programmable
  • Challenges & Projects
  • Design Challenges
  • Path to Programmable
  • More
  • Cancel
Path to Programmable
Blog Path to Programmable Blog 4 - Adding a PL Peripheral & using PS DMA
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Events
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: avnrdf
  • Date Created: 1 Dec 2018 4:11 PM Date Created
  • Views 5920 views
  • Likes 4 likes
  • Comments 9 comments
  • xilinz zynq
  • soc
  • path to programmable
  • zynq
  • xilinx
  • xilinx sdk
  • fpga
  • vivado
  • avnet minized
  • programmable logic
  • minized
Related
Recommended

Path to Programmable Blog 4 - Adding a PL Peripheral & using PS DMA

avnrdf
avnrdf
1 Dec 2018

In the preceding posts, we had a quick look at what Zynq-7000 is (Path to Programmable Blog 1 - Getting Started), the workflow (Path to Programmable Blog 2 - Xilinx Tool Flow & Getting Started with Zynq-7000) and we configured a couple of PS peripherals and ran tests (Path to Programmable Blog 3 - PS Peripheral Configuration & TCL).

Now comes the important part: making the PL talk to the PS & DRAM, which will be used in probably every design that targets Zynq.

 

HW Chapter 6 video: Merging the PS & PL

 

There are two types of interfaces between the PS & PL:

  • Functional interfaces which include AXI interconnect, EMIO, interrupts, DMA flow control, clocks, and debug interfaces. IP blocks in the PL can connect to these.
  • Configuration signals which include the processor configuration access port, configuration status, single event upset & Program/Done/Init. These signals are connected to fixed logic within the PL configuration block, providing PS control.

 

The functional interfaces allow us to transfer data: the 4 General Purpose AXI, the 4 High Performance AXI Ports & the Accelerator Coherency Port.

imageimage

  • The 4 General Purpose AXI are of 2 types:
    • 2x M_AXI_GPx - where the PS is the Master & PL is the Slave
    • 2x S_AXI_GPx - where the PL is the Master & PS is the Slave
    • This allows both, the PS & PL to be the initiator (Master) depending on the use case.
    • The PL can access the PS IOP and PS Slaves using the 2 Slave GP Ports (S_AXI_GPx) & although memory access is possible, it's slow.
    • The PS can access designs in the PL using the 2 Master GP Ports (M_AXI_GPx).

 

  • The 4x S_AXI_HPx ports allow the PL to directly access the PS OCM & Memory Controller with very low latency & have FIFOs built into the interface for streaming.
  • The S_AXI_ACP (Accelerator Coherency Port) can access all PS memory & peripherals, and has very low latency since it is connected to the Snoop Control Unit, which is hop away from the L1 & L2 caches.

 

imageimage

All of these interfaces are based on the AMBA AXI 3.0 protocol, but it seems that Xilinx IP transparently handles the conversion at exposes AXI 4.0 to the user.

The 3 types of AXI 4.0 interfaces that are available to the user are AXI4 (for high performance, memory-mapped), AXI4-Lite (simple, low throughput eg. control registers) & AXI4-Stream.

 

The interconnect is complex (and very interesting). You can find more information over here:

Xilinx UG1037: Vivado AXI Reference Guide

Xilinx UG761: AXI Reference Guide

Xilinx UG585: Zynq-7000 Technical Reference Manual 

 

Lab 5 - Adding a PL Peripheral

 

This lab involved adding a peripheral to the PL (Block RAM) and connecting it to the PS via the AXI Interconnect.

Picking up from where we left off, we add the AXI BRAM Controller IP:

image

After making a couple of changes to the IP (bus width etc), run Block Automation, which will automatically add a Block Memory Generator

imageimage

The PS doesn't have a AXI Master Port, so edit the PS7: Enable X_AXI_GP0 and enable FCLK_CLK0 (50Mhz). Run Connection Automation once more:

image

Vivado automatically adds the AXI Interconnect Block, the PS Reset and the Designer Assistant makes connections between the BRAM Controller, AXI Interconnect & PS7. It also wires up the Clock & Reset.

The Address Editor tab shows us the address to which the BRAM Controller has been mapped.

image

Here's what the 'high level' schematic looks like. The BRAM Generator is 2nd from the left, followed by the AXI BRAM Controller, AXI Interconnect & Zynq7.

image

Since this is an implemented design, Vivado lets you look at what's in each of those block. The BRAM Generator has a couple of Flip-Flops & LUTs which eventually connect to a RAMB36E1, which is the primitive for a "36K-bit Configurable Synchronous Block RAM", or Block RAM. The output width is 32 bits, and since we had set the width of the BRAM Generator to 64 bits, 2 of these are connected in parallel

image

I tried tracing the path of the databus from the BRAM to the AXI Interface, which involved opening up the lower levels of components, which displays the actual primitives that the design is mapped to in hardware. This also exposes many internal datapaths, and after expanding the cone a couple of times, Vivado was already displaying over 10000 Nets. For reference, the image on the right is a zoomed in version of the right of the highlighted section on the left. Thanks to Block Automation, we do not need wire all this up manually!

image image

This is what the implemented design looks like:

imageimage

We don't have much of a design in the PL (technically, only BRAM), but BRAM Controller & Interconnect use up some of the programmable logic.

 

imageimage

 

 

HW Chapter 7 video: Zynq PS DMA Controller

 

Now that the BRAM in the PL is connected to PS, its time to consider how it'll be used. Since it's been mapped to a memory address, the simplest way would be to use pointers to copy data to/from the memory address to an array. However, this isn't the best solution when it comes to performance, since the data would need to go from the PL to the Central Interconnect via the Slace GP port, then to the On-Chip Memory and L2 Cache before making it to the DRAM. The CPU processes each transfer, so it gets held up as well. However, if you make use of DMA, not only is the CPU free to continue executing, but the data path is shorter since it bypasses the cache.

 

imageimage

The DMA controller itself is complex: transfers are controlled by the DMA instruction execution engine which has its own instruction set. It supports upto 8 channels, each of which has its own thread and uses round robin arbitration to ensure that all channels have equal priority.

imageimage

As usual, the Zynq-7000 TRM contains details like the instruction microcode, DMA initialization, interrupts etc.

PL based DMA controllers can also be added in the form of IP, and these can make use of the AXI_HP ports to interface directly with the DRAM Controller.

image

 

Lab 6 - Improving Data flow between PL and PS utilizing PS DMA

 

Continuing from Lab 5, we export the project in Vivado & launch Xilinx SDK.

Create a new BSP and note that the 'system.hdf' file lists the bram & AXI interfaces that were added to the design in Vivado.

image

Next, we import the 'dma_test.c' file that was provided with the training material and add it to the source of a new application in SDK, which gets built automatically.

Since we've got a design (BRAM, AXI Interconnect etc.) that needs to be mapped to the FPGA fabric (PL), we need to program the PL, which is done using the bitstream.

We can do this manually, or tick the checkbox in the debug/run configuration that does it automatically.

image

After this, open up a Serial terminal and click 'run'.

 

The 'dma_test.c' file that we imported had code that gives the user the option of executing different types of transfers (BRAM to BRAM, DRAM to DDR3 or DDR3 to DDR3) of varying sizes.

After initializing the hardware, it first performs the transfer using the CPU, and repeats the process using DMA. The clock cycles taken are logged and printed out.

 

Here's the part of the code that initializes the CPU & DMA transfers and this is what the results look like:

imageimage

Unsurprisingly, DMA is a lot quicker.

There's no doubt that Zynq peripherals are complex & a little complicated to understand and work with at first, but Xilinx provides drivers to make things easier.

imageimage

 

Progress:

 

image HW Chapter 6 video: Merging the PS & PL

image Lab 5 - Adding a PL Peripheral

image HW Chapter 7 video: Zynq PS DMA Controller

image Lab 6 -  Improving Data flow between PL and PS utilizing PS DMA

  • Sign in to reply
Parents
  • kriswil
    kriswil over 5 years ago

    Does anyone know a tutorial on how to use PL DMA for streaming data from PL IO straight to DDR using PL DMA (which is much faster than PS DMA) ? I've been trying to find for a while. There are some examples, like the one for Zybo but it has custom lock IP Cores which I can't access.They're using AXI Video In / Video Out for HDMI video signal. I'm looking to stream raw video data directly from a video CMOS chip, not HDMI format.

     

    So it would look like this:

    input : PL IOs -> (something - either IP Core or Verilog code) -> PL DMA -> AXI -> PS -> DDR3

     

    I can't figure out how to send data from custom PL Block (IP Core or Verilog) to PL DMA.

     

     

    Then I need to stream it out from DDR3 to Ethernet:

    DDR3 -> PS -> Ethernet+DMA : output

     

    Anyone seen any tutorial like that?

     

    Thank you!

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • avnrdf
    avnrdf over 5 years ago in reply to kriswil

    Try using the AXI DMA IP Core.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Comment
  • avnrdf
    avnrdf over 5 years ago in reply to kriswil

    Try using the AXI DMA IP Core.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
Children
No Data
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube