element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
FPGA
  • Technologies
  • More
FPGA
Blog Tria Vitis Platforms — Adding support for Hailo-8
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join FPGA to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: albertabeef
  • Date Created: 18 Nov 2024 5:28 PM Date Created
  • Views 2210 views
  • Likes 5 likes
  • Comments 4 comments
  • tria
  • ultrazed-ev
  • petalinux
  • zynq-ultrascale+
  • vitis platform
  • amd
  • ZUBoard
  • vitis
Related
Recommended

Tria Vitis Platforms — Adding support for Hailo-8

albertabeef
albertabeef
18 Nov 2024
Tria Vitis Platforms — Adding support for Hailo-8

Introduction

This project is part 4 of a 5 part series of projects, where we will progressively create AI enabled platforms for the following Tria development boards:

  • ZUBoard
  • Ultra96-V2
  • UltraZed-7EV

These projects can be rebuilt using the source code on github.com:

  • github.com/AlbertaBeef/tria-vitis-platforms/tree/2023.2

The following series of Hackster projects describe how the above github repository was created and serves as documentation:

  • Part 1 : Tria Vitis Platforms — Building the Foundational Designs
  • Part 2 : Tria Vitis Platforms — Creating a Common Platform
  • Part 3 : Tria Vitis Platforms — Adding support for Vitis-AI
  • Part 4 : Tria Vitis Platforms — Adding support for Hailo-8
  • Part 5 : Tria Vitis Platforms — Adding support for ROS2

The motivation of this series of projects is to enable users to create their own custom AI applications.

Introduction Part IV

In the previous projects ( Part 2, Part 3 ), we created Vitis overlays that were augmented with the PL-based DPU engine. We saw that the DPU was a scalable engine, adapting to the available PL resources, but at the expense of reduced performance, as shown in the following table:

image
Vitis-AI PL-based DPU — Performance Comparison (Camera: AlbertaBeef)

In this project, we propose to add an AI engine as an external device : the Hailo-8 Acceleration module. This will offer a constant 26 TOPS (peak) of performance, regardless of what logic resources are available in the PL.

In order to integrate the Hailo embedded driver, run-time, and TAPPAS, I refer to the following three milestones:

  • Milestone 1 — Hailo-8 detected on PCI express bus
  • Milestone 2 — Hailo-8 detected by driver and runtime
  • Milestone 3 — Hailo-8 working with TAPPAS
image
Hailo-8 Integration Milestones (Camera: AlbertaBeef)

Milestone 1 — Hailo-8 detected on PCI express bus

image
Hailo-8 Integration — Milestone 1 (Camera: AlbertaBeef)

In order to achieve the first milestone, we need to attach the Hailo-8 acceleration module(s) to our hardware targets, using the PCIe enabled designs (apps):

  • ZUBoard : tria-zub1cg-base, tria-zub1cg-dualcam
  • UltraZed-EV : tria-uz7ev-nvme

For the ZUBoard, we will be using the M.2 HSIO to attach the Hailo-8 B+M Key module:

image
ZUBoard + M.2 HSIO + Hailo-8 (Camera: AlbertaBeef)
image
Hailo-8 mounted on ZUBoard via M.2 HSIO (Camera: AlbertaBeef)

For the UltraZed-EV, we will be using the Opsero M.2 Stack FMC to attach the Hailo-8 M-Key module:

image
UltraZed-EV+ M.2 Stack FMC + Hailo-8 (Camera: AlbertaBeef)
image
Hailo-8 mounted on Opsero M.2 Stack FMC (Camera: AlbertaBeef)

After booting our target hardware, we want to load our PCIe enabled designs, and query the PCIe bus with the “lspci” utility

On the UltraZed-EV

root@uz7ev-evcc-2023-2:~# xmutil loadapp tria-uz7ev-nvme
...
tria-uz7ev-nvme: loaded to slot 0

root@uz7ev-evcc-2023-2:~# lspci
0000:00:00.0 PCI bridge: Xilinx Corporation Device d021
0001:00:00.0 PCI bridge: Xilinx Corporation Device 9132
0001:01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

On the ZUBoard

root@zub1cg-sbc-2023-2:~# xmutil load tria-zub1cg-dualcam
...
tria-zub1cg-dualcam: loaded to slot 0

root@zub1cg-sbc-2023-2:~# lspci
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

Use the “lspci -vv” variant of the command to list the details of the Hailo-8 AI Processor.

root@zub1cg-sbc-2023-2:~# lspci -vv
00:00.0 Bridge: Xilinx Corporation Device d011
...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 111
Region 0: Memory at 600000000 (64-bit, prefetchable) [size=16K]
Region 2: Memory at 600008000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at 600004000 (64-bit, prefetchable) [size=16K]
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [f8] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [110 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [200 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0

We are doing great !

Before we dive into the remaining two milestones, it is important to get an overview of the Hailo AI SW Suite, and determine which version we will be integrating.

Hailo AI Software Suite Overview

Hailo’s AI Software Suite allows users to deploy models to the Hailo AI accelerators.

image
Hailo AI Software Suite — Workflow (Camera: Hailo)

In addition to the Hailo AI accelerator devices, Hailo offers a scalable range of PCIe Gen 3.0 compatible M.2 AI accelerator modules:

image
Hailo AI Acceleration Modules (Camera: AlbertaBeef)

This project will only cover the following Hailo AI acceleration modules:

  • Hailo-8 : M.2 M Key (PCIe Gen 3.0, 4 lanes), 26 TOPS
  • Hailo-8 : M.2 B+M Key (PCIe Gen 3.0, 2 lanes), 26 TOPS

The Hailo AI Software Suite supports the following frameworks:

  • TensorFlow Lite
  • ONNX

Hailo chose TensorFlow Lite, not because of its popular use for “reduced set of instructions and quantized” models, but rather because it is a “more stable” exportable format that also supports full floating-point models.

Other frameworks are indirectly supported by exporting to the TF-Lite or ONNX formats.

The deployment involves the following tasks:

  • Model Parsing
  • Model Optimization & Resource Allocation
  • Model Compilation

The Model Parsing task translates models from industry-standard frameworks to Hailo executable format (HAR). It allows the user to identify unsupported layers, or sequence of layers in the model that are not supported by the compiler. This step is crucial when training our own custom model, since we can adapt the model architecture to use layers that are supported by the target compiler prior to training, thus saving hours (or days) in our deployment flow.

The Model Optimization and Resource Allocation tasks convert the model to the internal representation, using state of the art quantization, then allocates this internal representation to available resources in the Hailo AI accelerator. In order to perform this analysis and conversion, a sub-set of the training dataset is required. The size of the required calibration data is typically in the order of several 1000s of samples.

The Model Compilation task converts the quantized model to micro-code that can be run on the Hailo AI acceleration device.

Current versions of the Hailo AI SW Suite are listed in the following table, which can be found on the Hailo Developer Zone:

image
Hailo AI Software Suite — Release versions compatibility (Camera: Hailo)

Choosing a Hailo AI SW Suite Version

In a previous project, I integrated the Hailo embedded stack in a 2022.2 petalinux project using the meta-hailo yocto recipes

  • Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

In that version, the integration of the Hailo embedded software went really well, since we had alignment of the yocto versions between Petalinux 2022.2 and HailoRT 4.15

  • Petalinux 2022.2 : yocto = honister, kernel = 5.15
  • HailoRT 4.19 : yocto = honister, kernel 5.14

With Petalinux 2023.2, however, we do not have an alignment of the yocto versions with HailoRT 4.19:

  • Petalinux 2023.2 : yocto = landdale, kernel = 6.1
  • HailoRT 4.19 : yocto = kirstone or mickledore, kernel = 6.1

The following yocto release diagram illustrates this alignment and mis-alignment for various versions of Petalinux versus HailoRT :

image
Yocto roadmap — Petalinux versus HailoRT (Camera: AlbertaBeef)

For the reasons above, I have decided to attempt to integrate the latest 2024–10 version of Hailo AI SW Suite, which includes HailoRT 4.19:

image
Hailo AI Software Suite — Selected version (Camera: Hailo)

The following two branches (mickledore, kirkstone) of the meta-hailo recipe were integrated into our repository as github sub-modules in the “common” directory:

image
Adding the “kirkstone” and “mickedore” branches of the “meta-hailo” recipes (Camera: AlbertaBeef)

Milestone 2 — Hailo-8 detected by driver and runtime

image
Hailo-8 Integration — Milestone 2 (Camera: AlbertaBeef)

In order to integrate the Hailo driver and run-time, I chose to use the latest “mickledore” version of the yocto recipes.

For the Hailo driver and firmware, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

image
Adding support for the Hailo driver and firmware recipes (Camera: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-accelerator/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-accelerator = "core"
LAYERSERIES_COMPAT_meta-hailo-accelerator = "mickledore langdale"

Similarly, for the Hailo run-time APIs, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

image
Adding support for the Hailo run-time API recipes (Camera: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-libhailort/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-libhailort = "core"
LAYERSERIES_COMPAT_meta-hailo-libhailort = "mickledore langdale"

With these recipes in place, we can integrate them into our petalinux project by adding the following lines to our user-rootfsconfig file:

project-spec/meta-user/conf/user-rootfsconfig

...
CONFIG_hailo-pci
CONFIG_hailo-firmware

CONFIG_hailortcli
CONFIG_libhailort
CONFIG_pyhailort
CONFIG_libgsthailo

With these packages enabled, we can rebuild our petalinux project, boot our new images on the hardware target, and test the driver and run-time APIs.

You will already have noticed the following new content being generated specific to the Hailo-8 driver during linux boot.

root@zub1cg-sbc-2023-2:~# dmesg | grep hailo
[ 8.162743] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11632
[ 8.162801] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[ 8.162813] hailo 0000:01:00.0: Probing: Device enabled
[ 8.162875] hailo 0000:01:00.0: Probing: mapped bar 0 - 00000000321ef596 16384
[ 8.162889] hailo 0000:01:00.0: Probing: mapped bar 2 - 00000000465a8748 4096
[ 8.162904] hailo 0000:01:00.0: Probing: mapped bar 4 - 000000007855d716 16384
[ 8.162921] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[ 8.162965] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[ 8.162973] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[ 8.162981] hailo 0000:01:00.0: Disabling ASPM L0s
[ 8.162997] hailo 0000:01:00.0: Successfully disabled ASPM L0s
[ 8.163204] hailo 0000:01:00.0: Writing file hailo/hailo8_fw.bin
[ 8.248885] hailo 0000:01:00.0: File hailo/hailo8_fw.bin written successfully
[ 8.248920] hailo 0000:01:00.0: Writing file hailo/hailo8_board_cfg.bin
[ 8.250291] Failed to write file hailo/hailo8_board_cfg.bin
[ 8.250305] hailo 0000:01:00.0: File hailo/hailo8_board_cfg.bin written successfully
[ 8.250312] hailo 0000:01:00.0: Writing file hailo/hailo8_fw_cfg.bin
[ 8.250353] Failed to write file hailo/hailo8_fw_cfg.bin
[ 8.250358] hailo 0000:01:00.0: File hailo/hailo8_fw_cfg.bin written successfully
[ 8.348872] hailo 0000:01:00.0: Firmware loaded successfully
[ 8.401807] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0
root@zub1cg-sbc-2023-2:~#

The presence of the Hailo-8 driver can be confirmed with the “lsmod” command.

root@zub1cg-sbc-2023-2:~# lsmod
Module Size Used by
...
hailo_pci 77824 0
...

Also, the “lspci” command will output information regarding the PCI devices, and their kernel drivers:

root@zub1cg-sbc-2023-2:~# lspci
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

root@zub1cg-sbc-2023-2:~# lspci -k
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Kernel driver in use: hailo
Kernel modules: hailo_pci

root@zub1cg-sbc-2023-2:~# lspci -v
00:00.0 Bridge: Xilinx Corporation Device d011
...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Flags: bus master, fast devsel, latency 0, IRQ 61
Memory at 600000000 (64-bit, prefetchable) [size=16K]
Memory at 600008000 (64-bit, prefetchable) [size=4K]
Memory at 600004000 (64-bit, prefetchable) [size=16K]
Capabilities: [80] Express Endpoint, MSI 00
Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [f8] Power Management version 3
Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108] Latency Tolerance Reporting
Capabilities: [110] L1 PM Substates
Capabilities: [128] Alternative Routing-ID Interpretation (ARI)
Capabilities: [200] Advanced Error Reporting
Capabilities: [300] Secondary PCI Express
Kernel driver in use: hailo
Kernel modules: hailo_pci

root@zub1cg-sbc-2023-2:~# lspci -vv
00:00.0 Bridge: Xilinx Corporation Device d011
...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 61
Region 0: Memory at 600000000 (64-bit, prefetchable) [size=16K]
Region 2: Memory at 600008000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at 600004000 (64-bit, prefetchable) [size=16K]
Capabilities: [80] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [f8] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [110 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [200 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Kernel driver in use: hailo
Kernel modules: hailo_pci

We can verify the Hailo-8 run-time with the “hailortcli” command, as shown below:

root@zub1cg-sbc-2023-2:~# hailortcli fw-control identify
Executing on device: 0000:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.19.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: HLLWMB0214600101
Part Number: HM218B1C2LA
Product Name: HAILO-8 AI ACCELERATOR M.2 B+M KEY MODULE

We have successfully detected the Hailo-8 AI Accelerator M.2 B+M Key module !

We can already run some benchmarks with the run-time.

First download some pre-compile model from the Hailo model zoo:

  • https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/PUBLIC_MODELS.rst

If your embeded hardware is connected to the internet, this can be done directory on the embedded platform with the “wget” utility:

wget https://hailo-tappas.s3.eu-west-2.amazonaws.com/v3.30/general/hefs/resnet_v1_50.hef

Then run some benchmarks with the “hailortcli” utility:

root@zub1cg-sbc-2023-2:~# hailortcli benchmark resnet_v1_50.hef
Starting Measurements...
Measuring FPS in HW-only mode
Network resnet_v1_50/resnet_v1_50: 100% | 20573 | FPS: 1370.97 | ETA: 00:00:00
Measuring FPS (and Power on supported platforms) in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network resnet_v1_50/resnet_v1_50: 100% | 20365 | FPS: 1357.09 | ETA: 00:00:00
Measuring HW Latency
Network resnet_v1_50/resnet_v1_50: 100% | 4559 | HW Latency: 3.09 ms | ETA: 00:00:00

=======
Summary
=======
FPS (hw_only) = 1370.99
(streaming) = 1357.1
Latency (hw) = 3.09044 ms
Device 0000:01:00.0:
Power in streaming mode (average) = 4.02865 W
(max) = 4.06264 W

The easiest way to validate the “pyhailort” package is to import “hailo_platform” within python, as shown below:

python3

>>> import hailo_platform
>>> print(hailo_platform.__version__)
4.19.0
>>> exit()

Another (longer) way to validate the “pyhailort” package is to install and run the “blaze_app_python” application which provides Hailo-8 accelerated versions of the mediapipe models:

  • Accelerating the MediaPipe models with Hailo-8
  • https://github.com/AlbertaBeef/blaze_app_python

Milestone 3 — Hailo-8 working with TAPPAS

image
Hailo-8 Integration — Milestone 3 (Camera: AlbertaBeef)

In order to integrate the Hailo TAPPAS, I chose to use the previous “kirkstone” version of the yocto recipes, since the “mickledore” branch did not contain these recipes.

For the TAPPAS, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

image
Adding support for the Hailo tappas (Camera: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-tappas/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-tappas = "core meta-hailo-libhailort"
LAYERSERIES_COMPAT_meta-hailo-tappas = "kirkstone langdale"

One of the included recipes “tappas-apps” attempts to discover if we are targetting the Hailo-8 or Hailo-15 device. Since this has been implemented for an IMX8 target, this discovery will not work for our targets. We need to modify the following recipe to properly handle this:

project-spec/meta-hailo/meta-hailo-tappas/recipes-gstreamer/tappas-apps/tappas-apps_3.30.0.bb

...
IMX8_DIR = "${APPS_DIR_PREFIX}/h8/gstreamer/imx8/"
#HAILO15_DIR = "${APPS_DIR_PREFIX}/h15/gstreamer/"

REQS_PATH = "${FILE_DIRNAME}/files/"
REQS_IMX8_FILE = "${REQS_PATH}download_reqs_imx8.txt"
#REQS_HAILO15_FILE = "${REQS_PATH}download_reqs_hailo15.txt"

REQS_FILE = ""
ARM_APPS_DIR = ""
python () {
#if 'imx8' in d.getVar('MACHINE'):
d.setVar('REQS_FILE', d.getVar('REQS_IMX8_FILE'))
d.setVar('ARM_APPS_DIR', d.getVar('IMX8_DIR'))
#else:
# d.setVar('REQS_FILE', d.getVar('REQS_HAILO15_FILE'))
# d.setVar('ARM_APPS_DIR', d.getVar('HAILO15_DIR'))
# d.appendVar('DEPENDS', " libmedialib-api xtensor")
}

#IS_H15 = "${@ 'true' if 'hailo15' in d.getVar('MACHINE') else 'false'}"
IS_H15 = "false"
...

With these recipes in place, we can integrate them into our petalinux project by adding the following lines to our user-rootfsconfig file:

project-spec/meta-user/conf/user-rootfsconfig

...
CONFIG_libgsthailotools
CONFIG_hailo-post-processes
CONFIG_tappas-apps

With these packages enabled, we can rebuild our petalinux project, boot our new images on the hardware target, and test the TAPPAS.

The presence of the gstreamer plug-ins from Hailo can be validated as follows:

root@zub1cg-sbc-2023-2:~# gst-inspect-1.0 hailotools
Plugin Details:
Name hailotools
Description hailo tools plugin
Filename /usr/lib/gstreamer-1.0/libgsthailotools.so
Version 3.30.0
License unknown
Source module gst-hailo-tools
Binary package gst-hailo-tools
Origin URL https://hailo.ai/

hailoaggregator: hailoaggregator - Cascading
hailocounter: hailocounter - postprocessing element
hailocropper: hailocropper
hailoexportfile: hailoexportfile - export element
hailoexportzmq: hailoexportzmq - export element
hailofilter: hailofilter - postprocessing element
hailogallery: Hailo gallery element
hailograytonv12: hailograytonv12 - postprocessing element
hailoimportzmq: hailoimportzmq - import element
hailomuxer: Muxer pipeline merging
hailonv12togray: hailonv12togray - postprocessing element
hailonvalve: HailoNValve element
hailooverlay: hailooverlay - overlay element
hailoroundrobin: Input Round Robin element
hailostreamrouter: Hailo Stream Router
hailotileaggregator: hailotileaggregator
hailotilecropper: hailotilecropper - Tiling
hailotracker: Hailo object tracking element

18 features:
+-- 18 elements

root@zub1cg-sbc-2023-2:~# gst-inspect-1.0 hailo
Plugin Details:
Name hailo
Description hailo gstreamer plugin
Filename /usr/lib/gstreamer-1.0/libgsthailo.so
Version 1.0
License unknown
Source module hailo
Binary package GStreamer
Origin URL http://gstreamer.net/

hailodevicestats: hailodevicestats element
hailonet: hailonet element
synchailonet: sync hailonet element

3 features:
+-- 3 elements

root@zub1cg-sbc-2023-2:~#

We may want to revisit the examples created in the home “apps” directory, to modify them for our targetted Zynq UltraScale+ device.

Known Issues

The current version of this project has the following known issues:

  • The examples in the “apps” directory need to be modified to run on the Zynq-UltraScale+ targets.

Conclusion

I hope this tutorial helped to understand how to add Hailo-8 functionality to your custom platform.

If you would like to have the pre-built SD card image for this project, please let me know in the comments below.

Revision History

2023/11/18 — Preliminary Version

  • Sign in to reply
  • nauman
    nauman 2 months ago in reply to albertabeef

    hi, yes i get the similar output. Problem is with my SSD inserted i get this

    ":lspci
    00:00.0 PCI bridge: Xilinx Corporation Device d011
    01:00.0 Non-Volatile memory controller: Micron Technology Inc Device 5413 (rev 03)"

    which shows that my PCIe is working correctly but with Hailo i just get  Xilinx Corporation Device d011 output. I also get the link is down message. Complete pcie log when hailo is inserted is this

    "[ 2.355723] nwl-pcie fd0e0000.pcie: host bridge /axi/pcie@fd0e0000 ranges:
    [ 2.362633] nwl-pcie fd0e0000.pcie: MEM 0x00e0000000..0x00efffffff -> 0x00e0000000
    [ 2.370641] nwl-pcie fd0e0000.pcie: MEM 0x0600000000..0x07ffffffff -> 0x0600000000
    [ 2.378756] nwl-pcie fd0e0000.pcie: Link is DOWN
    [ 2.383606] nwl-pcie fd0e0000.pcie: PCI host bridge to bus 0000:00
    [ 2.389784] pci_bus 0000:00: root bus resource [bus 00-ff]
    [ 2.395266] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff]
    [ 2.402142] pci_bus 0000:00: root bus resource [mem 0x600000000-0x7ffffffff pref]
    [ 2.409648] pci 0000:00:00.0: [10ee:d011] type 01 class 0x060400
    [ 2.415666] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff]
    [ 2.421988] pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot
    [ 2.429976] pci 0000:00:00.0: BAR 0: assigned [mem 0xe0000000-0xe00fffff]
    [ 2.436773] pci 0000:00:00.0: PCI bridge to [bus 01-0c]" 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • albertabeef
    albertabeef 2 months ago in reply to nauman

    The U96V2 does not have the capability to add an M.2 module.

    Which platform are you targetting ?
    Can you share the output of the "lspci" on your platform ? 
    Even without the Hailo plugged-in, you should see an item for the PCIe controller.
    Should look something like:

    00:00.0 Bridge: Xilinx Corporation Device d011
    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • nauman
    nauman 2 months ago

    Hi albertabeef, great article, help me a lot. I observed that in here you did not use Ultra96-V2 for Hailo, was there any complication in integrating hailo with this board? I am currently working on similar board which uses ZU3EG. I have integrated the hailo driver using your article but I am not able to see the hailo on this board with pcie enable. My pcie port is working as i can detect the SSD with it but not the hailo. 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
  • cstanton
    cstanton 10 months ago

    Hi albertabeef , looks like the first image you have embedded doesn't quite load - would you mind manually re-uploading and re-inserting it into the blog?

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube