Tria Vitis Platforms — Adding support for Hailo-8

18 Nov 2024

Tria Vitis Platforms — Adding support for Hailo-8

Introduction

This project is part 4 of a 5 part series of projects, where we will progressively create AI enabled platforms for the following Tria development boards:

ZUBoard
Ultra96-V2
UltraZed-7EV

These projects can be rebuilt using the source code on github.com:

github.com/AlbertaBeef/tria-vitis-platforms/tree/2023.2

The following series of Hackster projects describe how the above github repository was created and serves as documentation:

Part 1 : Tria Vitis Platforms — Building the Foundational Designs
Part 2 : Tria Vitis Platforms — Creating a Common Platform
Part 3 : Tria Vitis Platforms — Adding support for Vitis-AI
Part 4 : Tria Vitis Platforms — Adding support for Hailo-8
Part 5 : Tria Vitis Platforms — Adding support for ROS2

The motivation of this series of projects is to enable users to create their own custom AI applications.

Introduction Part IV

In the previous projects ( Part 2, Part 3 ), we created Vitis overlays that were augmented with the PL-based DPU engine. We saw that the DPU was a scalable engine, adapting to the available PL resources, but at the expense of reduced performance, as shown in the following table:

Vitis-AI PL-based DPU — Performance Comparison (: AlbertaBeef)

In this project, we propose to add an AI engine as an external device : the Hailo-8 Acceleration module. This will offer a constant 26 TOPS (peak) of performance, regardless of what logic resources are available in the PL.

In order to integrate the Hailo embedded driver, run-time, and TAPPAS, I refer to the following three milestones:

Milestone 1 — Hailo-8 detected on PCI express bus
Milestone 2 — Hailo-8 detected by driver and runtime
Milestone 3 — Hailo-8 working with TAPPAS

Hailo-8 Integration Milestones (: AlbertaBeef)

Milestone 1 — Hailo-8 detected on PCI express bus

Hailo-8 Integration — Milestone 1 (: AlbertaBeef)

In order to achieve the first milestone, we need to attach the Hailo-8 acceleration module(s) to our hardware targets, using the PCIe enabled designs (apps):

ZUBoard : tria-zub1cg-base, tria-zub1cg-dualcam
UltraZed-EV : tria-uz7ev-nvme

For the ZUBoard, we will be using the M.2 HSIO to attach the Hailo-8 B+M Key module:

ZUBoard + M.2 HSIO + Hailo-8 (: AlbertaBeef)

Hailo-8 mounted on ZUBoard via M.2 HSIO (: AlbertaBeef)

For the UltraZed-EV, we will be using the Opsero M.2 Stack FMC to attach the Hailo-8 M-Key module:

UltraZed-EV+ M.2 Stack FMC + Hailo-8 (: AlbertaBeef)

Hailo-8 mounted on Opsero M.2 Stack FMC (: AlbertaBeef)

After booting our target hardware, we want to load our PCIe enabled designs, and query the PCIe bus with the “lspci” utility

On the UltraZed-EV

root@uz7ev-evcc-2023-2:~# xmutil loadapp tria-uz7ev-nvme
...
tria-uz7ev-nvme: loaded to slot 0

root@uz7ev-evcc-2023-2:~# lspci
0000:00:00.0 PCI bridge: Xilinx Corporation Device d021
0001:00:00.0 PCI bridge: Xilinx Corporation Device 9132
0001:01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

On the ZUBoard

root@zub1cg-sbc-2023-2:~# xmutil load tria-zub1cg-dualcam
...
tria-zub1cg-dualcam: loaded to slot 0

root@zub1cg-sbc-2023-2:~# lspci
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

Use the “lspci -vv” variant of the command to list the details of the Hailo-8 AI Processor.

root@zub1cg-sbc-2023-2:~# lspci -vv
00:00.0 Bridge: Xilinx Corporation Device d011
        ...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
        Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 111
        Region 0: Memory at 600000000 (64-bit, prefetchable) [size=16K]
        Region 2: Memory at 600008000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at 600004000 (64-bit, prefetchable) [size=16K]
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [f8] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [200 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [300 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0

We are doing great !

Before we dive into the remaining two milestones, it is important to get an overview of the Hailo AI SW Suite, and determine which version we will be integrating.

Hailo AI Software Suite Overview

Hailo’s AI Software Suite allows users to deploy models to the Hailo AI accelerators.

Hailo AI Software Suite — Workflow (: Hailo)

In addition to the Hailo AI accelerator devices, Hailo offers a scalable range of PCIe Gen 3.0 compatible M.2 AI accelerator modules:

Hailo AI Acceleration Modules (: AlbertaBeef)

This project will only cover the following Hailo AI acceleration modules:

Hailo-8 : M.2 M Key (PCIe Gen 3.0, 4 lanes), 26 TOPS
Hailo-8 : M.2 B+M Key (PCIe Gen 3.0, 2 lanes), 26 TOPS

The Hailo AI Software Suite supports the following frameworks:

TensorFlow Lite
ONNX

Hailo chose TensorFlow Lite, not because of its popular use for “reduced set of instructions and quantized” models, but rather because it is a “more stable” exportable format that also supports full floating-point models.

Other frameworks are indirectly supported by exporting to the TF-Lite or ONNX formats.

The deployment involves the following tasks:

Model Parsing
Model Optimization & Resource Allocation
Model Compilation

The Model Parsing task translates models from industry-standard frameworks to Hailo executable format (HAR). It allows the user to identify unsupported layers, or sequence of layers in the model that are not supported by the compiler. This step is crucial when training our own custom model, since we can adapt the model architecture to use layers that are supported by the target compiler prior to training, thus saving hours (or days) in our deployment flow.

The Model Optimization and Resource Allocation tasks convert the model to the internal representation, using state of the art quantization, then allocates this internal representation to available resources in the Hailo AI accelerator. In order to perform this analysis and conversion, a sub-set of the training dataset is required. The size of the required calibration data is typically in the order of several 1000s of samples.

The Model Compilation task converts the quantized model to micro-code that can be run on the Hailo AI acceleration device.

Current versions of the Hailo AI SW Suite are listed in the following table, which can be found on the Hailo Developer Zone:

Hailo AI Software Suite — Release versions compatibility (: Hailo)

Choosing a Hailo AI SW Suite Version

In a previous project, I integrated the Hailo embedded stack in a 2022.2 petalinux project using the meta-hailo yocto recipes

Supercharge Your ZUBoard with the Hailo-8 AI Accelerator

In that version, the integration of the Hailo embedded software went really well, since we had alignment of the yocto versions between Petalinux 2022.2 and HailoRT 4.15

Petalinux 2022.2 : yocto = honister, kernel = 5.15
HailoRT 4.19 : yocto = honister, kernel 5.14

With Petalinux 2023.2, however, we do not have an alignment of the yocto versions with HailoRT 4.19:

Petalinux 2023.2 : yocto = landdale, kernel = 6.1
HailoRT 4.19 : yocto = kirstone or mickledore, kernel = 6.1

The following yocto release diagram illustrates this alignment and mis-alignment for various versions of Petalinux versus HailoRT :

Yocto roadmap — Petalinux versus HailoRT (: AlbertaBeef)

For the reasons above, I have decided to attempt to integrate the latest 2024–10 version of Hailo AI SW Suite, which includes HailoRT 4.19:

Hailo AI Software Suite — Selected version (: Hailo)

The following two branches (mickledore, kirkstone) of the meta-hailo recipe were integrated into our repository as github sub-modules in the “common” directory:

Adding the “kirkstone” and “mickedore” branches of the “meta-hailo” recipes (: AlbertaBeef)

Milestone 2 — Hailo-8 detected by driver and runtime

Hailo-8 Integration — Milestone 2 (: AlbertaBeef)

In order to integrate the Hailo driver and run-time, I chose to use the latest “mickledore” version of the yocto recipes.

For the Hailo driver and firmware, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

Adding support for the Hailo driver and firmware recipes (: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-accelerator/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-accelerator = "core"
LAYERSERIES_COMPAT_meta-hailo-accelerator = "mickledore langdale"

Similarly, for the Hailo run-time APIs, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

Adding support for the Hailo run-time API recipes (: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-libhailort/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-libhailort = "core"
LAYERSERIES_COMPAT_meta-hailo-libhailort = "mickledore langdale"

With these recipes in place, we can integrate them into our petalinux project by adding the following lines to our user-rootfsconfig file:

project-spec/meta-user/conf/user-rootfsconfig

...
CONFIG_hailo-pci
CONFIG_hailo-firmware

CONFIG_hailortcli
CONFIG_libhailort
CONFIG_pyhailort
CONFIG_libgsthailo

With these packages enabled, we can rebuild our petalinux project, boot our new images on the hardware target, and test the driver and run-time APIs.

You will already have noticed the following new content being generated specific to the Hailo-8 driver during linux boot.

root@zub1cg-sbc-2023-2:~# dmesg | grep hailo
[    8.162743] hailo 0000:01:00.0: Probing: Allocate memory for device extension, 11632
[    8.162801] hailo 0000:01:00.0: enabling device (0000 -> 0002)
[    8.162813] hailo 0000:01:00.0: Probing: Device enabled
[    8.162875] hailo 0000:01:00.0: Probing: mapped bar 0 - 00000000321ef596 16384
[    8.162889] hailo 0000:01:00.0: Probing: mapped bar 2 - 00000000465a8748 4096
[    8.162904] hailo 0000:01:00.0: Probing: mapped bar 4 - 000000007855d716 16384
[    8.162921] hailo 0000:01:00.0: Probing: Setting max_desc_page_size to 4096, (page_size=4096)
[    8.162965] hailo 0000:01:00.0: Probing: Enabled 64 bit dma
[    8.162973] hailo 0000:01:00.0: Probing: Using userspace allocated vdma buffers
[    8.162981] hailo 0000:01:00.0: Disabling ASPM L0s
[    8.162997] hailo 0000:01:00.0: Successfully disabled ASPM L0s
[    8.163204] hailo 0000:01:00.0: Writing file hailo/hailo8_fw.bin
[    8.248885] hailo 0000:01:00.0: File hailo/hailo8_fw.bin written successfully
[    8.248920] hailo 0000:01:00.0: Writing file hailo/hailo8_board_cfg.bin
[    8.250291] Failed to write file hailo/hailo8_board_cfg.bin
[    8.250305] hailo 0000:01:00.0: File hailo/hailo8_board_cfg.bin written successfully
[    8.250312] hailo 0000:01:00.0: Writing file hailo/hailo8_fw_cfg.bin
[    8.250353] Failed to write file hailo/hailo8_fw_cfg.bin
[    8.250358] hailo 0000:01:00.0: File hailo/hailo8_fw_cfg.bin written successfully
[    8.348872] hailo 0000:01:00.0: Firmware loaded successfully
[    8.401807] hailo 0000:01:00.0: Probing: Added board 1e60-2864, /dev/hailo0
root@zub1cg-sbc-2023-2:~#

The presence of the Hailo-8 driver can be confirmed with the “lsmod” command.

root@zub1cg-sbc-2023-2:~# lsmod
Module                  Size  Used by
...
hailo_pci              77824  0
...

Also, the “lspci” command will output information regarding the PCI devices, and their kernel drivers:

root@zub1cg-sbc-2023-2:~# lspci
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)

root@zub1cg-sbc-2023-2:~# lspci -k
00:00.0 Bridge: Xilinx Corporation Device d011
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
        Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
        Kernel driver in use: hailo
        Kernel modules: hailo_pci

root@zub1cg-sbc-2023-2:~# lspci -v
00:00.0 Bridge: Xilinx Corporation Device d011
        ...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
        Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
        Flags: bus master, fast devsel, latency 0, IRQ 61
        Memory at 600000000 (64-bit, prefetchable) [size=16K]
        Memory at 600008000 (64-bit, prefetchable) [size=4K]
        Memory at 600004000 (64-bit, prefetchable) [size=16K]
        Capabilities: [80] Express Endpoint, MSI 00
        Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [f8] Power Management version 3
        Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108] Latency Tolerance Reporting
        Capabilities: [110] L1 PM Substates
        Capabilities: [128] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [300] Secondary PCI Express
        Kernel driver in use: hailo
        Kernel modules: hailo_pci

root@zub1cg-sbc-2023-2:~# lspci -vv
00:00.0 Bridge: Xilinx Corporation Device d011
        ...
01:00.0 Co-processor: Hailo Technologies Ltd. Hailo-8 AI Processor (rev 01)
        Subsystem: Hailo Technologies Ltd. Hailo-8 AI Processor
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 61
        Region 0: Memory at 600000000 (64-bit, prefetchable) [size=16K]
        Region 2: Memory at 600008000 (64-bit, prefetchable) [size=4K]
        Region 4: Memory at 600004000 (64-bit, prefetchable) [size=16K]
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <1us, L1 <2us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [e0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [f8] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
        Capabilities: [108 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [110 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=0us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [128 v1] Alternative Routing-ID Interpretation (ARI)
                ARICap: MFVC- ACS-, Next Function: 0
                ARICtl: MFVC- ACS-, Function Group: 0
        Capabilities: [200 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [300 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: hailo
        Kernel modules: hailo_pci

We can verify the Hailo-8 run-time with the “hailortcli” command, as shown below:

root@zub1cg-sbc-2023-2:~# hailortcli fw-control identify
Executing on device: 0000:01:00.0
Identifying board
Control Protocol Version: 2
Firmware Version: 4.19.0 (release,app,extended context switch buffer)
Logger Version: 0
Board Name: Hailo-8
Device Architecture: HAILO8
Serial Number: HLLWMB0214600101
Part Number: HM218B1C2LA
Product Name: HAILO-8 AI ACCELERATOR M.2 B+M KEY MODULE

We have successfully detected the Hailo-8 AI Accelerator M.2 B+M Key module !

We can already run some benchmarks with the run-time.

First download some pre-compile model from the Hailo model zoo:

https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/PUBLIC_MODELS.rst

If your embeded hardware is connected to the internet, this can be done directory on the embedded platform with the “wget” utility:

wget https://hailo-tappas.s3.eu-west-2.amazonaws.com/v3.30/general/hefs/resnet_v1_50.hef

Then run some benchmarks with the “hailortcli” utility:

root@zub1cg-sbc-2023-2:~# hailortcli benchmark resnet_v1_50.hef
Starting Measurements...
Measuring FPS in HW-only mode
Network resnet_v1_50/resnet_v1_50: 100% | 20573 | FPS: 1370.97 | ETA: 00:00:00
Measuring FPS (and Power on supported platforms) in streaming mode
[HailoRT] [warning] Using the overcurrent protection dvm for power measurement will disable the overcurrent protection.
If only taking one measurement, the protection will resume automatically.
If doing continuous measurement, to enable overcurrent protection again you have to stop the power measurement on this dvm.
Network resnet_v1_50/resnet_v1_50: 100% | 20365 | FPS: 1357.09 | ETA: 00:00:00
Measuring HW Latency
Network resnet_v1_50/resnet_v1_50: 100% | 4559 | HW Latency: 3.09 ms | ETA: 00:00:00

=======
Summary
=======
FPS     (hw_only)                 = 1370.99
        (streaming)               = 1357.1
Latency (hw)                      = 3.09044 ms
Device 0000:01:00.0:
  Power in streaming mode (average) = 4.02865 W
                          (max)     = 4.06264 W

The easiest way to validate the “pyhailort” package is to import “hailo_platform” within python, as shown below:

python3

>>> import hailo_platform
>>> print(hailo_platform.__version__)
4.19.0
>>> exit()

Another (longer) way to validate the “pyhailort” package is to install and run the “blaze_app_python” application which provides Hailo-8 accelerated versions of the mediapipe models:

Milestone 3 — Hailo-8 working with TAPPAS

Hailo-8 Integration — Milestone 3 (: AlbertaBeef)

In order to integrate the Hailo TAPPAS, I chose to use the previous “kirkstone” version of the yocto recipes, since the “mickledore” branch did not contain these recipes.

For the TAPPAS, I created symbolic links to the content, and added a modified version of the layer.conf file to add support for the “langdale” yocto version:

Adding support for the Hailo tappas (: AlbertaBeef)

project-spec/meta-hailo/meta-hailo-tappas/conf/layer.conf

...
LAYERDEPENDS_meta-hailo-tappas = "core meta-hailo-libhailort"
LAYERSERIES_COMPAT_meta-hailo-tappas = "kirkstone langdale"

One of the included recipes “tappas-apps” attempts to discover if we are targetting the Hailo-8 or Hailo-15 device. Since this has been implemented for an IMX8 target, this discovery will not work for our targets. We need to modify the following recipe to properly handle this:

project-spec/meta-hailo/meta-hailo-tappas/recipes-gstreamer/tappas-apps/tappas-apps_3.30.0.bb

...
IMX8_DIR = "${APPS_DIR_PREFIX}/h8/gstreamer/imx8/"
#HAILO15_DIR = "${APPS_DIR_PREFIX}/h15/gstreamer/"

REQS_PATH = "${FILE_DIRNAME}/files/"
REQS_IMX8_FILE = "${REQS_PATH}download_reqs_imx8.txt"
#REQS_HAILO15_FILE = "${REQS_PATH}download_reqs_hailo15.txt"

REQS_FILE = ""
ARM_APPS_DIR = ""
python () {
    #if 'imx8' in d.getVar('MACHINE'):
        d.setVar('REQS_FILE', d.getVar('REQS_IMX8_FILE'))
        d.setVar('ARM_APPS_DIR', d.getVar('IMX8_DIR'))
    #else:
    #    d.setVar('REQS_FILE', d.getVar('REQS_HAILO15_FILE'))
    #    d.setVar('ARM_APPS_DIR', d.getVar('HAILO15_DIR'))
    #    d.appendVar('DEPENDS', " libmedialib-api xtensor")
}

#IS_H15 = "${@ 'true' if 'hailo15' in d.getVar('MACHINE') else 'false'}"
IS_H15 = "false"
...

With these recipes in place, we can integrate them into our petalinux project by adding the following lines to our user-rootfsconfig file:

project-spec/meta-user/conf/user-rootfsconfig

...
CONFIG_libgsthailotools
CONFIG_hailo-post-processes
CONFIG_tappas-apps

With these packages enabled, we can rebuild our petalinux project, boot our new images on the hardware target, and test the TAPPAS.

The presence of the gstreamer plug-ins from Hailo can be validated as follows:

root@zub1cg-sbc-2023-2:~# gst-inspect-1.0 hailotools
Plugin Details:
Name                     hailotools
Description              hailo tools plugin
Filename                 /usr/lib/gstreamer-1.0/libgsthailotools.so
Version                  3.30.0
License                  unknown
Source module            gst-hailo-tools
Binary package           gst-hailo-tools
Origin URL               https://hailo.ai/

hailoaggregator: hailoaggregator - Cascading
hailocounter: hailocounter - postprocessing element
hailocropper: hailocropper
hailoexportfile: hailoexportfile - export element
hailoexportzmq: hailoexportzmq - export element
hailofilter: hailofilter - postprocessing element
hailogallery: Hailo gallery element
hailograytonv12: hailograytonv12 - postprocessing element
hailoimportzmq: hailoimportzmq - import element
hailomuxer: Muxer pipeline merging
hailonv12togray: hailonv12togray - postprocessing element
hailonvalve: HailoNValve element
hailooverlay: hailooverlay - overlay element
hailoroundrobin: Input Round Robin element
hailostreamrouter: Hailo Stream Router
hailotileaggregator: hailotileaggregator
hailotilecropper: hailotilecropper - Tiling
hailotracker: Hailo object tracking element

18 features:
+-- 18 elements

root@zub1cg-sbc-2023-2:~# gst-inspect-1.0 hailo
Plugin Details:
  Name                     hailo
  Description              hailo gstreamer plugin
  Filename                 /usr/lib/gstreamer-1.0/libgsthailo.so
  Version                  1.0
  License                  unknown
  Source module            hailo
  Binary package           GStreamer
  Origin URL               http://gstreamer.net/

  hailodevicestats: hailodevicestats element
  hailonet: hailonet element
  synchailonet: sync hailonet element

  3 features:
  +-- 3 elements

root@zub1cg-sbc-2023-2:~#

We may want to revisit the examples created in the home “apps” directory, to modify them for our targetted Zynq UltraScale+ device.

Known Issues

The current version of this project has the following known issues:

The examples in the “apps” directory need to be modified to run on the Zynq-UltraScale+ targets.

Conclusion

I hope this tutorial helped to understand how to add Hailo-8 functionality to your custom platform.

If you would like to have the pre-built SD card image for this project, please let me know in the comments below.

Revision History

2023/11/18 — Preliminary Version

cstanton 4 months ago

Hi albertabeef , looks like the first image you have embedded doesn't quite load - would you mind manually re-uploading and re-inserting it into the blog?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel