element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Artificial Intelligence and Machine Learning
  • Technologies
  • More
Artificial Intelligence and Machine Learning
Blog Blazing Fast Models
  • Blog
  • Forum
  • Documents
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Artificial Intelligence and Machine Learning to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: albertabeef
  • Date Created: 4 Aug 2024 2:27 PM Date Created
  • Views 277 views
  • Likes 6 likes
  • Comments 0 comments
  • palm detection
  • mediapipe
  • object detection
  • artificial intelligence
  • landmark detection
  • hand landmarks
  • machine learning
Related
Recommended

Blazing Fast Models

albertabeef
albertabeef
4 Aug 2024
Blazing Fast Models

An exploration of the mediapipe models, and how they are still relevant today.

Introduction

This is the first project of a series on the subject of deploying the MediaPipe models to the edge on embedded platforms.

I will start by giving a general overview of the MediaPipe models, and some of the challenges that can be expected when running on embedded hardware.
Then I will dive into the architecture of its most popular models (hand/face/pose detection and landmarks).
Finally, I will introduce an custom python application that I created as a tool for running, debugging, and profiling alternate implementations of these models.

Status of the MediaPipe models

MediaPipe is a framework created by Google that implements detection for the following applications:

  • palm detection and hand landmarks
  • face detection and landmarks
  • pose detection and landmarks
  • etc…
image
MediaPipe models (Camera: Google)
The MediaPipe framework supports/requires the following tools:
  • TF-Lite (models)
  • Bazel (build tools)

The first public versions of the MediaPipe models appeared in 2019, which can be considered a very long time ago with respect to AI/ML’s rapid evolution.

The latest version of the MediaPipe framework, as of this writing, is v0.10.xx, which is still considered a development version.

Since its first public release, Google has made continual improvements to its MediaPipe framework.

One of the most notable improvements that was documented in 2021 was the improvement in hand landmarks for individual fingers, especially important for sign language applications.

image

MediaPipe hand landmarks model improvements (Video camera: Google)

In the latest version of the framework (v0.10.14), Google introduced a new MediaPipe LLM Inference API enabling select large language models (Gemma 2B, Falcon 1B, Phi 2, and Stable LM 3B).

It is interesting that Google have never published a 1.00 version, which would have been considered stable … most probably because they are continually improving the models and framework.

Challenges with the MediaPipe models

If using the MediaPipe framework on a modern computer, all that is required is a “pip install mediapipe”, and everything works out of the box !

The requirement to re-build the MediaPipe framework for a custom target, however, may be a challenge if not familiar with the Bazel build tools.

Re-training the models is not possible, since the dataset(s) used to train these models was never published by Google.

Deployment to specialized targets, which usually requires a sub-set of the training dataset presents the challenge of coming up with this data yourself.

Furthermore, many targets do not support the TF-Lite framework, so the models need to be converted to an alternate framework prior to deployment.

One example of this is OpenVINO, where the models are first converted to TensorFlow2 using PINTO’s model conversion utilities, before being fed into the OpenVINO framework.

Another example I ran into was deploying to Vitis-AI, where I had to work with models that had been converted to Pytorch. But this is a subject for another project …

Architecture Overview

It should be clear by now that these pipelines are implemented with two models, and can be called multi-inference pipelines. If we take the example of the palm detector and hand landmarks, we have the following dual-inference pipeline:

image
Dual-Inference Pipeline : Palm Detection + Hand Landmarks (Camera: AlbertaBeef)

The final hand landmarks are illustrated in the figures below:

image
Hand Landmarks (Camera: Google)

Re-Inventing the wheel with python

Before attempting to deploy the MediaPipe models to embedded platforms, I took upon myself to create a purely python implementation of the pre-processing and post-processing, and explicitly running inference of the individual TF-Lite models.

  • https://github.com/AlbertaBeef/blaze_app_python
image
blaze_app_python - Example Output (Video camera: AlbertaBeef)

Take the time to appreciate the details captured in the above video:

  • the top three windows illustrate overall detection and landmarks for hand, face, and pose
  • the bottom three “debug” windows illustrate the regions of interest (ROIs) that are extracted by the detector model, for the landmark model

The ROIs are extracted such that the hands, face, and even human pose are presented in the same orientation and size to the landmark model.

Notice for the case of the pose models, how the cropping is selected to include the entire body, even when it is not visible.

The following diagram illustrates the main components of the "blaze_app" python application, as well as the main components of the dual-inference pipelines. Although the "extract ROI" is part of the Landmarks pre-processing, it is identified separately for profiling reasons.
image
blaze_app_python - Block Diagram (Camera: AlbertaBeef)

The "blaze_app" python application provides a common base for performing the following experiments during my exploration:

  • Compare output from different versions of same models (0.07 versus 0.10)
  • Compare output from alternate versions of the models (PyTorch, …) with the reference versions (TFLite)
  • Identify and Profile latency of each component (model inference time, pre-processing, post-processing, etc…)
  • View intermediate results (ROIs, detection scores wrt threshold, etc…)

How Fast is Fast ?

On modern computers, the MediaPipe framework runs VERY fast !

On embedded platforms, however, things quickly break down, resulting in one frame per second in some cases …

The following benchmarks were gathered with my single-threaded python-only implementation in order identify where time is being spent for different components on various platforms:

image
Palm Detection + Hand Landmarks pipeline (v0.10 lite) — benchmarks on various platforms (Camera: AlbertaBeef)
image
Palm Detection + Hand Landmarks pipeline (v0.07) — benchmarks on various platforms (Camera: AlbertaBeef)

On an embedded ARM A63 processing, the Palm Detection and Hand Landmark models run 3x slower than a modern laptop, and 20x slower than a modern workstation !

This is a case, where the models cannot be used out of the box, they need to be further accelerated. But once again, this is a subject for another project …

Going Further

For a more in depth overview of the dual-inference pipeline architecture and instructions on installing the python application, refer to my full write-up on Hackster:

  • [Hackster] Blazing Fast Models

I hope this project will inspire you to implement your own custom application.

Acknowledgements

I want to thank Google (https://www.hackster.io/google) for making the following available publicly:

  • MediaPipe

References

  • [Google] MediaPipe Solutions Guide : https://ai.google.dev/edge/mediapipe/solutions/guide
  • [Google] MediaPipe Source Code : https://github.com/google-ai-edge/mediapipe
  • [Google] SignALL SDK : https://developers.googleblog.com/en/signall-sdk-sign-language-interface-using-mediapipe-is-now-available-for-developers/
  • [AlbertaBeef] blaze_app_python : https://github.com/AlbertaBeef/blaze_app_python
  • [Hackster] Blazing Fast Models

  • Sign in to reply
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube