element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Artificial Intelligence and Machine Learning
  • Technologies
  • More
Artificial Intelligence and Machine Learning
Blog Insightful Datasets for ASL recognition
  • Blog
  • Forum
  • Documents
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Artificial Intelligence and Machine Learning to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: albertabeef
  • Date Created: 12 Aug 2024 1:44 PM Date Created
  • Views 836 views
  • Likes 3 likes
  • Comments 0 comments
  • kaggle
  • palm detection
  • mediapipe
  • object detection
  • Sign Language
  • artificial intelligence
  • datasets
  • landmark detection
  • hand landmarks
  • machine learning
Related
Recommended

Insightful Datasets for ASL recognition

albertabeef
albertabeef
12 Aug 2024
Insightful Datasets for ASL recognition

An exploration of the Kaggle datasets for ASL Recognition.

Introduction

This project provides a deep dive exploration into the Kaggle datasets provided by Google for their 2023 competitions related to ASL recognition:

  • Google — Isolated Sign Language Recognition
  • Google — American Sign Language Fingerspelling Recognition

Since an image is worth a thousand words, a video is certainly worth even more. To that end, I created viewers for these two datasets, as well as a similar viewer for a live USB camera feed. I chose to display the landmarks generated by the MediaPipe framework (holistic model, including pose, face, hands), as well as a time lapse of the most important landmarks.

image
MediaPipe holistic landmark viewer (Video camera: AlbertaBeef)

In the video, I have captured myself executing two styles of sign language:

  • sign language (ie. words and phrases) : THANK YOU
  • fingerspelling : A, V, N, E, T

The Kaggle datasets provide captured examples for each style. The latter is used for content that is not covered with the sign language itself, such as phone numbers, web URLs, etc…

The Competitions

In 2023, Google launched two open-source competitions on Kaggle, totaling $300,000 in prizes.

Google — Isolated Sign Language Recognition

  • https://www.kaggle.com/competitions/asl-signs/overview
  • Feb-May 2023
  • $100,000 Prize
image
[Kaggle] Isolated Sign Language Recognition (Camera: Kaggle)

Google — American Sign Language Fingerspelling Recognition

  • https://www.kaggle.com/competitions/asl-fingerspelling
  • May-Aug 2023
  • $200,000 Prize
image
[Kaggle] Fingerspelling Recognition (Camera: Kaggle)

It is very interesting to analyze the winning solution for the Fingerspelling Recognition competition:

  • https://www.kaggle.com/competitions/asl-fingerspelling/discussion/434485
image
[1st place solution] Improved Squeezeformer + TransformerDecoder + Clever augmentations (Camera: Kaggle)

The input to the solution are the following subset of 130 landmarks:

  • 21 key points from each hand
  • 6 pose key points from each arm
  • 76 from the face (lips, nose, eyes)

In other words, the output of the MediaPipe framework (holistic model), the hand, face, and pose landmarks are being used as input to the very complex problem of sign language.

A time-lapse of the landmarks are used to generate tokens, which are then used as input for natural language processing.

The Datasets

The datasets for these competitions are now public on the Kaggle platform.

Kaggle datasets can be directly downloaded from the website with a valid user account. Kaggle also provides programming API and access keys that can be used to download datasets programmatically, as follows:

kaggle competitions download -c asl-signs
kaggle competitions download -c asl-fingerspelling

It is important to know the size of the datasets, before you attempt to download them. Make sure you have enough room for the archives and extracted data.

If you do not have enough room the datasets, you can still view the landmarks from a USB camera.

asl-signs

  • archive size : 40GB
  • extracted size : 57GB
  • link : https://www.kaggle.com/competitions/asl-signs/data

asl-fingerspelling

  • archive size : 170GB
  • extracted size : 190GB
  • link : https://www.kaggle.com/competitions/asl-fingerspelling/data

landmarks

The two datasets do not contain any images or video, but rather landmarks captured with the MediaPipe models. There is a total of 543 landmarks for each sample, including:

  • face : 468 landmarks
  • left hand : 21 landmarks
  • right hand : 21 landmarks
  • pose : 33 landmarks

When landmarks are absent/missing, they are represented as NaN in the datasets.

Time Lapse of Landmarks

All of the viewers have a time-lapse of the top 130 landmarks.


image
Time Lapse of Top 130 Landmarks (Camera: AlbertaBeef)

The choice of the 130 landmarks can be tracked down to the winner of the first competition, Hoyeol Sohn (https://www.kaggle.com/hoyso48), and corresponds to the following landmarks:

  • face (lips, nose, left eye, right eye) : 76 landmarks
  • left hand : 21 landmarks
  • right hand : 21 landmarks
  • pose (left arm, right arm) : 12 landmarks

The winners of the second competition, Darragh Hanley (https://www.kaggle.com/darraghdog) and Christof Henkel (https://www.kaggle.com/christofhenkel), reused the same choice of 130 landmarks.

I think we can assume that this is a good and relevant selection for analysis and further processing.

Installing the Viewers

The dataset viewers can be accessed from the following github repository:

git clone https://github.com/AlbertaBeef/aslr_exploration
cd aslr_exploration

If not done so already, download the datasets from to the Kaggle website to the “aslr_exploration” directory, or via the Kaggle API, as follows:

kaggle competitions download -c asl-signs
kaggle competitions download -c asl-fingerspelling

Extract the “asl-signs” dataset as follows:

mkdir asl-signs
cd asl-signs
unzip ../asl-signs.zip
cd ..

Extract the “asl-fingerspelling” dataset as follows:

mkdir asl-fingerspelling
cd asl-fingerspelling
unzip ../asl-fingerspelling.zip
cd ..

You are all set !

Viewing the “asl-signs” Dataset

The “asl-signs” dataset contains sequences from various participants for 250 words.

The “asl-signs” dataset is provided in the following format:

train (94,481 total sequences)

  • train.csv
  • train_landmark_files\[participant_id]\[sequence_id].parquet

The parquet files have the following format:

  • frame : int16 (frame number in sequence)
  • row_id : string (unique identifer, descriptor)
  • type : string (face, pose, left_hand, right_hand)
  • landmark_idx : int16
  • x/y/z : double

Each sample corresponds to 1,629 rows (543 landmarks * 3) having a common frame value in the parquet file.

The “asl-signs” viewer can be launched as follows:

python3 asl_signs_viewer.py
image
asl-signs Viewer (Video camera: AlbertaBeef)

Viewing the “asl-fingerspelling” Dataset

The “asl-fingerspelling” dataset contains the sequences for various sentences performed by various participants. These sentences are typically explicitly spelled out (fingerspelling) since they contain numbers, addresses, web URLs, etc…

The “asl-fingerspelling” dataset is provided in the following format:

train (67,213 total sequences)

  • train.csv
  • train_landmarks\[participant_id]\[sequence_id].parquet

supplemental_metadata (52,958 total sequences)

  • supplemental_metadata.csv
  • supplemental_landmarks\[participant_id]\[sequence_id].parquet

By default, the viewer will use the train.csv file. Feel free to edit the python script to use the supplemental_metadata.csv instead.

The parquet files have the following format:

  • sequence_id : int16 (unique identifier of sequence)
  • frame : string (frame number in sequence)
  • type : string (face, pose, left_hand, right_hand)
  • [x/y/z]_[type]_[landmark_idx] : double (1,629 spatial coordinate columns for the x, y and z coordinates for each of the 543 landmarks, where type is one of face, pose, left_hand, right_hand)

Each sample corresponds to one row in the parquet file.

The “asl-fingerspelling” viewer can be launched as follows:

python3 asl_fingerspelling_viewer.py
image
asl-fingerspelling Viewer (Video camera: AlbertaBeef)

Viewing a “Live Feed”

It may be interesting to view custom data, or capture additional data for your custom applications. The MediaPipe holistic viewer can be used for this purpose.

The “MediaPipe holistic” viewer can be launched as follows:

python3 mediapipe_holistic_viewer.py
image
mediapipe holistic Viewer (Video camera: AlbertaBeef)

Going Further

For installation instructions for the viewers, refer to my full write-up on Hackster:

  • [Hackster] Insightful Datasets for ASL recognition

I hope that these viewers will inspire you to implement your own custom application.

Acknowledgements

I want to thank Google for making the following available publicly:

  • MediaPipe
  • asl-signs : ASL Isolated Signs Dataset
  • asl-fingerspelling : ASL Fingerspelling Dataset

I also want to thank Kaggle for hosting the competitions and datasets, as well as the following winning Kaggle members for their insight into these datasets:

  • Hoyeol Sohn (https://www.kaggle.com/hoyso48)
  • Darragh Hanley (https://www.kaggle.com/darraghdog)
  • Christof Henkel (https://www.kaggle.com/christofhenkel)

References

    • [Kaggle] Isolated Sign Language Recognition : https://www.kaggle.com/competitions/asl-signs/overview
    • [Kaggle] Fingerspelling Recognition : https://www.kaggle.com/competitions/asl-fingerspelling
    • [AlbertaBeef] aslr_exploration : https://github.com/AlbertaBeef/aslr_exploration
    • [Hackster] Insightful Datasets for ASL recognition

  • Sign in to reply
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube