element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Eye On Intelligence Challenge
  • Challenges & Projects
  • Design Challenges
  • Eye On Intelligence Challenge
  • More
  • Cancel
Eye On Intelligence Challenge
Blog Sign Language Interpreter Final Blog: Putting everything together
  • Blog
  • Forum
  • Documents
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Eye On Intelligence Challenge to participate - click to join for free!
  • Share
  • More
  • Cancel
Group Actions
  • Group RSS
  • More
  • Cancel
Engagement
  • Author Author: cbohra00627
  • Date Created: 10 Dec 2024 2:05 PM Date Created
  • Views 690 views
  • Likes 9 likes
  • Comments 1 comment
  • arty z7
  • ml
  • fpga
  • Eye on Intelligence Challenge
  • pynq
  • zynq-7000
Related
Recommended

Sign Language Interpreter Final Blog: Putting everything together

cbohra00627
cbohra00627
10 Dec 2024

Introduction
Two months backs I took part in the Eye on Intelligence Challenge. My project is to create a sign language interpreter which will use ML on the Arty Z7 board to do real time classification on hand signs and classify those into characters. Having very little prior knowledge on ML and PYNQ, I started working on it and learning about it. I have shared my learning journey in the below blogs:

  1. Sign Language Interpreter Blog 1: Project Overview and Kit Unboxing

  2. Sign Language Interpreter Blog 2: Hands on with Arty Z7

  3. Sign Language Interpreter Blog 3: Running PYNQ on Arty Z7

  4. Sign Language Interpreter Blog 4: Using PYNQ overlays with HDMI and USB webcam on Arty Z7

  5. Sign Language Interpreter Blog 5: Running CNN image classification on Arty Z7 using PYNQ

In this blog, I will put together whatever I have learned till now.

ASL Dataset
First of all we are going to need the dataset. For that I plan to use python to capture a 30 sec video of my hand with the webcam and randomly select 200 frames from that as training data and 40 frames as test data. The image resolution will 266x200.

Getting the video

  1. Creating capture object
    #Imports and create capture object
    import cv2
    import time
    
    # Define the desired resolution
    desired_width = 640
    desired_height = 480
    
    # Initialize webcam
    cap = cv2.VideoCapture(1)
    
    # Check if the webcam is opened correctly
    if not cap.isOpened():
        print("Error: Could not open webcam.")
        exit()
    
    # Set the desired resolution
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, desired_width)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, desired_height)
  2. Recording video and saving in a file
    # Define the codec and create VideoWriter object
    fourcc = cv2.VideoWriter_fourcc(*'XVID')
    output_path = 'train_video/Z_video.avi'  # Specify the file path
    out = cv2.VideoWriter(output_path, fourcc, 20.0, (desired_width, desired_height))
    
    recording = False
    start_time = None
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
    
        # Display the resulting frame
        cv2.imshow('Webcam Video', frame)
    
        # Check for key press
        key = cv2.waitKey(1) & 0xFF
        if key == ord('r'):  # Press 'r' to start recording
            recording = True
            start_time = time.time()
            print("Recording started...")
    
        # Record video if 'r' was pressed
        if recording:
            out.write(frame)
            # Stop recording after 30 seconds
            if time.time() - start_time >= 30:
                recording = False
                print("Recording stopped after 30 seconds.")
    
        # Press 'q' to exit the loop
        if key == ord('q'):
            break
    
    out.release()
    cv2.destroyAllWindows()
  3. Closing the image capture object.
    # Release everything when the job is finished
    cap.release()
    out.release()
    cv2.destroyAllWindows()

Saving images

  1. The below code randomly collects 240 frames from each video and saves it in folders with names same as the class. The frames are split into train and test dataset (200 for training and 40 for testing). The directory structure is shown below:
    [current dir] > train > A > A_1.jpg, A_3.jpg, A_9.jpg ... 200 images
                                     B > B_2.jpg, B_5.jpg ... 200 images
    [current dir] > test > A > A_5.jpg, A_10.jpg ... 40 images
                                    B > B_1.jpg, B_6.jpg ... 40 images
    import cv2
    import random
    import os
    
    count = 0
    # Function to resize and save a frame
    def save_frame(frame, frame_index, label):
        resized_frame = cv2.resize(frame, (266, 200))
        global count
        count = count + 1
        if count <= 200:
            output_path = os.path.join(output_train_dir, f'{label}_{frame_index}.jpg')
        else:
            output_path = os.path.join(output_test_dir, f'{label}_{frame_index}.jpg')
        cv2.imwrite(output_path, resized_frame)
    
    labels = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
    
    for label in labels:
        # Define the video file path
        video_path = 'train_video/' + label + '_video.avi'
        output_train_dir = 'train/' + label
        output_test_dir = 'test/' + label
    
        # Create the output directory if it doesn't exist
        if not os.path.exists(output_train_dir):
            os.makedirs(output_train_dir)
        if not os.path.exists(output_test_dir):
            os.makedirs(output_test_dir)
    
        # Open the video file
        cap = cv2.VideoCapture(video_path)
    
        # Get the total number of frames in the video
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        print(total_frames)
    
        # Select 240 random frame indices
        random_frames = random.sample(range(total_frames), 240)
    
        # Read and save selected frames
        for frame_index in random_frames:
            cap.set(cv2.CAP_PROP_POS_FRAMES, frame_index)
            ret, frame = cap.read()
            if ret:
                save_frame(frame, frame_index, label)
    
        # Release the video capture object
        cap.release()
        count = 0
    
        print("Saved 240 randomly selected frames as JPEG files.")
    
  2. Some example images are shown below
    imageimageimage

Training the model
So, now we have our train and test dataset. Now we need to train the model. I am using my laptop for the training part and saving the model. I created a custom tinier yolo classifier with some help from internet. I followed the below steps.

  1. Import
    #Set up the environment
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torchvision import datasets, transforms
  2. Define the model architecture
    #Define the model architecture
    class TinierYOLO(nn.Module):
        def __init__(self):
            super(TinierYOLO, self).__init__()
            self.features = nn.Sequential(
                nn.Conv2d(3, 8, kernel_size=3, stride=1, padding=1),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
                nn.Conv2d(8, 16, kernel_size=3, stride=1, padding=1),
                nn.ReLU(),
                nn.MaxPool2d(kernel_size=2, stride=2),
                # Additional layers can be added as needed
            )
            self.classifier = nn.Sequential(
                nn.Linear(16 * 50 * 66, 128),  # Adjust dimensions based on input size
                nn.ReLU(),
                nn.Linear(128, 26),  # Number of classes (26 classes)
            )
    
        def forward(self, x):
            x = self.features(x)
            x = x.view(x.size(0), -1)
            x = self.classifier(x)
            return x
    
    # Initialize the model
    model = TinierYOLO()
  3. Preparing the dataset. My dataset is 266x200 images.
    #Prepare the dataset
    # Define transformations for your dataset
    transform = transforms.Compose([
        transforms.Resize((200, 266)),
        transforms.ToTensor(),
    ])
    
    # Load your custom dataset
    # Replace 'YourDatasetClass' with your custom dataset class if you have one
    train_dataset = datasets.ImageFolder(root='train', transform=transform)
    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
    
    test_dataset = datasets.ImageFolder(root='test', transform=transform)
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)
  4. Training and saving the weights.
    #Train the model
    # Initialize the model, loss function, and optimizer
    model = TinierYOLO()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # Training loop
    for epoch in range(10):  # Number of epochs
        model.train()
        running_loss = 0.0
        for inputs, labels in train_loader:
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f'Epoch {epoch+1}, Loss: {running_loss / len(train_loader)}')
    
    # Save the trained model
    torch.save(model.state_dict(), 'tinier_yolo_weights.pth')

    image
  5. Evaluating the mode
    #Evaluate the model
    # Evaluation loop
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    
    print('Total: ' + str(total))
    print(f'Accuracy: {100 * correct / total}%')

    image
  6. Deploy the model
    #Deploy the model
    # Save the model for deployment
    torch.save(model.state_dict(), 'tinier_yolo_deployment.pth')

Performing classification on real time video
Now lets perform image classification on real time video in the laptop before running it on the FPGA board.

  1. Load the model and weights
    #Perform classification on real time video.
    import cv2
    import PIL
    
    # Load the model
    model = TinierYOLO()
    model.load_state_dict(torch.load('tinier_yolo_weights.pth', weights_only=True))
    model.eval()
    
    # Define the classes (change as per your dataset)
    classes = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
    
    # Define image transformations
    transform = transforms.Compose([
        transforms.Resize((200, 266)),
        transforms.ToTensor(),
    ])
  2. Run classification
    def classify_frame(frame, model, transform, classes, threshold=0.8):
        # Preprocess the frame
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        pil_frame = PIL.Image.fromarray(frame_rgb)
        input_tensor = transform(pil_frame).unsqueeze(0)
    
        # Perform the classification
        with torch.no_grad():
            output = model(input_tensor)
        
        # Get confidence scores
        scores = nn.functional.softmax(output, dim=1)
        max_score, predicted = torch.max(scores, 1)
        
        # Check if confidence score is above threshold
        if max_score.item() >= threshold:
            class_name = classes[predicted.item()]
        else:
            class_name = None
    
        return class_name
    
    # Initialize webcam
    cap = cv2.VideoCapture(1)
    
    # Check if the webcam is opened correctly
    if not cap.isOpened():
        print("Error: Could not open webcam.")
        exit()
    
    while True:
        ret, frame = cap.read()
        if not ret:
            break
    
        # Resize the frame to 200x200 before classification
        resized_frame = cv2.resize(frame, (266, 200))
    
        # Perform classification
        class_name = classify_frame(resized_frame, model, transform, classes)
    
        # Display the resulting frame with the class name if a match is found
        if class_name:
            cv2.putText(resized_frame, class_name, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
        cv2.imshow('Webcam Video', resized_frame)
    
        # Press 'q' to exit the loop
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    # Release everything when done
    cap.release()
    cv2.destroyAllWindows()
    
  3. Here is a video of the live classification happening. I think its not that accurate. It is able to correctly classify B, D and W only. For now, I will move forward with this.
    image
    You don't have permission to edit metadata of this video.
    Edit media
    x
    image
    Upload Preview
    image

Saving the network architecture and weights
Now, we must save the network architecture in json format and the weights to run the classification on on pynq.

  1. To dump the network architecture, I followed the below code.
    import json
    
    def model_to_json(model):
        model_json = {
            "network": "tinier-yolo",
            "input_image": "../../tests/Test_image/tinier-yolo/input.bin",
            "verification_image": "../../tests/Test_image/tinier-yolo/verification.bin",
            "binparam": "binparam-tinier-yolo-nopool",
            "use_binparams": True,
            "binparam_skip": 1,
            "layer_skip": 0,
            "layers": []
        }
    
        for name, module in model.named_modules():
            if name == "":
                continue  # Skip the top-level module itself
            layer = {
                "name": name,
                "func": "",
                "input_bits": 3,
                "output_bits": 3,
                "weight_bits": 1,
                "threshold_bits": 16,
                "kernel_shape": None,
                "kernel_stride": None,
                "input_channels": None,
                "input": None,
                "output_channels": None,
                "output": None,
                "padding": None
            }
            if isinstance(module, nn.Conv2d):
                layer["func"] = "conv_layer"
                layer["kernel_shape"] = module.kernel_size[0]
                layer["kernel_stride"] = module.stride[0]
                layer["input_channels"] = module.in_channels
                layer["input"] = [module.in_channels, None, None]  # Example placeholder
                layer["output_channels"] = module.out_channels
                layer["output"] = [module.out_channels, None, None]  # Example placeholder
                layer["padding"] = module.padding[0]
            elif isinstance(module, nn.MaxPool2d):
                layer["func"] = "maxpool_layer"
                layer["kernel_shape"] = module.kernel_size
                layer["kernel_stride"] = module.stride
                layer["input_channels"] = None  # Example placeholder
                layer["input"] = [None, None, None]  # Example placeholder
                layer["output_channels"] = None  # Example placeholder
                layer["output"] = [None, None, None]  # Example placeholder
                layer["padding"] = module.padding
            elif isinstance(module, nn.Linear):
                layer["func"] = "fc_layer"
                layer["input_channels"] = module.in_features
                layer["input"] = [module.in_features]
                layer["output_channels"] = module.out_features
                layer["output"] = [module.out_features]
            
            model_json["layers"].append(layer)
    
        return model_json
    
    # Initialize the model
    model = TinierYOLO()
    
    # Convert the model to JSON
    model_json = model_to_json(model)
    
    # Save the JSON to a file
    with open('tinier_yolo_layers.json', 'w') as json_file:
        json.dump(model_json, json_file, indent=4)
    
  2. To dump the weights and biases, I followed below code.
    import os
    import numpy as np
    
    # Create directory to save weights
    os.makedirs('tinier_yolo_weights', exist_ok=True)
    
    # Extract and save weights and biases
    for name, param in model.named_parameters():
        if 'weight' in name:
            weight_np = param.data.cpu().numpy()
            np.save(f'tinier_yolo_weights/{name.replace(".", "-")}-W.npy', weight_np)
        elif 'bias' in name:
            bias_np = param.data.cpu().numpy()
            np.save(f'tinier_yolo_weights/{name.replace(".", "-")}-bias.npy', bias_np)
    
    print("Weights and biases have been saved in NumPy format.")

Unfortunately, after this point I couldn't complete my project. I am still stuck with how to use my custom network architecture on arty z7 and use the overlays to run it and there is no more time.

Summary
Although, I couldn't complete my project on time due to some unforeseen reasons, I had a very good time working on it and I learned a lot. It seems there is still much more to learn and I will be continuing on the project.

  • Sign in to reply
  • bidrohini
    bidrohini 8 months ago

    Thanks for the series. 

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • More
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube