Intel Neural Compute Stick 2 - Review

Table of contents

RoadTest: Intel Neural Compute Stick 2

Author: vishwanathan

Creation date:

Evaluation Type: Development Boards & Tools

Did you receive all parts the manufacturer stated would be included in the package?: True

What other parts do you consider comparable to this product?: 1. Google Coral Link: https://coral.ai/products/accelerator 2. NVIDIA Jetson Nano Link: https://developer.nvidia.com/embedded/jetson-nano-developer-kit

What were the biggest problems encountered?: As the product requries pre-trained models IR format, converting a custom pre-trained model not currently supported by openVINO was the biggest challenge. It requires indepth knowledge of the pre-trained model which often is not the case unless once has trained the model from scratch by themself.

Detailed Review:

Background

The basics of working with Intel ® Neural compute Stick 2 (NCS2), getting started and a sample demo is very well explained in the AI at the Edge: AI Prototyping on the Edge with the Intel ® Neural Compute Stick 2 document. The Benchmarking Edge Computing article is yet another excellent article for detailed comparison for other products comparable to NCS2.

 

Let's dive in to build the first chess engine accelerated by NCS2.

 

For this review we will use the open-source project called CrazyAra. It is a chess engine that is trained to play a Lichess variant of chess called Crazyhouse. There are various inference engines supported by the project. The project wiki is very well documented for all the details for developers and users. The openVINO inference engine has been integrated to it to test NCS2. This is done in three steps as explained below:

Converting the model to IR format.

 

Model Optimizer is a cross-platform command-line tool that facilitates the transition between the training and deployment environment, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices. Thus is used to convert the custom model to Intermediate Representation (IR) format via the DL workbench.

 

Model Architecture

The details of the model architecture used from the CrazyAra project is as follows:

DL workbench

DL Workbench is a web-based graphical environment that enables you to visualize, fine-tune, and compare performance of deep learning models on various Intel® architecture configurations, such as CPU, Intel® Processor Graphics (GPU), Intel® Movidius™ Neural Compute Stick 2 (NCS 2), and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs''. I used docker method to setup up DL workbench.

 

The following picture gallery will walk us through the steps to convert a custom model to IR format. Refer the image text along with the image for more details.

 

 

{gallery} OpenVINO DL workbench

$ sudo ./start_workbench.sh -ENABLE_MYRIAD

OpenVINO DL Workbench landing page

Importing the model

 

Add the input layer name, shape and color space details

Creating configuration to convert model to IR format.

Converted Model Summary

 

IR model and Runtime Graph Visualization

 

Execution time by layer Summary

Model performance summary

Packing model for deployment

Converted models are available in the forked github repository.

Integrating openVINO inference engine to CrazyAra

 

Testing the engine from command line:

 

Running a Chess engine inference on NCS2

Cutechess is the UI which can run standard UCI chess engines and is used to debug the engines. Here is the demo video of replay of a game played against the engine.

 

Summary

  • NCS2 enables deploying Neural Network applications on the edge. Because of it USB interface, any computing device running at least Linux can be interfaced with NCS2 to accelerate neural network inference. The size, cost and form factor makes it an ideal device for computing on the edge.
  • OpenVINO toolkit has detailed documentation for anyone to implement inference on the edge. There are several demos of many computer vision applications and standard neural networks models. The demos are available in both C++  and python. Setup of development environment is also has good documentation. All the information required to develop the idea to final implementation is available in the documentation.
  • Model optimizer is the heart of any application that will be running inference on NCS2. There are several pre-trained models in IR format, and documentation for many standard frameworks to be converted to IR format.
  • If one has the custom pre-trained model, it can be converted to IR using model optimizer. The important thing to note here is the supported operations of the framework in which model is trained. Converting a model which has operations that are not supported by model optimizer will increase the development time for the application. It may also need custom operations for the model to work as designed.
  • With these products, more applications like accelerating the computation required for running a chess engine will unlock the potential of integrating these products to smart chess boards for doing on board computation.
  • Overall this product is ideal for developers to prototype their ideas and deploy it to turn their ideas to reality.
Anonymous
  • Interesting to see how one can quickly use neural stick to run a game engine.

     

    Would have been interesting to see some speed improvement results from your test when running game engine on CPU/GPU/Neural stick. 

    • You can see the list of application demo here: https://docs.openvinotoolkit.org/latest/omz_demos_README.html
    • A custom model can be trained in any framework (ONNX,MXNET, Tensorflow, Kaldi). To use it on NSC2, if all operations are spported(https://docs.openvinotoolkit.org/latest/openvino_docs_ops_opset.html ), model optimizer can convert the model which will be ready to run on NCS2.
    • Some speed comparison on similar devices are published in Benchmarking Edge Computing.
    • As NCS2 runs openVINO, you can start developing your application that can run on any Intel CPU/GPU using openvino. Once you have you application ready you can choose to pickup the device to accelerate the results. With minimal change in code, same code that is developed for CPU/GPU will run seamlessly on NCS2. As you can see in all application demo there is the following option to run same code on different hardware:

     

    -d DEVICE, --device DEVICE

      Optional. Specify target device for infer: CPU, GPU,

      FPGA, HDDL or MYRIAD. Default: CPU

  • I am curious, what types of applications would you use the device for?

    How much time is needed to build up a custom model and use it?

    How does the speed of this device compare to a similar implementation on a DSP?

    How easy is it for a newbee to pick it up and put it to use?

     

    DAB