Ben Heck's Alexa, A Raspberry Pi and the Google TensorFlow Image Classifier

6 Mar 2018

After watching Ben Heck's two episodes on automating his workshop with Bob Badley (Episodes 325 & 326) I was inspired into having a go at it myself. But what should i connect it up too?

Someone ive been following online in the subject of machine learning and computer vision, Dr Adrian Rosebrock at PyImageSearch, recently did a blog on creating a scalable Keras + deep learning REST API

Ive been looking for some way to improve classification speed on my robot Zed. You may have seen him on some other posts on using opencv to track objects & LIDAR scanning. My main problem is the CPU power of the raspberry pi. I did try and look into a Cluster Hat but the tensorflow wheel only supports ARM 8 and the chips on the zeros are only ARM 7 (no you cant just rename the wheel to make it install!) Another option i was looking into was NVIDIA;s Jetson TX2 but for the chip and companion card your looking an almost £1000 and for all the grand total of 256 CUDA cores. So when Adrian posted his blog on the Classifier API i was very excited at the idea of offsetting the computational aspect of identifying an image with TensorFlow. Running a classification on a raspberry pi can take anything up to 70 seconds but sending an image to the API running on my 9 year old re purposed Intel Core2 Quad Core 2.5GHz Ubuntu server we can reduce this down to < .8 of a second.

I've been having a go at replicating his project and wrapping it up in a docker build and i wondered, Can I connect both Ben's Alexa automation project, and Adrian's Image classifier API? It turns out we can!

Using the Alexa API as Ben did in his videos, we can setup a new skill and send that to our Alexa/Echo device, have that then send a HTTPS POST to a web service running on the pi, instructing it to take a picture with the Pi Camera. We then get the Pi to send that photo to the tensorflow classifier API, return the results to the Pi and then have Alexa read it out for us. I noticed a few comments under Ben's video about the issue over running this service as HTTPS w/ SSL so hopefully i can also cover how we can overcome that later in this post.

There is quite a bit going on here, so im going to split it up into 3 sections.

Part 1 - Configuring the Alexa skill in the Amazon developers portal.

Just as Ben & Bob did in their video we are going to start with the Alexa Skill. Using your Amazon account details, head over to https://developer.amazon.com/ and login to the portal with the link in the top/right corner.

You should now have the following screen up..

Click the Alexa item and then in the top right you want the "Your Alexa Consoles" > "Skills" Menu item

In the next screen you should get an option to "Switch to the new console" we will do that but if your reading this down the line sometime you may not need to.

In the next screen will be a list of our Alexa skills, here we click on Create Skill and begin the process.

Lets call out new skill Hello 14 and click next to continue..

In the next screen we select "Custom" and then Create Skill and we should end up with the main page looking like the below..

Here is where we cheat a little. We wont be setting up any "Intents" as Ben did in his Film. Instead we will use the simple call to our Intent "Hello 14" to launch the python code on the pi to callisfy the image.

On the left hand side of the screen go into the JSON editor, clear everything that is in there and paste in the following code. I have included an example of 1 intent in the JSON this is only because the editor gets a bit funny if we dont have any specified.

{
    "languageModel": {
        "invocationName": "hello fourteen",
        "intents": [
            {
                "name": "YesIntent",
                "slots": [],
                "samples": [
                    "yes",
                    "sure"
                ]
            }
        ],
        "types": []
    }
}

Once done we then save and build the Alexa model and wait a few minutes for it to complete.

This is as far as we can take it for now. Leave this open as we will need to come back to this in Step 2, once we setup the raspberry pi. We will then have a url that we need to enter into the end point

Part 2 - Setting up the REST API based on Adrian's blog, using docker

I like to automate things. So after reading how to set this up i had a go at scripting the install. For this part as mentioned earlier i took an old PC, set up a fresh install of Ubuntu using the flash drive network installer, took all the defaults and then installed docker. docker is a tool to run apps and services inside containers. Getting it setup is pretty easy and can be done with a few apt-get commands. An install example can be found here

With docker setup we create a folder in our home directory, i called it "docker" We need to add in 3 files to build our image. The first one is the run_keras python file in Adrian's download. all we do is rename it to app.py and change the last line from

app.run()

app.run(host='0.0.0.0')

This will make the service listen on all interfaces in the docker instance.

The next file we need to make is our Docker file. create a file called "Docker" (case sensitive) and enter the following

FROM python:3.6


RUN apt-get update && \
        apt-get install -y \
        build-essential \
        cmake \
        git \
        wget \
        unzip \
        yasm \
        pkg-config \
        libswscale-dev \
        libtbb2 \
        libtbb-dev \
        libjpeg-dev \
        libpng-dev \
        libtiff-dev \
        libjasper-dev \
        libavformat-dev \
        libpq-dev


RUN pip install numpy scipy h5py tensorflow redis keras flask gevent imutils requests Pillow


RUN wget http://download.redis.io/redis-stable.tar.gz && tar xvzf redis-stable.tar.gz
WORKDIR redis-stable
RUN make && make install


WORKDIR /
ENV OPENCV_VERSION="3.4.0"
RUN wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip \
&& unzip ${OPENCV_VERSION}.zip \
&& mkdir /opencv-${OPENCV_VERSION}/cmake_binary \
&& cd /opencv-${OPENCV_VERSION}/cmake_binary \
&& cmake -j4 -DBUILD_TIFF=ON \
  -DBUILD_opencv_java=OFF \
  -DWITH_CUDA=OFF \
  -DENABLE_AVX=ON \
  -DWITH_OPENGL=ON \
  -DWITH_OPENCL=ON \
  -DWITH_IPP=ON \
  -DWITH_TBB=ON \
  -DWITH_EIGEN=ON \
  -DWITH_V4L=ON \
  -DBUILD_TESTS=OFF \
  -DBUILD_PERF_TESTS=OFF \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DCMAKE_INSTALL_PREFIX=$(python3.6 -c "import sys; print(sys.prefix)") \
  -DPYTHON_EXECUTABLE=$(which python3.6) \
  -DPYTHON_INCLUDE_DIR=$(python3.6 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") \
  -DPYTHON_PACKAGES_PATH=$(python3.6 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") .. \
&& make install \
&& rm /${OPENCV_VERSION}.zip \
&& rm -r /opencv-${OPENCV_VERSION}


COPY  .
WORKDIR /app


ADD run.sh /usr/local/bin/run.sh
RUN chmod +x /usr/local/bin/run.sh


CMD ["/usr/local/bin/run.sh"]

This is for my first docker script so if your a docker veteran dont judge me! my housemate already has. apparently i've done it all wrong and redis should be running in its own docker for a start but hey it works.

What this does is it starts with an Image of Ubuntu running python 3.6, updates it with everything it needs with opencv and then we grab a copy of the source for redis and opencv and build/roll our own installation. then once all said and done, at the end we execute a script that will launch our 2 processes, the REDIS server and the TensorFlow API. you can only launch 1 executable with docker so we use the shell script to get around that.

So we need to create one more file run.sh and add the following

#!/bin/bash

redis-server &

python3 app.py

At the end of the docker file there is a command that will chmod +x this file and then launch it to file up our 2 services.

Before we build this, if your following this post for a project, this will download around 4GB of data. So if your on a slow connection it may take some time for docker to a) pull everything down and b) compile opencv\redis and build our docker image.

the next step is to build it. Type following inside your docker folder

docker build .

**note the period on the end this denotes the current folder.

Each line in the Docker file creates a "Layer" in the docker system, so it will eat up some disk space. every time you change this file and build it, the higher up in the file your changes are the more layers it will rebuild so please refer to docker documentation so that you can clean up your hard drive as you go.

Once that has built successfully it should end with a key like the following

this is our "Container Id" we can now run it with the following plus the "Id"

docker run -p 5000:5000

so in my case above it would be

docker run -p 5000:5000 0e42345fbea7

If we add the -d flag we can background the process but for this post we will leave it run in this window for now.

Once we give it a minute this should fire up our tensorflow docker image running on <server_ip>:5000

With the server code Adrian has also included a client to test our server with. If you are running it from the same server you can use the url localhost, but you should also be able to run it from a remote linux machine or even using CURL from a raspberry pi.

If everything started up ok and you cant connect remotley you can try flushing any firewall rules you may have*** you can do this with

sudo iptables -F

*** However dont do this if the device is not behind a firewall

A successful test will look like the following, clearly identifying his pet as a beagle with a 94% sample result.

Now that we have our Tensorflow API setup and running lets look at the raspberry pi setup..

Part 3 - Building a Raspberry Pi

Staring with a fresh install of Rasbian Stretch we are looking at getting up and running as quick as we can here so:-

On boot, open up raspi-config and enable ssh, vnc (both for remote access) and the camera interface. Save and reboot. We dont need to expand the file system anymore, that is now done automatically on first boot in stretch.

After reboot power off the Pi and install your Pi Camera. Ensure you get the ribbon around the right way, there are lots of examples online if you are unsure.

Boot up the Pi and lets update some things.

A little tip before you apt-get update the pi, remove the stuff we dont need..

sudo apt-get purge libreoffice wolfram-engine scratch -y
sudo apt-get autoremove

then once that done run and update, so the standard...

sudo apt-get update && sudo apt-get upgrade -y

(this one might take a bit)

Even though we dont need that much disk space, removing these apps speeds up the update process and cuts down on unnecessary downloads for the update command

Now its time to prepare the hot sauce..

We need to install flask and a library called flask-ask. Flask is a web engine that helps us handle HTTP Posts a little easier and flask ask is a tool-set that we leverage to receive the Alexa commands. once we have this setup we will create a SSL proxy using ngrok They offer a free service where we can run a process and get a free SSL window to open up a connection to the Amazon cloud. It will forward the data through into our pi so we dont have to forward any ports. When we start the process the free edition only gives us an 8 hour window / address and every time we reset out dev environment you need to update the endpoint url in the Alexa skill set that we will cover that later in this section.

Lets start with pip

wget https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py

Then Flask and flask-ask

pip install flask-ask sudo pip install flask

now the python picamera libs..

sudo apt-get install python-picamera -y

Now for the ngrok system. this is pretty easy

wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-arm.zip 
unzip ngrok-stable-linux-arm.zip
cd  unzip ngrok-stable-linux-arm.zip

Lets leave that terminal there. we will use that to run ngrok in a minute.

Lets open up a new terminal and get our Flask server working to receive the Alexa requests and post them to the TensorFlow API

create a new file in your home folder (you can place all this under another if you wish!) lets call is run_server.py my editor of choice is nano (Sorry VIM users!) so....

nano run_server.py

And lets add the following

#!/usr/bin/python
import logging
import time
from flask import Flask
from flask_ask import Ask, statement, question, session
import picamera
import requests


# this is the ip of our keras server, here mine is .116 
KERAS_REST_API_URL = "http://192.168.1.116:5000/predict"
IMAGE_PATH = "alexa.jpg"
   
app = Flask(__name__)
ask = Ask(app, "/")


logging.getLogger("flask_ask").setLevel(logging.DEBUG)


@ask.launch
def new_game():
    camera = picamera.PiCamera()
    camera.vflip = True # I have to flip my camera, comment
                        # this out if you dont need too.
    camera.capture(IMAGE_PATH)
    time.sleep(.3) # Give it a tad to write the file.  If it
                   # fails to write and you get a blank image
                   # tensor flow will report it as a website 
    camera.close()
    image = open(IMAGE_PATH, "rb").read()
    payload = {"image": image}


    r = requests.post(KERAS_REST_API_URL, files=payload).json()
    tensor_text = ""
    if r["success"]:
        for (i, result) in enumerate(r["predictions"]):
            tensor_text = result["label"]
            tensor_text = tensor_text.replace("_", " ")
            break
    else:
        tensor_text = "Whoops!, Something broke"


    msgText = "I think it's a " + tensor_text
    return statement(msgText)


if __name__ == '__main__':
    app.run(debug=True)

What we do here is expose a basic "View" to Alexa, as i mentioned at the top of this section its a bit of a quick and nasty way to do it., Instead of using opencv to read the camera image and post it directly to the server we use the python picamer library to save it to disk then use it read it back off the disk. I dont like this bit and will probably refactor something else when i move it to the robot.

Now once we save that lets make the file executable with chmod so..

chmod +x run_server.py

now we can start the server by executing the file. this is a shortcut for running >python file.py as we hash bang at the top of our .py file and set its properties to +x

./run_server.py

This should output the following and show us whats going on..

our Alexa server is running! Lets hook up that that SSL business.. so in that other terminal we left open earlier, lets crank up ngrok

./ngrok http 5000

You should see something come up like this..

The yellow line is my doing! This is an important one. In part one where we setup the Alexa skill, we need to set this url as our endpoint. you will also see that the second line item shows our session status. this is our secure access as mentioned earlier. Its time limited but it suits us for this demo at least. (And saves buying a certificate!)

Back in the Amazon developer portal click on the Endpoints option in the left menu

Set box Z to HTTPS and then configure the other options as follows:

Box A is the address that we get from ngrok

Box B we select the 2nd option for the certificate sub domain that we get through the ngrok service

Save and then build the model and wait for it to complete.

Once its complete, download and install the Alexa app on your phone (if you dont have it already) then login, hit the burger button next to "Home", select "Skills" , then "Your Skills" then you should see out "Hello 14" skill. Select this and ensure not disabled and enable it if its not. If it wont enable, check your HTTPS settings. Using the app is a good way to check if the setup is working and saves screaming at the thing and not having it work!

->> Bringing it all together...

So, In part one, we setup our Alexa skill and gave it some of the most basic settings one can add to get the thing ticking over, and in part two we wrapped up all of Adrian's fine work into an Automated docker install script. and tested our API. In the third section we built a raspberry pi with a service to receive the Alexa calls and then post the data to our image classifier API and finally tucked it behind a free https proxy provided by the ngrok platform

Here is a short video of the project in action, you might need some volume to hear the sound..

Next Steps:

Using OpenCV to read the image and create the data to post to the server is a must before i move it onto the robot. that's giving me the niggles and i would like to move onto training my own models. I think i still want to swap out my ClusterHAT for a Jetson-TX2 though

Matt.