Blog Index | Next >> |
---|
Time is going fast, and the project submission deadline approaches quickly. In order to finish the project as complete as possible I had a quick look at what the original plan was in my application:
to add the Intel
Edison to this display, in order to display a six digit number using Intel
Edison's wifi connection connected to the internet. An IoT nixie display so to say. The number displayed can be anything, of course it can be the current local time, from a time server, or the local time elsewhere in the world. It can be the temperature and humidity of the closest weather station, or the forecast for tomorrow. It can be the number of visitors of the project webpage, , the position of the ISS space station from space.com, you name it.
Last week I wrote software to display time, date, temperature, humidity, pressure and rainfall on the nixie tubes. The time and date directly came from the Edison itself, which on his turn is updated by a standard time server. Weather information is retrieved from the OpenWeatherMap service using a standardised API.
As you can see in the application I promised to display some arbitrary information, for which no API or protocol is available. The technology to do this is called Web Scraping. As an example I choose to display of the number of views and likes of my most recent blog post in this challenge.
Web Scraping with Node.js
Two modules are very usefull for web scraping with Node.js. While Node.js does provide simple methods of downloading data from the Internet via HTTP and HTTPS interfaces, you have to handle them separately, to say nothing of redirects and other issues that appear when you start working with web scraping. The Request module merges these methods, abstracts away the difficulties and presents you with a single unified interface for making requests. We’ll use this module to download web pages directly into memory.
Cheerio enables you to work with downloaded web data using the same syntax that jQuery employs. To quote the copy on its home page, “Cheerio is a fast, flexible and lean implementation of jQuery designed specifically for the server.” Bringing in Cheerio enables us to focus on the data we download directly, rather than on parsing it.
First step is to install both modules:
root@edison_nixie:~# npm install request
request@2.81.0 node_modules/request
root@edison_nixie:~# npm install cheerio
cheerio@0.22.0 node_modules/cheerio
Next we need a web link to retrieve the information from. This web page is of course the content page from the Upcycle_It challenge:
https://www.element14.com/community/community/design-challenges/upcycleit/content
All the 'Upcycle Nixie Display' blogs are tagged with upcycled_nixie so by selecting this tag we only will get the once we need here.
And finally we put the most recent blog on top, by selecting 'Sort by date created: Newest first'.
Now the url is copied from the address bar and put in the code:
// Web scrapping var request = require("request"), cheerio = require("cheerio"), url = "https://www.element14.com/community/community/design-challenges/upcycleit/content?filterID=contentstatus%5Bpublished%5D~language~language%5Bcpl%5D&filterID=contentstatus%5Bpublished%5D~tag%5Bupcycled_nixie%5D&sortKey=contentstatus%5Bpublished%5D~creationDateDesc&sortOrder=0";
Finally I wrote a function to retrieve the information from the website:
function getBlogCounts() { request(url, function (error, response, body) { if (!error) { var $ = cheerio.load(body); var views = $('td.j-td-views').children().first().text(); var likes = $('td.j-td-likes').children().first().text().split(' ')[1]; console.log('Views:', views, 'Likes:', likes); blog_counts = views * 1000 + likes * 1; } else { console.log("We’ve encountered an error: " + error); } }); } getBlogCounts(); setInterval(getBlogCounts, 600000); // interval of 10 minutes
So, what are we doing here?
First, we use the Request module to download the page at the URL specified above via the request function. We pass in the URL that we want to download and a callback that will handle the results of our request. When that data is returned, that callback is invoked and passed three variables: error, response and body. If Request encounters a problem downloading the web page and can’t retrieve the data, it will pass a valid error object to the function, and the body variable will be null. Before we begin working with our data, we’ll check that there aren’t any errors; if there are, we’ll just log them so we can see what went wrong.
If all is well, we pass our data off to Cheerio. Then, we’ll be able to handle the data like we would any other web page, using standard jQuery syntax. To find the data we want, we’ll have to build a selector that grabs the element(s) we’re interested in from the page. If you navigate to the URL I’ve used for this example in your browser and start exploring the page with developer tools, you’ll see the folowing:
Notice that the number of views are selected with td.j-td-views, while the likes are selected with td.j-td-likes. We will get a whole list of those, one for each blog post. Therefore we select only the first one (children().first()), which is the most recent blog in this case. The likes selector returns a string and a number, in this case: "Show 7 likes7" from which the first number is selected by splitting the string and taking the second element (.split(' ')[1]).
Finally, now that we’ve got ahold of our elements, it’s a simple matter of grabbing that data and showing it on the six digits nixie tubes by adding the number of likes to the multiplication of the number of views by 1000. The function for grabbing the information from the blog list will run at an interval of ten minutes. The showAll loop is enhanced with an extra entry for showing the information.
case 6: showNumber(blog_counts); break;
As proof of the pudding here is an image of the display showing the view count (123) and 5 likes for [Upcycle It] Nixie Display #10 - Software stuff at the time of this writing (Fri May 26 15:12:27 CEST 2017).
Works like a charm . For displaying other information such as the internet speed, the position of the ISS space station from space.com, or whatever you want, the software can be adapted accordingly.
Updated function table
The table below shows an update of the available functions with their units and the number format.
Number | Function | Format | |
---|---|---|---|
0 | Time | hhmmss | |
1 | Date | YYMMDD | |
2 | Temperature | °C | 0000CC |
3 | Humidity | % | 0000HH |
4 | Pressure | hPa | 00PPPP |
5 | Rain Volume last 3H | RRRRRR | |
6 | Blog views and likes | VVVLLL |
The full source is on GitHub: https://github.com/AgriVision/nixie_display. In this code the URL of the blog index page is moved to settings.json which is a better place then main.js.
This finishes the blog of this week. I will keep an eye on my nixie display to be informed on the views and likes.
Next week is for some final tweaks and wrap up.
stay tuned!
Top Comments