XMOS startKIT: Introduction, Terminology/Architecture, Getting Started and Example Programs

shabaz

5 May 2014

This is part 1 in a series of XMOS startKIT posts. The following topics are covered here:

Introduction/Terminology/Architecture
Running Tasks in Parallel
XMOS and IoT
Installing the development environment xTIMEcomposer
Building and running an Example Program
Example: Controlling a Servo
Example: Multi-tasking
Troubleshooting/Debugging

Part 2 is called XMOS startKIT: Building an XMOS and Raspberry Pi Robot XMP-1 and it covers these topics:

Robot introduction. Hardware and Software overview and Circuit/Wiring diagram
Serial Peripheral Interface (SPI) introduction
Connecting the Raspberry Pi to the XMOS startKIT
SPI implementation on the XMOS and Raspberry Pi boards
Implementing multiple servo control
Creating a Raspberry Pi web server and application using Javascript and Node.js
Hardware assembly of a robot

Part 3 is called XMOS startKIT: XMOS and Raspberry Pi Oscilloscope XAE 1000 and it covers these topics:

Oscilloscope Introduction
SPI transmit and receive between the Raspberry Pi and the XMOS startKIT
Using the analog to digital converter (ADC) on the startKIT
Real-time traces/graphics in a web browser

Introduction

This post documents how to get started with XMOS’ startKIT which is an interesting board recently released (well, 5 months ago). The board is only slightly larger than a credit card (less than a centimetre longer) and packs an XMOS chip with multiple processors (more on that later), built-in 4-channnel ADC, touch sensors, integrated programmer/debugger, LEDs, switch and places for connectors. It costs under £12 including VAT (also available from Newark).

It was found that the XMOS startKIT is fantastic at handling timed events and multitasking. The development environment is fairly straightforward to use (it is based on Eclipse), and the programming language is C with a few extensions to handle multitasking requirements. Today, if you want to handle input/output at a relatively high speed with very high accuracy and ease-of-use, XMOS devices are very high up on the list of suitable devices.

The startKIT has a connector placeholder which is ideal for communicating to the Raspberry Pi, however for these first steps the board was used standalone. (Note - to use with a Raspberry Pi, a DIL pin header strip is required and a ribbon cable assembly; the board is not intended to be plugged on top).

For some initial experiments the XMOS board was used to connect to a hobby servo, and the on-board LEDs were used to test multitasking capabilities. A great debugging tool called XScope was also tried out briefly. Information on all that is further below.

First, some terminology needs to be addressed. It can be confusing, because the terminology used to describe XMOS technology has changed over time. So, the terminology and architecture is discussed below briefly. It can be skipped if you want to just get on with setting up the XMOS programming environment and running some demo code. You probably do want to read it at a later stage though.

XMOS Terminology and Architecture

We are used to hearing about multi-cored chips which have multiple processors per chip, or at least multiple instances of most parts of a complete processor. In XMOS terminology, the term ‘core’ means multiple instances of some parts of a complete processor and the sharing of some things not present on non-XMOS processors. This may sound like a bad thing, but read on and you'll find it isn't. In fact the implementation is a good thing for certain tasks; the kind of tasks that embedded systems often require. (I’ll call these cores ‘XMOS cores’ from now on, to make it clear). Totally unique non-shared instances of groups of XMOS cores can be placed on the same silicon; these unique instances are called ‘tiles’ in XMOS terminology.

A diagram can say a thousand words; the diagram below (taken from the xCORE architecture PDF document) shows in red the XMOS cores (8) per tile (two tiles in this diagram).

The device on the startKIT also happens to contain two tiles, each with 8 XMOS cores. One tile is available for user programs (tile #1), the other one (tile #0) is dedicated as a debugger on the startKIT.

XMOS have taken many of the essential bits of operating systems (i.e. traditionally software-based) and have implemented them in hardware. This means that you don’t need to run (say) uC/OS or other lightweight real-time OS’s, and you can still achieve multi-tasking.

In addition, you get ultra-low jitter, latency and the ability to prescribe timing (things still extremely difficult to achieve with traditional real-time OS’s on conventional processors) for one of the main things you’d want an embedded processor to handle - input/output.

This is the kind of stuff that is possible with FPGAs, and with non-XMOS processors running at a very high speed, but XMOS devices allow you to do this automatically with no software overhead, and with a standard programming language – C – with a few extensions. The extensions are designed to be as simple as possible to use (almost FPGA test-bench like) for handling I/O timing, and the message-passing/process synchronisation concepts are expressed as close as possible to the way conventional OS’s handle this.

This all sounds impressive (and it is), but XMOS devices take this a step further and make the message-passing/process synchronisation transparent to the programmer, so that the actual execution can occur between XMOS cores residing on the same tile, or on separate tiles on the same physical chip, or on separate chips connected with wire links. The hardware responsible for all that is known as xCONNECT (shown in blue on the right side in the diagram above). Again, some terminology needs explaining; xCONNECT is a term that appears to encompass links and switches. The links and switches allow the connection between the cores, regardless of where they reside. The links and switches are used to create point-to-point virtual circuits or ‘channels’ which can be opened up for as long as required, to pass information. The channels can traverse multiple physical chips to get to the correct destination. Each chip is connected to its closest neighbours using one or more sets of wire links (for a short distance of the order of several centimetres; for a longer distance additional circuitry would be required). At any one time multiple channels can exist; there is a ‘quality of service’ scheme that exists too.

Back to the XMOS cores in a tile; as mentioned earlier, each XMOS core has some dedicated hardware, and some shared hardware. The dedicated part is a separated set of registers per XMOS core. The shared part is memory and pipeline, but also a hardware scheduler used for implementing the process state and dispatching instructions into the pipeline in a round-robin fashion. Another major part of the secret sauce concerns the ability to timestamp events. XMOS instructions can be used to let an XMOS core sleep until one of a number of events occurs (such as a pin going high); when the event does occur, there is no delay waiting for a context-switch (which entails copying registers to memory and vice-versa) to occur as with a regular software-based OS. The scheduler merely needs to change the state so that that the sleeping XMOS core can execute. This is therefore fast and saves many clock cycles.

You may be wondering how timing can be guaranteed if XMOS cores can sleep. It is true that when many XMOS cores are asleep, the remainder ones will run faster. However, you can guarantee that the amount of processing time that each running one will receive will be a minimum of 1/x processor cycles, where x is the total number of XMOS cores currently running within a tile. The guarantee allows you to know beforehand if your code can execute in time to handle input/output. Actual I/O will still be timed precisely because I/O is time stamped. You can schedule input to be captured at specific times, or to be sent out at specific times. If the XMOS core needs to wait until that time occurs, then it will do so, allowing more processing time for the remainder running XMOS cores.

The timestamping is simply a breeze to use. The following line of code will instruct the device to set the pin ‘led’ logical high after time 100, and then low after a further time 100. Put into a loop, the LED would flash with precise timing.

led @ 100 <: 1
led @ 100 <: 0

In short, this is by far the easiest to use yet most effective way of implementing timed I/O that I’ve ever seen with any microprocessor and OS. With non-XMOS devices high-level constructs could be used to implement a similar style of I/O programming, but they would not be able to define timing down to the nanosecond level with low jitter. With XMOS, you can.

PWM could be implemented in a similar way. Here, a PWM output is produced with on-time equal to the period ‘width’ (set to a range between 0 and 1000 in this example). There is a real-world example of this further down in the post, controlling a servo.

while(1)
{
    t+=width;       outpin @ t <: 0;
    t+=1000-width;  outpin @ t <: 1;
}

Running Tasks in Parallel

Another common task is of course the handling of multiple tasks. As is clear from the earlier text, XMOS cores run in a preemptive manner with round robin scheduling. From the main() function it is possible to assign processes to run on separate XMOS cores using the following syntax:

    par
    {
      task1();
      task2();
    }

It is possible to select specific tiles with some modified syntax (the tiles can be off-board if a configuration file is modified to contain the node routing identifiers and routing table).

Since you may want to achieve some process synchronisation/communication between the cores, there are a few different methods of doing this. One simple way is to create a channel (the concept of a channel was described earlier) and send data down it. The receiving task can sit and wait for the message using a select statement.

The main() function can define a channel using the ‘chan’ variable type, and the tasks can receive it as a parameter.

int
main(void)
{
    chan c;

    par
    {
      task1(c);
      task2(c);
    }

    // ...
}

To send a message from one task to another:

c <: value;

To receive the message:

select
{
    case c :> inval:
                printf(“message received: %d\n”, inval);
                break;
}

The select statement will cause execution to wait for that XMOS core, until a message is received on the channel. When a message is received, the hardware scheduler will make the XMOS core active again.

XMOS and IoT

Similar to any other device selection, a determination would need to be made if the XMOS device is suitable for the desired application. The device on the startKIT has 8 XMOS cores and 64kbyte of memory on-chip. This is fine for many applications (e.g. audio, motor control) but not ideal for some other applications (for example video capture may require a larger buffer). Where XMOS devices would absolutely excel in an Internet of Things (IoT) world would be for handling the hundreds of protocols in systems that exist today that are not connected to the Internet. XMOS devices make superb protocol engines since timing is easy to implement, and quick to code up without relying on interrupt service routines like with conventional CPUs.

The rest of this post covers installing and getting started with the XMOS startKIT. A few example programs are explored, as well as some debugging capabilities.

Installing xTIMEcomposer

For this, you’ll need to register at xmos.com and download “xTIMEcomposer” Community version from their support pages. (The screenshots below are for Windows, but the Linux version looked identical. Windows, Mac and Linux are supported, but for Linux check to ensure your particular distribution is supported).

Double-click to install the software, and accept the defaults where prompted. When installed, start it up.

For documentation, there will be a link in the Windows start menu, which opens a browser with all the online (mostly PDF) documentation, and this will be used later. All the documentation there is useful and relevant.

At the workspace prompt, put in something appropriate for your needs, and select it to be a default if you wish.

Import an Example Program

As a first step, import an example project. To do this, select ‘Community’ tab as shown here, and then traverse to ‘The spinning bar’, right-click and import into the workspace.

The example is imported from github. This requires a folder for the import. I created and selected a folder as shown here.

Once the project has been imported, you can expand the project hierarchy to view the source code (main.xc file) as shown below.

Building the project

To build the project, first select the top level of the project hierarchy (app_spinning_bar) and then click on the hammer icon, or select Project->Build Project.

If all goes well with the build, success is reported at the console at the bottom, as shown below. You’ll also see the built file (app_spinning_bar.xe) in the bin folder in the project hierarchy, and you can view it in Windows Explorer too if desired as shown in the screenshot here.

Plugging in the startKIT

Now you can plug in the startKIT into your PC if you’ve not already done so. Note that there is currently a software bug that means that the USB port on the PC must be USB 2.0, not USB 3.0. Some modern laptops have mostly USB 3.0 ports but may have a USB 2.0 port perhaps on the back (e.g. Lenovo does this with some ThinkPads). So, plug the startKIT into a USB 2.0 port and avoid any USB 3.0 ports for now.

When plugged into a port, you’ll see a message appear to indicate that the device is correctly installed.

Windows device manager could also be checked to confirm all is ok:

Running a Program

In order to run the code, first select Run->Run Configurations, and then double-click on xCORE Application as shown here.

Once you’ve double-clicked, you should see a display as shown below. You should not have to make any changes here, but just observe to make sure all the highlighted areas look similar. Notice the Target drop-down should say something like “XMOS startKIT connected to …”. If it does not, then click ‘Refresh List’. If that doesn’t work either, then perhaps the startKIT is plugged into a USB 3.0 port by mistake.

When ‘Run’ is clicked, the spinning bar program will get transferred to the startKIT and will begin executing.

From now on, to run or stop the program directly from the main view, you can use the run and stop icons as shown here:

Programming the FLASH

To permanently store the program in FLASH, select Run->Flash Configurations. In the window that appears, double-click on xCORE Application as shown here:

The display should look as shown here, but confirm all the marked items, and then click Apply and then click Flash.

When you clicked Flash, the console displays some text in red. The warning can be ignored:

Once the console says “xflash succeeded” as shown above, the USB cable can be disconnected and reconnected to make the program run.

To re-program the FLASH after making any code changes and recompiling, the lightning shaped icon can be used:

Controlling a Servo

I wrote some quick code to experiment with timing with XMOS. XMOS handles timing quite uniquely, almost hardware test-bench language style. Typically with a microcontroller, a clock is set up to interrupt the microcontroller and execute an interrupt service routine that can then be used to control input/output. For easier control of the exact timing some microcontrollers have dedicated PWM units with quite a bit of flexibility, but nothing like XMOS have implemented.

As an example with XMOS, for servo PWM, it is possible to use these lines in a loop, where width is a variable corresponding to microseconds in this case:

t+=width;        servo @ t <: 0;
t+=20000-width;  servo @ t <: 1;

The code above will set the servo PWM signal low at a period of value ‘width’, and high at a period of value 20000-width.

The entire code is pasted below. To use it, you can connect up a servo to the startKIT as shown in the photo:

I used a small, especially low-power servo. For a larger servo you should use an external power supply, do not rely on the USB supply.

This was the code that was used (note: copying will result in hidden non-text characters in this html content being copied, so either manually type, or use the text file attached to this post):

#include <xs1.h>
#include <stdio.h>

clock clk = XS1_CLKBLK_1;
out buffered port:1 servo = XS1_PORT_1F;

int main(void)
{
    int i;
    unsigned int t=0;
    unsigned int width=1500;

    configure_out_port(servo, clk, 0);
    set_clock_ref(clk);
    set_clock_div(clk, 50); // 1MHz
    start_clock(clk);

    while(1)
    {
        printf("Enter servo position in microseconds (1500 is center):\n");
        scanf("%d", &width);
        for (i=0; i<200; i++) // loop for a while
        {
            t+=width;   servo @ t <: 0;
            t+=20000-width;  servo @ t <: 1;
        }
        t+=width;   servo @ t <: 0;
    }

    return 0; // warning on this line is ok
}

The code is designed to be run using the xTIME Composer console, and when run it will prompt the user to enter a value in microseconds. When this is done, the servo will move to the corresponding position. The code is not practical, because when disconnected from the PC there is no console capability. However it is absolutely great for code debugging or for testing out some hardware such as a servo as in this example.

Enter the code into xTIME Composer, build it as before, and then run it. The console will prompt in red text. Click inside the console window, and type an integer value as shown here:

Once the value is entered, the servo will move to an appropriate position.

A Multitasking Example

In order to experiment with multiple tasks, the simple program here was written, which flashes an ‘X’ symbol and a ‘+’ symbol in a pattern. It will flash the ‘X’ a few times, then flash the ‘+’ a few times, and then repeat. The code could be written in a single process of course, but it is implemented in two tasks to see how multi-XMOS cores are handled.

The first bit of the code just defines the LED patterns that will be used:

#include <xs1.h>

/*
* LEDs: the patterns for each bit are:
*   0x80000 0x40000 0x20000
*   0x01000 0x00800 0x00400
*   0x00200 0x00100 0x00080
*
* To get the desired value, OR the bits that
* you want to remain OFF
*/
port leds = XS1_PORT_32A;
unsigned int leds_value[]={
        0xE1F80, // Blank
        0xA0280, // Plus symbol
        0x41500}; // X symbol

port pin = XS1_PORT_1F; // for debugging with a scope

The next part of the code defines a task which just periodically sends a message to another task:

void task1(chanend c)
{
  while(1)
  {
      delay_milliseconds(1000);
      c <: 2;
      delay_milliseconds(1000);
      c <: 1;
  }
}

The receiving task is shown below. It’s not particularly tidy code, but the interesting bit is the select statement. Notice that it waits for two different types of events. The first event is timer expiry every 100 msec. The other event is a message received via the channel. When either event occurs, the hardware scheduler will automatically put the task into running state.

void task2(chanend c)
{
    int i;
    unsigned char led_sel=1;
    unsigned char toggle=0;
    timer t;
    unsigned int time;
    t:>time;
      while(1)
      {
          select
          {
              case t when timerafter(time+(100*1E5)) :> time: // 100*1E5 is 100msec
                  if (toggle & 0x01)
                  {
                      leds <: leds_value[led_sel];
                      pin <: 1;
                  }
                  else
                  {
                      leds <: leds_value[0];
                      pin <: 0;
                  }
                  toggle^=0x01;
                  break;
              case c :> i:
                  if (i==1)
                      led_sel=1;
                  else
                      led_sel=2;
                  break;
          } // end select
      } // end while

}

The main() function is responsible for indicating that both tasks need to run in parallel:

int
main(void)
{
    chan c;

    par
    {
      task2(c);
      task1(c);
    }

    return(0);
}

That’s it! When run, the behaviour defined earlier will execute.

Troubleshooting XMOS programs

The printf and scanf statements were used earlier, and they are great for testing out software. Other tools are available too. Of particular interest is a tool that can be used for debugging, profiling and pulling out useful performance detail. It was explored very briefly. The tool is called XScope.

It can of course be difficult troubleshooting multi-threaded programs because you don’t know where each thread has got to, so typically people tend to use printf statements everywhere. With XScope it is possible to plot user-defined states.

To use XScope, the header file xscope.h needs to be included in the program. Also, double-clock on the Makefile in the Project Explorer, and you should see a “XMOS Application Makefile Editor” appear.

Traverse to the ‘Xcc Flags’ section and append ‘-fxscope’ to the flags list as shown above, and then click File->Save.

Then, go to Run->Configurations and change the Target I/O options setting from JTAG to xSCOPE.

Next, click on the XScope tab, and change the Mode from ‘Disabled’ to ‘Offline [XScope] Mode’ as shown below. The, click Apply and then Close.

Now the code can be modified to insert the instrumentation. In the main() function, the following line can be added:

xscope_register (1, XSCOPE_STATEMACHINE , " State Transitions ", XSCOPE_UINT, " State ");

Now, whenever there is a state change, the following code can be added, where x is a number that will represent the state.

xscope_probe_data(0, x);

I inserted such a line in the main() function, twice in task1 (after each delay_milliseconds function) and in task2 to record the LED toggled state.

When the code is compiled and run, a file gets created called xscope.xmt. Click Terminate to stop the program, and you will see the file in the Project Explorer. Double-click the file, and the ‘Offline Scope’ will appear where the console view was. Check the State checkbox. You will see a view similar to the one shown below (you can use the zoom keys to the right of the console view to expand in). This shows the states and the time spent in each state. The states are shown on the vertical axis (marked 0,1, 2, etc). Time is on the horizontal axis.

XScope is capable of a lot more. It can also plot analog values. For example, the xscope_register function can be called as

xscope_register(1, XSCOPE_CONTINUOUS , " Continuous Value 1", XSCOPE_UINT , " Value ");

The same xscope_probe_data command as before will now plot analog values. Multiple sets of data can be captured simultaneously using this syntax:

xscope_register (3, XSCOPE_STATEMACHINE , " State Transitions ", XSCOPE_UINT, " State ",
                        XSCOPE_CONTINUOUS , " Continuous Value 1", XSCOPE_UINT , " Value ",
                        XSCOPE_CONTINUOUS , " Continuous Value 2", XSCOPE_UINT , " Value ");

Here is an example of analog values output:

Summary

It was fairly painless to begin to use the startKIT, and although some learning is needed to make full use of it, I think it will be worth it.

At under £12 the startKIT is an extremely low cost board, and will excel at anything timing related.

For IoT projects, the startKIT would be the ideal device for implementing the hundreds of protocols that exist with industrial equipment, some of which would be timing critical.

For home use, the startKIT is perfect for connecting to the Raspberry Pi (this will be explored later).

Attachments:

servo_test_program.txt.zip

Top Comments

Parents

Former Member over 10 years ago

Would there be any chance that you could develop wholly on something like a raspberry pi?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Problemchild over 10 years ago in reply to Former Member

I assume you mean could you actually develop the code on the RPI directly rather than using a PC for the IDE??

Yes in theory but you would need to rebuild the whole of the IDE which uses Eclipse as a framework.
Since this would represent a massive amount of development and also it really needs at least 1GB to use Eclipse then in practice not ... Sorry !
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
shabaz over 10 years ago in reply to Problemchild

I was going to reply a similar thing, that it would be a major task. I'm sure some of the tools are open source (the compiler is based on open source I think), but there could be the possibility that some tools are not, so all in all I too agree it is better to use a PC (running Windows or x86 Linux).
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel
Problemchild over 10 years ago in reply to shabaz

This is a very similar situation to wanting to run the Valent LogiPI FPGA development software on the RPI when it comes as an 11GB download for intel devices.
Sometimes the IDE is just too big to run on platform. Thank goodness that PCs are fast and cheap
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Comment

Problemchild over 10 years ago in reply to shabaz

This is a very similar situation to wanting to run the Valent LogiPI FPGA development software on the RPI when it comes as an 11GB download for intel devices.
Sometimes the IDE is just too big to run on platform. Thank goodness that PCs are fast and cheap
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Children

No Data