Programmable Logic provides the user the ability to accelerate functions by leveraging its highly parallel nature, freeing us from the sequential world which constrains software. However, not every algorithm or function within our programmable logic design requires a parallel implementation. There are elements which require sequential processing, like communications protocols over RS232 or control and sequencing structures. Digital designers will understand these sequential structures can be implemented using Finite State Machines, Counters, and Shift Registers appropriately.
However, using state machines for all sequential and control functionality quickly becomes limiting, as making changes is time consuming and it limits the size of the application. In many applications where higher levels of control and communication are required, a better solution is to use a processor for these sequential structures.
Of course, if you use discrete processor and programmable logic devices this further complicates the circuit card design, as additional design time is required, increasing non-recurring engineering cost while also increasing the cost of the Bill of Materials (BoM). The simplest and cheapest solution is therefore to use a processor internal to the programable logic device.
The choice faced by the engineer then becomes one of using a programmable logic device with a hard silicon processor or implementing a soft IP processor within the programmable logic. Both solutions have their pros and cons, which will depend upon the application requirements and challenges.
2. Objectives
In this Essentials course we are going to examine what hard and soft-core processors are, the different and common development flows, along with identifying different types of processors and their use cases. By the end of the module you should be able to understand:
The differences between hard and soft processors
The benefits and disadvantages of both hard and soft processors
The history of processors in programmable logic
The different types of hard and soft processors available
A typical development flow for hard and soft processors
Multi-processor environments and how you can work in them
The difference between hard and soft processors in programmable logic devices is very distinct. When the processor is implemented as a hard processor, the processor and often supporting infrastructure are fabricated directly in the silicon of the device during manufacture. As such, the actual design and most of the configuration of the hard processor are determined by the programmable device manufacturers. Implementing the processor directly in the silicon offers significant performance benefits and can accelerate development time, but it also comes with some disadvantages, as we will see as this course progresses.
Alternatively, soft processors are implemented using the logic resources available within the programmable logic device. This means there is more freedom to implement the soft processor, even down to which processor is implemented. However, as we will see there are also drawbacks.
If you are not familiar with the history of processing within programmable logic, you may think that it is a relatively recently phenomenon. However, both hard and soft processors have been available within programmable logic since the late 1990s and early 2000s. Early hard-core processors implemented in programmable logic included PowerPC 405 and 440 cores, while soft-core processors include both MicroBlaze and NIOS. As such, engineers have been using both hard and soft processing within logic designs for nearly 20 years, although how they integrate, leverage, and work with them has become significantly easier over the generations.
- 4.1 Comparing Processor Performance
Comparing different processors can be difficult, especially when it comes to comparing like-for-like performance. At a high level we can compare the different peripherals available, power modes, and IO capabilities. When it comes to comparing different processor performances, it is common to use industry standard benchmarks. The two most popular benchmarks classify processor performance by integer operations and floating-point operations able to be performed in a second. These two benchmarks are called Dhrystone Millions of Instructions Per Second (DMIPS) and Floating-Point Operations Per Second (FLOPS).
These benchmarks enable us to compare the processing capabilities of different processors regardless of manufacturer, implemented either as hard or soft processors. For each of the processor cores in this Essentials course we present the DMIPS to enable a like-for-like comparison of performance capability.
- 4.2 Understanding Hard Processors
As outlined above, hard processors are implemented by the programmable logic device manufacturer during the design and manufacture phase of the actual programmable logic device. This creates a new class of device which combines the processor and the programmable logic, called a heterogeneous SoC. Often these heterogeneous SoCs include multiple processor instantiations and can also include multiple different types of hard processor implementations.
Modern programmable logic devices by and large implement high performance Arm processing cores when hard processors are required. The exact Arm processor core implemented varies from device family to family; however, typical processors cores implemented include:
Cortex-A72: 64-bit three way Out of Order Superscalar Application Processor which implements the Armv8-A architecture. Performance wise the Cortex-A72 cores can achieve up to 4.72 DMIPS/ MHz with clock rates up to 2.5 GHz per core.
Cortex-A53: 64-bit Superscalar Application Processor which implements the ARMv8-A architecture. Performance wise the Cortex-A53 cores can achieve up to 2.24 DMIPS / MHz with clock rates up to 1.5 GHz per core.
Cortex-A9: 32-bit Superscalar Application Processor which implements the ARMv7-A architecture. Performance wise Cortex-A9 cores can achieve up to 2.5 DMIPS / MHz with clock rates up to 1 GHz per core.
Cortex-R5: 32-bit processor designed for Real Time Safety Critical Applications which implements the ARMv7-R architecture. Performance-wise the Cortex-R5 cores can achieve up to 1.67 DMIPS /MHz with maximum clock rates up to 600 MHz per core.
As you can see, the maximum clock frequencies and processing capability indicated by the DMIPS/MHz indicate that hard processors can offer very high-performance implementations.
This high-performance capability is necessary when we are working with high level operating systems such as Linux, and frameworks used for machine learning, signal, and image processing.
Along with the performance benefits of hard processor implementations, there are also several other benefits. The most significant of these is the creation of a complete processing system around the implemented cores with Caches, Interrupt Controllers, Memory Controllers for DDR and Non-volatile memories, along with providing a range of interfacing options (e.g., Gigabit Ethernet, SPI, I2C, UART, etc.). This creates a true processing solution in one half of the device and does not use precious logic resources for its implementation. Of course, the device manufacturers also include several high-performance interfaces between the processing system and the programmable logic; this is crucial for accelerating applications in the programmable logic.
Click to enlarge image
Because the processing system looks more like a traditional processing solution to the software development team, the development flow is more aligned with a traditional software development flow.
In fact, when working with a heterogeneous SoC which contains a hard processing system, the programmable logic is a slave peripheral of the processing system, and the boot sequence is exactly like a normal processor. This means that on day one of the project development, the SW team can get started developing the solution, which the programmable logic development progresses in parallel.
As the processing system and programmable logic are distinctly separate systems, they can be treated as being decoupled from each other. This has several advantages, including:
Partial Reconfiguration: The ability to change the entire or partial contents of the programmable logic as the application demands. This enables much easier field updates as standards evolve, or even allows for different programmable logic designs to be loaded at different parts of the application.
Power efficient operation: The processors can be powered down into low power operational modes, while the programmable logic can be powered down. This enables the system to be able to offer solutions which scale power demand with use cases.
Security: The processing system contains all of the necessary infrastructure to provide the confidentiality, integrity, and authentication of the application thanks to AES, SHA, and RSA algorithms.
Safety: The decoupling of the processing system and programmable logic enables safety solutions to be implemented using diverse approaches, which (with careful design) do not contain a single point of failure.
A typical development flow for a hard-macro processor is:
Architecture and sub-system segmentation of functions between the processing system and programmable logic is determined.
The software team starts developing, using software development tools such as Eclipse, and development boards to create the boot, configuration, and the majority of the application. This is possible because the hard processor configuration already exists in the device.
In parallel, the logic design can be conducted; to the software team all elements in the programmable logic which need to be under the software control appear within the device memory map. This memory map can be provided to the software development team by the programmable logic design team once the design is completed. They do not have to wait until the programmable logic design has a bit file which achieves timing closure. This further parallelizes the development process.
Once the programmable logic design meets timing, the programming file can be provided to the software team and integration / debugging of the design can occur.
When applications span across processing system and the programmable logic design, debugging can be a challenge. As such, many heterogeneous system providers offer tool chains which enable cross-triggering between the programmable logic and the processing system. What cross-triggering enables is the ability to set break points in the software and when they are hit to trigger events in the programmable logic. For example, it is possible to trigger an internal logic analyzer to start capturing data when a breakpoint is hit. This enables a systematic view of what is occurring between the processor and the programmable logic when behaviour is not as expected in the design. Of course, it is also possible to go from a trigger in the programmable logic to stopping the software as a breakpoint would to examine the reverse path.
Click to enlarge image
- 4.3 Understanding Soft Processors in Programmable Logic
Soft-core processors, instead of being implemented in the silicon of the programmable logic of the device, are implemented using the look-up tables, Block RAMS, and Flip Flops within the programmable logic device.
While this means that soft-core processors might not be able to achieve the performance of dedicated hard processors, they do have many advantages, including the ability to select the actual processor. This means the engineer can select a processor core available from the programmable logic device manufacturer, or alternatively from different IP vendors, or even open source. Of course, the size of the processor needed for the application can also scale with the demands, providing for a very flexible solution. With soft-core processors it is often very common, therefore, to see several different implementations available depending upon the application need.
The maximum clock frequency of soft-core processors is very dependent upon not only the processor design, but also the programmable logic architecture and the utilization of the programmable logic device. Of course, the logic resources required by the soft processor will also be a determining factor in device selection.
As engineers we have the choice of a range of softcore processors, including:
MicroBlaze: 32-bit Reduced Instruction Set Computer (RISC) offered by Xilinx. MicroBlaze is offered in three configurations: Microcontroller, Real Time and Application, offering 1.1 DMIPS/MHz, 1.3 DMIPS /MHz and 1.4 DMIPS/MHz respectively.
NIOS II: 32-bit RISC processor offered by Intel. NIOS II is offered in three configurations: NIOS II Fast, NIOS II Standard, and NIOS II Economy. The NIOS II Fast offers 0.9 DMIPS / MHz and the NIOS II economy offers 0.1 DMIPS/MHz.
Arm Cortex-M1 & M3 32-bit processors based on the Arm Arch V6 and Arm Arch V7, respectively. The Cortex-M1 offers 0.8 DMIPS/MHz while the Cortex-M3 offers 1.25 DMIPS/MHz
Click to enlarge image
RISC-V is not actually a processor itself, but instead is an Instruction Set Architecture which enables development of open source processors which are compliant with the RISC-V ISA. As such there are several providers of RISC-V cores for implementation in programmable logic, with each implementation providing a different solution. The SiFive E31 RISC-V implementation offers between 2.58 and 1.61 DMIPS/MHz.
While the soft-core processor provides the ability to implement the most efficient solution for the application at hand, there are some implications from using a soft processor; along with the obviously lower performance, we also find that a soft-core processor reduces the number of logic resources available for the logic design itself.
This is due to the need to implement the entire processor support architecture with the programmable logic resources. It is not only the processor core itself which requires logic resources, but on more complex core implementations it is also DDR interfaces, communication peripherals, and interfacing with the programmable logic design which is required. Although for small processor solutions none of this is required, and the program can execute from Block RAM.
Unlike hard processor implementations, soft processors are tightly coupled with the programmable logic, and this brings interesting points:
The programmable logic device is the master; it must be configured first to implement the soft-core processor. Once the programmable logic is configured, the soft core processor can load its boot loader and application SW.
Depending upon the size of the application, the soft-core processors application may be contained entirely within Block RAMS provided by the programmable logic. This removes the need for an external non-volatile memory for the SW application.
As the processor is located within the programmable logic, it is not possible to change the contents of the programmable logic at run time. However, it is possible to use partial reconfiguration and reconfigure regions of the programmable logic as required.
Power Management does not have the ability to power down the programmable logic; however, techniques exist, such as clock gating and switching to slower clock frequencies for the remaining logic elements.
While we cannot easily implement a single device implementation which is free from single points of failure, we can implement triple modular redundancy soft processor implementations with voting and synchronization easily.
A typical development flow for a soft processing system is:
Architecture and sub-system segmentation of functions between the processor and programmable logic design is determined.
Implementation of the processor within the programmable logic. The objective during this stage is to create a soft-core processor connected to the necessary peripherals, and which correctly builds and can be connected to over a debugger. Creation of this processor then enables software development to begin.
Implementation of the software design targeting the processor in the programmable logic.
Implementation of the remaining digital design; this may be done in parallel with the processor creation, depending upon the design.
Integration of the hardware and software design.
As you can see, the development of the processor in the programmable logic can impact the design time, especially if the processor is not a standard one for the flow. Of course, this impact to the development timeline may be mitigated using a development board if one is available for the processor in development.
- 4.4 Should I Use a Hard or Soft Processor?
There is no hard and fast rule when choosing between a hard or soft-core processor; selecting a type of processor depends upon application demands. For example, the need to run a high-level operating system or framework may weight the decision towards one choice or the other.
The table below shows some of the major comparison points between hard and soft processors, which can be used in conjunction with project requirements to help decision-making.
Parameter | Hard processor | Soft processor | Comment |
---|---|---|---|
Performance | High | Medium to low | |
Impact on Logic Resources | Low | Medium to High | Depends on additional supporting components required. |
Customize Processor | Medium | High | Hard Processors have limited configurability |
Security | High | Medium | Programmable Logic based soft implementations can still encrypt the bit stream |
Power Efficiency | High | Medium | |
Portability | Low | High | If open source is used can be very portable |
Ease of Development | Medium | Low | Need to create the processor in the programmable logic first |
Table 1: Comparison of Hard and Soft Processors
- 4.5 Multi-Processor Systems
Of course, I should say at this point that the use of a hard or soft processor is not mutually exclusive when a heterogeneous SoC is used. In this instance, the hard processor can be used along with the implementation of one or more soft cores within the programmable logic. This enables processing to be offloaded from the high-performance application processor to a dedicated processor in the programmable logic. An example of this might be motor control or sensor interfacing using dedicated softcore processors, and the application processor making the high-level analytics and algorithm implementation.
We can also use different soft processor implementations within the same programmable logic design. For example, a medium performance MicroBlaze could be working with an Arm Cortex-M1 which is dedicated to sensor interfacing.
Such solutions require the correct implementation of multi-processor design technique. These techniques include:
Mailbox: Allows bi-directional communication between multiple processors using a First In First Out(FIFO) based approach to messaging.
Mutex: Implement mutual exclusion locks, which allows processors to lock shared resources, preventing multiple accesses at the same time.
An in-depth look at multi-processor communication is an Essentials course on its own; to help enable multi-processor systems there exist frameworks such as OpenAMP.
Both hard- and soft-core processors have their place in designs. It is up to the engineer to determine the best approach per application. Hopefully, having read through this Essentials course you are now familiar with the pros and cons of each type of processor and will be able to start making informed decisions towards the selection of the best processor per use case. You will also understand a little more about multiprocessor systems and how you can effectively communicate in your designs.
*Trademark. Xilinx is a trademark of Xilinx Inc. Other logos, product and/or company names may be trademarks of their respective owners.
Shop our wide range of SoCs, EVMs, application specific kits, embedded development boards, and more.
Test Your KnowledgeBack to Top
Are you ready to demonstrate your Hard and Soft Processors Essentials knowledge? Then take this 15-question quiz to see how much you've learned.
To earn the Essentials Programmable Devices 4 Badge, read through the learning module, attain 100% on the Quiz, leave us some feedback in the comments section, and give the learning module a star rating.