Tel. +31 345 545535

Solutions
Coming soon...

Partners
Who we work with

News
What's going on

Media
Something to read

About
Who we are

Contact
Where we are

Nederlands | French

News

06 - 2022
05 - 2022
04 - 2022
03 - 2022
12 - 2021
11 - 2021
06 - 2021
05 - 2021
04 - 2021
03 - 2021
02 - 2021
01 - 2021
12 - 2020
11 - 2020
10 - 2020
09 - 2020
08 - 2020
07 - 2020
06 - 2020
05 - 2020
04 - 2020
03 - 2020
02 - 2020
12 - 2019
11 - 2019
10 - 2019
09 - 2019
07 - 2019
06 - 2019
05 - 2019
03 - 2019
02 - 2019
11 - 2018
10 - 2018
09 - 2018
07 - 2018
06 - 2018
05 - 2018
04 - 2018
03 - 2018
02 - 2018
01 - 2018
12 - 2017
SMCS: an efficient compression algorithm for microcontrollers
SEGGER introduces emWeb, the portable web server
IAR Systems enables development of comprehensive new IoT module from NXP
Tracealyzer 4: Breaking new ground in RTOS visualization
SEGGER presents RTOS, stacks, middleware for RISC-V
IAR Systems maximizes performance for latest Renesas RX industrial IoT MCUs
SEGGER Embedded Studio and J-Link support Amazon FreeRTOS
11 - 2017
10 - 2017
09 - 2017
08 - 2017
07 - 2017
06 - 2017
05 - 2017
04 - 2017
03 - 2017
02 - 2017
01 - 2017
12 - 2016
11 - 2016
10 - 2016
09 - 2016
08 - 2016
07 - 2016
06 - 2016
05 - 2016
04 - 2016
03 - 2016
02 - 2016
01 - 2016
12 - 2015
11 - 2015
10 - 2015
09 - 2015
08 - 2015
07 - 2015
06 - 2015
05 - 2015
03 - 2015
02 - 2015
01 - 2015
12 - 2014
10 - 2014
09 - 2014
08 - 2014
07 - 2014
06 - 2014
05 - 2014
03 - 2014
02 - 2014
01 - 2014
12 - 2013
11 - 2013
10 - 2013
09 - 2013
08 - 2013
07 - 2013
06 - 2013
05 - 2013
04 - 2013
03 - 2013
02 - 2013
01 - 2013
12 - 2012
11 - 2012
10 - 2012
09 - 2012
08 - 2012
07 - 2012
06 - 2012
05 - 2012
04 - 2012
03 - 2012
02 - 2012
01 - 2012
12 - 2011
11 - 2011
10 - 2011
09 - 2011
08 - 2011
07 - 2011
06 - 2011
05 - 2011
04 - 2011
03 - 2011
02 - 2011
01 - 2011
12 - 2010
11 - 2010
10 - 2010
09 - 2010
08 - 2010
07 - 2010
06 - 2010
05 - 2010
04 - 2010
03 - 2010
02 - 2010
01 - 2010
12 - 2009
11 - 2009
10 - 2009
09 - 2009
08 - 2009
07 - 2009
06 - 2009
05 - 2009
04 - 2009
03 - 2009
02 - 2009
12 - 2008
11 - 2008
08 - 2008
07 - 2008
05 - 2008
04 - 2008
11 - 2007

SMCS: an efficient compression algorithm for microcontrollers

One of the things that is driven from the top in SEGGER is that we can always do better. Not satisfied with standard schemes, we wanted to optimize emCompress, SEGGER’s compression library, for:

Very fast decompression
High compression ratio (uncompressed size divided by compressed size)
Small decompressor
Limited state in RAM when decompressing

With some experimentation, the end result is a coding technique we named SMCS, the Small Microcontroller Compression Scheme (also known as the SEGGER Microcontroller Compression Scheme). SMCS compresses as efficiently as DEFLATE, uses only a few words of decompressor state, and has an ultra-small, ultra-fast, ultra-simple decompressor.

Why do this?

In discussing the linker and the future direction of our emCompress product, we examined the possibility of improving emCompress and, using that improvement, fold it into the SEGGER Linker. The company ethos is that we develop and use our own software, and therefore it is natural to use emCompress components as a way of compressing data sections held in flash that must be copied to RAM on startup. When we improve emCompress, we also improve the SEGGER Linker, which is a win all round!

LZSS, the ubiquitous algorithm

Standard compression schemes, such as DEFLATE, LZMA, LZJU90, and LZ4, use an LZSS-style kernel to parse the input string into a stream of literals and matches, a match being a back reference to a substring that can be subsituted at the cursor position. Effective compression relies on being able to encode the literals and references in fewer bits than the string being replaced.

Once literals and matches are identified, the encoder takes each and emits them to the bitstream. Typical coding is a leading flag bit, identifying a literal or a match, followed by an 8-bit literal or a (distance, length) pair referencing previous content behind the cursor.

The main differences between all these algorithms is the way that the literal and (distance, length) pairs are encoded and constraints placed on the distance and length of the match because of this encoding.

Encoding with standard algorithms…explained

LZSS-based encoders try to reduce the coding of literal, distance, and length using an additional compression scheme:
Plain binary encoding does nothing to reduce literals, distance, and length, relying entirely on the string parser to detect redundancies.
DEFLATE uses two separate dynamic Huffman codes, one to encode literals and lengths and the other to encode distances.
LZJU90 uses two bit-oriented prefix encodings for distance and length and leaves literals uncompressed.
LZ4 uses a byte-oriented prefix encoding for distance and length and leaves literals uncompressed.
LZMA uses a single range coder to encode literals, distances, and lengths, dynamically adapting its probability model as it encodes.

Let’s examine these for suitability.

Plain binary coding

Fixed-width binary coding of the distance and length does not produce an outstanding result, but it is fast and requires no additional state memory when decompressing. We can discount plain binary encoding as it does not meet the “good compression” requirement.

DEFLATE

DEFLATE offers good compression, and decompression runs in a few kilobytes of RAM. Unfortunately it’s quite complex to decompress using two Huffman codes that are themselves encoded with a canonical Huffman code. It’s a clever solution to compres match distance and length, and it even compresses literals, which LZ4 and LZJU90 do not.

DEFLATE is specified with a minimum match length of three bytes. It stands to reason that you can find many more matches with a length of two than with a length of three. By experimentation, using a nonstandard minimum match length of two rather than three results in worse compression, so the default is the best choice.

We can discount DEFLATE as a candidate as it is complex to decode (around 800 lines of C code) and even 2K of memory is more than you would want to dedicate on a small microcontroller at startup or during application decompression.

LZMA

LZMA offers excellent compression, without question. It has a minimum match length of two and encodes everything very efficiently using a range coder and with some items encoded in a bit tree. Being able to compress efficiently requires very large tables, from 6 kilobytes to 6 megabytes. And unfortunately, the decoder needs to maintain an identical model to the encoder, so decoding also needs from 6 kilobytes to 6 megabytes.

Not only is it RAM-hungry, the decoder is also very complex and is unsuited to assembly coding: the LZMA decoder in emCompress is approximately 1,000 lines of code excluding common emCompress library code.

The complexity of the decoder alone is enough for immediate rejection. This just isn’t going to fly on a small microcontroller as it’s too complex too RAM-hungry.

What it does do is demonstrate that a minimal match of two bytes with efficient encoding does a much better job of compression than a minimal match of three bytes.

LZJU90

This scheme uses prefix coding of length and distance leaving literals uncompressed. The prefix coding schemes for distance and length are different, tuned to the restricted set of match lengths (3 to 256) and distances (1 to 32255).

The restricted match lengths and distance, imposed by the prefix encoding, limit the effectiveness of the format. For instance, consider two identical 128K files concatentaed and then compressed: it is impossible for LZJU90 to encode a 128K match distance and 128K match length to describe that the second 128K of the 256K combined file is identical to the first 128K. We would like to allow this capability…so unbounded match lengths and distances are a win.

Although LZJU90 is simple and therefor small, and requires no complex decoder state, the compression is simply not as efficient as DEFLATE, which is one of our goals. We must, therefore, reject LZJU90 as a candidate–but it does provide a basis for a new scheme.

LZ4

LZ4 is optimized for speed and sacrifices its ability to compress well: its minimum match length is 4 and so it has no possibility of producing well-compressed images. It does not require additional complex decode state and has a simple decoder. But for us, LZ4’s byte-prefix encoding scheme can be rejected because it does not meet the “excellent compression” requirement.

Surely there must be a modern encoder that uses the LZSS scheme but achieves really good compression with a near-zero decode footprint?

SMCS, a new encoding technique

From the above we can characterize what is required, and what must be rejected, when constructing an effective compression scheme:
It’s much more common to find short matches than long matches, and LZMA demonstrates excellent compression with a minimum match length of two bytes. Therefore, we require a minimum match length of two bytes, not three or four, to find as many potential matches as possible, and to encode short matches efficiently.
No complex range coder or Huffman codes as these introduce a large RAM state into the decoder along with additional decoder complexity and therefore a larger ROM footprint.
It must be possible to encode unbounded match lengths and distances to copy both far-flung matches and very long matches.
The prefix coding schemes of LZJU90 offer fairly good compression of lengths and distances, but the limited prefix coding is not suitable for two-byte minimum lengths and unbounded length and distance values.
Compressing literals should not be necessary for random binary data, such as machine code, and we assume that each byte of literal data has an equal probability of occuring. This is an assumption, but a valid one.

We decided to use a prefix encoding, inspired by LZJU90, in our SMCS algorithm. Match lengths of 2 to 4 bytes, which are very common, are specificially coded with high density (just two bits), with longer lengths using a modified prefix encoding where the “continue/end” flag is infix rather than prefix and is always present (unlike LZJU90). The distance encoding has a similar style of prefix coding optimized for these values.

Using a set of standard benchmark inputs, and some drawn from typical linked object code in the SEGGER Linker, we settled on a scheme that delivers excellent results. Here’s a sample from the SEGGER Linker, when compressing approximately 50K of code and data that must run in RAM:

LZMA:    30,704 bytes, 60.4% of original
DEFLATE: 32,592 bytes, 64.1% of original
SMCS:    32,878 bytes, 64.7% of original
LZJU90: 34,793 bytes, 68.4% of original
LZ4:     38,175 bytes, 75.1% of original

Although this is a single sample, it is representative of the effectiveness of each of the schemes. Of course, some compression algorithms will do very well on degenrate data whilst others will fare less well. However, SMCS achieves an overall compression comparable to DEFLATE whilst at the same time being much simpler and requiring a much smaller decoder state. SMCS and DEFLATE trade places depending upon input data, but are usually within 1% of each other on identical data sets, with 0.5% being common.

Whilst SMCS is not as efficient as LZMA, it is an order of magnitude less complex, measured by source code lines, and does not require decode state measured in multi-kilobytes or even megabytes for efficient compression.

When time permits, and with the final release of SMCS made, a follow-up post will detail the relative decompressor performance and object code sizes of the various algorithms.

What’s next for SMCS?

There is a possibility to improve the compression effectiveness of SMCS further by tuning the string parser that finds matches, although it is still fast and efficient using some coding heuristics. Without significant effort, it is not possible to find an optimal encoding efficiently as the bit length of the (distance, length) pair is value dependent, not constant.

There are a few other tweaks to both the string parser, the compressor, and the encoded format that we haven’t described here, nothing revolutionary: but small efficiencies add up to a significant result.

Conclusion

SMCS compresses well, decodes quickly, requires only a few words of decoder state held in machine registers, and is small. It has compression that is better than LZJU90 and LZ4 and is comparable with DEFLATE. The decoder is far simpler than these formats delivering an ultra-small ROM footprint.

SMCS is ideal for minimal-footprint decompression on the target and will be added to both the SEGGER linker and startup code in Embedded Studio for decompression of constants and code copied to RAM during startup and to emCompress, our general purpose compression software for embedded systems and desktop computers.

For more information regarding Segger please contact Glenn Moloney at Indes - IDS

28-12-2017

New products

IAR Embedded Workbench for RISC-V

...

IoT Edge Controller

...

CompCert - formally verified optimizing C compiler

...

emWin (Universal GUI and Graphics Software)

...