Code compression could lead to less overall system die area and therefore less cost. This is significant in the embedded system field where cost is very sensitive. In most of the recent approaches for code compression, only instruction ROM is compressed. Decompression is done between the cache and memory, and instruction cache is kept uncompressed. Additional saving could be achieved if the decompression unit is moved to between CPU and cache, and keeps the instruction cache compressed as well. In this paper, we explored the issues for compressing the instruction cache. In the process, we discuss a high level implementation for a 64-bit, 5-stage pipeline MIPS like processor with compressed instruction cache.
The discussion is carried with the help of a compression algorithm with instruction level random access within the compressed file. In addition we present a branch compensation cache, a small cache mechanism to alleviate the unique branching penalties that branch prediction cannot reduce. Code compression is a general technique. The most commonly used algorithms for block compression are variations of Huffman and arithmetic that compresses serially within the block, byte wise. A feature of Huffman coding is how the variable length codes can be packed together. A more sophisticated version of the Huffman approach is called arithmetic encoding. In this scheme, sequences of characters are represented by individual codes, according to their probability of occurrence. This has the advantage of better data compression, say 5-10%. Run-length encoding followed by either Huffman or arithmetic encoding is also a common strategy.
In this paper we modify the Code Compressed RISC Processor (CCRP) algorithm to compress the instruction cache. The original CCRP algorithm faces the problem of additional stages of DEC and CLB for single 64-bit instruction and granularity of random access. The modified algorithm can solve the above two problems. But branch penalty is another problem encountered by the modified algorithm. In order to avoid this, we suggest a method called Branch Compensation Cache.
A common method for code compression is the Compressed Code RISC Processor (CCRP), which is show in figure
It breaks the program into blocks of n, where n corresponds to the cache line size. Each block is compressed individually by a suitable algorithm. A Line Address Table (LAT) is used to record the address of each compressed cache line within the compressed program. Cache Look aside Buffer (CLB) is added to alleviate LAT lookup cost. During program execution, upon each cache miss, LAT entry cached in CLB is used to calculate the compressed cache line address. The compressed cache line is then fetched from memory, decompressed, and placed into the instruction cache.
The details of rest of the paper is as follows:
- Necessary modifications for the processor
- Compression algorithm
- Branch compensation cache