Reduced Instruction Set Computer (RISC)

RISC Definition

Reduced Instruction Set Computer (RISC), which if translated means "Simplified Instruction Set Computing", is a computer architecture or modern computing architecture with the simplest instructions and types of execution.

This architecture is used in high-performance computers, such as vector computers. In addition to being used in vector computers, this design is also implemented in other computer processors, such as in several Intel 960 microprocessors, Itanium (IA64) from Intel Corporation, Alpha AXP from DEC, R4x00 from MIPS Corporation, PowerPC and POWER Architecture from International Business Machine. In addition, RISC is also commonly used in Advanced RISC Machine (ARM) and StrongARM (including Intel XScale), SPARC and UltraSPARC from Sun Microsystems, and PA-RISC from Hewlett-Packard. In addition to RISC, another Central Processing Unit design is CISC (Complex Instruction Set Computing), which if translated into Indonesian means Complex or Complicated Instruction Set Computing.

RISC History

RISC was first proposed by John Cocke, a researcher from IBM in Yorktown, New York in 1974 when he proved that about 20% of instructions on a processor actually handle about 80% of its total work. The first computer to use the RISC concept was the IBM PC/XT in the 1980s. The term RISC itself was first popularized by David Patterson, a lecturer at the University of California at Berkeley.

RISC Processor Development Phase

The basic idea of the RISC processor can actually be traced back to what Von Neumann suggested in 1946. Von Neumann suggested that electronic circuits for logic concepts be implemented only when they are needed to complete the system to function or because the frequency of use is quite high (Heudin, 1992: 18). So the idea of RISC, which is basically to simplify the realization of processor hardware by delegating most of the tasks to its software, has existed in the first electronic computers. Like RISC processors, the first electronic computers were direct-execution computers that had simple and easy-to-decode instructions.

The same thing is also believed by Seymour Cray, a supercomputer specialist. In 1975, based on his studies, Seymour Cray concluded that the use of registers as a place for data manipulation caused the instruction design to be very simple. At that time other processor designers made more instructions that referred to memory rather than to registers like Seymour Cray's design. Until the late 1980s, Seymour Cray's computers, in the form of the Cray series supercomputers, were computers with very high performance.

In 1975, a research group at IBM led by George Radin, began designing a computer based on John Cocke's concept. Based on John Cocke's suggestion, after studying the frequency of utilization of instructions from the compilation of a program, to obtain a high-performance processor, it is not necessary to implement complex instructions into the processor if the instructions can be made from simple instructions that it already has. This IBM group produced the 801 computer that used fixed-format instructions and could be executed in one clock cycle (Robinson, 1987: 143). The 801 computer, which was made with ECL (emitter-coupled logic) technology, 32 registers, separate caches for memory and instructions, was completed in 1979. Due to its experimental nature, this computer was not sold on the market.

Berkeley RISC Processor

David Patterson's group from the University of California started the RISC project in 1980 with the aim of avoiding the trend of designing processors with increasingly complex instruction sets that require increasingly complex control circuit designs over time. The hypothesis proposed was that the implementation of complex instructions into the processor's instruction set actually had a negative impact on the use of these instructions in most complicated programs (Heudin, 1992: 22). Moreover, complex instructions can basically be composed of simple instructions that are already available.
The RISC-1 processor design was intended to support the C language, which was chosen because of its popularity and many users.

The design realization was completed by Patterson's group within 6 months. Fabrication was carried out by MOVIS and XEROX using 2 micron NMOS (N-channel Metal-oxide Semiconductor) silicon technology. The result was an integrated circuit chip with 44,500 transistors (Heudin, 1992: 230). The RISC-1 chip was completed in the summer with an execution speed of 2 microseconds per instruction (at a clock frequency of 1.5 MHz), 4 times slower than the targeted speed. The failure to achieve the target was due to a slight design error, although it could later be overcome by modifying the assembler design.

Based on the evaluation results, even though it only worked at a clock frequency of 1.5 MHz and contained design errors, RISC-1 was proven to be able to execute C language programs faster than several CISC processors, namely the MC68000, Z8002, VAX-11/780, and PDP-11/70.

Almost simultaneously with the RISC-1 fabrication process, another Berkeley team began work on designing RISC-2. The resulting chip was error-free and achieved the targeted operating speed of 330 nanoseconds per instruction (Heudin, 1992: 27-28).

RISC-2 requires only 25% of the chip area of RISC-1 with 75% more registers. Although the instruction set is the same as the instruction set of RISC-1, there are differences in the hardware microarchitecture. RISC-2 has 138 registers arranged as 8 register windows, compared to 78 registers arranged as 6 register windows. There are also differences in the organization of the pipeline. RISC-1 has a simple two-level pipeline with overlapping instruction fetch and execution, while RISC-2 has 3 pipelines, one each for instruction fetch, operand reading and execution, and writing the result back to a register.

The success of both projects spurred the Berkeley team to work on the SOAR (Smalltalk on RISC) project, which began in 1983. The goal of this project was to answer the question of whether RISC architectures work well with the Smalltalk programming language. So the SOAR project was the first attempt to use a RISC approach to symbolic processing.

The first version of the SOAR microprocessor was implemented using 4-micron NMOS technology. The resulting chip had 35,700 transistors and worked at a speed of 300 nanoseconds per instruction. The second version designed in 1984-1985 used CMOS (Complementary Metal-oxide Semiconductor) technology. Several RISC architecture processors were heavily influenced by the SOAR microprocessor design, such as the SPARC microprocessor (from Sun Microsystems Inc.) and the KIM20 designed by the French Department of Defense.

Following the SOAR project, the Berkeley group then worked on the SPUR (Symbolic Processing Using RISC) project which began in 1985. The SPUR project aimed to design a multiprocessor workstation as part of research on parallel processing (Robinson, 1987: 145). In addition, the SPUR project also conducted research on integrated circuits, computer architecture, operating systems, and programming languages. The SPUR processor system was built with 6-12 high-performance processors connected to each other, and connected to memory and input/output devices via a modified Nubus. System performance was improved by adding a 128-kilobyte cache to each processor to reduce data traffic density on the bus and to make memory access more effective (Heudin, 1992: 31).

Stanford RISC Processor

While the RISC-1 and RISC-2 projects were carried out by Patterson's group at the University of California, in 1981 John Hennessy from Stanford University worked on the MIPS (Microprocessor without Interlocked Pipeline Stages) project. Research experience on compiler optimization combined with RISC hardware technology was the main key to the MIPS project. The main goal was to produce a 32-bit general-purpose microprocessor chip designed to efficiently execute compiled code (Heudin, 1992: 34).

The MIPS processor instruction set consists of 31 instructions divided into 4 groups, namely the fill and store instruction group, the arithmetic and logic operation instruction group, the controller instruction group, and the miscellaneous instruction group. MIPS uses five levels of pipelines without interlocking hardware between the pipelines, so that the executed code must be completely free from conflicts between pipelines.
Realized with 2 micron NMOS technology, the MIPS processor which has 24,000 transistors has the ability to execute one instruction every 500 nanoseconds. Because it uses five levels of pipelines, the MIPS processor control section takes up twice the chip area compared to the control section on a RISC processor.

MIPS has 16 registers compared to 138 on RISC-2. This is not a significant problem because MIPS was designed to offload hardware complexity into software, resulting in much simpler and more efficient hardware. Simple hardware reduces design, implementation, and error-repair time.
The success of MIPS was continued by the Stanford team by designing a more sophisticated microprocessor, the MIPS-X. Design was carried out by the previous MIPS research team plus 6 students, and began in the summer of 1984. The MIPS-X design was greatly improved upon by MIPS and RISC-2 with several key differences:

All MIPS-X instructions are single operations and execute in one clock cycle.
All MIPS-X instructions have a fixed format with an instruction length of 32-bits.
MIPS-X features efficient and simple coprocessor support
MIPS-X includes support for use as a base processor in shared-memory multiprocessor systems.
MIPS-X features a fairly large on-chip instruction cache (2 kilobytes)
MIPS-X is fabricated with 2 micron CMOS technology.

Like MIPS, MIPS-X is a processor with a pipeline without hardware interlocks. Its software is designed to follow the instruction timing so that there are no conflicts between pipelines (Heudin, 1992: 36-37).

The first chip produced worked well at 16 MHz, lower than the target of 20 MHz, due to imperfect branch instructions. A 25 MHz version was built using 1.6 micron CMOS technology. Coupled with a cache integrated into the processor chip, MIPS-X contained nearly 150,000 transistors on a chip measuring 8 x 8.5 mm (Heudin, 1992: 38).

Characteristics of RISC Processors

In fact, RISC processors do not only have few and simple instructions as their name suggests but also include many other characteristics that are not all agreed upon by the designers themselves. However, many have agreed that processors have certain characteristics to distinguish them from non-RISC processors.

RISC processors execute instructions in every one clock cycle (Robinson, 1987: 144; Johnson, 1987: 153). IBM (International Business Machine) research results show that the frequency of use of complex instructions from compilation is very small compared to simple instructions. With good design, simple instructions can be made to be executed in one clock cycle. This does not mean that RISC processors automatically execute programs faster than CISC processors.
Instructions on RISC processors have a fixed format, so the instruction controller circuit becomes simpler and this means saving the use of semiconductor chip area. If CISC processors (for example Motorola 68000 or Zilog Z8000) utilize 50% - 60% of the semiconductor chip area for their controller circuits, RISC processors only need 6% - 10%. Instruction execution becomes faster because the circuit becomes simpler (Robinson, 1987: 144; Jonhson 1987: 153).
Instructions related to memory are only load and store instructions, other instructions are performed in the processor's internal registers. This method simplifies the addressing mode and makes it easier to repeat instructions for special conditions that are desired (Robinson, 1987: 144; Jonhson, 1987: 153). With this, designers also emphasize the implementation of more registers in the processor chip. In RISC processors, 100 or more registers are common. Data manipulation that occurs in registers that are generally faster than in memory causes RISC processors to potentially operate faster.
RISC processors require longer compilation times than CISC processors. Because of the limited choice of instructions and addressing modes that RISC processors have, it is necessary to optimize the compiler design so that it can compile a sequence of simple instructions efficiently and in accordance with the chosen programming language. The relationship between RISC processor design and programming language allows the design of a compiler that is optimized for the target language.
Single-sized instructions.
A common size is 4 bytes.
The number of data addresses is small, usually less than 5.
There is no indirect addressing that requires performing a memory access to obtain the address of another operand in memory.
There are no operations that combine load/store operations with arithmetic operations, such as addition to memory and addition from memory.
There can be no more than one memory-addressable operand per instruction.
Does not support arbitrary alignment of data for load/store operations.
The maximum amount of management memory usage for a data address is one instruction.
The number of bits for the integer register specifier is 5 or more, meaning that at least 32 integer registers can be explicitly referenced at once.
The number of floating point register specifier bits is equal to 4 or more, meaning that at least 16 floating point registers can be explicitly referenced at once; Some processor implementations of RISC architecture are AMD29000, MIPS R2000, SPARC, MC 88000, HP PA, IBM RT/TC, IBMRS/6000, Intel i860, Motorola 88000 (Motorola family), PowerPC G5.

Instruction Execution Characteristics

One of the major computer evolutions is the evolution of programming languages. Programming languages allow programmers to express algorithms more concisely, pay more attention to details, and support the use of structured programming, but it turns out that another problem arises, namely the semantic gap, namely the difference between the operations provided by the HLL (High Level Language) and those provided by the computer architecture, this is characterized by inefficient execution, large machine programs, and compiler complexity. To reduce this gap, designers answered it with architecture. Its features include large instruction sets, dozens of addressing modes, and HLL (High Level Language) statements implemented in hardware. These complex instruction sets are intended to:

Makes the compiler's job easier.
Improves execution efficiency, because complex operations can be implemented within microcode.
Provides support for more complex and sophisticated HLLs.

Therefore, to understand RISC, it is necessary to pay attention to the characteristics of instruction execution. The computational aspects are:

The Assignment Statement is very prominent stating that simple transfer is an important one. The results of this study are important for machine instruction set designers because they indicate which types of instructions occur frequently and must be optimally supported.
Operands Paterson's research has looked at [PATT82a] the dynamic frequency of occurrence of variable classes. Consistent results between Pascal and C programs show that the majority of references point to scalar variables. This research has tested the dynamic behavior of HLL programs that are independent of a particular architecture. The [LUND77] study examined DEC-10 instructions and dynamically found that each instruction references an average of 0.5 operands in memory and an average of 1.4 register references. Of course these numbers depend on the architecture and compiler but are sufficient to explain the frequency of operand accesses and thus indicate the importance of an architecture.
Procedure Calls In HLL procedure call and return are important aspects because they are time-consuming operations in a compiled program so it is useful to pay attention to how to implement these operations efficiently. The important aspects are the number of parameters and variables related to the procedure and the depth of nesting.
Implications In general, research states that there are three elements that determine the character of RISC architecture:

The use of a large number of registers is intended to optimize operand referencing;
Attention is required to the design of the instruction pipeline because of the high proportion of conditional branching instructions and procedure calls, making a direct and concise instruction pipeline inefficient;
There is a simplified instruction set.

RISC Pipelining

Pipelining is a method used to perform a number of tasks simultaneously but in different stages that are continuously streamed to the processing unit. In this way, the processing unit is always working.

This pipeline technique can be applied at various levels in a computer system. It can be at a high level, such as an application program, to a low level, such as the instructions run by a microprocessor.

In microprocessors that do not use pipelines, one instruction is executed until it is finished, then the next instruction can be executed. While in microprocessors that use pipeline techniques, when one instruction is being processed, the next instruction can also be processed at the same time. However, these instructions that are processed simultaneously are in different stages of the process. So, there are a number of stages that an instruction will go through.

By implementing this pipeline on the microprocessor, the performance of the microprocessor will be increased. This happens because several instructions can be executed in parallel at the same time.

Roughly, it is expected to get an increase of K times compared to a microprocessor that does not use a pipeline, if the stages in one instruction processing are K stages.
Because several instructions are processed simultaneously, it is possible that the instructions require the same resources, so that proper settings are needed so that the process continues to run correctly and smoothly. Meanwhile, data dependencies can arise, for example, sequential instructions require data from previous instructions. The Jump case also needs attention, because when an instruction asks to jump to a certain memory location, there will be a change in the program counter, while instructions that are in one of the next process stages may not expect a change in the program counter.

The pipeline technique applied to the microprocessor can be said to be a special architecture. There is a special difference between the microprocessor model that does not use the pipeline architecture and the microprocessor that applies this technique.

In microprocessors that do not use pipelines, one instruction is executed until it is finished, then the next instruction can be executed. While in microprocessors that use pipeline techniques when one instruction is being processed, the next instruction can also be processed at the same time. However, these instructions that are processed simultaneously are in different stages of the process.

Pipelining Vector Procedure:

Fetch instructions and buffer them.
When the second stage is free the first stage sends the buffered instructions.
While the second stage is executing an instruction, the first stage uses unused memory cycles to fetch and buffer the next instruction.
Three difficulties that are often encountered when using pipeline techniques: Concurrent use of resources.
Dependence on data.
Jump to a memory location setting.

Difference between RISC and CISC

CISC (complex instructionset computers) is to execute a command with as few lines of machine language as possible. This is achieved by making the processor hardware capable of understanding and executing several sequences of operations. For the purposes of our example, a CISC processor is equipped with a special instruction, which we call MULT.

RISC (reduced instruction set computers) use only simple instructions that can be executed in a single cycle. Thus, the 'MULT' instruction as explained earlier is divided into three different instructions, namely "LOAD", which is used to move data from memory into a register, "PROD", which is used to perform a product (multiplication) operation of two operands that are in a register (not in memory) and "STORE", which is used to move data from a register back into memory.

A simple way to see the strengths and weaknesses of the RISC (Reduced Instruction Set Computers) architecture is to directly compare it with its predecessor, the CISC (Complex Instruction Set Computers) architecture.

CISC Approach

The main goal of the CISC architecture is to execute a single command in as few lines of machine language as possible. This is achieved by making the processor hardware capable of understanding and executing multiple sequences of operations. For the purposes of our example, a CISC processor is equipped with a special instruction, which we will call MULT. When executed, the instruction will read two values and store them in different registers, multiply the operands in the execution unit, and then return the result to the correct register. So there is only one instruction.

MULT 2:3, 5:2

MULT in this case is better known as "complex instruction", or complex instruction. Works directly through computer memory and does not require other instructions such as read or save functions. One advantage of this system is that the compiler only translates high-level language instructions into a machine language. Because the length of the instruction code is relatively short, only a small amount of RAM is used to store these instructions.

RISC approach

RISC processors use only simple instructions that can be executed in one
cycle. Thus, the 'MULT' instruction as explained earlier is divided into three different instructions, namely "LOAD", which is used to move data from memory into a register, "PROD", which is used to perform a product (multiplication) operation of two operands that are in a register (not in memory) and "STORE", which is used to move data from a register back to memory. The following is the sequence of instructions that must be executed so that what happens is the same as the "MULT" instruction on a RISC processor (in 4 lines of machine language):

LOAD A, 2:3
LOAD B, 5:2
PROD A, B
STORE 2:3, A

At first it seems inefficient. This is because the more lines of instructions, the more RAM locations are needed to store those instructions. The compiler also has to convert from a high-level language to the 4-line instruction code.

CISC

Emphasis on hardware
Including complex multi-clock instructions
Memory-to-memory: “LOAD” and “STORE” work together
Small code size, low speed
Transistors are used to store complex instructions.
Lots of watts
2-5 GHz
Mapped
PC based selection via BIOS
Need Fans, FCC/CE approval an issue
Like PC
Load and Go

RISC

Emphasis on software
Single-clock, only a small number of instructions
Register to register: “LOAD” and “STORE” are separate instructions.
Large code size, (relatively) high speed
Transistors are widely used for memory registers.
A few hundred milliwatts
200-520 MHz
Direct, 32 bit
Custom
High Temp, Low EM Emissions
Custom, efficient and very fast
Difficult, requires low level BSP.

Some examples of RISC and CISC are as follows: