Tipovi procesora - Chapter 2 -

Tipovi procesora- Chapter 2 -

Different classes of processors In order to achieve efficient designs, there exist different classes of processors: • Microcontrollers • RISC processors • Digital Signal Processors (DSP) • Multimedia processors • Application Specific Instruction Set Processors (ASIP) • Other calasses

Microcontrollers classes of embedded processors • relatively slow • very area efficient • intended for control - intensive applications • microprogrammed CISC architecture • the number of clock cycles different for various instructions • limited computational and storage resources • relatively small word length data – path (8 or 16 bits)

Microcontrollers classes of embedded processors – cont. • complex instruction set – provides convenient programming interface, i.e. dense code • control-oriented application domain • reach set of instruction for bit level data manipulation and peripheral components like timers or serial I/O ports • simple processor, such as 8051, 6502 • nowadays reused in customized form as microcontroller for Ess.

Microcontroller – block diagram -

Microcontroller – detailed block diagram -

Timer as constituent of microcontroller

RISC classes of embedded processors • evolved from CISC architectures • Harvard architecture –separated data and instruction memory • pipelined instruction execution • offer only very basic set of instructions • instructions are executed at very high speed • all instructions have the same size, and require the same number of clock cycle for instruction execution

RISC classes of embedded processors - cont. • Load/Store architecture • large number of general purpose registers – reduced number of memory accesses in a machine program • for a fixed application, the code size for a RISC exceed the code size of a CISC • popular members of RISC processors for ESs are ARM RISC core, MIPS RISC core, TRICO • low power consumption (100 mW), suitable for portable systems with battery supply

Clock Frequency Versus Year for Various Representative Machines

Fundamental attributes The key metrics for characterizing a microprocessor include: • performance • power consumption • cost (chip area) • high availability (fault tolerant)

Instruction Level Parallelism – Definition The next step in performance enhancement beyond pipelining calls for executing several instructions in parallel Instruction-Level Parallelism (ILP)is a family of processor and compiler design techniques that speed-up execution by causing individual machine operations, such as memory loads and stores, integer additions, and floating-point multiplications, to execute in parallel.

Parallel processor systems Parallel processor systems tend to take one of two forms: • multiprocessors– relatively large tasks, such as procedures or loop iterations are executed in parallel • instructions level parallel (ILP) processors– execute individual instructions in parallel

ILP processors Processors that exploit ILP have been much more successful than multiprocessors in the general-purpose workstations/PC market because they can provide performance improvements on conventional programs, while this has not been possible on multiprocessors. The two more common architectures for ILP are: • superscalar processors • Very Long Instruction Word (VLIW processor)

The structure of ILP processors In the structure of ILP processor some of the execution units are able to execute integer while the other floating-point operations

What is ILP ? ILP processors exploit the fact that many of the instructions in a sequential program do not depend on the instructions that immediately precede them in the program Let consider the following sequence:

What is ILP ? - continue The dependencies require that instructions 1, 3, and 5 are executed in order to generate the correct result, but instructions 2 and 4 can be executed before, after, or in parallel with any of the other instructions without changing the result of the program fragment.

Division of responsibilities between the compiler and the hardware If ILP is to be achieved, between the compiler and the runtime hardware, the following functions must be performed • the dependencies between operations must be determined • the operations, that are independent of any operation has not as yet completed, must be determined, and • these independent operations must be scheduled to execute at same particular time, on some specific functional unit, and must be assigned a register into which the result may be deposited

Breakdown of tasks between compiler and runtime hardware

Superscalar processors – basic principle Superscalar processors contain hardware that examine a sequential program to locate instructions that can be executed in parallel. This allow them to maintain compatibility between generations and to achieve speedups on programs that were compiled for sequential processors, but were compiled window of instructions that the hardware examines to select instructions that can be executed in parallel, wich can reduce performance. Superscalar processors can achieve speedups when running programs (that were compiled for execution on sequential (non-ILP)) processors without requiring recompilation

Superscalar execution Instead of ‘scalar’ execution where in each cycle only one instruction can be resident in each pipeline stage, ‘superscalar’ execution is used, where two or more instructions can be at the same pipe stage in the same cycle. Superscalar execution allow multiple instructions, that are adjacent in program order, to be in the stage of processing simultaneously Superscalar design require significant replication of resources in order to support fetching, decoding, execution, and writing-back of multiple instructions in every cycle.

General superscalar organization

Superpipelining an alternative approach An alternative approach to achieving greater performance is referred to as ‘superpipelining’ Superpipelining exploits the fact many pipeline stages perform task that require less than half a clock cycle

Superscalar vs Superpipeline

Limitations The superscalar approach depends on the ability to execute multiple instructions in parallel ILP refers to the degree to which, on average, the instructions of a program can be executed in parallel A combination of compiler-based optimization and hardware techniques can be used to maximize ILP.

Fundamental limitations Fundamental limitations to parallelism with which he system must cope are : data dependencies: - true data dependencies - output dependencies - antidependencies procedural dependencies(control dependencies) resource conflicts(structural dependencies)

Effect of dependencies

Data dependencies

Design issues: ILP versus Machine Parallelism ILP and Machine Parallelism (MP)are two related concepts in processor design so it is very important to make a clear distinction between them: ILP exists when instructions in a sequence are independent and thus can be executed in parallel overlapping. ILP is a measure how many instructions can be executed together on an infinitely wide superscalar type machine.

ILP vs Machine Parallelism MP is a measure of the ability of the processor to take advantage of ILP MP is determined by the number of instructions that can be fetched and executed at the same time (the number of parallel pipelines) and by the speed and sophistication of the mechanisms that the processor uses to find independent instructions. Both ILP and MP are important factors in enhancing performance

Example for ILP and MP The code for ( i = 0 ; i < 100 ; i ++) a[i] = a[i] + 1; has considerable amount of parallelism. If we built a machine with 100 functional units and memory ports would give us a 100 x speedup.

Example for ILP and MP - continue In many cases the amount of ILP is simply the ratio of dependencies (data and structural) and control dependencies to other types of instructions. Fewer branches and true data dependencies will increase ILP More functional units will increase MP

Instruction issue and instruction issue policy Machine parallelism is not simply of matter of having multiple instances of each pipeline stage. The processor must also be able to identify ILP and to orchestrate the fetching, decoding and execution of instructions in parallel. The terminstruction issuerefer to the process of initiating instruction execution in the processor’s functional units The terminstruction issue policyrefer to the protocol used to issue instructions

Instruction issue policies Superscalar instruction issue policies can be grouped into the following three categories: • In-order issue with in-order completion • In-order issue with out-of-order completion • Out-of-order issue with out-of-order completion

Instruction issue policy - examples We assume a superscalar pipeline capable of fetching an decoding two instructions at a time, having three separate functional units, and having two instances of the write-back pipeline stage The examples assumes the following constraints on a six-instruction code fragment: • I1 requires two cycles to execute • I3 and I4 conflict for the same functional unit • I5 depends on the value produced by I4 • I5 and I6 conflict for a functional unit

In Order Issue and in Order Completion

In Order Issue Out of Order Completion

Out of Order Issue and Out of Order Completion

Another Example of out-of-order execution

Conceptual Description of Superscalar Processing

Superscalar processor - How execution progresses

Superscalar Internal Structure

Another Superscalar Internal Structure

Instruction Flow, Register and Memory Dataflow

VLIW processors-basic principles VLIW processors architecture requires that programs be recompiled for the new architecture but achieves very good performance on program written in sequential languages such as C or Fortran when these programs are recompiled for a VLIW processor. VLIW is one particular style of processor design that tries to achieve high levels of ILP by executing long instruction words composed of multiple operations. VLIW processors, contrary to superscalar approach, take a differant approach to ILP, relying on the compiler to determine which instructions may be executed in parallel and provide that information to the hardware.

VLIW instruction & VLIWprocessor In VLIW processors, each instruction specifies several independent operations that are executed in parallel by the hardware

Sheduling sequence of operations for execution on a VLIW processor with 3 Execution unit – Example Let consider the following sequence: VLIW scheduling will be:

VLIW – different flavours of parallelism The number of operations in VLIW instructions is equal to the number of execution units in the processor Each operation specifies the instruction that will be executed in the cycle that the VLIW instruction is issued. There is no need for the hardware to examine the instruction stream to determine which instructions may be executed in parallel. The compiler is responsible for ensuring that all of the operations in an instruction can be executed simultaneonsly.

Pros and cons of VLIW – advantages The main advantages of VLIW architectures are: • simpler instruction issue logic, often allow VLIW processors to fit more execution units onto a given amount of chip space (than superscalar processors) • the compiler generally has a larger-scale view of the program than the instruction logic in a superscalar processor and if therefore generally better than the issue logic at finding instructions to execute in parallel

Pros and cons of VLIW – disadvantages The most significant disadvantages of VLIW processors are: VLIW programs only work correctly when executed on a processor with the same number of execution units and the same instruction latencies as the processor they were compiled. Code written for a machine with 4 concurrent integer units could not exploit additional execution units in a later model. Likewise, code optimized for a newer VLIW with 8 concurrent integer units would not function correctly on an older machine with fewer units.

Tipovi procesora - Chapter 2 -

Tipovi procesora - Chapter 2 -

Presentation Transcript

Tipovi instrukcija V poglavlje - Tipovi instrukcija -

Tipovi, operatori i izrazi