Processor Architecture

Processor Architecture Chapter 4

Better utilization of Memory • There are different concepts with reference to memory organization. We have already discussed: • Cache Memory • Virtual Memory • But for the better utilization of our hardware unit, we have more concepts

Parts of the Execution • Process is an isolated entity that's managed by the operating system • Task is a part of a job - sometimes the only part. • A job is some script that's executed to do a specific set of task(s). • A task represents the execution of a single process or multiple processes on a compute node • A thread is the smallest unit of execution that lies within the process and share resources • So : Job -> one or more task -> one or more process -> one or more threads Note : Threads can themselves split themselves into two or more simultaneously running tasks. Introduction

More Concepts Threads share the resources whereas, processes do not share resources. • Context switching • This is the act of switching a processor from executing one process to executing another. This is called a context switch because the state of each process is called its context. A context switch is performed in a kernel by a routine that is called a dispatcher or scheduler. • A context switch happens when the OS code (running preemptively) alters the state of the processor (the registers, mode, and stack) between one process or thread's context and another. The state of the processor may be at a certain line of code in a one thread. It will have temporary data in registers, a stack pointer at a certain region of memory, and other state information. A preemptive OS can store this state (either to static memory or onto the processes' stack) and load the state of a previous process. Introduction

Pipelining vs Parallel processing • In both, multiple “things” processed by multiple “functional units” • Pipelining : a process is broken into sequence of processes, and each process is handled by different functional unit • Parallel processing : Each process is processed entirely by a single functional unit • Example : graphics, sound etc can be all done separately and run at the same time Introduction

Parallel Processing • Multiple outputs are computed in parallel in a clock period • The effective sampling speed is increased by the level of parallelism • Can also be used to reduce the power consumption • Two types of parallelism: • Instruction level – this may be implemented through the program code and is called as parallel programming or concurrent programming (in OS it is Fork-Joind operations) • Thread level – two or more complete processors, fabricated on the same silicon chip (Multi-core processors), which can execute functions from two or more programs/thread at a same time • Ex: Intel Core duo – has2 x 86 processors on same chip • XBox360 has 3 PowerPC cores • PS3 : has 9 cores – I general purpose, 8 special purpose SIMD (single instruction Multiple Data processors) Introduction

Data parallelism • Single Instruction Single Data • Single Instruction Multiple Data Introduction

Pipelining • It is the concept of improving the performance of the system by letting the different stages of the system operate concurrently. • A key feature of pipelining is that it increases the throughput of the system, that is, the number of customers served per unit time, but it may also slightly increase the latency, that is, the time required to service an individual customer. • For example, a customer in a cafeteria who only wants a salad could pass through a non-pipelined system very quickly, stopping only at the salad stage. A customer in a pipelined system who attempts to go directly to the salad stage risks incurring the wrath of other customers.

Pipelining • In the traditional system, • A circuit is designed, each task will work on the circuit and then the new task would start from beginning. In contemporary logic design, we measure circuit delays in units of picoseconds (abbreviated “ps”), or 10−12 seconds. • Here, we assume the combinational logic requires 300 picoseconds, while the loading of the register requires 20 ps. The time flows from left to right. A series of instructions (here named I1, I2, and I3) are written from top to bottom

Throughput • Throughput is calculated as: • The total time required to perform a single instruction from beginning to end is known as the latency. Here, the latency is 320 ps, the reciprocal of the throughput. Introduction

Pipeline • Let us divide the computation performed by our system into three stages, A, B, and C, where each requires 100 ps • put pipeline registers between the stages so that each instruction moves through the system in three steps, requiring three complete clock cycles from beginning to end. • I2 is allowed to enter stage A as soon as I1 moves from A to B, and so on. In steady state, all three stages would be active, with one instruction leaving and a new one entering the system every clock cycle. We can see this during the third clock cycle in the pipeline diagram where I1 is in stage C, I2 is in stage B, and I3 is in stage A.

The increased latency is due to the time overhead of the added pipeline registers. • In this system, we could cycle the clocks every 100 + 20 = 120 picoseconds, giving a throughput of around 8.33 GIPS. Since processing a single instruction requires 3 clock cycles, the latency of this pipeline is 3 × 120 = 360 ps. We have increased the throughput of the system by a factor of 8.33/3.12 = 2.67 at the expense of some added hardware and a slight increase in the latency (360/320 = 1.12). Introduction

Steps of pipelining To introduce pipelining in a processor P, the following steps must be followed: • Sub-divide the input process into a sequence of subtasks. These subtasks will make stages of pipeline, which are also known as segments. • Each stage Si of the pipeline according to the subtask will perform some operation on a distinct set of operands. • When stage Si has completed its operation, results are passed to the next stage Si+1 for the next operation. • The stage Si receives a new set of input from previous stage Si-1 Introduction

Pipelining Processor • A pipeline processor can be defined as a processor that consists of a sequence of processing circuits called segments and a stream of operands (data) is passed through the pipeline. In each segment partial processing of the data streamis performed and the final output is received when the stream has passed through the whole pipeline. • An operation that can be decomposed into a sequence of well-defined sub tasks is realized through the pipelining concept Introduction

Problems in pipelining • Data dependency between successive tasks: It becomes difficult if there is dependencies between the instructions of two tasks used in the pipeline. i.e. one instruction cannot be started until the previous instruction returns the results, as both are interdependent. Another problem may be when that both instructions try to modify the same data object. These are called data hazards. • Resource Constraints: When resources are not available at the time of execution then delays are caused in pipelining. For example, if one common memory is used for both data and instructions and there is need to read/write and fetch the instruction at the same time then only one can be carried out and the other has to wait. • Branch Instructions and Interrupts in the program: the branch instructions alter the normal flow of program, which delays the pipelining execution and affects the performance. Similarly, there are interrupts that postpones the execution of next instruction until the interrupt has been serviced. Branches and the interrupts have damaging effects on the pipelining Introduction

Interrupts • An interrupt is a signal to the processor emitted by hardware or software indicating an event that needs immediate attention. • It can be of two types: • Hardware interrupt : an I/O device like keyboard, mouse that asks the processor to read the position • Software Interrupt : These may occur because of the events occurring during program instruction. There may be different interrupts that may be triggered in assembly language • Exceptions : These are exceptional conditions that may occur in the code while executing and hinder in the normal functioning of the code. Introduction

Processor Architecture