Goals

Goals • Provide an overview of the 8260 device • Allow a quick start of an 8260 design cycle • Gain familiarity with debug issues particular to the 8260 • Create the basis to build further experience

Outline • 8260 Architecture • Application examples • Debug considerations

Outline • 8260 Architecture • Device overview • Core CPU • SIU • CPM

EC603e PowerPC Core 16 KB I-Cache SYSTEM INTERFACE UNIT IMMU 60x Bus Interface Unit 16 KB D-Cache PowerPC-to-Local Bridge DMMU Local Bus Interface Unit Memory Controller Time Counter/PIT COMM. PROCESSOR MODULE Bus Arbiter L2 Cache Controller Four Timers Interrupt Controller Internal Memory Space Serial DMAs Virtual IDMAs System Functions Parallel I/O 32-bit RISC and Program ROM Baud Rate Generators Timers MCC1 MCC2 FCC1 FCC2 FCC3 SCC1 SCC2 SCC3 SCC4 SMC1 SMC2 SPI I2C Serial Interface Time Slot Assigner 8 TDMs MII 2 UTOPIA

CPU • Based on the MPC603e core • Up to two instructions fetched per clock • Up to three instructions issued and retired per clock • Up to five instructions in execution per clock • Most instructions execute in one clock • Branches can execute in zero clocks

Programming Model 32 bits 64 bits GPR0 FPR0 CR GPR1 FPR1 XER GPR2 FPR2 FPSCR GPR3 FPR3 GPR4 FPR4 MSR PVR GPR30 FPR30 GPR31 FPR31 CTR LR TBU TBL SRR0 SRR1 DEC SPRn SPRx

MSR Bit 0 is MSB Bit 31 is LSB 0 0 0 0 0 0 0 0 0 0 0 0 0 POW 0 ILE EE PR FP ME FE0 SE BE FE1 0 IP IR DR 0 0 RI LE Power management enabled Interrupt little endian mode External interrupt enable Privilege level Floating point available Machine check enable Floating point exception mode [0,1] Single step trace enabled Branch trace enabled Exception [interrupt] prefix Instruction address translation enabled Data address translation enabled Recoverable exception Little endian mode

Floating Point Unit Integer Unit FPR File FPR0-FPR31 / + * / + * XER GPR File FP Rename Regs R0-R31 GP Rename Regs Completion Unit CPU Overview Inst. Cache Branch Processing Sequential Fetcher Inst. MMU System Register Unit CTR CR Instruction Queue LR Instruction Unit Dispatch Load/Store Unit Data MMU Main Memory Data Cache

Execution Units • Execution units operate in parallel • Fetch / Branch • Integer • Floating Point • Load / Store • System • Completion

Fetch / Dispatch • Instructions are fetched in pairs • Non-branch instructions enter the instruction queue • Branch instructions are redirected to the branch unit • Two instructions can be sent to the execution units and one to the branch unit for a total of three issued instructions per clock • All instructions “appear” to execute sequentially

Instruction Instruction Branch Processing Instruction CTR Instruction Instruction CR Instruction LR Instruction On each CPU clock: 64 bit wide transfer from instruction cache Instruction Cache Instructions fall through to first open location in queue Branch instruction closest to the bottom of the queue is issued to the branch unit on each clock Instruction Bottom two non-branch instructions are dispatched to available execution units Execution Unit Execution Unit Instruction Instruction

Branch • Branches are pre-executed, giving an effective execution time of zero clocks • Instruction queue provides look ahead to determine data dependencies • Unresolved conditional branches are statically predicted under control of the compiler

Subroutine Control Flow Software maintained stack Address of this instruction is placed into the Link Register by the branch function GPR1 Branch to sub LR Instructions save the LR to the stack to allow nested function calls The LR is reused for another call Branch to sub LR Branch to LR The LR is recalled from the stack to allow a return from subroutine Branching to the contents of the LR is a return instruction

Integer • Integer unit directly accesses the GPR file • Rename registers prevent stalls and allow instructions to be un-executed • Most instructions execute in one clock • Divides have been optimized over the 603 to reduce latency by 50%

Floating Point • Floating point unit directly accesses the FPR file • Rename registers prevent stalls and allow instructions to be un-executed (The same as in the integer GPR file) • Supports single (32 bit) and double (64 bit) precision operands • Three stage pipeline accepts one instruction per clock • Supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity) in hardware, eliminating the latency incurred by software exception routines

Load/Store • Responsible for all transfers between the GPR file and main memory • Instructions appear to execute in order • Actual accesses can occur out of order • Loads from cache execute in one clock with a two clock latency • Stores to cache execute in one clock with a latency of three clocks • Speculative loads are placed in the rename registers • Speculative stores remain in the store queue

System • Performs moves to and from SPR’s • Doubles as an auxiliary integer unit • Executes add / compare instructions • Executes condition register logical operations • Instructions that affect processor mode force serialization of the processor

Completion • Holds instructions executed in parallel or out of order until they can be retired in order • Retiring an instruction commits it’s results to the processor state • Simply discarding an instruction from the completion queue effectively un-executes it • Two instructions can be retired per clock

Instruction Set • 68K instructions were based on an accumulator, direct memory model add (0x00035300).L, D4 D0 0x00035300 D1 D2 D3 D4 D5 D6 + D7

Instruction Set • PowerPC instructions are based on a triadic, load/store model lwz r2,0x00035300 add r6,r2,r4 GPR0 0x00035300 GPR1 GPR2 GPR3 GPR4 GPR5 GPR6 + GPR7 GPR31

Exceptions • All exceptions cause processing to vector to a predetermined memory location • The base address of the vector table is controlled by the [IP] bit in the MSR • Each vector is placed at a page boundary • 64 instructions can be placed at a vector before hitting the next vector • Reset = 0xnnn00100 • Machine Check = 0xnnn00200 • External Interrupt = 0xnnn00500 • Decrementer = 0xnnn00900 • Etc.

Exceptions Flash MSR[IP] = 1 FFF00100 Instruction 64 instructions External 500 Instruction Instruction 64 instructions 400 Instruction ISI Instruction 64 instructions DSI 300 Instruction Instruction 64 instructions RAM MSR[IP] = 0 Machine Check 200 Instruction Instruction 64 instructions 00000100 Reset 100 Instruction

Exceptions • Only the Decrementer and the External Interrupt can be masked by the [EE] bit in the MSR • Machine Check exceptions can vector to a routine or force Checkstop state • All other exceptions are synchronous (caused by instruction execution) and are unmaskable

Nesting Exceptions • When an exception occurs, return state is stored in the processor • There is no automated stacking of critical registers • The address of the return instruction is stored in SRR0 • The MSR prior to the exception is in SRR1 • The [EE] bit of the MSR is cleared • The processor must save these registers and any other GPR’s to a software maintained stack • The EABI specifies GPR1 to be the stack pointer • The [RI] bit in the MSR is set by software when enough information is saved to allow recovery from a nested exception

Exception Control Flow An exception after the completion of Address of this instruction is placed into SRR0 by the hardware this instruction Software maintained stack causes flow to be directed to the GPR1 ISR SRR0 SRR1 Instructions save the SRR’s to the stack to allow nested exceptions The MSR[RI] bit is cleared by the exception hardware and set by software after the SRR’s have been saved It is safe for exceptions to occur in this section of code An exception while MSR[RI] is cleared causes a machine check event Breakpoints Are Exceptions! The SRR’s is recalled from the stack to allow a return from subroutine The MSR[RI] bit is cleared by the software just before the SRR’s are restored by the software rfi

Cache • Independent instruction and data caches implements an internal Harvard Architecture • Each cache is 16Kbyte, four way set associative • Caching of separate memory areas is controlled by the MMU

State State State State Words 0-7 Words 0-7 Words 0-7 Words 0-7 Way 0 Way 0 Address Tag 0 Block 508 Block 0 Address Tag 1 Way 1 Way 1 Block 509 Block 1 Address Tag 2 Block 510 Block 2 Way 2 Way 2 Address Tag 3 Block 511 Block 3 Way 3 Way 3 Cache Organization 31 0 Stored in address tag (20) Set select (7) Word Byte 128 sets

Cache Operation • Each cache block (or line) can be in one of three state (MEI protocol) • M = modified (or dirty) • Resides in cache and is different than memory • E = exclusive (resident and clean) • Resides in cache and is identical to memory • I = invalid (not resident) • The “shared” state of the full MESI protocol is not supported • Would allow synchronization of multiply cached blocks • There is no cache coherency for the instruction cache

Cache control • Hardware implementation dependent registers (HIDn) control cache function • Enabling • Invalidate • Locking • Supervisor instructions provide block level control • Allocate, flush, invalidate, store, touch, zero • Ability to store a given block of memory into the cache is controlled by the MMU • Each block or page in the MMU has WIMG bits • (Write-through, Inhibited, Global, Guarded)

MMU • The MMU provides for both memory translation and access control • The system boots in Real (un-translated) mode • To effectively use the caches, the MMU must be used in block or page mode • Effectively, a null translation is performed

Protection • The primary use of the MMU in embedded applications is for cache control and access protection • The WIMG bits are set for each page • W = write-through (applicable only to data cache) • I = inhibited • M = memory coherency supported in hardware • G = guarded (indicates that memory is ill-behaved) • I/O spaces • All accesses are forced to be in order • No speculative reads or pre-fetches

Translation • Block or page translation allows the full use of a virtual memory model • Block translation provides a memory space of 232 bytes • Page translation provides a virtual memory space of 252 bytes • System must be debugged with RTOS tools • Emulators and hardware debuggers don’t support it

Real mode 32 Logical address WIMG: W = 0: write-back I = 0: cache enable M = 1: data is global G = 1: memory is guarded 32 Physical address

4 11 17 BL (11) BEPI (15) 4 11 BRPN 4 11 17 BAT mode Logical address BAT Reg n & WIMG + Physical address

Page mode Logical address 4 16 12 Segment register Virtual address 24 16 12 40 TLB page table WIMG 20 12 Physical address

Reset operation

Reset Types • Power-on reset is used to align all logic from a chaotic state after Vcc stabilizes • The PLL then begins to lock • Hard reset is analogous to the normal reset on other processors • The PLL is not affected • Soft reset can be used to initiate a warm start • Not commonly used • Not driven or monitored by the emulator • Basically, a non-returnable exception to the reset vector

Reset Sequence POR asserted HRESET asserted SREST asserted HREST & SREST asserted HREST & SREST asserted SREST asserted PLL locks RSTCONF sampled RSTCONF sampled Internal logic reset Internal logic reset Internal logic reset HREST & SRESET negated HREST & SRESET negated SRESET negated

Memory Map Startup Boot Map Application Target Map Before Config Word After Config Word Flash Flash Flash CS0 At boot, CS0 is active for one of two large areas of the address space. All other chip selects are invalid. Flash Flash Flash IMMR IMMR IMMR CSi I/O Flash Flash Flash RAM CSx,y,z Flash Flash

Memory Map Implications • Since the Flash memory access by CS0 occupies one of two large areas in the address space, boot code can be linked to execute in a number of different locations • Any branches will change the NIA from the boot location to the linked location • All other chip selects are off • IMMR RAM is still available • CS0 must be reduced in scope before activating other chip selects • Be careful no to pull the rug out from under the boot code when reducing CS0 • BSP re-entry issues: • Altering chip select option registers while assuming the value in the Valid bit • Can the chip selects to the RAM and Flash be altered while running out of either?

Memory Map Init Issues • Three different factors can enhance (confuse) the boot process: • The MSR[IP] • The reset vector can be 0x0000_0100 or 0xfff0_0100 • Determined by the Reset Configuration Word • Not changed by an SRESET • CS0 scope • CS0 responds to either a the upper or lower end of the memory map • It must be changed while it is being used • It may have already been reduced by a previous pass through the BSP • Code link results • Execution can start in code that is linked to a different address than the boot vector • Only the address lines within the memory device are significant • PC Relative addressing will solve this, right? WRONG! • The first branch, will set the NIA MSB’s to the current execution value

RTOS Boot Sequences Flash External application image Compressed application image Boot Code BSP Boot code loads application over communication channel or backplane Boot code decompresses and relocates application from flash IMMR Data, stack, heap, etc. I/O Chip Select x Uncompressed application image Base Register RAM Base Address V Option Register BSP Mask Options

Endian Bus Connections 8 Bit 68K 31 7 MS Byte Lane 24 0 7 LS Byte Lane 0 X86 31 MS Byte Lane 24 8 Bit 7 7 LS Byte Lane 0 0 8 Bit PPC 0 7 MS Byte Lane 7 0 24 LS Byte Lane 31

Big Endian Bus 8 Bit 16 Bit 32 Bit 0-7 0-7 8-15 0-7 8-15 16-23 24-31 7-0 15-8 7-0 31-24 23-16 15-8 7-0 8260 64 Bit 0 63 MS Byte Lane 56 7 8 55 Byte Lane 48 15 16 47 Byte Lane 40 23 24 39 Byte Lane 32 31 32 31 Byte Lane 24 39 40 23 Byte Lane 47 16 48 15 Byte Lane 8 55 56 7 LS Byte Lane 0 63

Configuration Word • Configuration word is latched from Flash memory during reset cycle • A 32 bit value is loaded 8 bits at a time from the high order bits of the data bus • Immune to boot memory width • RSTCONF pin allows configuration word to be forced to all zero • Multiple 8260 can access the same memory device

Configuration Word Contents EARB EXMC CDIS EBM BPS CIP ISPS L2CPC DPPC - ISB • EARB – External arbitration • EXMC – External memory controller • CDIS - Core disable • EBM - External bus mode • BPS – Boot port size • CIP – Core initial prefix • ISPS – Internal space port size • L2CPC – L2 cache control pins • DPPC – Data parity pin configuration • ISB – Internal space base address BMS BBD MMR LBPC APPC CS10PC - MODCK_H • BMS – Boot memory space • BBD – Busy bus disable • MMR – Mask Masters request • LBPC – Local bus pin configuration • APPC – Address parity pin configuration • CS10PC – CS10 pin configuration • MODCK_H – MODCK high order bits

Configuration Word Format 8 bit wide boot device 32 bit wide boot device Address offset from CS0 603 bus MSB byte lane (0-7) Address offset from CS0 603 bus MSB byte lane (0-7) 603 bus byte lane (24-31) 0x00 Byte 0 0x00 Byte 0 Ignored Ignored Ignored 0x01 Ignored 0x04 Ignored Ignored Ignored Ignored 0x08 Byte 1 Ignored Ignored Ignored 0x08 Byte 1 0x0C Ignored Ignored Ignored Ignored 0x09 Ignored 0x10 Byte 2 Ignored Ignored Ignored 0x14 Ignored Ignored Ignored Ignored 0x10 Byte 2 0x18 Byte 3 Ignored Ignored Ignored 0x11 Ignored 0x1C Ignored Ignored Ignored Ignored 0x18 Byte 3 0x19 Ignored

Configuring a single 8260 8260 A bus D bus Vcc RSTCONF 8260 A bus Boot Flash D bus RSTCONF

Configuring multiple 8260’s Master 8260 A bus Boot Flash D bus RSTCONF 8260 Slave 1 A bus D bus A0 RSTCONF 8260 Slave 7 A bus D bus A6 RSTCONF

SIU • The SIU contains the logic to interface the external system components to the 8260 • Contains all of the glue logic needed for a typical embedded application

Goals

Goals

Presentation Transcript

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals :

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals

Goals :

Goals