570 likes | 719 Vues
KeyStone 1 + ARM device memory System. MPBU Application team. Agenda. Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory
E N D
KeyStone 1 + ARM device memory System MPBU Application team
Agenda Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory ARM-DSP communication
Agenda Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory ARM-DSP communication
ARM Coprocessors 64-Bit Cortex-A8 2MB DDR3 EMIF MSM 32KB L1 32KB L1 SRAM RAC Memory P-Cache D-Cache • x2 MSMC Subsystem 256KB L2 Cache TAC Debug & Trace RSA RSA x2 Boot ROM VCP2 • x4 Semaphore C66x™ Power TCP3d CorePac Management • x2 PLL FFTC • x2 32KB L1 32KB L1 x3 P-Cache D-Cache EDMA 1024KB L2 Cache BCP x3 Cores @ 1.0 GHz / 1.2 GHz HyperLink TeraNet Multicore Navigator Queue Packet Manager DMA t x2 x2 x6 6 x4 e h 1 M n I c T C r I Security P h F t e O i 2 S e I R c 2 I S Accelerator I I w F h t M U C A R I i t S E P A w U S E S Packet Accelerator I I M x2 G S Network Coprocessor TCI6614 TCI6614 Functional Architecture
C6616 TeraNet Data Connections TC7 TC4 TC3 TC9 TC2 TC5 TC8 TC6 TC1 TC0 M M M M M M M M M M DebugSS M HyperLink S MSMC DDR3 S M CPUCLK/2 256bit TeraNet S Shared L2 DDR3 HyperLink M M S S S S TPCC 16ch QDMA EDMA_0 • C6616 TeraNet facilitates high Bandwidth communication links between DSP cores, subsystems, peripherals, and memories. • TeraNet supports parallel orthogonal communication links • In order to evaluate the potential communication link throughput, consider the peripheral bit-width and the speed of TeraNet • Please note that while most of the communication links are possible, some of them are not, or are supported by particular Transfer Controllers. Details are provided in the C6616 Data Manual XMC S L2 0-3 M S Core M SRIO M S Core M S Core M M Network Coprocessor M SRIO S TPCC 64ch QDMA S TCP3e_W/R TPCC 64ch QDMA S TCP3d EDMA_1,2 S TCP3d CPUCLK/3 128bit TeraNet S TAC_BE TAC_FE M S RAC_FE RAC_BE0,1 M RAC_FE S RAC_BE0,1 M FFTC / PktDMA M FFTC / PktDMA M S VCP2 (x4) VCP2 (x4) S VCP2 (x4) S AIF / PktDMA M VCP2 (x4) S QM_SS M QMSS S PCIe M S PCIe …
C6614 TeraNet Data Connections TC7 TC9 TC1 TC8 TC2 TC0 TC3 TC5 TC6 TC4 M M M M M M M M M M DebugSS M S HyperLink MSMC S DDR3 M CPUCLK/2 256bit TeraNet 2A Shared L2 S HyperLink M M S S S S TPCC 16ch QDMA EDMA_0 DDR3 XMC x2 XMC ARM S L2 0-3 M S Core M CPUCLK/2 256bit TeraNet 2B SRIO M S Core M S Core M M From ARM Network Coprocessor M ToTeraNet 2B SRIO S TPCC 64ch QDMA S TCP3e_W/R TPCC 64ch QDMA MPU S TCP3d EDMA_1,2 S TCP3d CPUCLK/3 128bit TeraNet 3A DDR3 TAC_BE S TAC_FE M S RAC_FE RAC_BE0,1 M S RAC_FE RAC_BE0,1 M FFTC / PktDMA M FFTC / PktDMA M VCP2 (x4) S S VCP2 (x4) VCP2 (x4) S AIF / PktDMA M VCP2 (x4) S QM_SS M QMSS S PCIe M S PCIe
Agenda Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory ARM-DSP communication
CorePac 0 CorePac 1 CorePac 2 CorePac 3 XMC XMC XMC XMC MPAX MPAX MPAX MPAX 256 256 256 256 CorePac CorePac CorePac CorePac Slave Port Slave Port Slave Port Slave Port MSMC Datapath System Memory Slave Port Protection Teranet Arbitration and for shared Extension SRAM 256 256 Unit ( SMS ) ( MPAX ) Shared RAM , 2048 KB Memory 256 System Slave Port for external memory Protection EDC and Extension 256 Unit 256 ( MPAX ) ( SES ) MSMC Core MSMC EMIF MSMC System Master Port Master Port events 256 256 To SCR_2_B And the DDR – TeraNet MSMC Block Diagram
XMC – External Memory Controller The XMC responsible for: Address extension/translation Memory protection for addresses outside C66x Shared memory access path Cache and pre-fetch support User Control of XMC: MPAX registers – Memory Protection and Extension Registers MAR registers – Memory Attributes Registers Each core has its own set of MPAX and MAR registers!
The MPAX Registers • Translate between physical and logical address • 16 registers (64 bits each) control (up to) 16 memory segments • Each register translates logical memory into physical memory for the segment. • Segment definition in the MPAX registers: • Segment size = 5 bits; power of 2; smallest segment size 4K, up to 4GB • Logical base address (up to 20 bits) is the upper bits of the logical segment base address. The lower N bits are zero where N is determined by the segment size: • For segment size 4K, N = 12 and the base address uses 20 bits. • For segment size 8k, N=13 and the base address uses only 19 bits. • For segment size 1G, N=20 and the base address uses only 2 bits.
The MPAX Registers • Segment definition in the MPAX registers (continue): • Physical (replacement address) base address (up to 24 bits) is the upper bits of the physical (replacement) segment base address. The lower N bits are zero where N is determined by the segment size: • For segment size 4K, N = 12 and the base address uses up to 24 bits. • For segment size 8k, N=13 and the base address uses up to 23 bits. • For segment size 1G, N=20 and the base address uses up to 6 bits. • Permission types allowed in this address range: • Three bits are dedicated for supervisor mode (write, read, execute) • Three bits are dedicated for user mode (write, read, execute)
The MPAX Registers The following table summarizes the names and addresses of the MPAX registers:
The MAR Registers • MAR = Memory Attributes Registers • 256 registers (32 bits each) control 256 memory segment • Each segment size is 4MBytes, from logical address 0x00000000 to address 0xffffffff • The first 16 registers are read only. They control the core’s internal memories. • Each register controls the cache-ability of the segment (bit 0) and the pre-fetch-ability (bit 3). All other bits are reserved and set to 0 • All MAR bits are set to zero after reset
The MAR Registers The following table gives names, segments and addresses some of the MAR registers:
Example 1: Enable L2 Cache for MC Shared MemoryAssumptions • Shared memory (MCMS RAM address 0c0000000 to 0c3f ffff) is L1 cacheable, but not L2 cacheable. • User assumptions: • Make the first 1M of it L2 cacheable (and thus make it L3 memory). • Protect this memory so that user and supervisor can read and write but not execute from this memory • The user must configure the MPAX and the MAR registers.
Example 1: Enable L2 Cache for MC Shared MemoryConfiguring MPAX • Configuring the MPAX register: • Use any MPAX register that is available (e.g., Register 3).. • Configure segment size to be 1M. • Give a different logical address to the first 1Mbytes of shared L2. • The logical address will present a memory that does not exist on the board.For example: If there is 512M bytes of external memory (from address 0xc000 0000 to address 0xdfff ffff), choose the logical address to start at address 0xe000 0000 • The protection bits are 00110110 (two reserved bits, Supervisor read, write, execute, user read, write, execute) • Segment 3 registers are at addresses 0x0800 0018 (low register) and 0x0800 001c (high register). • Segment 3 has the following values: • Size = 1M = 10011b = 0x13 - 5 LSB of low register • 7 bits reserved, written as zeros 0000000b • Logical base address 0x00E00 (12 bits with the 20 zero bits from the size of the logical base address are 0xE0000000). So the low register at address 0x08000018 is:0000 0000 1110 0000 0000 0000 0001 0011 • Physical (replacement) base address 0x000c0 (16 bits, with the 20 bits from the size the physical base address is 0x0c000000). So the high register at address 0x0800001C is:0000 0000 0000 1110 0000 0011 0110
Example 1: Enable L2 Cache for MC Shared MemoryConfiguring MAR • Configuring the MAR register: • The MAR register that corresponds to logical address 0xe000 0000 is MAR 224 at address 0x01848380. • This register controls 4M of memory, from 0xe000 0000 to 0xe0ff ffff – even though only 1M of this memory is mapped into a “real” physical memory. • Assume that the user wants to enable both, the cache and the pre-fetch. So the value of the MAR register is set to:0000 0000 0000 0000 0000 0000 0000 1001
Example 2: Disable L1 Cache from MC Shared Memory • Shared memory (MCMS RAM address 0c0000000 to 0c3f ffff) is L1 cacheable. The coherency is not guaranteed between L1 cache and shared memory. • If the user wants to use the shared memory to communicate between cores, they must manually manage the L1 coherency or disable the “cache-ability” of the shared memory. • This example uses the same MPAX registers as in Example 1. However, the value of the correspondent MAR register (MAR 224 at address 0x01848380 ) is changed to disable cache and pre-fetch. • Thus, the MAR register is set to the value 0x0000 0000.
Example 3: Sharing Very Large DDR for Different Cores • The DDR controller supports up to 8GB of external memory. • Each core logical address is limited to 32 bits, where the external memory starts at address 0x8000 0000. • So the maximum external addressable external memory from each core is 2G. • If the user needs to use more external memory, each core can be provided a separate area in the external memory. For example, four cores can use 8G of memory. • The following example shows how each of the eight cores configures 1G of logical external memory to different parts of the 8G physical external memory. This configuration can be for multi-channel applications where the same code runs on all cores on different channels. • To configure the MPAX register for each core: • Use any MPAX register that is available, say register 1 • Configure segment size to be 1G • The logical address will start at 0x8000 0000 to 0xbfff ffff • The physical address depends on the core number • Assume full permission of the memory (R/W/E)
Example 3: Sharing Very Large DDR for Different Cores • Core 0 physical address will be from address 0x0 0000 0000 to address 0x0 3fff ffff • Core 1 physical address will be from address 0x0 4000 0000 to address 0x0 7fff ffff • Core 2 physical address will be from address 0x0 8000 0000 to address 0x0 bfff ffff • Core 3 physical address will be from address 0x0 C000 0000 to address 0x0 ffff ffff • Core 4 physical address will be from address 0x1 0000 0000 to address 0x1 3fff ffff • Core 5 physical address will be from address 0x1 4000 0000 to address 0x1 7fff ffff • Core 6 physical address will be from address 0x1 8000 0000 to address 0x1 bfff ffff • Core 7 physical address will be from address 0x1 c000 0000 to address 0x1 ffff ffff
Example 3: Sharing Very Large DDR for Different Cores • Segment 1 registers are at addresses 0x0800 0008 (low register) and 0x0800 000c (high register). • Segment 1 has the following values: • Size = 1G = 11101b = 0x1D; 5 LSB of low register • 7 bits reserved, written as zeros 0000000b • Logical base address 0x00002 (2 bits, with the 30 zero bits from the size the logical base address is 0x80000000) • So the low register at address 0x08000008 for ALL the cores is0000 0000 0000 0000 0010 0000 0001 1101 • The higher register is a function of the core number: • Core 0, Physical (replacement) base address 0x00000 (16 bits, with the 30 bits from the size the physical base address is 0x0 0000 0000) • So the high register at address 0x0800001C for Core 0 is:0000 0000 0000 0000 0000 0011 1111
Example 3: Sharing Very Large DDR for Different Cores • Core 1, Physical (replacement) base address 0x00001 (16 bits, with the 30 bits from the size the physical base address is 0x0 4000 0000) • So the high register at address 0x0800001C for Core 1 is0000 0000 0000 0000 0001 0011 1111 • Core 2, Physical (replacement) base address 0x00002 (16 bits, with the 30 bits from the size the physical base address is 0x0 8000 0000) • So the high register at address 0x0800001C for Core 2 is0000 0000 0000 0000 0010 0011 1111 • Core 7, Physical (replacement) base address 0x00007 (16 bits, with the 30 bits from the size the physical base address is 0x1 c000 0000) • So the high register at address 0x0800001C for Core 7 is0000 0000 0000 0000 0111 0011 1111
Using Software to Configure XMC • Verify that the following path exists in your project (if not, add it): • PDK_INSTALL\packages • Where PDK_INSTALL is the path to the directory where the latest PDK was installed. • A typical path looks like: C:\Program Files\Texas Instruments\pdk_C6678_1_0_0_11\packages • Include the CSL Auxiliary include file:#include <ti/csl/csl_cacheAux.h>
Using Software to Configure XMC • Manipulate the MAR registers: • Defined in csl_cacheAux.h • CSL_IDEF_INLINE void CACHE_enableCaching ( Uint8 mar ) • CSL_IDEF_INLINE void CACHE_disableCaching ( Uint8 mar ) • CSL_IDEF_INLINE void CACHE_setMemRegionInfo (Uint8 mar, Uint8 pcx, Uint8 pfx) • Where Mar is 8 bits (0 to 255) number of the MAR register • Interestingly enough, this is the base address shifted 24 places to the right • PCX controls cache-ability • PFX controls pre-fetching • Example 1: Enable cache for DDR3 memory 0x8000 0000 to 0x80ff ffff • #define MAPPED_VIRTUAL_ADDRESS0 0x80000000 • CACHE_enableCaching ((MAPPED_VIRTUAL_ADDRESS0) >> 24); • Example 2: Disable cache for DDR3 memory 0x8100 0000 to 0x81ff ffff • #define MAPPED_VIRTUAL_ADDRESS1 0x81000000 • CACHE_disableCaching ((MAPPED_VIRTUAL_ADDRESS1) >> 24); • Example 3: Disable cache and enable prefetch for DDR3 memory 0x8100 0000 to0x81ff ffff • #define MAPPED_VIRTUAL_ADDRESS1 0x81000000 • CACHE_setMemRegionInfo (((MAPPED_VIRTUAL_ADDRESS1) >> 24,0,1); • Note 1: If CACHE_setMemRegionInfo is used, no need to use CACHE_disableCaching or CACHE_enableCaching • Note 2: Reset values (Mar 15 to 255) pre-fetch enable, cache disabled
Using Software to Configure XMC Manipulate the MPAX registers: • Defined in csl_xmcAux.h CSL_IDEF_INLINE void CSL_XMC_setXMPAXL ( Uint32 index, CSL_XMC_XMPAXHL * mpaxh ) • Where index is one of the MPAX registers, 0 to 15 and CSL_XMC_XMPAXHL is a structure that is defined in the next slide:
Definition:CSL_XMC_XMPAXL • typedef struct CSL_XMC_XMPAXL • { • /** Replacement Address */ • Uint32 rAddr; • /** When set, supervisor may read from segment */ • Uint32 sr; • /** When set, supervisor may write to segment */ • Uint32 sw; • /** When set, supervisor may execute from segment */ • Uint32 sx; • /** When set, user may read from segment */ • Uint32 ur; • /** When set, user may write to segment */ • Uint32 uw; • /** When set, user may execute from segment */ • Uint32 ux; • }CSL_XMC_XMPAXL;
Using Software to Configure XMC Manipulate the MPAX registers:Defined in csl_xmcAux.h CSL_IDEF_INLINE void CSL_XMC_setXMPAXH ( Uint32 index, CSL_XMC_XMPAXH * mpaxh ) Where index is one of the MPAX registers, 0 to 15 and CSL_XMC_XMPAXH is a structure that is defined as follows: typedef struct CSL_XMC_XMPAXH { /** Base Address */ Uint32 bAddr; /** Encoded Segment Size */ Uint8 segSize; }CSL_XMC_XMPAXH;
Implementation of Example 1 using CSL API MPAX registers from the beginning of the presentation: • Use MPAX register 3 • Segment size 1M (0x13 = 10011b) • Logical address 0xe0000000 (0x00e00) • Protection for supervisor and user, read, write, no execution (00110110) • Physical memory starts at 0x0c000000 (0x000c0)
Implementation of Example 1 using CSL API • Load CSl structures (there are APIs to load it with the appropriate values): • struct CSL_XMC_XMPAXL lowerStructure • { • rAddr = 0x00e00 • sr = 1; • sw= 1; • sx = 0 ; • ur = 1; • uw= 1; • ux = 0 ; • }; • struct CSL_XMC_XMPAXH higherStructure • { • bAddr = 0X000C0; • segSize= 0x13 ; • };
Implementation of Example 1 using CSL API • Call CSl functions to set the MPAX registers: • CSL_XMC_setXMPAXH (3, higherStructure) ; • CSL_XMC_setXMPAXL (3, owerStructure) ;
Agenda Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory ARM-DSP communication
ARM subsystem Ports • 32-bit ARM addressing (MMU or Kernel) • 31 bits addressing into the external memory • ARM can address ONLY 2GB of external DDR (No MPAX translation) 0x8000 0000 to 0xffff ffff • The other 31 bits are used to access SOC memories or to address internal memories (ROM)
So what the ARM can see through the VBUS connection? • It can see the QMSS data at address 0x3400 0000 • It can see HyperLink data at address 0x4000 0000 • It can see PCIe data at address 0x6000 0000 • It can see shared L2 at address0x0c00 0000 • It can see EMIF 16 data at address 0x7000 0000 • NAND • NOR • Asynchronous SRAM
ARM access SOC memory • Do you see a problem with HyperLink access? • Addresses in the 0x4 range are part of the internal ARM memory map • What about the cache and data from the Shared Memory and the Async EMIF16? • The next slide presents a page from the device errata
Read the Errata Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Device and Development Support Tool Nomenclature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Package Symbolization and Revision Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Silicon Updates. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Advisory 1— HyperLink Temporary Blocking Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 Advisory 2— BCP DNT Support for HSUPA 10ms TTI With Spreading Factor Two Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Advisory 3— BCP DIO Reading From DDR Memory Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Advisory 4— DDR3 Excessive Refresh Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Advisory 5— TAC P-CCPCH QPSK Symbol Data Mode with STTD Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Advisory 6— SRIO Control Symbols Are Sent More Often Than Required Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Advisory 7— Corruption of Control Characters In SRIO Line Loopback Mode Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Advisory 8— SerDes Transit Signals Pass ESD-CDM up to ±150 V Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 Advisory 9— AIF2 CPRI 8x UL Peak BW Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Advisory 10— AIF2 SERDES Lane Aggregation Issue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Advisory 11— ARM L2 Cache Content Corruption Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 Advisory 12— L2 Cache Corruption During Block and Global Coherence Operations Issue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 Advisory 13— System Reset Operation Disconnects the SoC from CCS Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Advisory 14— Power Domains Hang When Powered Up Simultaneously with RESET (Hard Reset) Issue . . . . . . . . . . . . . . . . . . . . .24 Usage Note 1— TAC DL TPC Timing Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Usage Note 2— Packet DMA Clock-Gating for AIF2 and Packet Accelerator Subsystem Usage Note . . . . . . . . . . . . . . . . . . . . . . . . .26 Usage Note 3— VCP2 Back-to-Back Debug Read Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 Usage Note 4— DDR3 ZQ Calibration Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 Usage Note 5— I2C Bus Hang After Master Reset Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Usage Note 6— MPU Read Permissions for Queue Manager Subsystem Usage Note. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30 Usage Note 7— Queue Proxy Access Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 Usage Note 8— TAC E-AGCH Diversity Mode Usage Note. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 Usage Note 9— Minimizing Main PLL Jitter Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .33 Usage Note 10— MSMC and Async EMIF Accesses from ARM Core Usage Note. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34 Usage Note 11— OTP Efuse Controller Does Not Operate at Full Speed Usage Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
One more comments about the ARM ARM uses only Little Endian DSP can use Little Endian or Big Endian Using Big Endian on the DSP requires a little extra attention to details
Agenda Over View of the 6614 TeraNet Memory System – DSP core point of view Overview of memory map MSMC and external Memory Memory System – ARM point of view Overview of memory map ARM subsystem access to memory ARM-DSP communication
Moving Messages/Data between DSP cores and ARM • Data to exchange can reside in the DDR, shared L2 or others • Only DDR data is cacheable • Send/Receive messages via two one-direction buffers with interrupts or polling • Using the Navigator to communicate. The navigator was designed for such used case • Communication between the ARM and DSP • Standard interface to and from DSP core regardless if the message arrives from another core or from the ARM • Kernel space does physical addressing, User’s space applications call kernel space driver
Requirements • Runs directly on KeyStone Navigator • Shall support communications between Application processes on the same core, different cores, and deferent devices • Note: inter QMSS over Ethernet/SRIO - can be done later • Shall provide the options to minimize either: • Application level latency (from writer’s context PUT to reader’s context GET including message cache operations). The goal is <300cycles for inter core. • Number of interrupt context switching (e.g. through message accumulation) • Shall support Management and Abstraction of hardware resources • SoC resources are managed by distributed resource manager. • Writer/Reader are generally unaware of the details of communication channel that is being set up. No changes in application SW required when underlying plumbing has been replaced (assuming the same blocking/non-blocking method is used). • Shall support both zero copy and CPPI DMA copy (for scattering/gathering and memory management) operations • Shall support both blocking/non-blocking operations • Shall support PDSP-based accumulation/interrupt pacing • Shall support following options for callback-based notification • None (assuming reader will read/poll at it’s convenience) • Implicit (each channel has dedicated non-empty interrupt line - e.g. QPEND) and • Explicit (out of band method, writer explicitly notifies reader that there are messages pending)
Types of Channel communications • Examples of the DMA-Copy constructions • Used for ARM (user’s Space) to Core communication • Examples of the Zero-Copy constructions • Used for Core to Core communication
Case 1 – Generic Channel communicationZero Copy based Constructions Core to Core Note – logical function only READER hCh = Create(“MyCh1”); hCh=Find(“MyCh1”); WRITER MyCh1 Tibuf *msg = PktLibAlloc(hHeap); Tibuf *msg =Get(hCh); Put(hCh,msg); PktLibFree(msg); Delete(hCh); Reader create a channel ahead of time with a given name When writer has information to write it looks for the channel (find) The write asks for buffer and writes the message into the buffer The writer put the buffer. The navigator does it magic When the reader calls get, it gets the message The reader responsibility is to free the message after it is done reading
Case 2 – Low-Latency Channel communicationZero Copy based Constructions Core to Core Note – logical function only READER WRITER hCh = Create(“MyCh2”); MyCh2 Posts internal Sem and/or callback posts MySem; hCh=Find(“MyCh2”); chRx (driver) Get(hCh); or Pend(MySem); Tibuf *msg = PktLibAlloc(hHeap); Put(hCh,msg); PktLibFree(msg); hCh = Create(“MyCh3”); hCh=Find(“MyCh3”); MyCh3 Get(hCh); or Pend(MySem); Tibuf *msg = PktLibAlloc(hHeap); Put(hCh,msg); PktLibFree(msg); Reader create a channel based on one of the pending queues ahead of time with a given name. The reader waits for the message by pending on a (software) semaphore When writer has information to write it looks for the channel (find) The write asks for buffer and writes the message into the buffer The writer put the buffer. The navigator generate an interrupt . The ISR post the semaphore to the correct channel The reader start processing the message Virtual channel structure enables usage of a single interrupt to post semaphore to one of many channels
Case 3 – Reduce context Switching Zero Copy based Constructions Core to Core Note – logical function only READER WRITER hCh = Create(“MyCh4”); MyCh4 Tibuf *msg =Get(hCh); hCh=Find(“MyCh4”); chRx (driver) Tibuf *msg = PktLibAlloc(hHeap); PktLibFree(msg); Put(hCh,msg); Accumulator Delete(hCh); Reader create a channel based on one of the accumulator queues ahead of time with a given name. When writer has information to write it looks for the channel (find) The write asks for buffer and writes the message into the buffer The writer put the buffer. The Navigator adds the message to an accumulator queue When the number of messages reaches a water mark, or after a pre-defined time out, the accumulator sends an interrupt to the core The reader start processing the message and free after it is done