1 / 52

APIC Tutorial --- Architecture and Hardware John DeHart Washington University jdd@arl.wustl

APIC Tutorial --- Architecture and Hardware John DeHart Washington University jdd@arl.wustl.edu http://www.arl.wustl.edu/~jdd. Coverage. APIC is a complicated device No way we can cover everything today.

kmatson
Télécharger la présentation

APIC Tutorial --- Architecture and Hardware John DeHart Washington University jdd@arl.wustl

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. APIC Tutorial --- Architecture and Hardware John DeHart Washington University jdd@arl.wustl.edu http://www.arl.wustl.edu/~jdd

  2. Coverage • APIC is a complicated device • No way we can cover everything today. • in the original workshop we spent one whole day on the APIC architecture and hardware and a second day on the software • Lots more details in Zubin’s slides from the original workshop: • http://www.arl.wustl.edu/gigabitkits/kits.html • go to “Course Slides & Papers” in left margin • Also, papers and documentation from web site.

  3. Our Original Goals for the APIC • Build a high speed ATM host interface • Single Chip • Low cost • High Bandwidth • Gigabit all the way to the application • Low Latency • Zero copy • Support for Quality of Service

  4. APIC Features Overview • 32 bit and 64 bit PCI at 33MHz • All of our cards are 32 bit. • Point-to-Point, Multipoint and Loopback VCs • AAL5 Segmentation and Reassembly • AAL0: Raw ATM (RATM) • Support for multiple traffic types • Batching of cells in PCI Transaction • Control via PCI bus and remotely via control cells • Multiple DMA modes • Interrupts and Notification List for efficient interrupt handling • Flow Control: UTOPIA and ATM GFC field

  5. Data Paths Control Paths APIC Internal Design Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  6. C A F D B E APIC Internal Design: 6 Clock Regions Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 A,B,C,D: Link Clocks (typically 62.5 MHz) E: Bus Clock (PCI: 33 MHz) F: Internal Clock (85 MHz) Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  7. Data Paths Control Paths APIC Transit Path: ATM Port  ATM Port Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  8. Data Paths Control Paths APIC Receive Path: ATM Port  Memory Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  9. Data Paths Control Paths APIC Transmit Path: Memory  ATM Port Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  10. Data Paths Control Paths APIC Multipoint Receive Path: ATM Port  * Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  11. Data Paths Control Paths APIC Multipoint Transmit Path: Memory  * Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  12. Data Paths Control Paths APIC Loopback Path: Memory  Memory Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  13. Data Paths Control Paths APIC Multipoint Loopback Path: Memory  * Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  14. Data Paths Control Paths APIC Control and Response Cell Path Input Port Input Sync VC Trans- lation Table (VCXT) Output Sync Output Port Port 0 Port 0 . . . Utopia Ports Utopia Ports Cell Store Input Port Input Sync Output Sync Output Port Port 1 Port 1 Port 2 Port 2 Tx Sync Rx Sync Requestor Pacer Register Manager DataPath BusInterface Interrupt/ Notification Manager PCI-32/64 Bus

  15. APIC and AALs • AAL5 • Frames up to 65535 bytes. • Used for IP Packets • Format on next slide • AAL0 • Host can send and receive individual ATM Cells • Used for: • communication with raw ATM devices • sending specially formatted control cells • APIC uses 56 byte cell format shown on a future slide.

  16. Multiple of 48 Bytes 1 to 65535 bytes 0 to 47 1 1 2 4 AAL5 Frame Padding CRC Packet data Length Bytes User-to-User Reserved Length AAL5 Frames

  17. APIC ATM pIn: Port In pOut: Port Out C: Control Cell L: Low Delay ChanId: Channel Id APIC AAL0 Header 31 24 16 8 0 pIn pOut C L ChanId AAL0 Frames One Cell 56 Bytes Internally, 56 bytes. When it goes out onto the ATM Link, of course it is 53 bytes AAL0 Frame 4 4 48

  18. APIC Traffic Types • Transmit • Low Delay • highest priority • transmitted at link rate (APIC Global Pacing Rate) • Paced • transmitted at rate configured for channel • rates independently configurable for each channel • Best Effort • lowest priority • can use whatever bandwidth is left after low delay and paced channels • Receive • Low Delay • Strictly higher priority then Normal Delay • Normal Delay • Only serviced when all Low Delay queues are empty

  19. ... Empty Buffer Partially Filled Buffer Full Buffers APIC Descriptors and Buffers Current Descriptor • Buffer Descriptor points to a buffer queued for sending data from or receiving data into • Buffer Descriptor contains: • Address of buffer • physical address: PCI bus operates on physical not virtual memory • Buffer Length • Link to next descriptor • Flags

  20. Buffer Details • Receive Buffers: • 8-byte aligned and a multiple of 8 bytes in length • CAVEAT: RX Sync Bug • AAL0 buffers should be multiple of 56 bytes in length • AAL5 buffers should be multiple of 48 bytes in length • Single AAL5 frame can span multiple buffers • No buffer can contain data from more then one AAL5 frame • EndOfFrame bit (E) set in buffer containing the last 8 bytes of the AAL5 frame. • with caveat above, this expands to be the last cell of the AAL5 frame • Multiple AAL0 frames can occupy the same buffer • Single AAL0 frame can span multiple buffers • BUT because of caveat above, this won’t happen. • Buffers for AAL0 will be completely filled

  21. Buffer Details • Transmit Buffers: • Need not be aligned on word boundaries • But our drivers always do… • Can be of any length • Single AAL5 frame can span multiple buffers • No buffer can contain data from more than one AAL5 frame • EndOfFrame bit (E) set in buffer containing first byte of the last cell for the AAL5 frame. • Multiple AAL0 frames can occupy the same buffer • A single AAL0 frame can span multiple buffers • All buffers will be completely transmitted unless there is an error

  22. Descriptor Details • All descriptors MUST reside in a block of contiguous physical memory, 1MB or less • All descriptors MUST be 16-byte aligned • APIC global register, descriptor area pointer register, must contain the address of this block of memory • Think of the descriptor area as an array of descriptors • nextDescOfs field in the descriptors is an index into the descriptor array • 16 bit index  65536 descriptors possible • 65536 descriptors * 16 bytes per descriptor = 1MB

  23. Match/TCP_Checksum V I S O E C L X T Y BufLen NextDescOfs BufAddrLo (physical address) BufAddrHi (physical address) APIC Receive Descriptor • We’ll look at the Y field … • For more details, see Zubin’s original workshop slides

  24. Match TCRC V I S O E T Y BufLen NextDescOfs BufAddrLo (physical address) BufAddrHi (physical address) APIC Transmit Descriptor • We’ll look at the Y field next … • For more details, see Zubin’s original workshop slides

  25. Sync Bits (Y Field) of APIC Descriptor • Sync (Y) Bits: Implement Ready/Done • 0  DONE_VALIDLINK • APIC is finished with this descriptor and its link to the next descriptor is valid • 1  DONE_INVALIDLINK • APIC is done with this descriptor BUT its link to the next descriptor is not valid! • Be Careful of this one • 2  NOT_READY • Not ready for the APIC to use • The last descriptor in a chain is always marked NOT_READY by the driver • 3  READY • Ready for the APIC to use • Set in Receive Descriptors in a chain for APIC to use • Set in Transmit Descriptors that are ready for the APIC to send

  26. APIC DMA Modes • Simple DMA • Separate queue of buffer descriptors for each connection • works well for transmit • Inefficient for receive • no sharing of receive buffers and descriptors • Pool DMA • multiple connections share a pool of buffer descriptors • works well for receive • caveat: one connection can use up all the buffer descriptors • obviously, does not work for transmit • Protected DMA • queueing operations executed by user-space driver • pair of descriptors associated with each buffer: • kernel descriptor • user descriptor • See details in Zubin’s original workshop slides.

  27. Simple DMA

  28. Pool DMA

  29. APIC Interrupts and Notifications • Interrupts used to report an asynchronous event: • completion of transmission/reception of a frame • error condition • Interrupts can be enabled/disabled per channel • Notification List contains list of channels that have had events. • APIC issues an interrupt and disables further interrupts until processor re-enables. • subsequent events will just set an entry in notification list. • This reduces frequency of interrupts • This can also help reduce overhead of interrupt processing.

  30. APIC Memory Mapped Register Space

  31. Global Registers (i.e. not per channel): 2 14 9 2 00 00000000000000 RegID 00 APIC Register Addresses • 27 bit address space • On PCI Bus, high order 5 bits are device select • These are programmed into the APIC PCI Configuration space at boot time by the BIOS

  32. APIC Register Addresses (continued) Kernel Access Per-channel Registers: 2 8 8 9 2 10 t CID 00000000 RegID 00 User Access Per-channel Registers: 2 8 8 9 2 11 t CID 00000000 RegID 00 t=0  Rx Channel, t=1  Tx Channel CID: Channel Index or VCI

  33. APIC Pacing: General Stuff • Pacing is for Transmit Channels only • Cells are NOT Paced out onto the wire • Not Exactly • Pacing is done on the PCI bus • Pacing is not a Guarantee, it is just a Restriction • Pacing Calculations include the ATM headers • But not the APIC header

  34. APIC Pacing: General Stuff • Two pacer controls: • Global Pacing • APIC Pacing Parameter register (Global, 0x208) • Per VC Pacing • TX Channel Pacing Parameter Register (TX, 0x500XX68) • XX is the Channel ID • Three types of Channels: • Low Delay (Highest Priority) • Paced • Best Effort (Lowest Priority) • All channels are paced by the Global Pacing • Paced Channels also use Per VC Pacing

  35. APIC Data Transfers • APIC pulls data from memory across the PCI bus in Batches of cells. • The number of cells in a Batch is controlled by a register • The Pacer identifies when it is time to transmit data and which connection should transmit • Pacer “wakes up” every 14 PCI Bus clock ticks • checks to see if it is time to transmit • Controlled by the Global APIC Pacing Parameter (APP) • If it is time to transmit, it takes the first connection off the previously sorted list of keys and transmits its data. • A lot of gory details about keys and heap storage of connections is not going to be included here. Read Rex’s documentation and/or read the VHDL if you want that level of detail

  36. Global Pacing Parameter • Pacing parameters are 24 bits • 16 bits of Integer • 8 bits of fractional part • Global Apic Pacing Parameter (APP) (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = -------------------------------------------------------- (14 * ClockEstimate * LinkRateMbps) [Items in formula explained on next slide]

  37. Explanation of Expression (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = -------------------------------------------------------- (14 * ClockEstimate * LinkRateMbps) • 256 : shifts left by 8 bits to set “decimal point” • BatchSz: How many cells per transfer • 53*8: Translate cells/second into bits/second • 8192, InternalClockMhz (85MHz), ClockEstimate • APIC counts how many of its internal 85MHz clock ticks take place during the time it takes for 8192 PCI bus clock ticks. This value is the ClockEstimate. • PCI Bus Clock Rate in MHz = (8192 * 85)/ClockEstimate • 14: # of PCI Bus Ticks in a Pacer Period • LinkRateMbps: Our target rate [Example on next 2 slides]

  38. Example: Units in the APP Formula (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = -------------------------------------------------------- (14 * ClockEstimate * LinkRateMbps) (256 * Cells * Bytes/Cell * Bits/Byte * 8192 * M/sec) APP = -------------------------------------------------------- (14 * 1 * MBits/sec)

  39. Example: APP for 1Gb/s Link Rate (256 * BatchSz * 53 * 8 * 8192 *InteralClockMhz) APP = -------------------------------------------------------- (14 * ClockEstimate * LinkRateMbps) • BatchSz=8 • 53*8: Translate cells/second into bits/second • InternalClockMhz = 85MHz • ClockEstimate = 20954 (typical value) • LinkRateMbps: 1000 (1000 Mb/s == 1Gb/s) (256 * 8 * 53 * 8 * 8192 * 85) APP = ---------------------------------------- = 2061.15 (14 * 20954 * 1000) APP = 2061 = 0x80D

  40. Example: APP for 1Gb/s Link Rate APP = 2061 = 0x80D This means that every 14*8 = 112 PCI Bus clock ticks the APIC will be able to pull 8 Cells worth of data across the PCI Bus. (8 Cells)/(112 * 30ns) = (3392 bits)/(3360ns) ~= 1Gb/s

  41. 33 MHz PCI Bus Clock Count to 14 Count to APP Count to TX Channel Pacing Parameter This Tx Channel is Ready to Transmit BATCH Cells Per VC Pacing • Per VC Pacing Parameter • What portion of the full link rate can be used • e.g. an integer value of 2 means that this channel can use half the link rate • Conceptually like this:

  42. Per VC Pacing vcPacingParameter ~ 10 One APIC Pacing Period current pacedTime Expired connections X X X X X X X time oldExpirationTime + vcPacingParameter  newExpirationTime

  43. pacedTime • pacedTime is incremented every global pacing cycle in which a non-LowDelay connection wins contention • Example with two connections: • (L) Low Delay at 1/24th of the global rate • (P) Paced at 1/6th of the global rate (.1666667) L L L L P P P P P P P P P P P P P P P 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84

  44. L L L L P P P P P P P P P P P P P P P 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 pacedTime (continued) • We might expect the Paced channel to miss its exact turn and fire on the next global pacing interval but keep it next expiration on the (0,6,12,18,…) boundaries. • But…

  45. L L L L 0 0 6 5 12 11 18 17 22 24 28 30 36 34 42 40 48 45 51 54 60 57 63 66 68 72 78 74 80 84 pacedTime (continued) • Actual rate for Paced connection: • (GlobalRate) * (3*(1/6) + 1*(1/7))/4 • (GlobalRate) * (.1607) • For a Global Rate of 24Mb/s (DQ test example) • 24 * .1607 = 3.8568 P P P P P P P P P P P P P P P t+ pacedTime t+ “Real” time

  46. Example of a Pacing Oddity • Suppose we have a channel on which we are sending single cell packets at a rate of 2 cells every pacing period for that channel and the BATCH size is 1 cell so that the channel should only send 1 cell during each pacing period. D D D D D D D • You would expect the connection to build up a backlog, but it doesn’t……

  47. Example of a Pacing Oddity (con’t) • Turns out the Driver does a RESUME each time it puts data in an empty transmit queue to restart it. • A RESUME causes the ExpireTime to be set to the current PacedTime. • This causes the channel to be expired at the very next Pacer Period. • Thus the channel transmits at twice its expected rate D T D T D T D T D T D T D T R R R R R R R

  48. APIC Bugs and Caveats: RxSync • RxSync Lockup when buffers too short • APIC is receiving data for a connection. • APIC runs out of buffers when there is still data left • If this happens repeatedly, under certain conditions the APIC’s Rx-Sync module can lock-up. Example: if we have 3 16 byte buffers set up to receive one 56 byte AAL0 cell (re- member that the APIC AAL0 cell size is 56 bytes), then each time we receive a cell with these buffers we will have 8 bytes left over that the APIC SHOULD throw away. After the eighth time we use this chain of buffers to receive a cell, the APIC locks up. • A similar problem exists for AAL5. • Bug has not been identified in VHDL • Work- arounds: • For AAL0, always allocate buffers in multiples of 56 bytes. • For AAL5, always allocate buffers in multiples of 48 bytes.

  49. APIC Bugs and Caveats: Word Swap • APIC swaps contiguous 32bit words when receiving data into host memory. • Exists in APIC when used in Intel architectures • Exists only in 32bit PCI mode • Bug has been identified in VHDL but we aren’t going to respin the chip… • Work-arounds: • Driver performs a word swap on all data received. • painful and costly data touch

  50. APIC Bugs and Caveats: ILR • Bug in APIC decode of Interrupt Line Register address on writes • ILR is at 0x3C • BIOS writes IRQ value to ILR register and then reads it back to see if this is a functioning PCI device. If it doesn’t read back properly, it “removes” this device from the PCI bus • BIOS write to 0x3C enters APIC as write to 0x7C • reads of 0x3C are ok. • Bug has been identified in VHDL. • Work-around implemented on NICs and SPCs • you should never have to worry about this one…

More Related