230 likes | 268 Vues
What’s needed to receive?. A look at the minimum steps required for programming our anchor nic’s to receive packets. A disappointment. Our former ‘nicwatch.cpp’ application does not seem to work reliably to show packets being received by the 82573L controller
E N D
What’s needed to receive? A look at the minimum steps required for programming our anchor nic’s to receive packets
A disappointment • Our former ‘nicwatch.cpp’ application does not seem to work reliably to show packets being received by the 82573L controller • It was based on the ‘raw sockets’ protocol implemented within the Linux kernel’s vast networking subsystem, thus offering us the prospect of a ‘hardware-independent’ tool -- if only it would show us all the packets!
Two purposes… • So let’s discard ‘nicwatch.cpp’ in favor of writing our own hardware-specific module that WILL be able to show us all the nic’s received packets, independently of Linux’s various layers of networking protocol code • And let’s keep it as simple as possible, so we can see which programming steps are the truly essential ones for the 82573L nic
Accessing 82573L registers • Device registers are hardware mapped to a range of addresses in physical memory • We can get the location and extent of this memory-range from a BAR register in the 82573L device’s PCI Configuration Space • We then request the Linux kernel to setup an I/O ‘remapping’ of this memory-range to ‘virtual’ addresses within kernel-space
i/o-memory remapping Local-APIC APIC registers IO-APIC nic registers 1-GB vram nic registers kernel code/data user space vram 3-GB dynamic ram physical address-space ‘virtual’ address-space
Kernel memory allocation • The NIC requires that some host memory for packet-buffers and receive descriptors • The kernel provides a ‘helper function’ for reserving a suitable region of memory in kernel-space which is both ‘non-pageable’ and ‘physically contiguous’ (i.e., kzalloc()) • It’s our job is to decide how much memory our network controller hardware will need
Ethernet packet layout • Total size normally can vary from 64 bytes up to 1522 bytes (unless ‘jumbo’ packets and/or ‘undersized’ packets are enabled) • The NIC expects a 14-byte packet ‘header’ and it appends a 4-byte CRC check-sum 0 6 12 14 the packet’s data ‘payload’ goes here (usually varies from 56 to 1500 bytes) destination MAC address (6-bytes) source MAC address (6-bytes) Type/length (2-bytes) Cyclic Redundancy Checksum (4-bytes)
Rx-Descriptor Ring-Buffer RDBA base-address 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 RDH (head) RDLEN (in bytes) RDT (tail) = owned by hardware (nic) = owned by software (cpu) Circular buffer (128-bytes minimum – and must be a multiple of 128 bytes)
Our ‘nicspy.c’ module • It will be a ‘character-mode’ device-driver • It will only implement ‘read()’ and ‘ioctl()’ • The ‘read()’ function will cause a task to sleep until a network packet has arrived • An interrupt-handler will wake up the task • A ‘get_info’ function will be provided as a debugging aid, so the NIC’s Rx descriptor-queue can be conveniently inspected
Sixteen packet-buffers • Our ‘nicspy.c’ driver allocates 16 buffers of size 1536 bytes (i.e., for normal ethernet) for the Rx Descriptor Queue (256 bytes) for the sixteen packet-buffers unused unused 32-KB allocated (16 packet-buffers, plus Rx-Descriptor Queue) #define KMEM_SIZE 0x8000 // 32KB = size of kernel memory allocation void *kmem = kzalloc( KMEM_SIZE, GFP_KERNEL ); if ( !kmem ) return –ENOMEM;
Format for an Rx Descriptor 16 bytes Base-address (64-bits) Packet- length Packet- checksum status errors VLAN tag The device-driver initializes this ‘base-address’ field with the physical address of a packet-buffer The network controller will ‘write-back’ the values for these fields when it has transferred a received packet’s data into this packet-buffer
Suggested C syntax typedef struct { unsigned long long base_address; unsigned short packet_length; unsigned short packet_cksum; unsigned char desc_status; unsigned char desc_errors; unsigned short VLAN_tag; } RX_DESCRIPTOR; ‘Legacy Format’ for the Intel Pro1000 network controller’s Receive Descriptors
RxDesc Status-field 7 6 5 4 3 2 1 0 PIF IPCS TCPCS UDPCS VP IXSM EOP DD DD = Descriptor Done (1=yes, 0=no) shows if nic is finished with descriptor EOP = End Of Packet (1=yes, 0=no) shows if this packet is logically last IXSM = Ignore Checksum Indications (1=yes, 0=no) VP = VLAN Packet match (1=yes, 0=no) USPCS = UDP Checksum calculated in packet (1=yes, 0=no) TCPCS = TCP Checksum calculated in packet (1=yes, 0=no) IPCS = IPv4 Checksum calculated on packet (1=yes, 0=no) PIF = Passed In-exact Filter (1=yes, 0=no) shows if software must check
RxDesc Error-field 7 6 5 4 3 2 1 0 RXE IPE TCPE reserved =0 reserved =0 SEQ SE CE RXE = Received-data Error (1=yes, 0=no) IPE = IPv4-checksum error TCPE = TCP/UDP checksum error (1=yes, 0=no) SEQ = Sequence error (1=yes, 0=no) SE = Symbol Error (1=yes, 0=no) CE = CRC Error or alignment error (1=yes, 0=no)
Essential ‘receive’ registers enum { E1000_CTRL 0x0000, // Device Control E1000_STATUS 0x0008, // Device Status E1000_ICR 0x00C0, // Interrupt Cause Read E1000_IMS 0x00D0, // Interrupt Mask Set E1000_IMC 0x00D8, // Interrupt Mask Clear E1000_RCRL 0x0100, // Receive Control E1000_RDBAL 0x2800, // Rx Descriptor Base Address Low E1000_RDBAH 0x2804, // Rx Descriptor Base Address High E1000_RDLEN 0x2808, // Rx Descriptor Length E1000_RDH 0x2810, // Rx Descriptor Head E1000_RDT 0X2818, // Rx Descriptor Tail E1000_RXDCTL 0x2828, // Rx Descriptor Control E1000_RA 0x5400, // Receive address-filter Array };
Receive Control (0x0100) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 R =0 0 FLXBUF 0 SE CRC BSEX R =0 PMCF DPF R =0 CFI CFI EN VFE BSIZE 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 B A M R =0 MO DTYP RDMTS I L O S LBM S L U LPE MPE UPE 0 0 SBP E N R =0 EN = Receive Enable DTYP = Descriptor Type DPF = Discard Pause Frames SBP = Store Bad Packets MO = Multicast Offset PMCF = Pass MAC Control Frames UPE = Unicast Promiscuous Enable BAM = Broadcast Accept Mode BSEX = Buffer Size Extension MPE = Multicast Promiscuous Enable BSIZE = Receive Buffer Size SECRC = Strip Ethernet CRC LPE = Long Packet reception Enable VFE = VLAN Filter Enable FLXBUF = Flexible Buffer size LBM = Loopback Mode CFIEN = Canonical Form Indicator Enable RDMTS = Rx-Descriptor Minimum Threshold Size CFI = Canonical Form Indicator bit-value We used 0x0000801C in RCTL to prepare the ‘receive engine’ prior to enabling it
Device Control (0x0000) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 PHY RST VME R =0 TFCE RFCE RST R =0 R =0 R =0 R =0 R =0 ADV D3 WUC R =0 D/UD status R =0 R =0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 R =0 R =0 R =0 FRC DPLX FRC SPD R =0 SPEED R =0 S L U R =0 R =0 R =1 GIO M D 0 0 R =0 F D FD = Full-Duplex SPEED (00=10Mbps, 01=100Mbps, 10=1000Mbps, 11=reserved) GIOMD = GIO Master Disable ADVD3WUP = Advertise Cold Wake Up Capability SLU = Set Link Up D/UD = Dock/Undock status RFCE = Rx Flow-Control Enable FRCSPD = Force Speed RST = Device Reset TFCE = Tx Flow-Control Enable FRCDPLX = Force Duplex PHYRST = Phy Reset VME = VLAN Mode Enable We used 0x040C0241 to initiate a ‘device reset’ operation 82573L
Device Status (0x0008) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 ? 0 0 0 0 0 0 0 0 0 0 0 GIO Master EN 0 0 0 some undocumented functionality? 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 PHY RA ASDV I L O S SPEED S L U 0 TX OFF Function ID 0 0 L U F D FD = Full-Duplex LU = Link Up TXOFF = Transmission Paused SPEED (00=10Mbps,01=100Mbps, 10=1000Mbps, 11=reserved) ASDV = Auto-negotiation Speed Detection Value PHYRA = PHY Reset Asserted 82573L
PCI Bus Master DMA 82573L i/o-memory Host’s Dynamic Random Access Memory on-chip RX descriptors Rx Descriptor Queue packet-buffer on-chip TX descriptors packet-buffer packet-buffer DMA packet-buffer RX and TX FIFOs (32-KB total) packet-buffer packet-buffer packet-buffer
Our ‘read()’ algorithm unsigned int rx_curr; ssize_t my_read( struct file *file, char *buf, size_t len, loff_t *pos ) { // our global variable ‘rx_curr’ is the descriptor-array index // for the next receive-buffer descriptor to be processed if ( this descriptor’s status is zero ) put calling task to sleep; // wakeup the task when a fresh packet has been received copy received data from the packet-buffer to user’s buffer clear this descriptor’s status advance our global variable ‘rx_curr’ to the next descriptor return the number of data-bytes transferred }
‘nicspy.cpp’ • This application calls our device-driver’s ‘read()’ function repeatedly, and displays the ‘raw’ ethernet packet-data each time • It requires our ‘nicspy.c’ device-driver to be installed in the kernel, obviously • There’s no ‘clash’ of filenames here – and their similarity helps keep them together: nicspy.c and nicspy.ko (the kernel-side) nicspy.cpp and nicspy ( the user-side )
in-class demo • We can install ‘nicspy.ko’ on one of our anchor machines – making sure ‘eth1’ is ‘down’ before we do our module-install – and then we run ‘nicspy’ on that machine • Next we install our ‘nicping.ko’ module on some other anchor machine – be sure its ‘eth1’ interface is ‘down’ beforehand – and then use ‘cat /proc/nicping’ for a transmit