320 likes | 449 Vues
Designing Campus Networks for Manageability. Nick Feamster Georgia Tech. Campus Networks: Large and Complex. 3 campuses: Atlanta; Savannah; Metz, France 87,000 ports, 155 buildings, 1,800 fiber miles 2,300 Wireless APs (3,900 radios) Campus-wide Layer 2 Mobility Transit for the southeast
E N D
Designing Campus Networks for Manageability Nick FeamsterGeorgia Tech
Campus Networks: Large and Complex • 3 campuses: Atlanta; Savannah; Metz, France • 87,000 ports, 155 buildings, 1,800 fiber miles • 2,300 Wireless APs (3,900 radios) • Campus-wide Layer 2 Mobility • Transit for the southeast • Internet 2, National Lambda Rail (NLR), Southern Light Rail (SLR), Southern Crossroads (SoX)
Problems in Campus Networks • Security:Access control is complex, dynamic • Example: Campus wireless network has 4-6k active clients at any time, considerable churn • Resonance:Dynamic access control enterprise networks • Virtual Network Configuration • Today: Virtual LANs • Today: Effects of VLAN-induced sharing • Next steps
Dynamic Access Control in Enterprise Networks Ankur Nayak, Hyojoon Kim, Nick Feamster, Russ Clark Georgia Tech
Motivation • Enterprise and campus networks are dynamic • Hosts continually coming and leaving • Hosts may become infected • Today, access control is static, and poorly integrated with the network layer itself • Resonance:Dynamic access control • Track state of each host on the network • Update forwarding state of switches per host as these states change
State of the Art • Today’s networks have many components “bolted on” after the fact • Firewalls, VLANs, Web authentication portal, vulnerability scanner • Separate (and competing) devices for performing the following functions • Registration (based on MAC addresses) • Scanning • Filtering and rate limiting traffic Correctness depends on state that is distributed across the network.Many chances for incorrect, or unsynchronized state!
Example: User Authentication 3. VLAN with Private IP 7. REBOOT 1. New MAC Addr 2. VQP 6. VLAN with Public IP VMPS New Host 4. Web Authentication 5. Authentication Result ta Web Portal
Problems with Current Architecture • Access control is too coarse-grained • Static, inflexible and prone to misconfigurations • Need to rely on VLANs to isolate infected machines • Cannot dynamically remap hosts to different portions of the network • Needs a DHCP request which for a windows user would mean a reboot • Correctness depends on consistent state • Monitoring is not continuous
Resonance: Main Ideas • Idea #1: Access control should incorporate dynamic information about hosts • Actions should depend not only on changing state of host, but also on the security class of that host • Idea #2: Distributed dependencies should be minimized. • Incorrect behavior often results from unsynchronized state. Build a consistent view of the network.
Resonance: Approach • Step 1: Specify policy: a state machine for moving hosts from one state to the other. • Step 2: Associate each host with states and security classes. • Step 3: Adjust forwarding state in switches based on the current state of each machine • Actions from other network elements, and distributed inference, can affect network state.
Example: Simple User Authentication Infection Removed Quarantined Registration Failed Authentication Successful Authentication Still Infected after an update Operation Not Infected Scanning Vulnerability detected
Web Portal DHCP Server Controller Resonance: Step by Step 4. To the Internet Internet 1. DHCP request 2. Authentication 3. Scanning
Implementation: OpenFlow/NOX • OpenFlow: Flow-based control over the forwarding behavior of switches and routers • A switch, a centralized controller and end-hosts • Switches communicate with the controller through an open protocol over a secure channel
Why OpenFlow? • Much complexity, bugs, and unexpected behavior results from distributed state • Solution: Centralize control and state • Specifying a single, centralized security policY • Coordinating the mechanisms for switches • Finer granularity of control • Separation of control from data plane
Immediate Challenges • Scale • How many forwarding entries per switch? • How much traffic at the controller? • Performance • Responsiveness: How long to set up a flow? • Security • MAC address spoofing • Securing the controller (and control framework)
Length of Each Flow Much DNS traffic is quite short.Not tenable to have flow table entry for each.
Overhead: Flow Setup and Transfer • RTT delays < 20 ms • Secure copy overhead < 1%
Summary • A new architecture for dynamic access control • Preliminary design • Application to Georgia Tech campus network • Preliminary evaluation • Many challenges remain • Scaling • Performance • Policy languages • Complexity metrics
Big Challenge: Policy Languages • Holy grail: Elevate policy configuration from mechanism to a high-level language • Our belief:Centralized control can simplify this • Various programming language-related projects (e.g., Nettle) may hold some promise • Perhaps also formally checkable • Maybe ConfigAssure could help?
Understanding VLAN-Induced Sharing in a Campus Network Mukarram bin Tariq, Ahmed Mansy,Nick Feamster, Mostafa AmmarGeorgia Tech
Virtual LANs (VLANs) Ethernet VLAN3 VLAN Core VLAN2 VLAN1 • Multiple LANs on top of a single physical network • Typically map to IP subnets • Flexible design of IP subnets • Administrative ease • Sharing infrastructure among separate networks, e.g., for departments, experiments • Sharing: IP networks may depend on same Ethernet infrastructure
Informal Operator Survey “I wish for insight. Better visibility into operational details” Lack of cross-layer visibility “[users] can end up on ports configured for the wrong VLAN …. difficult for end users to determine why their network isn't working ("but I have a link light!”)” Need for diagnostic tools for VLANs “deploy tomography tool [for the campus to isolate faulty switches]” • Shared failure modes among networks “Using only the information the switch can give [is difficult to determine] to which VLAN or VLANs are the busy ones”
Key Questions and Contributions How to obtain visibility in sharing of Ethernet among IP networks? • EtherTrace: A tool for discovery of Ethernet devices on IP path • Passive discovery using bridge tables • Does not require CDP or LLDP How much sharing is there in a typical network? • Analysis of VLAN in Georgia Tech network • 1358 Switches, 1542 VLANs • Find significant sharing How much does Ethernet visibility help? • Network tomography • 2x improvement in binary tomography using Ethernet visibility
EtherTrace: IP to Ethernet Paths Frames arrive on same port for off-path switches C B D E F A • Due to spanning tree, frames from H1 and H2 are received on separate ports of same VLAN for switches that are on the path • EtherTrace automates discovery of Ethernet path by analyzing bridge and ARP tables, and iterating for each IP hop in IP traceroute H2 Frames arrive on separate ports for on-path switches • Works well for stable networks D C F A E B H1 • Available at: http://www.gtnoise.net/ethertrace
Georgia Tech: Network Dataset Data sources Dataset Analysis • Obtain Ethernet devices for IP traceroutes using EtherTrace • Quantify the sharing of Ethernet devices among IP hops and paths 1358 Switches 31 Routers 79 monitoring nodes • Bridge tables obtained every 4 hours • ARP tables obtained every hour • IP traceroutes among monitoring nodes every 5 minutes • One-day snapshot on March 25, 2008
Ethernet Hops Shared Among IP Hops Maximum IP hops traversing an Ethernet hop: 34. 17 when considering only disjoint IP hops. 57% of Ethernet Hops have 2 disjoint IP Hops in common
Application Network Tomography • Send end-to-end probes through the network • Monitor paths for differences in reachability • Infer location of reachability problem from these differences Monitor Targets y x
Improving Diagnosis Accuracy Metric Using IP level information only Incorporating layer-2 visibility Accuracy: Is failed hop in the diagnosed set of hops? Fraction of times faulty edge in diagnosed set 54% 100% Specificity: How big is the diagnosed set relative to number of failed hops? Size of Diagnosed Set Average 3.7 1.48 95th %-ile 9 1 • Experiment • Simulate failure of a random Ethernet hop • Determine IP paths that are affected by the failure • Use binary tomography to determine the hop that has fault
Summary • Surprising amount of sharing • On average, an Ethernet hop affects ~30 IP hops • 57% of Ethernet hops affect two or more disjoint IP hops • Failure of an Ethernet device affects (on average) as many IP paths as failure of an IP device • Two orders of magnitude more Ethernet devices • Cross-layer visibility improves diagnosis • 2x improvement in accuracy and specificity • EtherTrace: www.gtnoise.net/ethertrace
Next Steps • Better understand the types of tasks that VLANs are used for in the campus • Ease of mobility • Separation of access for security reasons • Flattening the layers: Explore the possibility of eliminating VLANs where they are not needed? • E.g., can an OpenFlow-capable network eliminate the need for VLANs?
Problems in Campus Networks • Security:Access control is complex, dynamic • Example: Campus wireless network has 4-6k active clients at any time, considerable churn • Resonance:Dynamic access control enterprise networks • Next steps: Scaling, policy languages • Virtual Network Configuration • Today: Virtual LANs • Today: Effects of VLAN-induced sharing • Next steps: VLAN alternatives