1 / 53

Application of AI- and ML-Techniques to Fault-Tolerant Routing

Application of AI- and ML-Techniques to Fault-Tolerant Routing. Arjun Rao CS 717 November 16 and 18, 2004. Papers Covered. [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies”

jemma
Télécharger la présentation

Application of AI- and ML-Techniques to Fault-Tolerant Routing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of AI- and ML-Techniques to Fault-Tolerant Routing Arjun Rao CS 717 November 16 and 18, 2004

  2. Papers Covered • [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies” • [2] Loh, Shaw., “A Genetic-Based Fault-Tolerant Routing Strategy for Multiprocessor Networks”

  3. Papers Covered (cont.) • [3] Loh, Schröder, Hsu., “Fault-Tolerant Routing on Complete Josephus Cubes” (not AI-related but interesting nevertheless) If time permits, also: • [4] Bradley, Tyrrell., “Immunotronics: Hardware Fault Tolerance Inspired by the Immune System”

  4. The Problem of Routing • Communication between nodes • Servers • Microprocessors • Desire shortest, most efficient paths • Multiprocessor network topologies, e.g. hypercubes, Josephus cubes, etc. • Desire availability of paths • What to do when links/nodes fail? • How to remain (close to) optimal?

  5. Intro to Fault-Tolerant Routing • Current algorithms adaptive but non-minimal • Misrouting • Routing strategies tied to specific topologies • k-ary, n-cubes, meshes, etc.: Regular structures and symmetry • Constrained by fault number and types • More general strategies vulnerable to deadlock and livelock

  6. “Turn Model” [Glass, Ni] • Widest application scope • k-ary, n-cubes, nD-meshes, torus geometries, etc. • “West-First” algorithm (on 2D-mesh) • Messages prevented from turning “west” again • Prevents cyclesdeadlocks • Routing along virtual channels in strictly decreasing or increasing order

  7. Turn Model and Channel Numbering

  8. Turn Model (cont.) • Three examples of routing • “F” = FAILURE • Full adaptation w/o deadlock and livelock requires more global infomore overhead

  9. AI Search Techniques • Arbitrary topology  Search space • Search space  Search tree(s) • Adaptive but still non-minimal • Characteristic recursion impractical on loosely-coupled, distributed network

  10. AI Logical Abstraction • Abstraction: • S: Problem space • O: Set of objectives • P: Search paths • S = (O, P), where oi O and pj  P, each pj connects tuple (ok, ol), k  l Abstraction used to model…

  11. Multiprocessor Network w/ Generic Topology • Network • N: Nodes • L: Links between nodes • G = (N, L), where ni N and lj  L, each lj connects tuple (nk, nl), k  l • Objective  Node • Search path  Link

  12. Abstract Routing Model • Search : • (os, ot): S x S  S*, where S = (O, P) and S* = (O*, P*) • ox,oy O and ox,oy O*  Successful search • ox,oy O and ox  O*, oy O*  Unsuccessful  • Routing attempt R: • R(ns, nd): G x G  G*, where G = (N, L) and G* = (N*, L*) • ni,nj N and ni,nj N*  Complete route • ni,nj N and ni  N*, nj N*  Incomplete 

  13. Routing Analogy • AI search equivalent to routing attempt • Successful search  Route between source and destination nodes • Unsuccessful search  Incomplete route to destination

  14. Caveats of Analogy • No specific search algorithm  No routing strategy • No optimality constraints • Nothing about deadlocks/livelocks • Nothing about fault tolerance!!

  15. Fault-Tolerant Routing Model • Model considers two aspects: • Routing system configuration • Must be generic enough! • Message propagation protocols and policies • Following slides introduce what is needed for AI searches (w/ physical message backtracking)

  16. FT Routing Model (cont.)

  17. FT Routing Model (cont.) • Eager readership of input messages • Single input buffer to avoid polling • Multiple output buffers to accommodate different delivery rates • Router process: • AI/FT routing strategy implemented here • Physical message backtracking  Increased message sizes • Increased message sizes/overhead  Requires communications router at each node

  18. Communications Router

  19. Communications Router (cont.) • Communication router constitutes router process and connections • Main components: LCM and CP • ROM: Stores link management and routing software • RAM: Stores routing table, link status table, associated link lists

  20. CR Data Structure: Routing Table

  21. CR Routing Table • For each node, up to n links • For each link: • Connected with status OK and node ID of neighbor • Not connected with status NC and node ID –1 • Link fault represented by timeout: • Status reset to NC • Processor fault represented by timeouts in neighbors

  22. CR Data Structures: Link Status Table, Lists

  23. Message Packets • Six fields: • Router Control (4 bits): Type of message, including NORMAL and BACKTRACK • Destination Node ID (10 bits): Supports network of size up to 1024 nodes • Pending Nodes (20 bytes): Stack of node IDs that may receive packet but have not yet • Traversed Nodes (20 bytes): Stack of nodes traversed, with most recent on top

  24. Message Packets (cont.) • Traversed Nodes Index (10 bits): Index to previous traversed nodes field. Supports simulation of physical message backtracking • Data Field (n-bit pointer): Points to information content of packet

  25. (Finally) AI Search Strategies • Brute Force: • Depth-First Search • Random Climbing • Heuristic: • Hill Climbing • Best-First Search • A*

  26. AI Search Strategies (cont.) • In presence of network faults: • Prevent cycles  No deadlocks • Prevent more than two traversals of nodes/links  No livelocks and necessary for AI searches • Adaptations of search algorithms • Problems: • Recursion? Nope (PMB) • Overhead? Fixed (Well, mostly…)

  27. Common Beginning Extracts header and disassembles it IF Destination Node is reached, pass packet to host processor ELSE IF Router Control is BACKTRACK IF Pending Nodes top node is directly linked Route packet to that node Set Router Control to NORMAL ELSE Backtrack packet to previous node in traversed Pop current node ID from Pending Nodes Push current node ID onto Traversed Nodes

  28. Depth-First Search • Travel as far as possible • Do not consider alternative paths just yet • If fault or dead-end, backtrack to most recent possible path

  29. DFS (cont.) Following common beginning: Look for directly linked successor nodes IF they are already traversed, ignore ELSE IF they are in Pending Nodes, ignore ELSE push them onto Pending Nodes Read top node of Pending Nodes IF directly linked (no fault), route packet to it ELSE Set BACKTRACK and route to last traversed node END

  30. DFS Example

  31. DFS Example (cont.)

  32. Random Climbing Following the common beginning: … ELSE Select a successor node randomly Push unselected successor nodes onto Pending Nodes …

  33. Hill Climbing • Heuristic: Estimated remaining distance Following common beginning: … ELSE Sort successor nodes according to est. remaining distance Push sorted nodes onto Pending Nodes …

  34. Best-First Search • Resumes partial routes not previously considered • Looks at immediate neighbors, neighbors of predecessors • Sorts by est. remaining distance • Leads to non-minimal routes!

  35. BFS (cont.) … ELSE Push (directly linked successor nodes) onto Pending Nodes Sort Pending Nodes according to est. remaining distance …

  36. A* • Two heuristics: • Estimated remaining distance: h • Path length traversed: g • Partial paths sorted by f = g + h • When no faults, always finds minimal route

  37. A* (cont.) After current ID processing: Record path length traversed, g … ELSE Calculate and store f for new successor nodes Push them onto Pending Nodes sorted by f …

  38. Performance Testing • Simulated 125-node multiprocessor network • Max 8 links per node (maps to many topologies) • Faulty links and processors • Pre-specified or dynamically generated • Testing: • Messages between every pair of nodes • 20 trials at 0%, 5%, 10%, 15%, 20% faulty links • 125 x 125 x 20 x 6 = 1,875,000 tests (??)

  39. Test Results • As faults increase, heuristic strategies fair better (esp. > 15%) • A* best search technique but slow • Hill climbing and BFS do not consider nodes traversed • Hill climbing considers only immediate neighbors

  40. Test Results (cont.)

  41. Main Point Using AI search techniques, we abstract from routing in networks to searching in trees (topology-independent, quantity and type of faults irrelevant)

  42. Next Paper • [1] Loh, Peter K.K., “Artificial Intelligence Search Techniques as Fault-Tolerant Routing Strategies” • [2] Loh, Shaw., “A Genetic-Based Fault-Tolerant Routing Strategy for Multiprocessor Networks”

  43. Our Little Problem… • AI search techniques topology- and fault-type independent… • …but non-minimal routes utilized • Follow-up work shows how genetic algorithms (combined with heuristics) can find minimal routes in presence of network faults

  44. Genetic Algorithms: Overview • Optimization strategy • Population of potential solutions evolve over series of generations • Each element of population is chromosome; each unit of chromosome is gene • Chromosomes undergo crossover and mutation • Most fit chromosomes selected for next generation, based upon fitness function

  45. Abstract Model • Same as before (including definitions of S and G) • Pure abstraction suffers from same caveats as before • Basic idea: Instead of AI search for adaptive route, optimize over population of routes to find best

  46. Message Packets • Simplified version:

  47. Chromosome • Route  Chromosome • Node on route  Gene in chromosome • Length of route  Size of chromosome • Chromosome size directly reflects routing performance! • Distance traversed basis of fitness

  48. Population Creation

  49. Mutation and Crossover • Mutation: Swap and/or shift • Normal crossover destroys routes, messes with source and destination; problem w/ different lengths • Use one-point random crossover

  50. Fitness Function • F = (Dmax – Droute) / Dmax +  • Dmax:Maximum distance between source and destination • Droute: Distance traveled by specific route • : Predefined value to ensure non-zero fitness • Higher value  More fit

More Related