1 / 27

External Memory Value Iteration

External Memory Value Iteration. Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela. Agent. s t. c t. a t. Environment. Motivation: Reinforcement Learning.

ave
Télécharger la présentation

External Memory Value Iteration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. External Memory Value Iteration Stefan Edelkamp, Shahid Jabbar Chair for Programming Systems, University of Dortmund, Germany Blai Bonet Departamento de Computacion Universidad Simon Bolivar, Caracas, Venezuela

  2. Agent st ct at Environment Motivation: Reinforcement Learning • Aim: Write Controller to act successfully in the environment • Minimize Cost/Maximize Rewards Edelkamp, Jabbar & Bonet

  3. Motivation: External Reinforcement Learning • Cover deterministic, non-deterministic, probabilistic environments (and games) • But what to do, if the agent’s state space or policy space is too large to be computed and stored in RAM? • Disk Space is Cheap (500 GB ~ 100$)  External Memory Algorithm Edelkamp, Jabbar & Bonet

  4. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  5. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  6. Uniform Search Modell: Deterministic Non-Deterministic Probabilistic Edelkamp, Jabbar & Bonet

  7. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  8. ε-Optimal for solving MDPs, AND/OR trees… • Problem: • Needs to have the whole state space in the main memory. Edelkamp, Jabbar & Bonet

  9. Why External Memory Algorithms ? • Search algorithms perform well as long as they consume RAM only! • Virtual memory slows down the performance! Virtual Address Space 0x000…000 7 I/Os Memory Page 0xFFF…FFF Edelkamp, Jabbar & Bonet

  10. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Memory Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  11. External Memory Model [Vitter and Shriver, 94] If the input size is very large, running time depends on the I/Os rather than on the number of instructions. M B Input of size N >> M Edelkamp, Jabbar & Bonet

  12. A C Remove Duplicates w.r.t 2 previous layers External Sort Open (2) Compact Open (2) E D B D A A A E D A B Open (2) A D D C D D E E E Open (1) External Breadth-First Search (Munagala and Ranade, SODA’99) A Open (0) For undirected graphs, subtracting two layers is enough [Munagala & Ranade, 99]. For directed graphs, the longest back-edge has to be taken into account [Zhou & Hansen, 05]. Edelkamp, Jabbar & Bonet

  13. External Memory Algorithms for Implicit Graphs • Frontier Search [Korf, 03] • External A* [Edelkamp, Jabbar, Schrödl, 04] • Structured Duplicate Detection [Zhou & Hansen, 04]. • Cost-Optimal External Planning [Edelkamp, Jabbar, 06] • Model Checking for Linear Temporal Logic • [Jabbar & Edelkamp, 05] for safety error detection • [Edelkamp & Jabbar, 06] for liveness detection (cycle) • [Barnat, Brim, Simecek, 07] for liveness detection (cycle) • Real-Time Model Checking/Scheduling [Edelkamp, Jabbar, 06] Edelkamp, Jabbar & Bonet

  14. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Memory Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  15. External Memory Algorithm for Value Iteration • What makes value iteration different from the usual external memory search algorithms? • Answer: • Propagation of information from states to predecessors!  Edges are more important than the states. Ext-VI works on Edges: Edelkamp, Jabbar & Bonet

  16. Phase I: Generate the edge space by External BFS. Open(0) = Init; i = -1 while (Open(i-1) != empty) Open(i) = Succ(Open(i-1)) Externally-Sort-and-Remove-Duplicates(Open(i)) forloc = 1 to Locality(Graph) Open(i) = Open(i) \ Open(i - loc) i++ endwhile External Memory Value Iteration Remove previous layers Merge all BFS layers into one edge list on disk! Opent = Open(0) UOpen(1) U … UOpen(DIAM) Temp = Opent Sort Opent wrt. the successors; Sort Tempwrt. the predecessors Edelkamp, Jabbar & Bonet

  17. 2 1 2 7 1 h=3 5 0 0 2 I T T 1 3 8 10 1 2 1 6 4 9 Working of Ext-VIPhase-II Temp : Edge List on Disk – Sorted on Predecessors h= 3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0 {(Ø, 1), (1,2), (1,3), (1,4),(2,3), (2,5),(3,4), (3,8),(4,6),(5,6), (5,7),(6,9),(7,8), (7,10),(9,8), (9,10)} {(Ø,1),(1,2),(1,3), (2,3),(1,4), (3,4),(2,5),(4,6), (5,6),(5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)} h= 3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0 h’= 3 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0 Opent : Edge List on Disk – Sorted on Successors Alternate sorting and update until residual < epsilon Edelkamp, Jabbar & Bonet

  18. ……… Complexity Analysis • Phase-I: External Memory Breadth-First Search. • Expansion: • Scanning the red bucket: O(scan(|E|)) • Duplicates Removal: • Sorting the green bucket having one state for every edge from the red bucket. • Scanning and compaction. • O(sort(|E|)) • Subtraction: • Removing states of blue buckets (duplicates free) from the green one. • O(l xscan(|E|)) Complexity of Phase-I: O(l xscan(|E|) + sort(|E|) ) I/Os Edelkamp, Jabbar & Bonet

  19. Complexity Analysis • Phase-II: Backward Update • Update: • Simple block-wise scanning. • Scanning time for red and green files: O(scan(|E|)) I/Os • External Sort: • Sorting the blue file with the updated values to be used as red file later: O(sort(|E|)) I/Os • Fast External Sort: • If |E| / M < Max file pointers • O(scan(|E|)) I/Os Sorted on preds ……… Sorted on states Updated h-values Total Complexity of Phase-II: For tmax iterations, O(tmax xsort(|E|)) I/Os With Fast External Sort: O(tmax xscan(|E|)) I/Os Edelkamp, Jabbar & Bonet

  20. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  21. Experiments: 3x3 Sliding Tiles Puzzle Number of Iterations differ!! Edelkamp, Jabbar & Bonet

  22. 3x4 Sliding Tile Puzzle with p=0.9 (State space: 12!/2 = 239 x 106) • On 2 Gigabytes, VI could not generate the state space. • External VI Finished: • Took 45 GB of disk space for the edges. • Total 1,357,171,197 edges. • Took 437 hours and 72 iterations to converge. • ε = 0.0001 • RAM used: 1.4 Gigabytes Edelkamp, Jabbar & Bonet

  23. Race Track Domain • Example Edelkamp, Jabbar & Bonet

  24. Overview • Uniform Search Model • Internal Memory Value Iteration • Existing External Model and BFS • External Memory Value Iteration • Experimental Highlights • Summary & Outlook Edelkamp, Jabbar & Bonet

  25. Summary Achievements • First I/O efficient disk-based algorithm for solving Markov Decision Processes. • I/O Complexity Analysis. Features • General Cost Model • Can Pause-and-Resume Execution to add more Hard Disks. Refinements • Disk Space eaten by Duplicate States:  Start “Early”Delayed Duplicate Detection Edelkamp, Jabbar & Bonet

  26. Outlook • Application to Bellman-Ford • Parallel External Value Iteration: During the time of internal update, hard disk is not in use.. Edelkamp, Jabbar & Bonet

  27. Thank You!Questions ?

More Related