catrin
Uploaded by
17 SLIDES
322 VUES
170LIKES

Understanding VLIW and Software-Driven ILP Techniques

DESCRIPTION

This presentation, delivered by Anshul Kumar from CSE IITD, explores the fundamentals of Very Long Instruction Word (VLIW) architectures and software-driven Instruction-Level Parallelism (ILP). It covers topics such as pipeline scheduling, loop unrolling, branch prediction, and techniques for detecting and enhancing loop-level parallelism. The session also discusses practical examples and strategies for multi-issue processors, focusing on improving performance through advanced scheduling techniques and removing false dependencies.

1 / 17

Télécharger la présentation

Understanding VLIW and Software-Driven ILP Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006 Anshul Kumar, CSE IITD

  2. Outline • Pipeline scheduling and loop unrolling • Branch prediction with static scheduling • Basic VLIW approach • Detecting and enhancing loop level parallelism • Software pipelining • Global scheduling • Hardware support • Real examples Anshul Kumar, CSE IITD

  3. Approaches for multi-issue processors Anshul Kumar, CSE IITD

  4. Pipeline scheduling example for (i=1000; i>0; i--) x[i] = x[i] + s; Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  5. Latency due to data hazards Assume no structural hazards Anshul Kumar, CSE IITD

  6. Straight forward scheduling Loop: L.D F0, 0(R1) 1 stall 2 ADD.D F4, F0, F2 3 stall 4 stall 5 S.D F4, 0(R1) 6 DADDUI R1, R1, #-8 7 stall 8 BNE R1, R2, Loop 9 stall 10 Anshul Kumar, CSE IITD

  7. A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 0(R1) 6 Anshul Kumar, CSE IITD

  8. A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 8(R1) 6 Anshul Kumar, CSE IITD

  9. Loop unrolling Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1) 12 L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1) 18 L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7 Anshul Kumar, CSE IITD

  10. Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1) 12 L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1) 18 L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7 Anshul Kumar, CSE IITD

  11. Re-scheduling Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10, -16(R1) L.D F14, -24(R1) 4 ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 8 S.D F4, 0(R1) S.D F8, -8(R1) 10 DADDUI R1, R1, #-32 S.D F12, -16(R1) 12 BNE R1, R2, Loop S.D F16, -24(R1) 14 14/4=3.5 Anshul Kumar, CSE IITD

  12. Decisions and transformations • Can S.D move after DADDUI and BNE? • Adjust S.D offset. • Are loop iterations independent? • Do register renaming. • Remove extra loop termination tests, adjust the code. • Analyze addresses. Can loads/stores be reordered? • Schedule the code, preserving dependences. Anshul Kumar, CSE IITD

  13. Dependences in unrolled loop Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  14. Remove extra DADDUI Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop offsets in loads/stores adjusted Anshul Kumar, CSE IITD

  15. False dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  16. Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

  17. True dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop Anshul Kumar, CSE IITD

More Related
SlideServe
Audio
Live Player
Audio Wave
Play slide audio to activate visualizer