1 / 26

Programming the Velocity Engine

Introduction. What is a Vector Processor?The Velocity EngineProgramming the Velocity EngineDiscuss Examples 1 to 3 onlyQ

kipling
Télécharger la présentation

Programming the Velocity Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Programming the Velocity Engine Bing-Chang Lai Phillip John McKerrow University of Wollongong

    2. Introduction What is a Vector Processor? The Velocity Engine Programming the Velocity Engine Discuss Examples 1 to 3 only Q&A

    3. What is a Vector Processor? Supports Single Instruction Multiple Data (SIMD) instructions Originally used in Supercomputers for crunching scientific programs Now popular on the desktop as well, for crunching multimedia related applications

    4. What is a Vector Processor? On desktop, it is usually part of a larger processor Examples of Vector Processor Technologies MMX, SSE, 3DNow, AltiVec

    5. The Velocity Engine Apples name for AltiVec Technology What is AltiVec Technology then? Refers to technique Motorola used to add vector processing capabilities to the G4 (74xx) family of processors

    6. The Velocity Engine G4 Processor Load/Store Unit Integer Unit Floating Point Unit Vector Unit (AltiVec)

    7. Programming the Velocity Engine Specifications AltiVec Technology Programming Interface Manual Available from http://e-www.motorola.com/brdata/ PDFDB/MICROPROCESSORS/32_BIT/POWERPC/ALTIVEC/ALTIVECPIM.pdf http://www.altivec.org/tech_specifications/ altivec_pim.pdf

    8. Programming the Velocity Engine Compilers Apple AltiVec-related patches to GCC 2.295.2 Metroworks Codewarrior Vector types All vectors are 128-bit long Start with keyword vector or __vector Followed by type. Eg. unsigned char, unsigned int, signed int and so on

    9. Programming the Velocity Engine Vector types

    10. Programming the Velocity Engine Vector types

    11. Programming the Velocity Engine Vector types

    12. Programming the Velocity Engine Vector operations Arithmetic Operations vec_abs (absolute value), vec_add (addition), vec_sub (subtraction) ... Boolean Operations vec_and (Logical AND), vec_or (Logical OR) ... vec_cmpeq (Equality), vec_cmple (Less Than or Equal To)

    13. Programming the Velocity Engine Vector operations Miscellaneous Operations vec_perm (Permutation), vec_merge (Merges two vectors into 1) ... Memory Operations vec_st (Store), vec_ld (Load) ... Data Stream Operations vec_dst (Vector Data Stream Touch), vec_dss (Vector Data Stream Stop) ...

    14. Programming the Velocity Engine Constraints Vector operations all work on 128-bits at a time only no more and no less. vec_ld (load) and vec_st (store) all operate on 16-byte (128-bit) boundaries. This leads to alignment of data issues Loading of data from memory to the processor is one of the main bottlenecks. Use cache functions to mark data for load before the operation takes place

    15. Programming the Velocity Engine The following examples from the paper will be discussed Example 1: Element-by-Element access Example 2: Alignment Example 3: Unaligned Loads and Stores The Image Addition program in the Appendix will not be discussed

    16. Programming the Velocity Engine Example 1: Element-by-Element Access

    17. Programming the Velocity Engine Example 1: Element-by-Element Access Outputs 01234567890ABCDEF Instead of using the union, you can also access elements by address and casting

    18. Programming the Velocity Engine Example 2: Alignment 16-byte aligned locations have address with the least significant 4 bits set to 0. Eg. 0xf0, 0x10 and so on AltiVec specification specifies vec_malloc and vec_free for creating 16-byte aligned blocks for vectors. The code finds the aligned address by removing setting the 4 l.s.b to 0 and then adding 16. Please note that Apple GCC aligns everything to 16-byte boundaries

    19. Programming the Velocity Engine Example 2: Alignment - Allocate

    20. Programming the Velocity Engine Example 2: Alignment - Deallocate

    21. Programming the Velocity Engine Example 2: Alignment - Using

    22. Programming the Velocity Engine Example 3: Unaligned Loads and Store

    23. Programming the Velocity Engine Example 3: Unaligned Loads and Store

    24. Programming the Velocity Engine Example 3: Unaligned Loads and Store

    25. Resources The code for this paper will be available At http://www.bclai.net (Probably by the end of the week) Email me on bl12@uow.edu.au Other Important Resources AltiVec Information Source At http://www.altivec.org Email group list Apples AltiVec Homepage At http://developer.apple.com/hardware/ve/ Tutorials Vector Libraries AlienOrb AltiVec Page At http://www.alienorb.com/AltiVec/ AltiVec Tutorial AltiVec Code Examples on lookup table, streaming data fetch instructions ...

    26. References Bing-Chang Lai, Phillip John McKerrow Programming the Velocity Engine, AUC, 2001 Motorola, Inc. AltiVec Technology Programming Interface Manual, 1999. see http://e-www.motorola.com/brdata/PDFDB/ MICROPROCESSORS/32_BIT/POWERPC/ ALTIVEC/ALTIVECPIM.pdf IanOllmann Ph.D. AltiVec, 2001. see http://www.alienorb.com/AltiVec/Altivec.pdf

    27. Q&A Q&A This slide follows the Think different slide and precedes any optional slides.Q&A This slide follows the Think different slide and precedes any optional slides.

More Related