740 likes | 853 Vues
Sony Playstation: Graphics in a Crunch. By Paul Zimmons. PSX Sources:. http://www.classicgaming.com/aec/css/html/psx_development.html http://psx.rules.org/ http://dev.paradogs.com/ http://www.eetimes.com More sources listed for PS2. History.
E N D
Sony Playstation:Graphics in a Crunch By Paul Zimmons
PSX Sources: • http://www.classicgaming.com/aec/css/html/psx_development.html • http://psx.rules.org/ • http://dev.paradogs.com/ • http://www.eetimes.com • More sources listed for PS2
History • Play Station was announced in June 1991 at the CES • Part of a deal with Nintendo to add a CD-ROM attachment to the Super Nintendo system • The deal fell through • 1993 - PlayStation-X announced • December 3, 1994 - Japan release - $387 • 300,000 units in 30 days • September 9, 1995 - US release - $299
December 1994 • Onyx RealityEngine 2 was king ~ $100K • Indigo just starting ~ $15-30K • PC - Pentium 90, 16M, 1M Video, 500M • Mac - 100 Mhz PPC604, 16M, 2M Video, 500M HD announced for Q2 1995 • Doom II came out October 10, 1994
People of PSX • Ken Kutaragi • ‘Father’ of the Playstation • Designed audio chip for initial defunct SNES Nintendo deal • Headed Playstation project • Masakazu Suzuoki (suzu@rd.scei.sony.co.jp) • More technical • Design started in May 1992 (CD, 1M transistors goals)
PSX HW Overview • MIPS R3000A main CPU at 33.8688Mhz • 32 bit processor, 4KB I cache, 1 KB D cache • 2 MB of main system memory • 1 MB of video memory, 2KB tex cache • 32 bit bus, 132 Mbyte/sec • 2x CD-ROM (300KB/sec) • 512 KB Sound buffer, 24 simul. Sounds • ~300 K unlit no tex poly/sec
CPU Notes • Some ranges of logical memory space benefit from I cache, some don’t • D cache is accessible by programmers • Between CPU to main memory are • I cache, R buffer (reads),W buffer (writes) • Buffers speed up ops but make exact program prediction nearly impossible
Sound Notes • Pitch can be changed on the fly • Pitch of one sound can be varied by the volume of another sound • Pseudo random noise built in (changed by clock changes) • Attack, decay, sustain, release • Separate reverb unit • Buffer is used for reverb, SPU, xfers to MM
Geometry Transform Engine • Coordinate Transforms • Light Source Calculations • Coprocessor for the CPU (separate chip) • Fixed point matrix and vector ops • Sin, Cos on R3000 (rot matrices) • Operates in parallel • 1 sign bit, 3 integer bits, 12 fraction bits
GPU • Receives drawing instructions from the CPU • Addresses from the CPU map to the frame buffer • Textures and palettes are sent from the CPU to the frame buffer • GPU uses that info with the coordinates and color info (lighting) from the GTE
Frame Buffer • Dual ported, can also go direct MM to FB • Can draw while you display
Frame Buffer • Supports 256x240 up to 640x480 • Can do interlaced or not • 15 bit color or 24 bit color (no ops in 24)
Frame Buffer • 640x480x24 is really done as 960x480x16 • Virtual bits per pixel • GPU only works on 16 bits so no drawing while showing a movie sequence • Frame buffer is used for current output, drawing area, textures and color tables • Color palette = color table = CLUT = color look up table = color index mode • No separate texture memory
Typical Frame Buffer Layout Can also change drawing environment properties while drawing.
Primitive Types • Data is in Main Memory • Drawing primitives - seen on screen • Special primitives - change drawing parameters while drawing occurs • Polygon Primitive • 3 or 4 sides, flat or Gouraud, textured or not • Line Primitive • Line (A,B), (A,B,C), (A,B,C,D), gradient or no
Primitive Types • Sprite Primitive - rectangle • Sprite - tex map, Tile - no tex map • free, 1x1, 8x8, 16x16 pixels • Special Primitive • Change window, window clipping, texture window, drawing offset
Drawing Primitives • Draw on pixel centers, pixel center inside OK • Pixel center outside has rules • If pixel to right is inside -> draw • If pixel to left is inside -> no draw • If pixel above is inside -> no draw • If pixel below is inside -> draw • Don’t draw boundaries more than once
Ordering Tables • Like Word diagrams (grouping, order) • Similar to a linked list (draw1->draw2) • GPU renders while CPU goes on
Z Sorting with Ordering Tables • Calculate the primitives position in the table based on its Z value • GTE creates ordering table while converting (x,y,z) coordinates to (xs,ys), zw/4 returned • 256 entries: AddPoly(ot+256-z, poly0) • i.e. Painter’s algorithm style back to front • Last -> Next to Last -> Before that -> ..closest
Reverse OT • Normal OT approaches 0 close to viewer (!) • Let’s try 1/z • Index into OT with 1/z • But then we are indexing in reverse • Reverse the order of the table • OT: ot[0]->ot[1]->… ot[SIZE-1] • ROT: ot[SIZE-1]->ot[SIZE-2] … ->ot[0]
Transform to Screen • (Wx,Wy,Wz) - World, (Sx,Sy,Sz) - Screen • (m00, m01.. m22) - rotation matrix
Now Perspective Xform • Project onto screen • Distance h from the viewer Sy Sz h
Packet Buffers • PB=Area in memory for OT and primitives • CPU and graphics are not in parallel • Have two sets of Packet Buffers
Texture Mapping • Textures are stored in the frame buffer • Textures are in Texture Pages (256 x 256) • X coord multiple of 64, Y multiple of 256 • 4, 8, 16 bit (15+1) • 4, 8 use CLUT (palette of 16 or 256 colors)
Texture Cache • GPU has 2K of texture cache • Faster than the frame buffer reads • Texture reads fill the texture cache • Subsequent reads hit the cache • Depends on your bits per pixel • 4 bpp ==> 64x64 pixels saved • 8 bpp ==>64x32 pixels saved • 16 bpp ==> 32x32 pixels saved
Performance • Large polygons directly mean longer render time • Semi-transparent takes longer (R+W vs. W) • # of reads and writes is frame time • 4 bit means 4 texels per read
Performance Calculation • Cycles for 100x100 texture to 1/2 size 50x100 in 4 bit mode, cache misses always • Read: 100x100/4 = 2500 • Write: 50x100 = 5000 • Total: 7500 • So half texture size does not mean half the time • Ratio gets bad if texture is oblique to viewer
More Performance • Enlarging textures is faster than reducing • Texture cache receives multiple hits per pixel • No filtering • If you always hit the cache (never read from the frame buffer), 4, 8, 16 bits all take same time • Or if you make a repeated pattern small enough • Clipping generates empty cycles • Above and below OK but not side to side
Texture Mapping • No Z used, no perspective • Perspective correct is: U = a0*x + a1*y + a2 V = b0*x + b1*y + b2 U = (a0*x + a1*y + a2*z + a3) / (c0*x + c1*y + c2*z + c3) V = b0*x + b1*y + b2*z + b3 / (c0*x + c1*y + c2*z + c3)
Texture Mapping • If z value doesn’t vary much then OK • Otherwise you get: • Diagonal distortions of textures (split polys)
Geometry Transforms • Local coord -> World coord -> Screen • Local has rot and trans to world • World has rot and trans to viewer • Can concatenate to 3x3 mult and vec add • Then divide to do perspective correct • Lighting is similar (loc normal to world)
Normal Line Clipping • Back face culling (in screen space) • Order of vertices in screen space seems to determine whether they are back face or not 0 1 2
Depth Cueing • Use vertex colors (and blend) • Only done with black or if texture and back are close together • Use texture • Use MIP MAP but farther maps are darker • MIP MAP based on ‘size’ of polygon => depth • Can also change CLUT based on drawing order
MIP MAP • Must be within the same texture page
Meshing • Strip Mesh • Round Mesh
<PS-X OS> • Environment for developer and program control • Takes up max of 64 KB of RAM • OS System table are not hidden for speed • Careful programming • Multi-tasking OS (background music + drawing) • Multiple file systems (through driver)
PS2 • Totally different
PS 2 Sources • 2 Main Papers: • “A Microprocessor with a 128-Bit CPU, Ten Floating-Point MAC’s, Four Floating-Point Dividers,, and an MPEG-2 Decoder” by Masakazu Suzuoki et al. In IEEE Journal of Solid-State Circuits Vol 14. No. 11, November 1999. Page 1608 - 1618 • “Designing and Programming the Emotion Engine” by Masaaki Oka and Masakazu Suzuoki. IEEE Micro November-December 1999. Pages 20-28. • Chip Images • http://fuji.stanford.edu/seminars/spring99/slides/may13/sld001.html, Slides 29, 30, 36 • GPU Info: http://www.g-o-l.com/ck/speciali/altro/graphics-synthesizer.pdf
CPU (Emotion Engine) • 250 Mhz • 32 MB RAMBUS • 2 GB/sec bus • MIPS core with vector coprocessors • 128 bit internal and external pathways • VLIW • 10.5 M transistors
New Design Goals • Behavior synthesis • Dynamices (distance, Newton iterations) • Geometry • More surface processing • Texture compression
Three in One • RISC core with floating point • 128 bit registers • Two floating point vector units • VPU0 for behavior and physics • VPU1 for geometry • Independent phyics and geometry • Because of penalties with vector processors
Acronyms • IPU = MPEG2 decoder • DMAC = direct memory access • EFU = elementary function unit • SPR = scratch pad RAM • Used for communicating between subprocs • GIF = graphics synthesizer interface unit • Interprets display lists
VPU • 4D quantities (x,y,z,w), (r,g,b,a) • 4 multiply accumulators (FMAC) • Big penalties for branches and context switches and interrupts • Cache • Swap out big chunks at a time • VLIW bad efficiency • Break into two parts