740 likes | 900 Vues
Explore the intricate graphics architecture of the original Sony PlayStation, designed under the guidance of Ken Kutaragi, the "Father of PlayStation." This comprehensive overview delves into key technical features such as the MIPS R3000A CPU, Geometry Transform Engine, and the graphics processing unit (GPU). Learn how these components enabled the PlayStation to push the boundaries of gaming visuals and audio, setting a new standard in the industry. From its initial announcement in 1991 to its groundbreaking release in Japan and the US, uncover the legacy of PlayStation in video game history.
E N D
Sony Playstation:Graphics in a Crunch By Paul Zimmons
PSX Sources: • http://www.classicgaming.com/aec/css/html/psx_development.html • http://psx.rules.org/ • http://dev.paradogs.com/ • http://www.eetimes.com • More sources listed for PS2
History • Play Station was announced in June 1991 at the CES • Part of a deal with Nintendo to add a CD-ROM attachment to the Super Nintendo system • The deal fell through • 1993 - PlayStation-X announced • December 3, 1994 - Japan release - $387 • 300,000 units in 30 days • September 9, 1995 - US release - $299
December 1994 • Onyx RealityEngine 2 was king ~ $100K • Indigo just starting ~ $15-30K • PC - Pentium 90, 16M, 1M Video, 500M • Mac - 100 Mhz PPC604, 16M, 2M Video, 500M HD announced for Q2 1995 • Doom II came out October 10, 1994
People of PSX • Ken Kutaragi • ‘Father’ of the Playstation • Designed audio chip for initial defunct SNES Nintendo deal • Headed Playstation project • Masakazu Suzuoki (suzu@rd.scei.sony.co.jp) • More technical • Design started in May 1992 (CD, 1M transistors goals)
PSX HW Overview • MIPS R3000A main CPU at 33.8688Mhz • 32 bit processor, 4KB I cache, 1 KB D cache • 2 MB of main system memory • 1 MB of video memory, 2KB tex cache • 32 bit bus, 132 Mbyte/sec • 2x CD-ROM (300KB/sec) • 512 KB Sound buffer, 24 simul. Sounds • ~300 K unlit no tex poly/sec
CPU Notes • Some ranges of logical memory space benefit from I cache, some don’t • D cache is accessible by programmers • Between CPU to main memory are • I cache, R buffer (reads),W buffer (writes) • Buffers speed up ops but make exact program prediction nearly impossible
Sound Notes • Pitch can be changed on the fly • Pitch of one sound can be varied by the volume of another sound • Pseudo random noise built in (changed by clock changes) • Attack, decay, sustain, release • Separate reverb unit • Buffer is used for reverb, SPU, xfers to MM
Geometry Transform Engine • Coordinate Transforms • Light Source Calculations • Coprocessor for the CPU (separate chip) • Fixed point matrix and vector ops • Sin, Cos on R3000 (rot matrices) • Operates in parallel • 1 sign bit, 3 integer bits, 12 fraction bits
GPU • Receives drawing instructions from the CPU • Addresses from the CPU map to the frame buffer • Textures and palettes are sent from the CPU to the frame buffer • GPU uses that info with the coordinates and color info (lighting) from the GTE
Frame Buffer • Dual ported, can also go direct MM to FB • Can draw while you display
Frame Buffer • Supports 256x240 up to 640x480 • Can do interlaced or not • 15 bit color or 24 bit color (no ops in 24)
Frame Buffer • 640x480x24 is really done as 960x480x16 • Virtual bits per pixel • GPU only works on 16 bits so no drawing while showing a movie sequence • Frame buffer is used for current output, drawing area, textures and color tables • Color palette = color table = CLUT = color look up table = color index mode • No separate texture memory
Typical Frame Buffer Layout Can also change drawing environment properties while drawing.
Primitive Types • Data is in Main Memory • Drawing primitives - seen on screen • Special primitives - change drawing parameters while drawing occurs • Polygon Primitive • 3 or 4 sides, flat or Gouraud, textured or not • Line Primitive • Line (A,B), (A,B,C), (A,B,C,D), gradient or no
Primitive Types • Sprite Primitive - rectangle • Sprite - tex map, Tile - no tex map • free, 1x1, 8x8, 16x16 pixels • Special Primitive • Change window, window clipping, texture window, drawing offset
Drawing Primitives • Draw on pixel centers, pixel center inside OK • Pixel center outside has rules • If pixel to right is inside -> draw • If pixel to left is inside -> no draw • If pixel above is inside -> no draw • If pixel below is inside -> draw • Don’t draw boundaries more than once
Ordering Tables • Like Word diagrams (grouping, order) • Similar to a linked list (draw1->draw2) • GPU renders while CPU goes on
Z Sorting with Ordering Tables • Calculate the primitives position in the table based on its Z value • GTE creates ordering table while converting (x,y,z) coordinates to (xs,ys), zw/4 returned • 256 entries: AddPoly(ot+256-z, poly0) • i.e. Painter’s algorithm style back to front • Last -> Next to Last -> Before that -> ..closest
Reverse OT • Normal OT approaches 0 close to viewer (!) • Let’s try 1/z • Index into OT with 1/z • But then we are indexing in reverse • Reverse the order of the table • OT: ot[0]->ot[1]->… ot[SIZE-1] • ROT: ot[SIZE-1]->ot[SIZE-2] … ->ot[0]
Transform to Screen • (Wx,Wy,Wz) - World, (Sx,Sy,Sz) - Screen • (m00, m01.. m22) - rotation matrix
Now Perspective Xform • Project onto screen • Distance h from the viewer Sy Sz h
Packet Buffers • PB=Area in memory for OT and primitives • CPU and graphics are not in parallel • Have two sets of Packet Buffers
Texture Mapping • Textures are stored in the frame buffer • Textures are in Texture Pages (256 x 256) • X coord multiple of 64, Y multiple of 256 • 4, 8, 16 bit (15+1) • 4, 8 use CLUT (palette of 16 or 256 colors)
Texture Cache • GPU has 2K of texture cache • Faster than the frame buffer reads • Texture reads fill the texture cache • Subsequent reads hit the cache • Depends on your bits per pixel • 4 bpp ==> 64x64 pixels saved • 8 bpp ==>64x32 pixels saved • 16 bpp ==> 32x32 pixels saved
Performance • Large polygons directly mean longer render time • Semi-transparent takes longer (R+W vs. W) • # of reads and writes is frame time • 4 bit means 4 texels per read
Performance Calculation • Cycles for 100x100 texture to 1/2 size 50x100 in 4 bit mode, cache misses always • Read: 100x100/4 = 2500 • Write: 50x100 = 5000 • Total: 7500 • So half texture size does not mean half the time • Ratio gets bad if texture is oblique to viewer
More Performance • Enlarging textures is faster than reducing • Texture cache receives multiple hits per pixel • No filtering • If you always hit the cache (never read from the frame buffer), 4, 8, 16 bits all take same time • Or if you make a repeated pattern small enough • Clipping generates empty cycles • Above and below OK but not side to side
Texture Mapping • No Z used, no perspective • Perspective correct is: U = a0*x + a1*y + a2 V = b0*x + b1*y + b2 U = (a0*x + a1*y + a2*z + a3) / (c0*x + c1*y + c2*z + c3) V = b0*x + b1*y + b2*z + b3 / (c0*x + c1*y + c2*z + c3)
Texture Mapping • If z value doesn’t vary much then OK • Otherwise you get: • Diagonal distortions of textures (split polys)
Geometry Transforms • Local coord -> World coord -> Screen • Local has rot and trans to world • World has rot and trans to viewer • Can concatenate to 3x3 mult and vec add • Then divide to do perspective correct • Lighting is similar (loc normal to world)
Normal Line Clipping • Back face culling (in screen space) • Order of vertices in screen space seems to determine whether they are back face or not 0 1 2
Depth Cueing • Use vertex colors (and blend) • Only done with black or if texture and back are close together • Use texture • Use MIP MAP but farther maps are darker • MIP MAP based on ‘size’ of polygon => depth • Can also change CLUT based on drawing order
MIP MAP • Must be within the same texture page
Meshing • Strip Mesh • Round Mesh
<PS-X OS> • Environment for developer and program control • Takes up max of 64 KB of RAM • OS System table are not hidden for speed • Careful programming • Multi-tasking OS (background music + drawing) • Multiple file systems (through driver)
PS2 • Totally different
PS 2 Sources • 2 Main Papers: • “A Microprocessor with a 128-Bit CPU, Ten Floating-Point MAC’s, Four Floating-Point Dividers,, and an MPEG-2 Decoder” by Masakazu Suzuoki et al. In IEEE Journal of Solid-State Circuits Vol 14. No. 11, November 1999. Page 1608 - 1618 • “Designing and Programming the Emotion Engine” by Masaaki Oka and Masakazu Suzuoki. IEEE Micro November-December 1999. Pages 20-28. • Chip Images • http://fuji.stanford.edu/seminars/spring99/slides/may13/sld001.html, Slides 29, 30, 36 • GPU Info: http://www.g-o-l.com/ck/speciali/altro/graphics-synthesizer.pdf
CPU (Emotion Engine) • 250 Mhz • 32 MB RAMBUS • 2 GB/sec bus • MIPS core with vector coprocessors • 128 bit internal and external pathways • VLIW • 10.5 M transistors
New Design Goals • Behavior synthesis • Dynamices (distance, Newton iterations) • Geometry • More surface processing • Texture compression
Three in One • RISC core with floating point • 128 bit registers • Two floating point vector units • VPU0 for behavior and physics • VPU1 for geometry • Independent phyics and geometry • Because of penalties with vector processors
Acronyms • IPU = MPEG2 decoder • DMAC = direct memory access • EFU = elementary function unit • SPR = scratch pad RAM • Used for communicating between subprocs • GIF = graphics synthesizer interface unit • Interprets display lists
VPU • 4D quantities (x,y,z,w), (r,g,b,a) • 4 multiply accumulators (FMAC) • Big penalties for branches and context switches and interrupts • Cache • Swap out big chunks at a time • VLIW bad efficiency • Break into two parts