1 / 11

Converse BlueGene Emulator

Converse BlueGene Emulator. Gengbin Zheng Parallel Programming Lab 2/27/2001. Objective. Completely rewritten the previous Charm++ Blue Gene emulator; Bluegene emulator for architecture studying (PetaFLOPS computers); Performance estimation (with proper time stamping)

crwys
Télécharger la présentation

Converse BlueGene Emulator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Converse BlueGene Emulator Gengbin Zheng Parallel Programming Lab 2/27/2001

  2. Objective • Completely rewritten the previous Charm++ Blue Gene emulator; • Bluegene emulator for architecture studying (PetaFLOPS computers); • Performance estimation (with proper time stamping) • Provide API for building Charm++ on top of it.

  3. Node(x2,y2,z2) Node(x1,y1,z1) Node(x3,y3,z3) Big picture - emulator • 34x34x36 nodes • 25 processors per node • 8 threads per processor Emulator Processor

  4. Bluegene Emulator Communication threads Worker thread inBuffer Affinity message queue Non-affinity message queue Node Structure

  5. Communication Threads • Communication threads get messages from inbuffer • If small work, execute the task itself. • If affinity message, put to the thread’s local queue; • If non-affinity message, put to the node queue;

  6. Worker threads • Worker threads examine messages from two queues: affinity queue and non-affinity queue; • Compare the receive-time of two messages and pick the one that comes first and execute it;

  7. Low-level API • Class NodeInfo: id, x, y, z, udata, commThQ, workThQ • Class ThreadInfo: (thread private variable) id, type, myNode, currTime • Class BgMessage: node, threadID, handlerID, type, sendTime, recvTime, data • getFullBuffer() • checkReady() • addBgNodeMessage() • addBgThreadMessage() • sendPacket()

  8. User’s API • BgGetXYZ() • BgGetSize(), BgSetSize() • BgGetNumWorkThread(), BgSetNumWorkThread() • BgGetNumCommThread(), BgSetNumCommThread() • BgRegisterHandler() • BgGetNodeData(), BgSetNodeData() • BgGetThreadID(), BgGetGlobalThreadID() • BgGetTime() • BgSendPacket(), etc • BgShutdown() • BgEmulatorInit(), BgNodeStart()

  9. Bluegene application example - Ring void BgEmulatorInit(int argc, char **argv) { if (argc < 6) CmiAbort("Usage: <ring> <x> <y> <z> <numCommTh> <numWorkTh>\n"); BgSetSize(atoi(argv[1]), atoi(argv[2]), atoi(argv[3])); BgSetNumCommThread(atoi(argv[4])); BgSetNumWorkThread(atoi(argv[5])); passRingID = BgRegisterHandler(passRing); } void BgNodeStart(int argc, char **argv) { int x,y,z; int nx, ny, nz; int data=888; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x == 0 && y==0 && z==0) BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); } void passRing(char *msg) { int x, y, z; int nx, ny, nz; int data = *(int *)msg; BgGetXYZ(&x, &y, &z); nextxyz(x, y, z, &nx, &ny, &nz); if (x==0 && y==0 && z==0) if (++iter == MAXITER) BgShutdown(); BgSendPacket(nx, ny, nz, passRingID, LARGE_WORK, sizeof(int), (char *)&data); }

  10. Performance • Pingpong • Close to Converse pingpong; • 81-103 us v.s. 92 us RTT • Charm++ pingpong • 116 us RTT • Charm++ Bluegene pingpong • 134-175 us RTT

  11. Charm++ on top of Emulator • BlueGene thread represents Charm++ node; • Name conflict: • Cpv, Ctv • MsgSend, etc • CkMyPe(), CkNumPes(), etc

More Related