1 / 74

Compiler++ Evolving the compiler - C2.DLL

Compiler++ Evolving the compiler - C2.DLL. Jim Radigan - Architect C ++ Optimizer. Mission: Evolving the C++ compiler. Evolve the red arrow. $87.7 B. 1. ~Absolute Correctness 2. Compiler throughput 3. Code size 4. Code quality. $100 .0B +. 3,100,000 Transistors. Ivy Bridge .

kalare
Télécharger la présentation

Compiler++ Evolving the compiler - C2.DLL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler++ Evolving the compiler - C2.DLL Jim Radigan - Architect C++ Optimizer

  2. Mission:Evolving the C++ compiler

  3. Evolve the red arrow $87.7 B • 1. ~Absolute Correctness • 2. Compiler throughput • 3. Code size • 4. Code quality $100 .0B +

  4. 3,100,000 Transistors

  5. Ivy Bridge 1.4 Billion Transistors

  6. TEGRA 3- 5 cores / 128 bit vector instructions

  7. Haswell C++

  8. Built with C++ • Windows SQL Office • Mission critical correctness and compile time

  9. Compiler++ “Evolving the compiler” • How we work • Core Technologies • Where we are going

  10. Full compile, test build Windows – N hours 24 cores + 32 Gb memory 3 raid 0 drives

  11. … if you’re in a hurry – 40 cores

  12. X86, ARM, X64 - retail and checked

  13. N Applications - then stress a compiler’s build

  14. Compiler developer – bad day

  15. Win8 improved– but still a work/life balance thing

  16. Compiler++ “Evolving the compiler” • How we work • Core Technologies • Where we are going

  17. “Compiler Business” • Absolutely NO new compiler optimization switches • Each switch would cost millions $$

  18. Core Technologies • Code size / stack size / data alignment • Vectorization/Parallelization of existing C++ • Security • Parallelizing C++ control flow • Alias analysis • FOR ALL HARDWARE & RUNTIMES!!

  19. Code Size / Stack Size • Foo (int p1, int p2, int p3) { • intw,x,y,z • …. • if (flag) { • w = • x = w + z • … • return x • } • else { • y = • } [ebp+10] Parameter 3 [ebp+0C] Parameter 2 [ebp+08] Parameter 1 [ebp+04] Return address [ebp+00] Old ebp [ebp -04] Local 1 // w [ebp -08] Local 2 // x [ebp -0C] Local 3 // z or y

  20. Stack Packing

  21. Its all about… CACHE LINES

  22. NTSTATUS • NtfsCommonRead ( • PIRP_CONTEXT IrpContext, • PIRP Irp, • BOOLEAN AcquireScb • ){ • NTSTATUS Status; • PIO_STACK_LOCATION IrpSp; • PFILE_OBJECT FileObject; • TYPE_OF_OPEN TypeOfOpen; • PVCB Vcb; • PFCB Fcb; • PSCB Scb; • PCCB Ccb; • ATTRIBUTE_ENUMERATION_CONTEXT AttrContext; • EOF_WAIT_BLOCK EofWaitBlock; • PFSRTL_ADVANCED_FCB_HEADER Header; • PTOP_LEVEL_CONTEXT TopLevelContext; • VBO StartingVbo; • LONGLONG ByteCount; • LONGLONG ByteRange; • ULONG RequestedByteCount; • PCOMPRESSION_SYNC CompressionSync = ((void *)0); • BOOLEAN FoundAttribute = 0; • BOOLEAN PostIrp = 0; • BOOLEAN OplockPostIrp = 0; • BOOLEAN ScbAcquired = 0; • BOOLEAN ReleaseScb; • BOOLEAN PagingIoAcquired = 0; • BOOLEAN DoingIoAtEof = 0; • BOOLEAN Wait; • BOOLEAN PagingIo; • BOOLEAN NonCachedIo; • BOOLEAN SynchronousIo; • BOOLEAN CompressedIo = 0;

  23. ROOT • __try { • NtfsPrePostIrp( IrpContext, Irp ); • if (( (((Fcb->FcbState) & ((0x00000004)))) ) && • ( (((Scb->ScbState) & ((0x00000010)))) )) { • FsRtlPostPagingFileStackOverflow( IrpContext, Event, NtfsStackOverflowRead ); • } else { • FsRtlPostStackOverflow( IrpContext, Event, NtfsStackOverflowRead ); • } • (void) KeWaitForSingleObject( Event, Executive, KernelMode, 0, ((void *)0) ); • Status = ((NTSTATUS)0x00000103L); • } __finally { • if (Resource != ((void *)0)) { • (ExReleaseResourceLite(Resource)); • } • ExFreeToNPagedLookasideList( &NtfsKeventLookasideList, Event ); • } • } else { • if (Irp->Tail.Overlay.AuxiliaryBuffer != ((void *)0)) { • IrpContext->Union.AuxiliaryBuffer = • (PFSRTL_AUXILIARY_BUFFER)Irp->Tail.Overlay.AuxiliaryBuffer; • if (!( (((IrpContext->Union.AuxiliaryBuffer->Flags) & (0x00000001))) )) { • Irp->Tail.Overlay.AuxiliaryBuffer = ((void *)0); • } • } • Status = NtfsCommonRead( IrpContext, Irp, 1 ); • } • break; • } • __except (NtfsExceptionFilter( IrpContext, (struct _EXCEPTION_POINTERS *)_exception_info() )) { • NTSTATUS ExceptionCode; • ExceptionCode= _exception_code(); • if (ExceptionCode == ((NTSTATUS)0xC0000123L)) { • IrpContext->ExceptionStatus = ExceptionCode = ((NTSTATUS)0xC0000011L); • Irp->IoStatus.Information = 0; • } • } TRY EXCEPT TRY FINALLY

  24. Try Region Graph – asynchronous lifetimes int x, y; _try { _try { x = } _finally { } = x + … y = _except (filter()) { = y } ROOT TRY = x EXCEPT TRY X = FINALLY

  25. Recall …Compiler dev. primary concern

  26. C++ Core Technologies • Code size / stack size / data alignment • Vectorization/Parallelization of existing C++ • Security • Parallelizing C++ control flow • Alias analysis

  27. C++ Compiler - Auto Parallelism

  28. Vector - all loads before all stores “addps xmm1, xmm0 “ xmm0 xmm1 + xmm1

  29. Simple vector add loop - unaligned for (i = 0; i <1000/4; i++){ movps xmm0, [ecx] movps xmm1, [eax] addps xmm0, xmm1 movps [edx], xmm0 } for (i = 0; i < 1000; i++) A[i] = B[i] + C[i]; Compiler looks across loop iterations !

  30. Auto Parallelism/Vectorization for C++ • For ( iv1= 0; iv1 <= U1; iv1++) •   For ( iv2= 0; iv2 <= U2; iv2++) •      ... •       For (ivn= 0; ivn <= Un; ivn++) •               t13 = OPLOAD [ a1*iv1+ a2 *iv2 + ... an* ivn+ sym_expression] •       } •    } • }

  31. Math in the compiler- Legal to vectorize ? FOR ( j = 2; j <= 5; j++) A( j ) = A (j-1) + A (j+1) Not Equal !! A (2:5) = A (1:4) + A (3:7) A(3) = ?

  32. Vector Semantics • ALL loads before ALL stores A (2:5) = A (1:4) + A (3:7) VR1 = LOAD(A(1:5)) VR2 = LOAD(A(3:7)) VR3 = VR1 + VR2 // A(3) = F (A(2) A(4)) STORE(A(2:5)) = VR3

  33. Vector Semantics • Instead - load store load store ... FOR ( j = 2; j <= 257; j++) A( j ) = A( j-1 ) + A( j+1 ) A(2) = A(1) + A(3) A(3) = A(2) + A(4) // A(3) = F ( A(1)A(2)A(3)A(4) ) A(4) = A(3) + A(5) A(5) = A(4) + A(6) …

  34. Doubled the optimizer A ( a1 * I + c1 ) ?= A ( a2 * I’ + c2)

  35. for (size_t j = 0; j < numBodies; j++) { D3DXVECTOR4 r; r.x = A[j].pos.x - pos.x; r.y = A[j].pos.y - pos.y; r.z = A[j].pos.z - pos.z; float distSqr = r.x*r.x + r.y*r.y + r.z*r.z; distSqr += softeningSquared; float invDist = 1.0f / sqrt(distSqr); float invDistCube = invDist * invDist * invDist; float s = fParticleMass * invDistCube; acc.x += r.x * s; acc.y += r.y * s; acc.z += r.z * s; } Legal math ? Complex C++ Not just arrays!

  36. Legal ? Where’s the base of the array? void foo(int n, float *a, float *b, float *c){                for (int j=0; j<n; j++) { *a++ = *b++ + *c++;                } }

  37. …and where’s the IV? A ( a1 * I + c1 ) ?= A ( a2 * I’ + c2) • void • transform1(int* first1, int* last1, int* first2, int* result) { • while (first1 != last1) {    • *result++ = *first1++ + *first2++; • } • } STL – source code

  38. Parallelizing C++ requires transformation to analyze

  39. while (first1 != last1) {            *result++ = *first1++ + *first2++; } intsynthetic_i; intsynthetic_upper  =  (last1 – first1 + 4)/4; for (synthetic_i = 0; synthetic_i < synthetic_upper; synthetic_i++) {     result[synthetic_i] = first1[synthetic_i] + first2[sythetic_i]; } STL – source code

  40. Now …C++ vector code gen • We don’t know if the array bases overlap • We don’t know what the target ISA is • We don’t know if the trip count is divisible by 4

  41. if ( ! overlap (result, first1) && ! overlap(result ,first2)) • if (_ISA_AVAILABLE(AVX2)) { • for (i= 0; i< synthetic_upper/4; i+= 4) { // Vector + Parallel Loop • result[i : i +3] = first1[i : i + 3] + first2[i : i +3]; • } • j = synthetic_upper/4 • } • } • for (j = 0; j < synthetic_upper; i++) { // Sequential or cleanup loop • result[j] = first1[j] + first2[j]; • }

  42. Maps C++ to all forms of Parallelism • Vector • Vector + Parallel • SPMD

  43. Don’t BSOD…its all about life style choices

  44. C++ Core Technologies • Code size / stack size / data alignment • Vectorization/Parallelization of existing C++ • Security • Parallelizing C++ control flow • Alias analysis

  45. Heap overflow vulnerability • HRESULT CDocManager::IsValidWMToolsStream(bool* pfValid) • { • long cbSize; • if(FAILED(hr = ExtractDataSize(strPath, &cbSize))) • return S_OK; • CSmartPtr<BYTE> pBuffer = new BYTE[cbSize]; • ExtractData(strPath, pBuffer, cbSize); • long dwCheckSum = DwChecksumFromLpvCb(0, pBuffer, cbSize); • long dwStreamCnt = GetStreamCount(m_pVisitedTree); • if(FAILED(hr = ExtractDataSize(kszCheckSumStream, &cbSize))) { • return S_OK; • } • //ExtractData(kszCheckSumStream, pBuffer, cbSize); • for(int i=0; i<cbSize; i++) { • *pBuffer++ = *kszCheckSumStream++; • } • } 1. cbSize assigned 4470 2. allocate buffer with 4470 bytes 3. cbSize re-assigned 4496 Heap Overflow! Leads to Hijack

  46. IE Aurora - Dangling pointer vulnerability 2. Copy evt, but fail to AddRef on CTreeNode! <html><head><script> var e1; function f1(evt){ e1 = document.createEventObject(evt); document.getElementById("sp").innerHTML = ""; window.setInterval(f2, 50); } function f2(){ var t = e1.srcElement; } </script></head> <body> <span id="sp"> <imgsrc=“any.gif" onload=“f1(evt)"> </span> </body></html> 3. Destroy img tag in span leading to a free when evt falls out of scope 4. Call f2 async so evt goes out of scope Hijack! Vtable call via freed CTreeNode 1. Pass onload event (evt) to f1 • Red is C++ called from javascript

  47. Vulnerability: “use after free” heap pointer vtable attack data function_1 function_2 attack data attack code attack code attack code attack data

  48. Illegal - flow or writes • What if the C++ compiler generated code to check? • It would have to always be on • NOT degrade performance !! Example for : Hardware + Language + Compiler co-design

  49. C++ Core Technologies • Code size / stack size / data alignment • Vectorization/Parallelization of existing C++ • Security • Parallelizing C++ control flow • Alias analysis

More Related