1 / 41

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation. Phillip Porras and Hassen Saidi Computer Science Lab SRI International . Objectives.

rufin
Télécharger la présentation

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation Phillip Porras and Hassen Saidi Computer Science Lab SRI International

  2. Objectives • Now that we have various ways of knowing what the malware does when running on an infected system, we aim at answering two fundamental questions: • How does it do it? • What are the full capability of the malware: both observed behavior and yet to be triggered behavior

  3. Dynamic vs Static Malware Analysis • Dynamic Analysis • Techniques that profile actions of binary at runtime • Only provides partial ``effects-oriented profile’’ of malware potential • Static Analysis • Techniques that apply program analysis to the binary code • Can provide complementary insights • Potential for more comprehensive assessment

  4. Malgram Report • …go interactive

  5. From Binary To Semantically Rich C Code Raw Binary Disassembly

  6. From Binary To Semantically Rich C Code Complete Disassembly

  7. From Binary To Semantically Rich C Code Decompiled C code

  8. Challenges in Static Analysis Complete Disassembly Raw Binary Disassembly Decompiled C code

  9. Malware Obfuscation • Most malware is obfuscated • Packing is the most used obfuscation technique • Packing is often combined with other advanced forms of obfuscation: • Binary Rewrite to create semantically equivalent code with vastly different structure • Call obfuscation in general and API obfuscation in particular • Chuncking or “code spaghettisation” • Dead code (or functionally irrelevant code) Page 9

  10. Challenges in Static Analysis Raw Binary Challenge: Does the binary represents the full malware binary logic. Disassembly

  11. Unpacking Result Unpacking Page 11

  12. Packed vs Unpacked • go interactive…

  13. Coarse-grained Execution Monitoring • Generalized unpacking principle • Execute binary till it has sufficiently revealed itself • Dump the process execution image for static analysis • Monitoring execution progress • Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table) • Callback invoked on each NTDLL system call • Filtering based on malware process pid

  14. Statistics-based Unpacking • Observations • Statistical properties of packed executable differ from unpacked executable • As malware executes code-to-data ratio increases • Complications • Code and data sections are interleaved in PE executables • Data directories (import tables) look similar to data but are often found in code sections • Properties of data sections vary with packers

  15. Statistics-based Unpacking (3)

  16. Evaluation (ASPack)

  17. Evaluation (MoleBox)

  18. API Resolution • User-level malware programs require system calls to perform malicious actions • Use Win32 API to access user level libraries • Obfuscations impede malware analysis using disassemblers and decompilers • Packers use non-standard linking and loading of dlls • Obfuscated API resolution

  19. Standard API Resolution Imports in IAT identified by IDA by looking at Import Table

  20. Resolving API Calls Using Dataflow Analysis GetEnvironmentStringW def use Identify register based indirect calls

  21. Evaluation Metrics • Measuring analyzability • Code-to-data ratio • Use disassembler to separate code and data. • Most successfully unpacked malware have code-to-data ratio over 50% • API resolution success • Percentage of API calls that have been resolved from the set of all call sites. • Higher percentage implies more the malware is amenable to static anlaysis.

  22. Challenges in Static Analysis Complete Disassembly Challenge: Can we isolate subroutines? Disassembly

  23. Binary Rewrites • go interactive …

  24. From Raw Binary To Decompiled C Code Complete Disassembly Raw Binary Disassembly Decompiled C code

  25. Renaissance: Improving C Code Readability void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1) { unsigned int *destination2; size_t num3, num2, num4, num5; destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 ); } void *sub_9AB966(int a1, void *source, unsigned int a3) { int v3, v4, v5, v6, v8; v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); } Hex Rays Hex Rays + Renaissance

  26. 1. Typing and naming variables void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1) { unsigned int *destination2; size_t num3, num2, num4, num5; destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 ); } void *sub_9AB966(int a1, void *source, unsigned int a3) { int v3, v4, v5, v6, v8; v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); } Hex Rays Hex Rays + Renaissance

  27. 2. Highlighting important vars void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_tnum1) { unsigned int *destination2; size_t num3, num2, num4, num5; destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 ); } void *sub_9AB966(int a1, void *source, unsigned inta3) { int v3, v4, v5, v6, v8; v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); } Hex Rays Hex Rays + Renaissance

  28. 3. Improvements to decompilation void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1) { unsigned int *destination2; size_t num3, num2, num4, num5; destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 ); } void *sub_9AB966(int a1, void *source, unsigned int a3) { intv3, v4, v5, v6, v8; v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); } Hex Rays Hex Rays + Renaissance

  29. 4. Caller → Callee type info void *sub_9AB966(unsigned int *destination1, unsigned int *source, size_t num1) { unsigned int *destination2; size_t num3, num2, num4, num5; destination2 = destination1; num3 = destination1[20] + 8 * num1; num2 = (destination1[20] >> 3) & 0x3F; destination1[20] = num3; if ( num3 < 8 * num1 ) ++destination1[24]; destination1[24] += num1 >> 29; if ( num2 + num1 <= 0x3F ) { num4 = 0; } else { num4 = 64 - num2; memcpy( &destination1[num2 + 28], source, 64 - num2); sub_9A9F13( destination1, &destination1[28] ); if ( num4 + 63 < num1 ) { num5 = num4 + 63; do { sub_9A9F13( destination2, &source[num5 - 63] ); num5 += 64; num4 += 64; } while ( num5 < num1 ); } num2 = 0; } return memcpy( &destination2[num2 + 28], &source[num4], num1 - num4 ); } void *sub_9AB966(int a1, void *source, unsigned int a3) { int v3, v4, v5, v6, v8; v3 = a1; v4 = *(_DWORD *)(a1 + 20) + 8 * a3; v5 = (*(_DWORD *)(a1 + 20) >> 3) & 0x3F; *(_DWORD *)(a1 + 20) = v4; if ( v4 < 8 * a3 ) ++*(_DWORD *)(a1 + 24); *(_DWORD *)(a1 + 24) += a3 >> 29; if ( v5 + a3 <= 0x3F ) { v6 = 0; } else { v6 = 64 - v5; memcpy((void *)(v5 + a1 + 28), source, 64 - v5); sub_9A9F13(a1, (void *)(a1 + 28)); if ( v6 + 63 < a3 ) { v8 = v6 + 63; do { sub_9A9F13(v3, (char *)source + v8 - 63); v8 += 64; v6 += 64; } while ( v8 < a3 ); } v5 = 0; } return memcpy((void *)(v5 + v3 + 28),(char *)source + v6, a3 - v6); } Hex Rays Hex Rays + Renaissance

  30. Evaluation

  31. Challenges in Static Analysis Complete Disassembly Raw Binary Disassembly Decompiled C code

  32. The Need for Rapid Crypto-Algorithm Isolation   AES Truecrypt Waledac SSL Agobot (IRC over SSL) Serpent Truecrypt Twofish Truecrypt  Cascades Truecrypt HASH Whirlpool Truecrypt HASH MD6 conficker  BC HASH SHA1 conficker A Truecrypt RC4 Rustock         Zeus    Conficker Custom Crypto / Encoding Pushdo         Kraken  mebroot         Mega-D  XOR-Custom Lethic Virut Hydraq Torpig RSA variants Nugashe Conficker Waledac Blowfish - 448 bit Clampi

  33. Intra-module Analyzer IntraModule isCrypto() Constant Data Loading Constant Detector isCrypto Score = isConst + isPadded + Crypt API fn (LargeVar, Loop Detection, Opcodes, BigMath) Padding Analysis Microsoft CryptoAPI CAPICON Large Local Variables cryptoFnDetection () – At least 2 matches Unknown Computation cryptoFnDetection () Loop Detection Big Number Math Opcode Analysis

  34. Constant detection Direct Reference BlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6

  35. Indirect Load BlowfishCameliaCASTCAST256CRC32DESGOSTHAVALMARSMD2PKCS_MD2PKCS_MD5PKCS_RIPEMD160PKCS_SHA256PKCS_SHA384PKCS_SHA512PKCS_TigerRawDESRC2RijndaelSAFER SHA1SHA256SHA512SHARKSKIPJACKSquareTigerTwofishWAKEWhirlpoolzlibAESMD6 Data array contains Known crypto content Unknown Computation This could be Encryption Or Decryption Load Array

  36. Inter-module Analyzer funcColorNode (Subgraph) { if (exists uncolored subgraph) ColorNode (subgraph) foreach leaf in subgraph { isCrypto(Leaf) } If (exist green leaf) then color root green if (exist orange leaf) then color root orange if (exist > 2 red leaves) then color root red } funccryptoString (per subroutine) if node contains known crypto implementation substring, label node with corresponding crypto library. AES MD6 Vowpalwabbit

  37. IDA Pro Call Graph w/ Crypto-routine detection

  38. Example Running SRI Crypt Finder  (c) SRI International  Finding crypto constants and subroutines in binary files  automatic discovery of crypto functions as unknown computations4BABF1: found sparse constants for SHA-1 50C254: found const array sbox_AES (used in AES) 50E354: found const array rsbox_AES (used in AES) 50F574: found const array Twofish_q (used in Twofish) 50F7A4: found const array MARS_Sbox (used in MARS) 510EA4: found const array zinflate_lengthExtraBits (used in zlib) 510F18: found const array zinflate_distanceExtraBits (used in zlib) 511918: found const array CRC32_m_tab (used in CRC32) 514F98: found const array CRC32_m_tab (used in CRC32) Found 9 known constant arrays in total. Scanning code for crypto subroutines found crypto in Function @ 407334 found crypto in Function @ 40E5B4 found crypto in Function @ 47D954 found crypto in Function @ 47ED34 found crypto in Function @ 4816F4 found crypto in Function @ 4B6624 found crypto in Function @ 4B9980 found crypto in Function @ 4CCBD4 found crypto in Function @ 4CCD4C found crypto in Function @ 4CE208 found crypto in Function @ 4CE7CC found crypto in Function @ 4CEBE8 found crypto in Function @ 4D9B00 found crypto in Function @ 4D9EE4 Done labelling crypto subroutines Found 14 subroutine(s) with possible crypto

  39. Running SRI Crypt Finder

  40. Report Generation • go interactive

More Related