50 likes | 167 Vues
This report outlines the progress made in decompiling .NET bytecode as part of the Computer Science Part II Project at Trinity Hall, Cambridge. It covers the structure of a decompiler, including the front-end processes, control flow analysis, and code generation. Examples illustrate the analysis of CIL bytecode, generating control-flow graphs, and reconstructing high-level language constructs. The current status details the implemented features, pending tasks, and plans for extending decompilation to other architectures and languages, highlighting the importance of ethical considerations in source recovery.
E N D
Decompilation of .NET bytecode Stephen Horne Trinity Hall Computer Science Part II Project Progress Report http://hal.trinhall.cam.ac.uk/~srh38/project 10th February 2004
The .NET framework • .NET and the Common Language Runtime • Microsoft’s answer to Java • CLR is .NET equivalent of the JVM • Lots of useful metadata provided in assemblies C# C# compiler J# J# compiler Common Language Runtime CIL and Metadata Managed C++ Managed C++ compiler VB .NET VB .NET compiler • What about reversing the compilation process? • Sometimes we want to recover source from a binary • Language translation • Lost source recovery • Checking for malicious code • Obvious legal and ethical ramifications Slide 2
Structure of a decompiler Executable Front end • Reads in bytecode • Divides into basic blocks Low-level intermediate code Unstructured control-flow graph UDM Decompiler • Data-flow analysis • Control-flow analysis Structured control-flow graph High-level intermediate code Source Back end • Code generation Slide 3
Example decompilation CIL bytecode Control-flow graph Process IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.0 IL_0003: stloc.1 IL_0004: br.s IL_0023 Entry 1 • Divide code into basic blocks and create CFG • Data-flow analysis • Register copy propogation • Control-flow analysis • Divide graph into intervals • Loops induced by back-edges within intervals • Nesting of intervals nesting of loops • Conditionals found by common follow nodes • Order of nodes nesting of conditionals • Generate code from structured CFG 1 IL_0006: ldc.i4.3 IL_0007: ldloc.1 IL_0008: mul IL_0009: ldarg.0 IL_000a: bge.s IL_0012 3 2 IL_000c: ldloc.0 IL_000d: ldc.i4.1 IL_000e: sub IL_000f: stloc.0 IL_0010: br.s IL_0016 4 8 3 IL_0012: ldloc.0 IL_0013: ldc.i4.1 IL_0014: add IL_0015: stloc.0 5 9 4 5 IL_0016: ldloc.0 IL_0017: call Math::Abs(int32) IL_001c: ldloc.1 IL_001d: blt.s IL_0006 6 Exit 6 IL_001f: ldloc.1 IL_0020: ldc.i4.1 IL_0021: add IL_0022: stloc.1 7 7 IL_0023: ldloc.1 IL_0024: ldarg.0 IL_0025: blt.s IL_0006 2 IL_0027: ldloc.0 IL_0028: stloc.2 IL_0029: br.s IL_002b 8 9 IL_002b: ldloc.2 IL_002c: ret Slide 4
Current status Original • Features implemented: • Analysis for basic conditional and looping structures • Control flow graph generation • C# code generation • Almost half the CIL instruction set • Decompiles very basic applications • Remaining tasks (lots!): • Local variable names • Basic language features (arrays, switching, breaks etc.) • Advanced features (custom indexers, operator overloading, properties) • Object oriented features • Extensions: • Decompilation for other stack-based architectures (e.g. Java) • Code generation for other languages (e.g VB .NET) • Graphical user interface public static int ControlExample(int x) { int y = 0; for(int i = 0; i < x; i++) { do { if(3 * i < x) y--; else y++; } while(Math.Abs(y) < i); } return y; } Decompiled public static Int32 ControlExample(Int32 x) { Int32 local0; Int32 local1; Int32 local2; local0 = 0; local1 = 0; while (local1 < x) { do { if (((3 * local1) < x)) { local0 = (local0 - 1); } else { local0 = (local0 + 1); } } while (Math.Abs(local0) < local1); local1 = (local1 + 1); } local2 = local0; return local2; } Slide 5