Software Security Through Code Obfuscation

Software Security ThroughCode Obfuscation

Introduction Definition Problem Statement Code Obfuscation Process Transformations Metrics for Obfuscation Transformations Classification of Transforms De-Obfuscation Commonly Employed Techniques The Power of Obfuscation Outline

Intellectual Protection Why Code Obfuscation? Legal protection Technical Protection Encryption Obfuscation Server-side Execution Trusted native code

JustificationIf Bob is able to retrieve Alice’s original source, he can intercept proprietary information such as data structures, algorithms, etc. Source Client compile Object Code Obfuscated Object code Object Code De-obfuscate obfuscate Obfuscated Object code De-compile Source Server Executer Alice Bob

Determining Potency vs. Cost: Potency: The level of obfuscation applied to the code. Cost: Maximum execution time/space that the obfuscated code adds to the application. In order to determine which level of obfuscation we desire, we must first analyze how much we are willing to forgo in program efficiency; hence the relation: Potency vs. Cost. Code Obfuscation Process

Source Pre-Processing Much like a compiler, this step gathers information about the application in order to determine which transformations will lead to the desired level of obfuscation. Types of Information Gathered: Symbol Table Data-Flow Data-Dependence Language Constructs Programming Idioms

Source goes through a number of pre-defined ObfuscatingTransformations until the desired relation of potency vs. cost is reached. Definition of an Obfuscating Transformation: Let P  P’ be a transformation of a source program P into a target program P’. P  P’ is an obfuscating transformation if P and P’ have the same observable behavior. More precisely, in order for P  P’ to be a legal transformation the following must hold: If P fails to terminate or terminates with an error, then P’ may or may not terminate. Otherwise, P’ must terminate and produce the same output as P. We classify an obfuscation transformation according to the type of information it targets and its level of potency. Transformations.

Measure of Potency Measure of Resilience Measure of Execution Cost Formal Definition of the Quality of an obfuscating transform: Tqual(P) = [Tpot(P), Tres(P), Tcost(P)] Evaluation of Obfuscating Transforms (3 Metrics)

Let T be a behavior-conserving transformation s.t. P  T  P’ transforms a source program P into a target program P’. Let E(P) be the complexity of P, as defined by known software complexity metrics. Tpot(P) is defined as E(P’)/E(P) – 1. T is a potent obfuscating transformation if Tpot > 0. From here, we will define the potency of a transform as <low, medium, high>. In order for a transform to be sufficiently potent, it should: Increase overall program size and introduce new classes/methods Introduce new predicates and increase the nesting level of conditional/looping constructs Increase the number of method arguments and inter-class instance variable dependencies Increase the height of the inheritance tree Increase long-range variable dependencies Measure of Potency

Resilience (according to the Merriam-Webster): 1: the capability of a strained body to recover its size and shape after deformation caused especially by compressive stress2: an ability to recover from or adjust easily to misfortune or change A transform is potent if it manages to confuse a human reader, but it is resilient if it confuses an automatic de-obfuscator. We base resiliency primarily on the scope of effect due to a transform. That is, if a transform effects an entire program it is more likely to provide is with a more resilient program. Resiliency is measured from trivial to one-way, with one-way defining a transformation that gives code P’ from which it is impossible to recover P. Measure of Resilience

The third component in describing the quality of a transformation is that of cost, which is based on the execution time/space penalty which is incurred upon an obfuscated application after transformation. Cost is measured on a four-point scale: Dear: if executing P’ requires exponentially more resources than P Costly: if executing P’ requires O(n^p), p > 1 more resources than P Cheap: if executing P’ requires O(n) more resources than P Free: if executing P’ requires O(1) more resources than P Measure of Execution Cost

Trivial but irreversible transformations Examples: Formatting Removal: Tqual(P) = [low, one-way, free] Removes source code formatting such as tabulation and carriage returns. This is a free yet un-reversible transformation. Code: voltage = current * resistance; power = (voltage * voltage) * resistance;  voltage=current*resistance;power=(voltage*voltage)*resistance; Classification of Transformations: Layout Transformations

Scrambling Identifier Names: Tqual(P) = [medium, one-way, free] Removes pragmatic information inherent in identifier names thus providing a higher level of potency; however, once transformed it cannot be undone. Code: voltage=current*resistance;power=(voltage*voltage)*resistance;  v4=i12*r15; p6=(v4*v4)*r15; Classification of Transformations: Layout Transformations

Purpose is to obscure the control flow of the source application Control Aggregation Transformations break up computations that logically belong together or merge computations that do not. Control Ordering Transformations randomize the order in which computations are carried out. Control Computation Transformations insert new redundant or dead code, or make algorithmic changes. Transformations which alter the flow of control have the largest computational overhead. Classification of Transformations: Control Transformations

The real challenge in designing control-altering transformations is to make them cheap and resistant to attack from de-obfuscation. To accomplish this, many transformations are based upon opaque variables and opaque predicates. A variable V is opaque if it has some property q which is known a priori to the obfuscator, but is difficult for a de-obfuscator to deduce. Likewise, a predicate P (boolean expression) is opaque if a de-obfuscator can only deduce its outcome with great difficulty, while this outcome is known to the obfuscator. Creation of Opaque Variables and Predicates which are difficult for a de-obfuscator to crack yet use little resources is a major area of research within Code Obfuscation, and is the key to highly resilient control transformations. Opaque Predicates

Examples of applied Control Aggregation Transformations: Cloned Methods Example: A Reverse Engineer, when trying to understand the purpose of a subroutine, will often examine its signature and body as well as the different environments in which it is called. To obfuscate this, we apply a transform which obscures a method’s call sites. In doing this, we make it appear that different routines are being called. We create several different versions of a method by applying various transformations to the original code. At runtime we use different predicates to select which version to run. Aggregation Transformations

Aggregation Transformations

In object-oriented languages such as Java, control is organized around data structures rather than the reverse. Therefore, the most important part of reverse engineering such languages is to recover their data structures. Aggregation Transforms are used to aggregate data in arrays and objects. Example: Restructuring Arrays Next we see a number of transformations performed to obscure an array. First, we attempt to split an array into several sub-arrays [statements (1-2)]. We then merge two arrays into one array [statements (3-5)]. Folding an array increases its number of dimensions [statements (6-7)]. Finally, we show the concept of flattening an array thus reducing its number of dimensions [statements (8-9). Performing splitting and folding greatly increases the complexity of our array structures, while merging and flattening decreases the complexity. The purpose of this is to introduce structure to a program where little existed before, and remove structure where it once existed. Therefore, the obscurity of the program is greatly increased. Aggregation Transformations

Aggregation Transformations

(a) Next, we see a Loop Blocking transformation applied to the given loop. Loop Blocking is the process in which we aim to improve the cache behavior of a loop by breaking up the iteration space such that the inner loop fits into the cache space. (b) Here we apply the concept of Loop Unrolling, during which we replicate the body of the loop one or more times. If we know the loop bounds at compile time, we can unroll the loop in its entirety. (c) Loop Fission is applied in this example. Here we aim to turn a loop with a compound body into several loops of the same iteration space. All three types of Loop Transformations increase the source applications total size and number of conditions, while the first transformation also introduces extra nesting. When we use these methods in isolated circumstances, they provide us with little resilience. However, when applied in serial, the resilience of the total transformation increases dramatically thus requiring significant analysis by a de-obfuscator. Loop Transformations

Loop Transformations

Example of an applied Control Computation Transformation (Inserting Dead or Irrelevant Code): A) We insert an opaque predicate Pt into S (= S1…Sn), essentially splitting it up. This predicate is irrelevant because it will always evaluate to True. One possible predicate to use would be an if-statement such as: if (1 < 5) <evaluate left>; else <evaluate right>; B) We again break S into two halves, which creates two different obfuscated versions Sa and Sb.These are created by applying various computational transforms to the second half of S. Therefore, it becomes not directly obvious to a reverse engineer that Sa and Sb perform the same function. We use a predicate P? to select between the two at runtime. Computation Transformations

Computation Transformations C) Finally, we perform a function similar to (B), but we introduce a bug into Sb and make sure that the predicate Pt always evaluates to Sa. Thus, de-obfuscation of Sb would lead to incorrect and non-functioning source code.

Aim to obscure the data structures used in the source application. Most important for keeping proprietary structures hidden to a Reverse Engineer. Storage Transformations: Attempt to choose an unnatural storage class for dynamic as well as static data, thus making it difficult for a de-obfuscator to determine the type of data stored. Encoding Transformations: Attempt to choose unnatural encoding for common data types. . Data Transformations

Loop Transformations • Example: Change Encoding Here we encode a simple variable i by transforming it into: i’ = c1 * i + c2 where c1 and c2 are constants. Below, we choose c1 to be a power of 2 for efficiency, and let c1 = 8, c2 = 3. By making this transformation, we add a small amount of execution time, while obfuscatingthe original purpose of i.

Ordering Transformations • Randomize the order in which data structures are declared in a source application. Particularly, here we aim to randomize the order of methods and instance variables within classes and formal parameters within methods. • Example: Opaque Encoding Function

De-obfuscation Techniques • Identifying Opaque Constructs • This is the most difficult part of de-obfuscation, the identifying and evaluating of opaque constructs. These fall under three main categories: • Local: • Global: • Inter-procedural:

De-obfuscation Techniques • Identification by Pattern Matching • Uses knowledge of strategies employed by obfuscators to identify opaque predicates. This can be gathered through de-compilation and analysis of popular obfuscation problems. To prevent this attach avoid using canned opaque constructs. Also, choose constructs that are syntactically similar to those used in the real application. • .

De-obfuscation Techniques • Identification by Program Slicing • Used by a Reverse Engineer to counter the problem that logically related pieces of code have been broken up and dispersed over the program. Also used to filter “live” code from “dead” code. • Countering this technique of de-obfuscation requires that one adds parameter aliases and variable dependencies to increase the slice size, thus making de-obfuscation a more computationally-intensive process.

Statistical Analysis • Used to analyze the outcome of all predicates in an obfuscated system. An alert is made about any predicate value pointing to true over multiple test runs, as it may turn out to be an opaque predicate. A powerful method of preventing this attack is to design opaque predicates in such a way that several predicates would have to be cracked at the same time in order to retrieve information. • Example: Protecting Against Statistical Analysis

Statistical Analysis • Example: Protecting Against Statistical Analysis • Here we aim to thwart statistical analysis by forcing our opaque predicates to have side effects. Below, an obfuscator has determined that S1 and S2 must always execute the same number of times. The statements are then obfuscated using opaque predicates that call to functions Q1 and Q2, which both increment and decrement a global variable k. Now, if a de-obfuscator tries to replace one of the predicates with True, k will overflow. Thus, the de-obfuscated program will always terminate with an error.

In reality, an obfuscated program really consists of two programs merged into one: a real program which performs a useful task and a bogus task which computes useless information. The sole purpose of this bogus task is to confuse Reverse Engineers by hiding the real program behind irrelevant code. Encryption vs. Obfuscation: Both are attempts at hiding data from “prying” eyes. Both have a shelf life lasting until it is possible to “crack” the given protection. Future Areas of Research: New obfuscating transformations Interaction and ordering between different transformations (optimization) Relationship between potency and cost (which has the most “bang-for-the-buck”) Other Uses of Obfuscation: Tracing of Software Piracy Different obfuscated versions of the same code would be sold to all customers, thus making it easily identifiable which one distributed their application to others. Mobile Agent Security Enforcing “Blackbox” security techniques on un-trusted hosts. The Power of Obfuscation

Software Security Through Code Obfuscation

Software Security Through Code Obfuscation

Presentation Transcript

Code Obfuscation

Software Security with Static Code Analysis Using CAT.NET

On The (Im)possibility of Software Obfuscation

Applying Software Obfuscation to Malicious Code

Network Service Security through software defined networking

JavaScript Obfuscation

Mimimorphism: A New Approach to Binary Code Obfuscation

Software Obfuscation from Crackers’ viewpoint

Code Obfuscation

Software Obfuscation

URL Obfuscation With @

Code Access Security

Software Security Without The Source Code

Code Obfuscation Final Presentation

ColdFusion Code Security

On The (Im)possibility of Software Obfuscation

Mobile Code Security

Code Access Security

Code Obfuscation Final Presentation

Code Obfuscation Midterm Presentation