How to prevent your jar packages from being decompiled?

程式設計師泥瓦匠發表於2023-02-14

As an interpreted language, the highly abstract nature of java means that it is easy to be decompiled, easy to be decompiled, and naturally there are measures to prevent decompilation. Today I read an article about it, and I benefited a lot from it. The reason why I am interested in the decompilation of java is that I often need to learn from the results of others in the process of learning (you know ...). . Perhaps decompiling other people's code is not very ethical, this well ......
Without further ado, the text is as follows.

Common protection techniques

Since Java bytecodes have a high level of abstraction, they are more likely to be decompiled. This section describes several common methods for protecting Java bytecode from decompilation. Usually, these methods can not absolutely prevent the program from being decompiled, but just make it more difficult to decompile, because each of these methods has its own usage environment and weakness.

01. Isolating Java programs   

The simplest way is to make the Java Class program inaccessible to the user. This method is the most fundamental one, and there are various ways to implement it. For example, developers can put critical Java C
lass on the server side, the client accesses the server through the relevant interface to obtain services, rather than directly access the Class file. This way the hacker has no way to decompile the Class file. Currently, there are also more and more standards and protocols for providing services through interfaces, such as HTTP, Web Service, RPC, etc.. But there are many applications that are not suitable for this type of protection, for example, for stand-alone running programs can not isolate Java programs. This type of protection is shown in Figure 1.

02. Encryption of Class files   

In order to prevent Class files from being directly decompiled, many developers encrypt some key Class files, such as classes related to registration codes, serial number management, etc.. Before using these encrypted classes, the program first needs to decrypt these classes, and then load them into the JVM. The decryption of these classes can be done either by hardware or by software.  

In the implementation, developers often load the encrypted classes through a custom ClassLoader class (note that Applets cannot support custom ClassLoaders for security reasons). The custom ClassLoader first finds the encrypted class, then decrypts it, and finally loads the decrypted class into the JVM. In this protection method, the custom ClassLoader is a very critical class. Since it is not encrypted itself, it can be the first target of a hacker. If the associated decryption key and algorithm is compromised, the encrypted class can be easily decrypted as well. See Figure 2 for a schematic diagram of this type of protection.

https://www.decompilertool.com/

03. Converting to local code   

Converting programs to local code is also an effective way to prevent decompilation. Because local code is often difficult to be decompiled. Developers can choose to convert the entire application to local code or to convert key modules. If only the critical part of the module is converted, Java programs need to use JNI technology to make calls when using these modules.  

Of course, while using this technique to protect Java programs, the cross-platform nature of Java is also sacrificed. For different platforms, we need to maintain different versions of native code, which will increase the work of software support and maintenance. However, for some critical modules, sometimes this solution is often necessary.  

To ensure that this native code is not modified and replaced, it is often necessary to digitally sign this code. Before using these native codes, it is often necessary to certify them to ensure that they have not been altered by hackers. If the signature check passes, the relevant JNI method is invoked. See Figure 3 for a schematic of this protection approach.

04. Code obfuscation

Code obfuscation is the reorganization and processing of Class files so that the processed code accomplishes the same function (semantics) as the pre-processed code. However, the obfuscated code is difficult to be decompiled, i.e., the decompiled code is very difficult to understand and obscure, so it is difficult for the decompiler ( DecompilerTool https://www.decompilertool.com/ ) to get the real semantics of the program. In theory, the obfuscated code may still be cracked if hackers have enough time, and some people are even currently working on anti-obfuscation tools. But from a practical point of view, due to the diversified development of obfuscation techniques and the maturity of obfuscation theory, obfuscated Java code is still well protected against decompilation. In the following we will describe obfuscation techniques in detail, because obfuscation is an important technique to protect Java programs. Figure 4 shows a diagram of code obfuscation.

Summary of several techniques Each of the above techniques has a different application environment and each has its own weaknesses. Table 1 shows a comparison of the relevant features.  

Introduction to obfuscation techniques   

The obfuscation technique is by far the most basic protection method for Java programs, and there are many Java obfuscation tools, including commercial, free, and open source ones, and Sun also provides its own obfuscation tools. Most of them obfuscate Class files, but there are also a few tools that first process the source code and then process the Class, which increases the strength of the obfuscation process. Currently, some of the more commercially successful obfuscation tools include the 1stBarrier series from JProof, JShrink from Eastridge, and SourceGuard from 4thpass.com. The main obfuscation techniques can be classified according to the obfuscation target as follows, they are symbolic obfuscation (Lexical Obfuscation), data obfuscation (Data Obfuscation), control obfuscation (Control Obfuscation), and preventive obfuscation (Prevent Transformation).

05. Symbolic Obfuscation

In the Class there are many information unrelated to the execution of the program itself, such as method names, variable names, the names of these symbols often carry a certain meaning. For example, a method named getKeyLength (), then this method is likely to be used to return the length of Key. Symbol obfuscation is the process of breaking up this information and turning it into a meaningless representation, e.g. numbering all variables from vairant_001; for all methods from method_001. This will cause some difficulties for decompiling. For private functions, local variables, you can usually change their symbols without affecting the operation of the program. But for some interface names, public functions, member variables, if there are other external modules that need to refer to these symbols, we often need to keep these names, otherwise the external modules cannot find the methods and variables with these names. Therefore, most obfuscation tools for symbol obfuscation provide rich options for users to choose whether and how to do symbol obfuscation.

06. Data obfuscation

Data obfuscation is the obfuscation of data used by a program. There are various methods of obfuscation, which can be divided into changing data storage and encoding (Store and Encode Transform), and changing data access (Access Transform).  

Changing data storage and encoding can disrupt the way the program uses data storage. For example, taking an array with 10 members, splitting it into 10 variables and disrupting the names of those variables; transforming a two-dimensional array into a one-dimensional array, etc. For some complex data structures, we will disrupt its data structure, for example, by replacing a complex class with multiple classes, etc.  

Another way is to change the data access. For example, when accessing the subscripts of an array, we can perform certain calculations, an example of which is shown in Figure 5.  

In practicing obfuscation processing, these two approaches are usually used in combination, disrupting the data storage and at the same time disrupting the way the data is accessed. After obfuscating the data, the semantics of the program becomes complex, which increases the difficulty of decompiling.

07. Control Obfuscation

Control obfuscation is the obfuscation of the control flow of a program to make it more difficult to decompile. Usually the change in control flow requires some additional computation and control flow, so it can have some negative impact on the program in terms of performance. Sometimes, there is a trade-off between the performance of the program and the degree of obfuscation. The techniques for controlling obfuscation are the most complex and have the most techniques. These techniques can be divided into the following categories.  

Adding obfuscation control By adding additional, complex control flow, the original semantics of the program can be hidden. For example, for two statements A and B that are executed in sequence, we can add a control condition that determines the execution of B. By this way the difficulty of disassembling is increased. But all the interfering controls should not affect the execution of B. Figure 6 then gives three ways to add obfuscation control to this example.

Control flow reorganization Reorganizing control flow is also an important method of obfuscation. For example, a program calls a method, and after obfuscation, the method code can be embedded in the calling program. In turn, a piece of code in a program can be transformed into a function call. In addition, for a control flow of a loop, for can split the control flow of multiple loops or transform the loop into a recursive procedure. This method is the most complex, and the number of researchers is very large.

08. Preventive obfuscation

This type of obfuscation is usually designed for some specialized decompilers. In general, these techniques exploit weaknesses or bugs in decompilers to design obfuscation schemes. For example, some decompilers do not decompile instructions after Return, while some obfuscation schemes place the code exactly after the Return statement. The effectiveness of this obfuscation varies from decompiler to decompiler. A good obfuscation tool will usually use a combination of these obfuscation techniques.

09. Case Studies

In practice, protecting a large Java program often requires using a combination of these methods rather than a single method. This is because each method has its own weaknesses and application context. The combined use of these methods makes the protection of Java programs more effective. In addition, we often need to use other related security techniques, such as secure authentication, digital signatures, PKI, etc.  

The example given in this paper is a Java application, which is a SCJP (Sun Certificate Java Programmer) exam simulation software. The application comes with a large number of simulation questions, all of which are encrypted and stored in a file. Since the question bank it comes with is the core part of the software, the access and access about the question bank becomes very core classes. Once these related classes are decompiled, then all the question bank will be cracked. Now, let's consider how to protect these question banks and related classes.  

In this example, we consider using a combination of protection techniques, which include native code and obfuscation techniques. Since the software is mainly distributed on Windows, the conversion to native code requires maintaining only one version of the native code. In addition, obfuscation is also very effective for Java programs and is applicable to such standalone distributed applications.  

In the specific scenario, we divide the program into two parts, a module written from native code for question access and another module developed in Java for the rest. This provides a higher degree of protection against decompilation of the question management module. For the modules developed in Java, we still have to use obfuscation techniques. See Figure 7 for a diagram of this scheme.

For the question management module, since the program is mainly used under Windows, the question bank access module is developed in C++ and provides a certain access interface. In order to protect the interface for question bank access, we also added an initialization interface for initialization before each use of the question bank access interface. Its interfaces are divided into two main categories.

10, initialization interface

Before using the question bank module, we must call the initialization interface. When calling this interface, the client needs to provide a random number as a parameter. Through this random number, the question bank management module and the client generate the same SessionKey at the same time according to a certain algorithm, which is used to encrypt all the data input and output in the future. In this way, only authorized (valid) clients can connect to the correct connection and generate the correct SessionKey for accessing the question bank information. Illegal clients can hardly generate the correct SessionKey and therefore cannot access the question bank information. If you need to establish a higher level of confidentiality, you can also use two-way authentication technology.

11、Data access interface

After the authentication is completed, the client can access the question database data normally. However, the input and output data are encrypted data by SessionKey. Therefore, only the correct question bank management module can use the question bank management module. Figure 8 timing diagram shows the interaction process between the question bank management module and other parts.

出處:公號「程式設計師泥瓦匠」
部落格: https://bysocket.com/

內容涵蓋 Java 後端技術、Spring Boot、Spring Cloud、微服務架構、運維開發、系統監控等相關的研究與知識分享。

相關文章