Learn Java Reverse Engineering

From The Bytecode Club Wiki
Jump to: navigation, search

This page is here to help people get a better understanding of Java Reverse Engineering in the hopes they will learn it.

Introduction

What is Java Bytecode? It's the compiled Java source code; I like to think of Bytecode as almost having the source code. If you can write/read Bytecode well, you'll be able to easily mod/patch/whatever any Java application with ease, even those obfuscated with advanced obfuscation techniques. Reverse Engineering (RE) is not limited to Bytecode editing, RE is the art of figuring out exactly how a piece of code works, and being able to alter part of it. In this tutorial you'll be learning how to Reverse Engineer by modifying the class files (Bytecode) directly.

I'll be assuming you have a basic to intermediate understanding of Java programming, if you don't go learn Java first. If you're planning on doing advanced Bytecode edits, you'll also need an understanding of how the JVM works (the stack, etc).

Tools

The first thing to Java reverse engineering would be the tools, I recommend you go and download Bytecode Viewer & JBE.

I'd also recommending getting a code highlighting text editor, like Notepad++ if you're on Windows.

Once you download the tools, I recommend you read the discussion threads for both of them.

Explaining The Classfile

Structure

There are 10 basic sections to the Java Class File structure:

  • Magic Number: 0xCAFEBABE
  • Version of Class File Format: the minor and major versions of the class file
  • Constant Pool: Pool of constants for the class
  • Access Flags: for example whether the class is abstract, static, etc.
  • This Class: The name of the current class
  • Super Class: The name of the super class
  • Interfaces: Any interfaces in the class
  • Fields: Any fields in the class
  • Methods: Any methods in the class
  • Attributes: Any attributes of the class (for example the name of the sourcefile, etc.)

General Layout

Because the class file contains variable-sized items and does not also contain embedded file offsets (or pointers), it is typically parsed sequentially, from the first byte toward the end. At the lowest level the file format is described in terms of a few fundamental data types:

  • u1: an unsigned 8-bit integer
  • u2: an unsigned 16-bit integer in big-endian byte order
  • u4: an unsigned 32-bit integer in big-endian byte order
  • table: an array of variable-length items of some type. The number of items in the table is identified by a preceding count number, but the size in bytes of the table can only be determined by examining each of its items.

Some of these fundamental types are then re-interpreted as higher-level values (such as strings or floating-point numbers), depending on context. There is no enforcement of word alignment, and so no padding bytes are ever used.

The Constant Pool

The constant pool table is where most of the literal constant values are stored. This includes values such as numbers of all sorts, strings, identifier names, references to classes and methods, and type descriptors. All indexes, or references, to specific constants in the constant pool table are given by 16-bit (type u2) numbers, where index value 1 refers to the first constant in the table (index value 0 is invalid).

Due to historic choices made during the file format development, the number of constants in the constant pool table is not actually the same as the constant pool count which precedes the table. First, the table is indexed starting at 1 (rather than 0), but the count should actually be interpreted as the maximum index plus one. Additionally, two types of constants (longs and doubles) take up two consecutive slots in the table, although the second such slot is a phantom index that is never directly used.

There are only two integral constant types, integer and long. Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant.

Class names in Java, when fully qualified, are traditionally dot-separated, such as "java.lang.Object". However within the low-level Class reference constants, an internal form appears which uses slashes instead, such as "java/lang/Object".

Decompiling the Classfile

For the most part the process of decompiling is automatic. If the classfile or jar you're trying to decompile hasn't implemented any form of obfuscation, you'll be able to decompile the class file completely.

For those of you that don't know, decompiling means converting the classfile's Bytecode back to Java Source Code.

The tool I recommend to decompile is Bytecode Viewer, it contain's 3 different modern Java Decompilers inside of it. It can also display the Bytecode right beside the source code to make it easier for the user to learn Bytecode, and make it easier on experienced users who want to just quickly view the source code.

You can download it at https://github.com/Konloch/bytecode-viewer/releases

Bytecode 101

Bytecode is constructed using Opcodes (instructions), here's some example Bytecode:

getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Hello World"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V

This is from a Hello World example:

System.out.println("Hello World");

Bytecode instructions fall into a number of broad groups:

  • Load and store (e.g. aload_0, istore)
  • Arithmetic and logic (e.g. ladd, fcmpl)
  • Type conversion (e.g. i2b, d2i)
  • Object creation and manipulation (new, putfield)
  • Operand stack management (e.g. swap, dup2)
  • Control transfer (e.g. ifeq, goto)
  • Method invocation and return (e.g. invokespecial, areturn)

There are also a few instructions for a number of more specialized tasks such as exception throwing, synchronization, etc.

Many instructions have prefixes and/or suffixes referring to the types of operands they operate on. These are as follows:

Prefix/Suffix Operand Type:

  • i integer
  • l long
  • s short
  • b byte
  • c character
  • f float
  • d double
  • z boolean
  • a reference

For example, "iadd" will add two integers, while "dadd" will add two doubles. The "const", "load", and "store" instructions may also take a suffix of the form "_n", where n is a number from 0–3 for "load" and "store". The maximum n for "const" differs by type.

The "const" instructions push a value of the specified type onto the stack. For example "iconst_5" will push an integer 5, while "dconst_1" will push a double 1. There is also an "aconst_null", which pushes "null". The n for the "load" and "store" instructions specifies the location in the variable table to load from or store to. The "aload_0" instruction pushes the object in variable 0 onto the stack (this is usually the "this" object). "istore_1" stores the integer on the top of the stack into variable 1. For variables with higher numbers the suffix is dropped and operands must be used.

Basic Cracking (IFEQ/IFNE)

From here on out we'll be working with Bytecode, if you're not sure what Bytecode is, it's essentially what the .Java files are compiled down to (.class now). There is a few mild ways to obfuscate the Bytecode (Flow Obfuscation, etc), but this doesn't really do much, for the most part once you learn to read/write the Bytecode, you'll be able to crack anything written in Java.

So, let's look at the basic protection system:

    public static void main(String[] args) {
        String key = args[0];
        if(key.equalsIgnoreCase("securepassword")) {
            System.out.println("Cracked");
        }
    }

There is two obvious ways to crack this, one just give out the 'securepassword' by decompiling, but we're going to do something different.

Here's the Bytecode of that method

aload_0
iconst_0
aaload
astore_1
aload_1
ldc "securepassword"
invokevirtual java/lang/String/equalsIgnoreCase(Ljava/lang/String;)Z
ifeq 12
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Cracked"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
return

Now it's time to learn the two simple Opcodes; IFEQ if(equals) and IFNE if(!equals). Utilizing these two Opcodes will allow you to crack any form of serial/login based authentication. (Since it'll end up having to compare the serial/login response to something).

So, as you can see if the String 'securepassword' was dynamic, or if their security relies on simple if checks, all you'll need to do is replace all of the ifeqs with ifnes. (vise versa if needed)

Cracked Bytecode:

aload_0
iconst_0
aaload
astore_1
aload_1
ldc "securepassword"
invokevirtual java/lang/String/equalsIgnoreCase(Ljava/lang/String;)Z
ifne 12
getstatic java/lang/System/out Ljava/io/PrintStream;
ldc "Cracked"
invokevirtual java/io/PrintStream/println(Ljava/lang/String;)V
return

And the Java source code of this would be:

    public static void main(String[] args) {
        String key = args[0];
        if(!key.equalsIgnoreCase("securepassword")) {
            System.out.println("Cracked");
        }
    }

Isn't cracking easy? You'd be surprised by how many programs rely on ifeq/ifne (Almost all indie programs/games will).

Tips

  • You'll really need to think of it like this, you have control of the source code, if you can read/write Bytecode well enough, you could even develop in it (highly unlikely, but this means modding/patching/whatever can be done without even needing to have access to the source code. For example, say there is a method that returns a String, this is used to generate a unique key made only for your computer, we could simply edit the classfile's Bytecode quickly and make the method return random shit.
  • A jar file is essentially just a ZIP file with a META-INF/ folder and Java class files inside of it, because of this you can open it with any Zip archive tool.
  • If you're on Windows and you come across a jar file with aa.class AA.class Aa.class, etc. Try reobfuscating it without strong obfuscation, or else you won't be able to fully extract the jar file to your filesystem.