Like contemporary systems, JVM is basically a stored-program computer. It cannot execute a class file directly. It needs to be mapped to memory first. In the previous week we read that most JVM implementations follow stack-based architecture. An exception could be Dalvik VM, a close kin of JVM that is register-based as a trade-off for optimizing memory and speed over portability. Still, stack-based architecture has given Java a platform-agnostic and compact instruction set. Since most of the existing hardware is register-based, blocks of JIT compiled bytecode would ultimately get converted to native register-based instructions at runtime. But let’s first see how a typical JVM implementation organizes its data and code in memory.

Types and Objects

In case of register-based assembly language, data is treated as raw bytes and words, luckily it is not so with JVM. Though not fully, JVM supports all primitive types of Java language (byte, short, int, long, char, boolean, float & double). byte, short and boolean are handled as int type by the compiler and JVM. Besides primitive types, JVM supports two more datatypes – returnAddress and reference. returnAddress type values point to instructions pertaining to method return or unconditional jump. reference type values are used to point to class, array or interface references or null. (By the way, the specification does not mandate any definite value for null; it is left to implementors.) The instruction set is also designed such a way that type of the operand can be inferred from the instruction itself.

Unlike other compiled object-oriented languages where object-oriented-ness vanishes once compiled to native code, JVM continues to support objects explicitly, i.e., an object continues to be treated as ‘object’ during runtime as well. Inside JVM, an object can be an instance of array or class. The reference type acts like a pointer type for determining the location of an object. Usually array objects are stored in continuous memory locations and ordinary objects are stored as a bunch of pointers – one pointing to Class object, one pointing to method table and one pointing to actual data on heap.

Memory Organization

Conventionally, memory allocated to a process can be characterized as code/text area, data area and stack area. JVM also does something similar; it stores runtime data broadly in three areas – method area (for code), heap (for data) and thread stacks (for java and native threads). Physically, these may not be distinct. Method area may be part of heap for ease of management. But for now, let us consider them to be logically separate. Additionally, JVM uses a set of program counter registers to keep track of execution of instructions.

We could visualize JVM memory roughly like the one shown in the diagram. But then, implementations are free to follow their own approach in organizing memory, going by public design-private implementation specification style of JVM.

Let’s look at what is stored in each of these areas.

Method Area

Method area is a shared memory area that stores metadata of active classes and interfaces. It consists of type metadata, runtime constant pool, field and method information, bytecode instructions for methods (including constructors and synthetic methods used for class/interface initialization and instance initialization), non-final class variables and method lookup table. Type metadata includes qualified name of the class/interface, its modifiers and parent class and interfaces. Runtime constant pool (also called symbol table in other languages) holds type specific compile-time and runtime constants and symbolic references to other types. Field and method information contains details about name, type, modifiers, parameter and their types and return type for fields and methods respectively. Besides these, each class/interface holds a reference to its Class and ClassLoader instances for programmatical lookup of metadata and dynamic linking of new types.

Heap

Heap is also a shared memory area common to all threads. Memory for arrays and objects are allocated from heap during runtime. Since JVM has memory allocation instructions but none for deallocation, a separate memory management system called garbage collector is always present in most of the standard implementations with options for the user to configure minimum and maximum memory to be allocated for heap.

Thread Stacks

A thread stack or JVM stack stores the runtime state of a thread. It keeps track of methods executed by the thread. Thread stack is not shared across threads. Each element of it is called a stack frame. A stack frame ideally stores the state of a method. When a new method is called, a new stack frame is pushed onto the thread stack. This stack frame holds local variables, frame data and another inner stack called operand stack.

When a method is compiled to bytecode, compiler replaces local variables with indices of runtime local variable array and field and method references with indices of the constant pool. If the method has parameters, they are also stored in local variable array in order of their declaration. Frame data stores reference to the type’s runtime constant pool and an exception table to handle abrupt method completions. Operand stack comes into action when some calculations are to be performed. It is worth noting that the sizes of local variable table and operand stack are determined at compile-time and the implementation can cleverly use this information to overlay local variable table with previous method’s operand stack in order to save memory and time during method invocation.

Let’s look at an example to better understand the role of stack frame in method invocation. Consider the following snippet taken from the specification:

void spin() {
  for (int i = 0; i < 100; i++) {
    ;// empty block
  }
}

Its equivalent bytecode would be something like

0: iconst_0 // Push zero to operand stack
1: istore_1 // Store top value on operand stack at local variable index 1 (i=0)
2: goto 8 // Jump to avoid incrementing i for the first time
5: iinc 1 1 // Increment local variable at index 1 by one (i++)
8: iload_1 // Push local variable index 1 again onto operand stack (i)
9: bipush 100 // Push 100 to operand stack
11: if_icmplt 5 // Compare values on operand stack and loop if less than (i<100)
14: return // Return void when done

In the above code, local variable i has been replaced by local variable index 1 (By default, instance methods would have ‘this’ pointer as the element at index 0). If you notice, prefixes i, b etc… of the instructions denote the type of instruction – iconst_x means push integer constant x to operand stack, bipush means push the immediate byte to operand stack and so on. For the above code, stack frame after executing instruction 9 would be something like the one shown below:

Native Method Stacks

Native method stack helps in the execution of native methods inside JVM. It is also local to a thread and not shared. This stack is managed by the native method interface module of JVM and may be missing if the implementation does not support native methods.

PC (Program Counter) Registers

Perhaps PC register is the only register used by JVM in the real sense. It is sometimes called IP (Instruction Pointer) register since it points to the active instruction that is being executed. JVM assigns one register for each thread and it may or may not be mapped to a hardware register. At any instant the PC register holds the address of a bytecode instruction of some method that is being executed by a thread. If the thread is executing a native method then PC register’s value would be empty or undefined.