The role of JVM’s execution engine can be compared to that of a microprocessor. Its duty is to execute the instructions fed to it in some acceptable format. This is achieved by either interpreting the bytecode instructions or compiling them to native instructions as the need may be. JVM’s execution engine, in a way, behaves like an assembler, so to say. Those who are familiar with assembly language can easily relate JVM’s instruction set to that of modern microprocessors. In this final instalment of ‘Disassembling JVM’ series we will have a look at how JVM executes the contents of a class file.
The Instruction Set
The executable portions of a class file are its methods, be them user-written or synthetic. After compilation, the instructions pertaining to methods are stored as Code attributes in the attribute info section of the .class file. During classloading, these get mapped to method area. When a method is invoked, these instructions are executed one by one sequentially.
JVM’s instruction consists of a one-byte opcode and zero or more operands. Due to one-byte limitation, its instruction set has only 256 instructions in all. The opcode specifies the operation to be formed like, add, subtract, object instantiation, method invocation etc. and operands specify the data to be used for the operation. Not every opcode has operands associated with them; hence a bytecode instruction can be of variable length. It can be considered to have the following format.
Most of the JVM instructions are typed, i.e., we can infer the type of operands from the opcode itself. For example, iadd means add two integers and fmul means add two floating-point numbers. Due to one-byte limitation, there are no equivalent operations for all types, especially for byte, char and boolean types. They are mostly handled using integer type opcodes. Also, most of the instructions manipulate operand stack and local variable table. There are shortcut opcodes like iconst_0, iconst_1, iconst_2, iconst_m1 etc. that load frequently used constants 0, 1, 2, -1 etc. onto the stack and iload_0, iload_1, iload_2 etc. that load values at frequently used local variable indices onto the stack.
The instruction set can be broadly classified into ten categories:
- Load and store: These are used to transfer values between local variable table and operand stack. E.g.: iload, istore, bipush, iconst_x, ldc
- Arithmetic and logic: These are used to perform integer and floating-point arithmetic and logic operations. E.g.: iadd, iinc, fdiv, dmul, iand,ineg, dcmpl
- Type conversion: They are used for numeric type conversion, especially to widen and narrow byte, short and char types as we do not have many instructions for them. E.g.: i2l, i2f, i2b, i2c
- Object manipulation: They are used to create and manipulate class and array objects. E.g.: new, newarray, getfield, putfield, aaload, aastore, arraylength, getstatic, pustatic, instanceof, checkcast
- Operand stack management: They are used to manipulate the operand stack. E.g.: pop, push, dup, swap
- Control transfer: They are used to transfer control conditionally or unconditionally. E.g.: ifeq, ifne, ifnull, ifnonnull, if_icmpeq, tableswitch, lookupswitch, goto, jsr, ret
- Method invocation: They are used to invoke methods and return from them. E.g.: invokevirtual, invokespecial, invokestatic, invokeinterface, return, ireturn, freturn
- Exception handling: An exception can be thrown using athrow instruction. Runtime exceptions are thrown directly by the JVM.
- Synchronization: They are used to manipulate object monitor for synchronizing methods and instruction blocks. Two operations available for this are monitorenter and monitorexit, which are called when the synchronized block is entered and exited.
- Reserved: These are opcodes reserved for internal use and for future extension of the virtual machine and may not appear in method bytecode. They are impdep1, impdep2 and breakpoint.
It would be difficult to explain each and every instruction in this article. For a detailed explanation, please refer the specification. Let’s look at a sample program and try to understand its bytecode instructions before concluding.
The equivalent bytecode disassembled using javap tool is given below, with unnecessary details removed. You would find that the instructions are pretty much straight-forward except for some cryptic numbers prefixed with ‘#’. They are nothing but references to constant pool of the class. It is not shown here for the sake of brevity.
A Word on Multi-threading
When JVM is invoked, it runs as a new process on the host operating system and forks a new non-daemon thread to execute the main method class. This JVM instance would continue to live till the last non-daemon thread exits or till Runtime.exit() is called. During its lifetime it might fork separate threads internally for its own house-keeping activities like garbage collection, management and monitoring etc… So we could visualize a live JVM to be consisting of a group of threads which are trying to execute some tasks – user-defined or system-defined.
So how does JVM handle multiple threads? Does it emulate threads or actually create native threads? If you have read the previous articles, the answer would be obvious – it’s implementation specific! Matter-of-factly, the specification doesn’t even talk about how they should be implemented. Prior to Java 1.1, multi-threading was implemented using green threads. These days, however, implementations settle for green (emulated) threads or native threads or sometimes a combination of both, depending on the environment or user configuration. Green threads are particularly useful when native operating system doesn’t support multi-threading. But they come with additional burden of implementing scheduling, memory management etc. and hence may not be suitable for performance intensive scenarios. The whole application may stall if one of them is waiting on I/O, virtually bringing down the throughput to zero. Native threads on the other hand are efficient and can take advantage of software and hardware optimizations of the host system.
We have only covered the tip of the iceberg here. Most of the information were taken from the JVM 7 specification. Developers should be encouraged to go through it to have a better understanding of JVM and appreciate the efforts that have gone in to develop such a superb piece of software.