In case of natively compiled languages, linking is the last step of generating executable from source files. Linker takes in object files, output by compiler, and builds an executable file which could be further run without any dependencies. This approach of statically linking related code had considerable impact on memory and limited reusability of shared code. With the introduction of dynamic linkers, this problem was overcome by importing shared code and combining them with the running process. In native applications, dynamic loading and linking of shared libraries are handled by the operating system itself, sometimes offering support for different versions of the same library as well.
Dynamic loading and linking, though done differently, happen almost implicitly in Java – unless one starts seeing ClassNotFoundException or NoClassDefFoundError that is. In the context of Java, loading is the process of finding and creating a class/interface from its binary, and linking is the process of integrating it with the runtime state of JVM. These essentially involve mapping a class/interface to JVM memory – adding appropriate type definition to method area and creating a new instance of Class for it on the heap. (Linking should not be confused with the process of binding that happens when a native method is invoked. Binding is handled by native interface module and won’t be discussed in this series.)
Since most of the Java developers would be familiar with classloading principles, instead of explaining the entire topic once again, we would be covering only the essential details here.
Types of Classloaders
A class/interface is loaded into JVM on its first use and is usually unloaded by the garbage collector when it is no longer required. Part of JVM that does loading and linking of classes/interfaces is undoubtedly called the classloader and is abstracted by ClassLoader class. Developers can extend ClassLoader to create specialized classloaders that could load classes from different sources or possibly instrument them before use.
In a broad sense we can classify classloaders into bootstrap/primordial classloader and user-defined classloaders.
- Bootstrap classloader: This classloader is responsible for loading core java classes (those located in jre/lib or as defined by bootclasspath argument). It is natively written and hence cannot be extended.
- User-defined classloaders: They load other application specific classes and can be identified as subclasses of ClassLoader. This category includes system/application classloader that load classes found in the classpath (or as defined by java.class.path property) and custom classloaders written by developers.
With the introduction of extension mechanism in 1.2, most implementations also support a third type of classloaders called extension classloader. This classloader is used to load optional packages (those located in jre/lib/ext or as defined by java.ext.dirs property) that add functionality to the core platform.
The standard classloading mechanism used by JVM (since 1.2) is called the parent-delegation/parent-first model. In this arrangement, a classloader always has a parent, either a user-defined classloader or the bootstrap classloader. (Remember, this is a delegation hierarchy and not based on inheritance.) When a class/interface is to be loaded, the active classloader checks if it has been already loaded by it, if it has not, the request is delegated to the parent classloader. If the parent classloader cannot load the type, it would delegate it further to its own parent until the request reaches the bootstrap classloader. If the bootstrap classloader also cannot load the type, the initiating classloader tries to load it on its own or throws ClassNotFoundException. Again, applications can override this pattern and follow their own classloading logic if functionality demands so. A good example for this could be web application classloaders that follow a parent-last model in order to allow libraries in WEB-INF to take precedence over classpath libraries.
Inside JVM, a class/interface is jointly identified by its fully qualified name and classloader that defined it. And at any instant, JVM can contain only one instance of Class pertaining to a class/interface in the same classloader namespace. But it is possible that the same class/interface exists during runtime in different classloader namespaces. To avoid confusion and guarantee type safety (and security) in such scenarios, special validations called loading constraints are imposed so that a type even though loaded by multiple classloaders, refer to the same class/interface definition.
It is also worth mentioning here that arrays are handled differently from classes and interfaces – primitive array types are loaded by the bootstrap classloader and reference type arrays are loaded by JVM itself. The component type of the latter is loaded through classloader mechanism though.
3-Step Classloading Process
Classloading happens in three distinct steps – loading, linking and initialization. Loading involves building an implementation specific internal representation of a class/interface. This occurs either during runtime constant pool resolution (which in turn happens on method invocation, field access etc… of an already loaded class/interface) or during a reflective call. Linking makes sure that the loaded type (including its parent class and interfaces) can be safely used. And initialization as the last step sets the initial values of static fields. The specification mandates that even if there are errors during linking or initialization, they are to be thrown at a point in the program that uses the class/interface and strictly not otherwise.
The sequence of steps for classloading can be represented as:
- Load: This is the first step in creating type definition of a class/interface. Here binary information of the type is parsed; validated against standard .class format and stored in the method area and finally, a Class object is created. Direct superclass and superinterfaces are also recursively loaded and linked till Object class.
- Link: As shown in the figure, linking involves three sub-steps – verification, preparation and resolution.
- Verification consists of elaborate checks to ensure security and type safety of the bytecode. This includes structural validations and well-formedness checks of the .class file and trustworthiness checks of code for potential exploits by following well-defined type checking and type inference rules. Since this is a costly operation, .class files generated for JVMs of limited memory devices are often pre-verified by the compiler.
- Preparation involves creating static fields and initializing them to their default values.
- Resolution of symbolic references in the class or interface is an optional part of linking and is often done when the field or method is used for the first time. It basically involves determining concrete values of symbolic references in the runtime constant pool.
- Initialize: This involves execution of class/interface initialization method <clinit> of superclass and superinterfaces and finally that of the current class/interface. The <clinit> method is nothing but a synthetic method, equivalent to static initializing block, added by the compiler to set initial values of static fields.
Since JVM supports multithreading, it is possible that more than one thread may try to load the same class/interface at the same time. It is the responsibility of the implementation to guarantee that only one instance of class/interface is loaded within a classloader hierarchy. Often, the Class object which is created for each class/interface, in the first step, is used to synchronize the entire classloading process.