Today, most search engines available for code search have a full-text search and the search result consists of textual occurrences of the search keywords. They have no way to know whether the searched keywords were types, fields or methods etc. This is called search context.

Questions like – what is that keyword? A method or a Type and where does it occurs, or which all other keywords surround it. This context makes all the difference to the search made and helps to get the desired result.

The differentiating factor in KodeBeagle is that it takes into consideration what surrounds a keyword i.e. its context. This makes KodeBeagle stand out.

A typical KodeBeagle client is either an IDE plugin or web client. A KodeBeagle server is completely agnostic to the type of client in question. An IDE plugin has a lot more context about the user code than a website based client. In this way, an IDE plugin is much more powerful client.

What goes around under the hood ?

KodeBeagle Client

Step 1. Client side keyword extraction.

Highlighted text shows the keywords taken into account for searching for the results.

Screen 1

We intend it to work like this, but the current implementation of the plugin is not able to detect the methods. This sort of optimization is definitely on our roadmap.

Step 2. Make a service call to our servers.

Screen 3

While making a service call, Fully Qualified Name(FQN) are sent as keywords. In this step, we take special care to eliminate any reference to imports internal to the project. Furthermore, a user is allowed to configure more exclude imports in settings of the plugin. These imports are no longer the part of the query and removed even if a region of selected code may include them. This helps us to optimize and get a closer picture of the part of the code context that is important to the search.

Internals of KodeBeagle server.


The picture explains the intermediate steps that go around in the making of KodeBeagle service.

Stage one

Our GitHub crawlers are minimalistic servers running GitHubRepoCrawlerApp. A very trivial Script to upload the content to HDFS can be run parallely.

Stage Two

Here actual parsing of downloaded GitHub content takes place via a Spark Job and conversion to Elasticsearch bulk upload format takes place.

Stage Three

Here an Elasticsearch server, a sample configuration and running upload script is set. Just these two simple steps are enough to setup the entire KodeBeagle backend locally. You may obviously need more advanced setup to host this over the internet.

Final Stage

This is relevant only if you would like to open your kodebeagle service to the world wide web. And there are good online guides to achieve this. The most important aspect of any public hosting is disallowing all Http verbs except GET. This sort of simplicity let’s us eliminate the need for an extra middle-ware that typically talks to Elasticsearch and provides all the security and high availability features.