In my previous post I had written on the different types of boosting. In addition I had also provided an introduction to the concept of scoring. I had promised in my previous post a series of posts on how to achieve custom scoring. There are many a means to achieve custom scoring, too many in fact to cover all of them in a single blog post. In this post we will take a look at the oft used custom scoring technique of using a custom query in conjunction with a custom score provider.

Prerequisites:

1. It is expected that the reader is aware of the basic concepts of Lucene like Document, Indexing and Analyzing, tokens, terms and querying.
2. Reader should at minimum be acquainted with the use of the basic Lucene API objects like IndexReader, IndexWriter, Query, Directory etc.
Code Samples

The code for this example can be found here.

Notes to set up and run the demo program

1. Download the source code. The code makes use of the latest version (as of date of writing this article) of Lucene -> 4.6.
2. Run mvn package which will generate the JAR –> boost-imaginea-demo-1.0.jar
3. Place this jar along with the following Jars in a folder say “C:\Imaginea-Boost-Demo”.
         a. lucene-analyzers-common-4.6.0.jar
         b. lucene-core-4.6.0.jar
         c. lucene-queryparser-4.6.0
         d. lucene-queries-4.6.0
4. The program usage is as below,

Param 1: Type of scoring:
customscorequery — Custom Score Query and Custom Score Provider Demo

Copyright to Wikimedia

Image Copyright from Wikimedia and the person who posted it there.

I must admit, I am obsessed with SUVs (affording them with Indian taxes regime is another thing though) and wish to sneak them into my technical blog pursuits as well. I will reuse my previous examples of SUVs boosted on white colour and origin. You may please proceed to the technical content below after you are done ogling at the white Scorpio above in all its beauty.

It becomes necessary to score documents individually at the time of querying. We had seen in the previous post of how to achieve query time boosting by assigning a higher score to a specific data set in the query. What if you have a lot of scoring logic to perform on top of the data you run into while querying? It may not be possible to specify all this logic in the query. This is where a custom score query comes in. This in conjunction with a custom score provider provides a neat way to put in our custom scoring logic. To make it better, Lucene neatly hands over to your custom code the scores it calculated in itself which you can further manipulate and provide a final score or pass it on to a super class for coming up with its final score after considering your manipulated inputs in its calculation.

Let’s write a custom score query now shall we? The class you write should extend CustomScoreQuery.

public class ImagineaDemoCustomScoreQuery extends CustomScoreQuery {

public ImagineaDemoCustomScoreQuery(Query subQuery) {
super(subQuery);
}

@Override
public CustomScoreProvider getCustomScoreProvider(final AtomicReaderContext atomicContext) {
return new ImagineaDemoCustomScoreProvider(atomicContext);
}

}

That’s it. We have just written a custom score query and overridden a method which in turn hands out a custom score provider. Now, let’s write our own custom score provider and fit the pieces together.

public class ImagineaDemoCustomScoreProvider extends CustomScoreProvider {

private static AtomicReader atomicReader;

public ImagineaDemoCustomScoreProvider(AtomicReaderContext context) {
super(context);
atomicReader = context.reader();
}

@Override
public float customScore(int doc, float subQueryScore, float valSrcScore)
throws IOException {
Document docAtHand = atomicReader.document(doc);
String[] itemOrigin = docAtHand.getValues(“originOfItem”);
boolean originIndia = false;
for (int counter=0; counter<itemOrigin.length; counter++) {
if (itemOrigin[counter] != null &&
itemOrigin[counter].equalsIgnoreCase(“India”)) {
originIndia = true;
break;
}
}
if (originIndia) {
return 3.0f;
} else {
return 1.0f;
}

}

}

The custom score provider is in place too. It is seen that in the overridden customScore method of the custom score provider implementation, the individual documents have been accessed and it is checked to see if the pertinent SUV has its origin in India. Such documents are boosted to a score of 3.0f whilst the others have their original score of 1.0f. Now that we have the custom score query and the custom score provider in place let’s write the code which will employ them to provide customized scoring.

 

IndexReader idxReader = DirectoryReader.open(ramDirectory);
IndexSearcher idxSearcher = new IndexSearcher(idxReader);
Query queryToSearch = new QueryParser(Version.LUCENE_46, “itemType”, analyzer)
.parse(queryToRun);

CustomScoreQuery customQuery = new ImagineaDemoCustomScoreQuery(queryToSearch);

ScoreDoc[] hitsTop = idxSearcher.search(customQuery, 10).scoreDocs;

Note that the constructor for our custom query accepts a query as a parameter. Internally Lucene runs the query, calculates the score and for each document encountered calls the customScore method of our custom score provider class and allows us to manipulate the score.

Now, let us run the example for ourselves and see some sample data of how this works.

The command to be used is as below,

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-.6.0.jar;lucene-queryparser-4.6.0.jar;lucene-queries-4.6.0.jar com.imaginea.scoring.ScoringExamples customscorequery

 

In the example code a simple query is done without any custom scoring and it is seen that the documents all have a similar score of 0.8. Using our custom score query it is seen that all the SUVs from India have been boosted to the top with individual score of 3.0f each. Simple isn’t it?