Key Cloud Computing Statistics

 

In the infographic, we find some key cloud computing statistics that highlight the growth and adoption trends of this strategic technology.

Learn more about Cloud spending, Cloud adoption, Cloud data and Cloud impacts from this infographic.

http://www.imaginea.com/cloud/index.html

Cloud Computing

Thanks to http://meship.com/Blog/2014/03/10/key-cloud-computing-statistics/

 

Android App development – 5 points to consider

Read some cool tips and tricks in Android App development from our experts in the following infographic.

We worked with a social shopping company to create an Android app to accompany their already successful iPhone version. Working with their amazingly talented team, we guided them towards creating an app that genuinely adapted to the Android platform instead of being a mere copy of the iPhone version.

5 points for developing Android App

http://www.imaginea.com/mobile-app-android/

Using PaaS and going beyond

In this infographic, you will find information about the details of a survey which suggests a high growing demand for PaaS architecture from organizations looking for faster development and deployment circles.
Please go through the infographic in detail …

 

Using PaaS - results from a Survey

For more details on the solution, you can visit www.imaginea.com

 

Test Link setup on Windows:

Prerequisites:

 

Configuration Setup:

           

Old path:

$tlCfg->log_path = ‘/var/testlink/logs/’;

$g_repositoryPath = ‘/var/testlink/upload_area/’;

 

To be updated path:

$tlCfg->log_path = ‘D:/xampp/htdocs/testlink-1.9.7/logs/’

$g_repositoryPath = ‘D:/xampp/htdocs/testlink-1.9.7/upload_area/’;

 

 

 

 

 

 

 

 

 

 

 

Cloud Security – Part 2: “Security with multiple tenants sharing same infrastructure”

Welcome to the part 2 of the blog series ‘Cloud Security’. In Part 1 of the series, we raised some important questions about the security in the cloud. Now, in this blog post, we would like to answer one of the most important questions that we encounter when we talk about Cloud Security.

“How secure is my data when multiple tenants share the same infrastructure?”

Well, this is a tricky question that keeps cropping up again and again. In this blog post, we place a few of the different components in perspective to see which areas need to be addressed. First of all, a question arises, why do multiple tenants share the same infrastructure? The answer is, organizations want to gain price and performance advantages, and thus end up sharing the same infrastructure.

Let us understand the term ‘multi-tenancy’. It simply means, many tenants share same resources and this turns out to be very efficient and scalable. In IaaS, tenants share infrastructure resources like hardware, servers, and data storage devices. In SaaS, tenants source from the same application (for example, Salesforce.com), which means that data of multiple tenants is likely stored in the same database and may even share the same tables. When it comes to security, the risks with multi-tenancy must be addressed at all layers.

 

Shared Premises / Shared Data centers:

In a ‘shared premises’ context, a dedicated rack is the safest unit you can own. However we need to ensure that the power cables are secure and redundant paths are available for power. Also, we should check whether the network cables are secure and whether the redundant paths are available for network. A point to be noted here is that the rack is always locked and cameras monitor the rack and are capable of a playback for a determined period of time.

infrastructure.png

 

Whereas in a ‘shared racks’ context, there is always an element of risk as multiple tenants have access to the Rack. An ideal condition would be to make it a managed service and provide access only to the service provider. Doing so ensures that the untrained / semi-trained hands may not affect the services of a co-tenant.

Shared Hardware:

In an instance where one cannot afford dedicated hardware, one has to settle in for one of the following:

Out of the above, a separate VM is the next best secure element. In order to ensure that the VM is secure, we first need to encrypt the VM image and ensure that the bios password is in force so no one can tamper with the boot order. For additional security, we need to ensure that a boot loader password is in force.

 

As we look upon the Shared Hardware scenario, we encounter that there are other elements where we need to be careful about, such as, Disks, Processors, Memory, Hypervisors etc.

Let us look at each one of them in detail:

Disk

We first need to ensure that the disk should be encrypted with a key recorded by the administrator and no user-end encryption should be enabled. Many a times, we find this feature and it is done to facilitate data recovery in case the employee is not available to recover sensitive / important data. Another important security measure would be to dispose or reassign the disk after due cleanup.

Processor

We need to ensure that the processor should have a secure ring architecture so that the hypervisor operates in a higher security zone than the VMs.

OS

When multiple tenants share the same infrastructure, we need to check the OS specially for extra security. We need to facilitate jails / chrooted environment for different tenants, so one can not see the other’s data.

Hypervisor

A hypervisor or virtual machine monitor (VMM) is a piece of computer software, firmware or hardware that creates and runs virtual machines. Hypervisors main job is to map traffic from VMs to the underlying VM host hardware so that it can make its way through the data center and out to the Internet and vice versa.  As the  hypervisor intercepts all traffic between VMs and VM hosts, it is the natural place to introduce segmentation for the resources of IaaS tenants where VMs might be housed within the same VM host or VM host cluster. We should not share direct access to any devices to the VMs.

 

Also another major security concern in the virtualized infrastructure is that the machines are owned by different customers. These machines can escalate the risk for many types of breaches such as unauthorized connection monitoring, unmonitored application login attempts, and malware propagation.

VM segmentation and isolation is an absolute requirement for VMs containing regulation and compliance intense data like employee details, customer information, etc. Most regulatory mandates such as ISO 27001, SAS 70, Payment Card Industry Data Security Standard (PCI DSS), SSAE 16 and Health Insurance Portability and Accountability Act (HIPAA) require that access be limited to a business’ need to know, and that control policies be set in place to enforce blocking of unwarranted access.

Hope this post has answered the question completely. If you have any further queries, do not hesitate to contact us. You can also comment / share your observations about the topic here.

We are waiting…

Security in the Cloud – Part 1

Security in the Cloud 

We will publish a series of blog posts on Cloud Security. This is the first blog post in the series.

One of the ‘security-as-a-service’ providers conducted a survey of their 2,200 customers about cyber-attacks. The results are startling, they reveal that cyber-attacks on cloud environments are increasing at an alarming level as more and more enterprises move their data to the public cloud.  According to the report, as more and more enterprises transfer their data and processing activities to the cloud, traditional on-premises cyber-attacks have also moved to the cloud. The report highlights a 14 percentage points year-on-year increase in brute force attacks while vulnerability scans on cloud setups have risen by 17 percentage points year-on-year. More info about the report can be found here.

 

Truly, enterprises and businesses have always been reluctant to move away from traditional IT to adopt cloud model. They always were skeptical about data security, and their doubt is genuine whether the data is protected to the same levels as in an on-premises setup.

 

This topic brings us to a very important point: Who controls the data that is hosted in the cloud? Before the Public Cloud came into the picture, enterprise data was safe within the premises and IT could have complete control over it. Now with the cloud, data is under the organizational control, but it rests elsewhere physically and is managed by someone else.

 

Questions such as the following arise:

 

You  can share your answers / ideas / solutions in the comment box.

We are waiting for your response…

 

Top CIOs concerns about Cloud Deployment

In this infographic, you will find more about the top challenges for cloud deployment for CIOs across the globe.

You will also find out some interesting facts about Cloud.

Go on and view the infographic…

 

 

 

Leveraging Devops Automation to Achieve Maximum Benefits!

What if… you could reduce standard infrastructure setup and configuration time from 8 hours to 15 minutes?

What if… you could achieve YoY engineering cost reduction of 37 percent?

What if… you could reduce time to market for new features by 20 percent?

Aren’t these stats interesting? Here’s how a San Francisco-based IT Company leveraged Devops Automation and achieved this (Download Case Study)!

The company looked to increase its market share by lowering the subscription prices and offer more product features while maintaining profitability. This was quite a challenge where developers and architects struggled in order to achieve this goal. They spent too much time migrating code between local, development, user acceptance testing, and production environments. Without much success, they lost 20% of development time in fine tuning.

The company then approached Imaginea Cloud Services team to look into the problem and suggest solutions that could shorten the development cycle duration and improve asset utilization. They also wanted to cut Opex and CapEx.

Imaginea’s unique solution helped developers to achieve continuous delivery and continuous integration to automate and improve the software delivery process, thus accelerating the build, deploy, test, and release processes.

Imaginea helped the company automate much of their development grunt work and significantly reduced operational rework. This IT team completely redefined its existing perception of “how much can be done in how much time”.

Interested in reading more about this. You can visit us and know more.

You can also download the complete case study here.

 

A staggering fact, but true! A Gartner survey estimates that downtime caused by incorrect manual configurations cost small and medium sized businesses $42,000 an hour, with figures in the millions for larger enterprises.

 

Lucene Custom Scoring – Custom Score Query and Custom Score Provider

In my previous post I had written on the different types of boosting. In addition I had also provided an introduction to the concept of scoring. I had promised in my previous post a series of posts on how to achieve custom scoring. There are many a means to achieve custom scoring, too many in fact to cover all of them in a single blog post. In this post we will take a look at the oft used custom scoring technique of using a custom query in conjunction with a custom score provider.

Prerequisites:

1. It is expected that the reader is aware of the basic concepts of Lucene like Document, Indexing and Analyzing, tokens, terms and querying.
2. Reader should at minimum be acquainted with the use of the basic Lucene API objects like IndexReader, IndexWriter, Query, Directory etc.
Code Samples

The code for this example can be found here.

Notes to set up and run the demo program

1. Download the source code. The code makes use of the latest version (as of date of writing this article) of Lucene -> 4.6.
2. Run mvn package which will generate the JAR –> boost-imaginea-demo-1.0.jar
3. Place this jar along with the following Jars in a folder say “C:\Imaginea-Boost-Demo”.
         a. lucene-analyzers-common-4.6.0.jar
         b. lucene-core-4.6.0.jar
         c. lucene-queryparser-4.6.0
         d. lucene-queries-4.6.0
4. The program usage is as below,

Param 1: Type of scoring:
customscorequery — Custom Score Query and Custom Score Provider Demo

Copyright to Wikimedia

Image Copyright from Wikimedia and the person who posted it there.

I must admit, I am obsessed with SUVs (affording them with Indian taxes regime is another thing though) and wish to sneak them into my technical blog pursuits as well. I will reuse my previous examples of SUVs boosted on white colour and origin. You may please proceed to the technical content below after you are done ogling at the white Scorpio above in all its beauty.

It becomes necessary to score documents individually at the time of querying. We had seen in the previous post of how to achieve query time boosting by assigning a higher score to a specific data set in the query. What if you have a lot of scoring logic to perform on top of the data you run into while querying? It may not be possible to specify all this logic in the query. This is where a custom score query comes in. This in conjunction with a custom score provider provides a neat way to put in our custom scoring logic. To make it better, Lucene neatly hands over to your custom code the scores it calculated in itself which you can further manipulate and provide a final score or pass it on to a super class for coming up with its final score after considering your manipulated inputs in its calculation.

Let’s write a custom score query now shall we? The class you write should extend CustomScoreQuery.

public class ImagineaDemoCustomScoreQuery extends CustomScoreQuery {

public ImagineaDemoCustomScoreQuery(Query subQuery) {
super(subQuery);
}

@Override
public CustomScoreProvider getCustomScoreProvider(final AtomicReaderContext atomicContext) {
return new ImagineaDemoCustomScoreProvider(atomicContext);
}

}

That’s it. We have just written a custom score query and overridden a method which in turn hands out a custom score provider. Now, let’s write our own custom score provider and fit the pieces together.

public class ImagineaDemoCustomScoreProvider extends CustomScoreProvider {

private static AtomicReader atomicReader;

public ImagineaDemoCustomScoreProvider(AtomicReaderContext context) {
super(context);
atomicReader = context.reader();
}

@Override
public float customScore(int doc, float subQueryScore, float valSrcScore)
throws IOException {
Document docAtHand = atomicReader.document(doc);
String[] itemOrigin = docAtHand.getValues(“originOfItem”);
boolean originIndia = false;
for (int counter=0; counter<itemOrigin.length; counter++) {
if (itemOrigin[counter] != null &&
itemOrigin[counter].equalsIgnoreCase(“India”)) {
originIndia = true;
break;
}
}
if (originIndia) {
return 3.0f;
} else {
return 1.0f;
}

}

}

The custom score provider is in place too. It is seen that in the overridden customScore method of the custom score provider implementation, the individual documents have been accessed and it is checked to see if the pertinent SUV has its origin in India. Such documents are boosted to a score of 3.0f whilst the others have their original score of 1.0f. Now that we have the custom score query and the custom score provider in place let’s write the code which will employ them to provide customized scoring.

 

IndexReader idxReader = DirectoryReader.open(ramDirectory);
IndexSearcher idxSearcher = new IndexSearcher(idxReader);
Query queryToSearch = new QueryParser(Version.LUCENE_46, “itemType”, analyzer)
.parse(queryToRun);

CustomScoreQuery customQuery = new ImagineaDemoCustomScoreQuery(queryToSearch);

ScoreDoc[] hitsTop = idxSearcher.search(customQuery, 10).scoreDocs;

Note that the constructor for our custom query accepts a query as a parameter. Internally Lucene runs the query, calculates the score and for each document encountered calls the customScore method of our custom score provider class and allows us to manipulate the score.

Now, let us run the example for ourselves and see some sample data of how this works.

The command to be used is as below,

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-.6.0.jar;lucene-queryparser-4.6.0.jar;lucene-queries-4.6.0.jar com.imaginea.scoring.ScoringExamples customscorequery

 

In the example code a simple query is done without any custom scoring and it is seen that the documents all have a similar score of 0.8. Using our custom score query it is seen that all the SUVs from India have been boosted to the top with individual score of 3.0f each. Simple isn’t it?

Understanding Lucene Boosting – Part 1

Lucene is one of the most popular open source search tools offering high scalability, robustness and versatility leading to many an entire enterprise search servers/engines built around it. Solr and Elastic Search come immediately to mind. Twitter with its humongous data volumes and scalability requirements has in place a search architecture built around a customized lucene version. Lucene offers excellent real world functionality like hit highlighting, spell checking, tokenizing and analyzing etc but one of the powerful and oft used feature is boosting. Most well designed and built websites today offer some degree of search functionality to them which range from searching plain text content within the site to specific content hidden inside binary documents. Lucene in conjunction with many other plugins/tools plays a big part in this.

 

So what is boosting anyway?

 

One of the real world functionality mentioned in the paragraph above is the concept of boosting, you might have inadvertently experienced this in your searches in some website out there. A good example is google in itself where you are shown some search results boosted to the top (or a place which catches your attention) as they would be from a sponsored source. In essence google has boosted the sponsored search result to the top to bring it to prominence. A well designed search interface would provide the ability to adapt to user input and modify the search results accordingly, say in drilling down or choose from among a first of equals. This is where boosting can play a big part. It therefore becomes important to understand boosting in its entirety. Given the complicated inner workings of how lucene gets boosting to work it would be better to understand this in phases. With that in mind I present to you the first of the 3 part series on what boosting is and how it works. A quick glance on what the three part series has to offer,

1. Part 1 (current article) – What is boosting? The different types of boosting and a quick look into the some of the underlying concepts like scoring and norm.

2. Part 2 – A deeper look into scoring with special focus on customizing the scoring to our need. This part will be further broken up into individual pieces covering such topics as custom query implementation, custom score provider, scoring using expressions etc.

3. Part 3 – Lucene by default uses a combination of the the Tf/Idf Vector space and Boolean models for scoring purposes. There are many other models apart from the default one used by Lucene which will be looked into in this part. This part will complement and drill deeper into the areas covered in part two.

                So let’s get started with part 1 but first a quick look into the prerequisites and the code that comes along with this article.

 

Prerequisites:
1. It is expected that the reader is aware of the basic concepts of Lucene like Document, Indexing and Analyzing, tokens, terms and querying.
2. Reader should at minimum be acquainted with the use of the basic Lucene API objects like IndexReader, IndexWriter, Query, Directory etc.

 

Code samples:
Present here is the example code to be used in conjunction with this article to understand the topic at hand. The code demonstrates the 2 types of boosting in Lucene (Indexing and Query Time) and also prints out the various scoring information associated with the results. The code is in the form of a Maven project and uses a RAMDirectory for ease of use.

 

Notes to set up and run the demo program,
1. Download the source code. The code makes use of the latest version (as of date of writing this article) of Lucene -> 4.6.
2. Run mvn package which will generate the JAR –> boost-imaginea-demo-1.0.jar
3. Place this jar along with the following Jars in a folder say “C:\Imaginea-Boost-Demo”.
         a. lucene-analyzers-common-4.6.0.jar
         b. lucene-core-4.6.0.jar
         c. lucene-queryparser-4.6.0
         d. lucene-queries-4.6.0
4. The program usage is as below,

 

Param 1: Type of boost:

index – Index Time Boosting

query – Query Time Boosting

both – Demo both Index and Query boosting

Param 2: Print scoring info: Either true or false

 

5. Example commands are as below,
C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples index false

 

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples query false

 

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples both false

First up in this article we need to pay a visit to the very important concepts of Scoring and Information Retrieval Models whose understanding will lay a good foundation towards understanding how boosting works beneath the hood.

 

Scoring:

You would most certainly have run into scoring in your routine Lucene search queries, after all, Lucene sorts the query results based on their “score” if you don’t specify any sorting criteria. Every document has a score to it indicating how relevant it is to the search query specified. Lucene assigns a score to every document brought up by the search after running some number crunching (more of it to come in this article) and presents the results sorting on this score with the highest valued ones first. This scoring process begins the moment the query has been processed and submitted to the IndexSearcher object. The first set of documents retrieved from the search are by means of a Boolean model (see information retrieval models below) which basically checks to see if the document at hand has the term/token or not. Once the basic subset of documents from the index have been retrieved that the scoring process begins which involves assigning of score to each document in this subset. It is by means of manipulation of the score attached to a given document that it is possible to selectively elevate the score of a subset of documents and boost them to top of the search results.

 

Information Retrieval Model:    

Now to understand how the scoring process crunches numbers and assigns a score to each document we will need to bring into context the concept of the Information Retrieval models. The theoretical world of information retrieval is rife with several models which deal with coming up with information relevant to a search query. When Lucene started out only the Boolean and the Vector Space models were implemented in it. The Vector space model is still the default Lucene model but the first subset of documented returned by the search before they are scored is always through the Boolean model which checks the presence of the search tokens in the documents. The more recent version of Lucene have had more number of information retrieval models added to them. The complete list is as below,

1. Vector Space Model

2. Probabalistic Relevance Models. There are many flavours to this like DFR (Divergence From Randomness) and BM25.

3. Language Models.

As mentioned earlier Lucene by defalt uses the Vector Space Model. Lucene permits the changing of the models used for scoring using the Similarity class. We will be looking at the changing and implementing of custom scoring and information models in parts 2 and 3 respectively. For now refer to this link to understand how Lucene implements the vector space model.

 

Different types of boosting:

Lucene supports two types of boosting, they are as below,

1. Index time boosting.

2. Query time boosting.

Although index time boosting earlier comprised of both field boosting and the document as a whole, the latter was discarded in later versions due to its irrelevance and other associated issues. For now index time boosting is only possible at a field level. Let us delve into both these in greater depth.

 

Index Time Boosting:

You would have come across the following type in the Field.Index Enum (stands deprecated now starting version 4) –> “ANALYZED_NO_NORMS“. Note the term “NORM” which is relevant in the context of index time boosting. More on it in a short while but first to define Index Time Boosting. Index time boosting is basically programatically setting the score of a field(s) (and thus impacting that of the overall document) at the time of indexing. However, you are not actually setting the score here, score is dependent on a lot of factors (for example the tokens in the query in itself which adds to the score), so what is being set is a number against a field which plays a part in the calculation of the score based on the query. This is where NORM comes into play. Norm is basically that one number against the field which affects the document’s score and thus position in the search result pecking order. Norm basically is short for normalized value. The Norm values are added to the index and this can potentially (again, potentially) help increase the query time.

When should I use index time boosting?

This pretty much depends on the business scenario at hand. For those scenarios where you know which subset of documents need to be boosted before hand, index time boosting would come in useful. Let us take a real world example here, say you have a shopping site selling cars with visitors from around the world. It is required that the search results for cars be boosted to the country of the user currently logged in. Say boost all products which are based in India to those users who have current address country as India?

Let us go ahead and add some documents to the index,

public void populateIndex() {
try {
	System.out.println(printBoostTypeInformation());
	indexWriter = new IndexWriter(ramDirectory, config);
	boostPerType("Lada Niva", "Brown", "2000000", "Russia", "SUV");
	boostPerType("Tata Aria", "Red", "1600000", "India", "SUV");
	boostPerType("Nissan Terrano", "Blue", "2000000", "Japan", "SUV");
	boostPerType("Mahindra XUV500", "Black", "1600000", "India", "SUV");
	boostPerType("Ford Ecosport", "White", "1000000", "USA", "SUV");
	boostPerType("Mahindra Thar", "White", "1200000", "India", "SUV");
	indexWriter.close();
	} catch (IOException | NullPointerException ex) {
	System.out.println("Something went wrong in this sample code -- "
				               + ex.getLocalizedMessage());
		        }
	}

	protected void boostPerType(String itemName, String itemColour,
			String itemPrice, String originOfItem, String itemType)
			throws IOException {
	  Document docToAdd = new Document();
	  docToAdd.add(new StringField("itemName", itemName,
			Field.Store.YES));

	  docToAdd.add(new StringField("itemColour", itemColour,
                                               Field.Store.YES));
	  docToAdd.add(new StringField("itemPrice", itemPrice,
                                               Field.Store.YES));
	  docToAdd.add(new StringField("originOfItem", originOfItem,
				               Field.Store.YES));

	  TextField itemTypeField = new TextField("itemType", itemType,
                                               Field.Store.YES);
	  docToAdd.add(itemTypeField);
	  //Boost items made in India
	  if ("India".equalsIgnoreCase(originOfItem)) {
		itemTypeField.setBoost(2.0f);
	  }
	  indexWriter.addDocument(docToAdd);
	}

 


The cars have been added to the index in a random order. Notice these particular lines of code in the method boostPerType,

          //Boost items made in India
	  if ("India".equalsIgnoreCase(originOfItem)) {
	 	itemTypeField.setBoost(2.0f);
	  }

Here, the field “originOfItem” is being specifically matched against the text “India” and a specific boost is being assigned to the field. Let us write a query which just does a term search for “suv” against the itemType field. The query would be as below,

itemType:suv

The code which performs the search is as below,

public void searchAndPrintResults() {
try {
  IndexReader idxReader = DirectoryReader.open(ramDirectory);
  IndexSearcher idxSearcher = new IndexSearcher(idxReader);
  Query queryToSearch = new QueryParser(Version.LUCENE_46, "itemType",
                                  analyzer).parse(getQueryForSearch());

  System.out.println(queryToSearch);
  TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
  idxSearcher.search(queryToSearch, collector);
  ScoreDoc[] hitsTop = collector.topDocs().scoreDocs;
  System.out.println("Search produced " + hitsTop.length + " hits.");
  System.out.println("----------");
  for(int i=0;i<hitsTop.length;++i) {
  int docId = hitsTop[i].doc;
  Document docAtHand = idxSearcher.doc(docId);
  System.out.println(docAtHand.get("itemName") + "\t" +
                                         docAtHand.get("originOfItem")
       	  + "\t" + docAtHand.get("itemColour") + "\t" +
                                         docAtHand.get("itemPrice")
	  		         + "\t" + docAtHand.get("itemType"));

 if (printExplanation) {
    Explanation explanation = idxSearcher.explain(queryToSearch,
                                                    hitsTop[i].doc);
    System.out.println("----------");
    System.out.println(explanation.toString());
    System.out.println("----------");
     }
   }
} catch (IOException | ParseException ex) {
    System.out.println("Something went wrong in this sample code -- "
                   + ex.getLocalizedMessage());
} finally {
			ramDirectory.close();
   }		

}

 
Let us take a look at the results, you can also try this in the demo code by running the following command,

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples index false

The output would be as below, notice that all the documents with the country India have been boosted.

index-without-explanation  We can actually take a look at how Lucene has calculated the score for our query result documents. It will be seen that the score of the boosted car results with origin as India will have a score much higher than the others. You can also try this in the demo code by running the following command,

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples index false

 
index-with-explanation   It is seen that the boosted India origin cars have a higher score than the ones not boosted. 1.69 > 0.8.

Query Time Boosting

We noted that in index time boosting, the normalized value is assigned to a field which is later used in calculating score at the time of querying. In Query time boosting the boost value is directly specified at the time of querying. You could this directly using the setBoost method of the various query objects or directly in the query. Let us look at an example using the same data set of cars. There is a slight change in requirement though. It is now required that the cars of white colour are boosted to the top of search results. Let us write a query for this,

itemColour:white ^2 OR itemType:suv

 


Note the text “^2″ which immediately follows the term itemColour:white. Here we have boosted that those documents which have a colour white be assigned higher rank and thus boosted. Let us take a look at the results, you can also try this in the demo code by running the following command,

C:\Imaginea-Boost-Demo>java -cp boost-imaginea-demo-1.0.jar;lucene-analyzers-common-4.6.0.jar;lucene-core-4.6.0.jar;lucene-queryparser-4.6.0.jar com.imaginea.boost.BoostExamples query false

 
query-without-explanation When should I use query time boosting?   When you require the search results to be driven by the user input or if you need to bring in specific boosts — for example you look up an external service to look up sponsored cars and boost these in specific, you did not have this information pre-hand and were thus unable to boost at index time.
 Using the explain method of the searcher to understand what happens under the hood
 In the example code above you would have noticed the following line

Explanation explanation = idxSearcher.explain(queryToSearch, hitsTop[i].doc);
System.out.println("----------");
System.out.println(explanation.toString());

 
The explain method of the IndexSearcher object is a powerful tool to understand how Lucene has calculated the score and will be helpful in debugging as well.

 

—————————————————————————————————————-
Hope this part one was useful in understanding the basics of boosting. More on boosting coming up in parts 2 and 3. Please do feel free to leave any comments for feedback or corrections in the content.