Saturday, January 4, 2014

Testing elasticseach


    Real time analytics on real time data has been increasingly used in today's e-commerce.  Elasticsearch is a great tool for that. Its distributed architecture, high availability and full text searching capability have gained a lot of followers. Not only it has a long list of partners (27 and growing), it has many thriving communities.

    For a customer who runs Elasticsearch software on the cloud infrastructure,  it's important to know the capacity of the reserved instances.  This will help the customer to reserve less instances and still meet his/her business needs.  In this blog, we are going to show how easy it is to run some performance test on ElasticSearch with the NetGend platform.  I would like to thank Joel Abrahamsson's excellent tutorial on the basic commands on ElasticSearch to insert/view records and do queries.

    We are going to do some simple tests on the ElasticSearch software running on my Ubuntu 12.04 server with Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz.   First, let's insert a comprehensive DVD list into the database so that we can do some searchs on it.  The dvdlist is in the format of

 DVD_Title|Studio|Released|Status|Sound|Versions|Price|Rating|Year|Genre|Aspect|UPC|  
 DVD_ReleaseDate|ID|Timestamp  
 !!!! Beat, Vol. 1: Shows 01 - 05|Bear Family||Discontinued|2.0|4:3|45.98|NR|UNK|Mus  
 ic|1.33:1|4000127201263|2005-05-10 00:00:00|61689|2012-12-24 00:00:00  
 !!!! Beat, Vol. 2: Shows 06 - 09|Bear Family||Discontinued|2.0|4:3|45.98|NR|UNK|Mus  
 ic|1.33:1|4000127201270|2005-05-10 00:00:00|61690|2012-12-24 00:00:00  
    Here is the NetGend script that does insertions.
 //Insert dvd lists to the database
 function userInit() {  
      var db = fromCSV("dvdlist.csv", "|");  
      rec = getNext(db); //skip the first record, which is header  
      var gId = 1;  
 }  
 function VUSER() {  
      id = gId ++;  
      rec = getNext(db);  
      http.POSTData = q|{  
 "title": "${rec[0]}",  
 "price": ${rec[6]},  
 "status": "${rec[3]}"  
 }|;  
      http.method = "PUT";  
      action(http,"http://localhost:9200/tests/test/${id}");  
      res = fromJson(http.replyBody);
      if (res.ok != true) {
            println("failed to insert rec ${id} ${res.error}");
      }
      //exit();  
 }  
     We were able to achieve 2500+ inserts/second with about 80% CPU usage on ElasticSearch process.

     Note that in constructing a JSON message, we don't have to escape the multiple double quotes in the template as on some other test platforms, which could be quite tedious.  In the last example of this blog, we will see an even simpler way of constructing a JSON message.

     Now let's make sure the records exist.
 function userInit() {  
      var gId = 1;  
 }  
 function VUSER() {  
      id = gId ++;  
      action(http,"http://localhost:9200/tests/test/${id}");  
      res = fromJson(http.replyBody);  
      if (res.exists != true) {  
            println("record ${id} doesnot exist");
      } 
 }  
    This is quite light on CPU.

    Finally let's do some simple searchs, our search items are from a comprehensive list of nouns with 2000+ items.   It's a simple file where each line is a noun.
function userInit() {  
 var db = fromCSV("nounlist.txt", "\n");
}  
function VUSER() {  
 rec = getNext(db); //each record corresponds to a row in csv file, so it's an array
 q.query.query_string.query = rec[0];
 http.POSTData = toJson(q);
 action(http,"http://localhost:9200/tests/_search");
 res = fromJson(http.replyBody);
 if (res.took > 15) {
  println("query for ${rec[0]} took ${res.took} ms");
 }   
}  
    When we send requests at the rate of 1000/second, the CPU usage on the elasticsearch process is 134%.  So searching can be heavier than insertion.

    This blog shows some simple examples on testing an elasticsearch system.  It's clear that NetGend platform greatly simplifies the processing of JSON messages, which are essential to test elasticsearch.

    We will be using it to do some real world testing, so stay tuned.

No comments:

Post a Comment