Friday, December 6, 2013

Performance testing on an online library



     Recently we tested an online library. This library serves many technical documents,  each document is in the form of a book.  Users can search and read the books all from his/her browser.    We need to find the capacity of this library, defined as  the number of users that can use this library concurrently and still have positive experience.

   The development team had gone out of their way to make the user experience as pleasant as possible, for example, when a user flips a page, it will load immediately, why? the system will try to pre-load the next page while the user is reading the current page.  Assume a user spends an average of 20 seconds reading a page, we want to find out how many users can the library serve while most of page loading times are still within  20 seconds.

   In summary, here is a simple test scenario:
  • user login to the library
  • pick a book
  • go to a random page in the book and read all the way to the last page. 
     The login step is relatively simple and so is the step on getting a list of books.  The hard part is to emulating a user flipping through the pages.  The web page representing a book contains a flash application.  According to the developer tool,  when user flips to the next page, the flash application will send a HTTP request like  the following:
 .../<bookId>/<prefix><pageNumber>.swf  

    It's easy to find the list of book IDs.  However, it's not quite easy to find the following

  • "prefix" (it varies with the bookId)
  • max pages of the book - used to generate a random page in the book
    Luckily, the following blob of text in the HTML page of the book has the information we need (see the bold text).
 function(){setPreviewSource('/123456789/410/{ef[*,0].swf,96}&DownloadEnabled=false&PrintEnabled=false&PrintOnePageEnabled=false','/bitstream/123456789/410/3/ef.js','/bitstream/123456789/410/2/6515695376  

     With this observation, we can easily write a script to extract those fields
  • URL Path (green part), 
  • prefix (the yellow part)
  • max number of pages (pink part).
     We are going to use regular expression setPreviewSource\(\'([^\{]+)\{([^\[]+).*swf\,(\d+)/ to do the extraction.
 //script A
 function VUSER() {  
      action(http,"http://www.example.com/"); //this step will get a sessionID in cookie   
      a.login_email = "jsmith@example.com";   
      a.login_password = "abc123";   
      http.POSTData = combineHttpParam(a);   
      action(http,"http://www.example.com/password-login");   

      for (id = 1; id < 500; id ++) {  
           action(http, "http://www.example.com/handle/123456789/${id}");  
           if (match(http.replyBody, /setPreviewSource\(\'([^\{]+)\{([^\[]+).*swf\,(\d+)/) {  
                println("${g1},${g2},${g3}");  
           }  
      }  
      action(http,"http://www.example.com/logout");   
 }  
     Note that the three variables g1, g2, g3 represent the 3 fields extracted by the regular expression. We run it with 1 Vuser and collect the outputs of "println" to produce a csv file (say, "books.csv") which we will use in the next step.  The csv file looks like:
 ...  
 /123456789/410,ef,96  
 ....  

    On a side note, this shows that our platform can not only be used for performance testing, it potentially can be used to build handy tools :-)

    Now we are going to use the csv file in the implementation of the test scenario,

  //script B
 function userInit() {  
      var db = fromCSV("books.csv");  
 }  
 function VUSER() {  
      action(http,"http://www.example.com/"); //this step will get a sessionID in cookie   
      a.login_email = toUrl("jsmith@example.com");   
      a.login_password = "abc123";   
      http.POSTData = combineHttpParam(a);   
      action(http,"http://www.example.com/password-login");   

      a = getNext(db);  

      for(num = randNumber(1, a[2]); num <= a[2]; num ++ ) {
           action(http,"http://www.example.com/handle/${a[0]}${a[1]}${num}.swf");  
           if (http.totalRespTime > 20000) {  
                print("${http.totalRespTime},${http.url}");  
           }
      }
 }  
     Here the function call getNext(db) will grab the next row in the csv file,  each row consists of 3 elements: URL_path,  prefix and max_pages. Denoted by a[0], a[1], a[2] respectively.   So randNumber(1, a[2]) simply means a random page in this book.
     We ran the script B multiple times, each time with a different number of VUsers and found the number of Vuser when the response times are still acceptable.  The development team are quite happy at how simple the scripts are. They have tried JMeter, which was not as flexible.

    Development of an online library can be challenging, performance testing on the library doesn't have to be, what do you think?

No comments:

Post a Comment