WSO2 published results for Round 6.5 on the 30th of January.

1. The Amazon EC2 AMI Image was not published / The results cannot be repeated

  • The Introduction of the article reads: "The performance benchmarks are performed in Amazon EC2 as done in the ESB performance Round 6, so that they can be independently verified and repeated" 

  • Publishing the actual image publicly shows transparency, and allows for third party validation and verification of the test results, and the detailed analysis of execution log files.

  • No reasons were given by WSO2 as to why the actual AMI was not published.

2. Results of two independent execution runs were presented as results obtained on one and the same node

  • The Round 6.5 Conclusion reads: "This article presents a performance study comparing WSO2 ESB 4.6.0, WSO2 ESB 4.5.1, Mule 3.3.0, Talend-SE 5.1.1, and UltraESB 1.7.1"

  • When preparing the results analysis, WSO2 combined separate and independently obtained results from two different executions/sources, into one comparison table.

  • The results for Mule CE, Talend ESB, UltraESB were extracted from the results published in Round 6 by AdroitLogic. These tests executed on the 3rd of August 2012, on an EC2 node on the Amazon "us-east" zone.

  • The results for the WSO2 ESB was extracted from results obtained on an EC2 node on an unknown zone, on an unknown date. In addition, the WSO2 team used a different Load Generator, Back-end Mock service, and altered the request payloads and scripts.

round6.5

3. Failed to identify the failure of all XSLT Test cases

  • The configurations published by WSO2 (also archived at https://bitbucket.org/adroitlogic/esbperformance/src/f3a9247f52e8/wso2-esb/wso2-resources-6.5?at=default) along with the WSO2 ESB 4.6.0 release (MD5 hash 3738e26e330de53e9f87f690dc2cb99d) fails the XSLT test case as reported in the Round 7. However the engineers conducting the performance test was unable to detect this, and reported the figures as valid results, stating "While the XSLTProxy remains significantly less performant than the equivalent UltraESB test,..".

  • The benchmark-client-wso2.zip published by WSO2 along with the Round 6.5 includes a log file named "load_full.txt" which includes clear proof that the XSLT test case failed when the tests were executed by the WSO2 engineers as well. Only a response of 363 bytes was returned by the WSO2 ESB for request payload sizes of 500 bytes, 1K, 5K and 10K - which under normal circumstances should return a response of the same size as the request. This was not noticed by the WSO2 performance test engineers.

benchmark-client-wso2.zip [load_full.txt]

Begin performance test...
Wed Jan 16 05:39:31 UTC 2013
...
XSLT 500B
...
Document Path:          http://localhost:8280/services/XSLTProxy
Document Length:        363 bytes
...
XSLT 1K
...
Document Length:        363 bytes
...
XSLT 5K
...
Document Length:        363 bytes
...
XSLT 10K
...
Document Length:        363 bytes
...
  • As per the same full_log.txt found within the benchmark-client-wso2.zip published by WSO2 along with the Round 6.5, we can see that the new enhanced XSLT test introduced against the WSO2 ESB also failed with a constant response of size 258 bytes for all request payload sizes. This too has proceeded unnoticed by the WSO2 performance test engineers.

Failure of the Enhanced XSLT Test cases

...
XSLTEnhancedProxy 500B
...
Document Path:            http://localhost:8280/services/XSLTEnhancedProxy
Document Length:        258 bytes
...
XSLTEnhancedProxy 1K
...
Document Length:        258 bytes
...
XSLTEnhancedProxy 5K
...
Document Length:        258 bytes
...
XSLTEnhancedProxy 10K
...
Document Length:        258 bytes
...
  • This information shows that although the WSO2 performance test engineers failed to detect this failure, they published the higher performance numbers obtained for 363 byte static responses as valid high performance numbers that were better than other ESBs.

4. 4. Failed to identify the defect causing response corruption

  • The WSO2 ESB 4.6.0 release (MD5 hash 3738e26e330de53e9f87f690dc2cb99d) as well as the WSO2 ESB 4.7.0 release (MD5 hash 8dc1cea6e99ed2ef1a2bb75c92097320) corrupts the response payload for at least the CBR SOAP Body and CBR SOAP Header test cases for payload size 100K as reported in Round 7. Detailed analysis discovered that this defect occurs for any payload over 16,384 bytes processed by the WSO2 ESB versions. However the engineers conducting the performance test, nor the quality assurance engineers of WSO2 have been able to detect this defect to-date, and remains on the current Milestone 4 build of the yet to be released 4.8.0 version, as of the 9th of October 2013.

5. 5. Erroneous Observations

  • Table 1 title reads: "Number of messages per client n=1000 up to 320 concurrency and n=10 for higher concurrency (1280/2560)"

    • The table title seems to indicate that the results for each of the ESBs were conducted under these conditions. i.e. Using 1,000 requests for the 20,40,80,160 and 320 concurrency levels, and 10 requests for the 1280 and 2560 concurrency levels. The level used for 640 concurrency is not documented. The results for the http://esbperformance.org benchmark used 10,000 requests for the 20 and 40 concurrency levels, 1,000 requests for the 80, 160, 320 concurrency levels and 10 requests for the 640, 1280 and 2560 concurrency levels - as the original benchmark was developed to test 8 open source ESBs, of which some took an extremely longer time to compete the larger tests with huge levels of concurrency and load.

    • However, the WSO2 team placed the results for the WSO2 ESB 4.6.0 taken ONLY with 10 requests for each concurrency level, along with the results for the Mule CE ESB, Talend SE ESB and the UltraESB directly off the results published under Round 6.

Selection 001
  • Leaving the matter of possible response corruption with the WSO2 ESB for payloads larger than 16K aside, this presentation would cause confusion to an average user reading the table title, and finding non-compliant and incomparable results within the table.

  • The WSO2 summary table and graphs also include WRONG averages for the Mule CE ESB (See highlighted Cells and compare with the Round 6 or the Google Document of Round 6.5). The errors causes a disadvantage to the Mule ESB and should be rectified.

Selection 004
  • Table 2 title reads: "Number of messages per client n=1000 up to 320 concurrency and n=200 for higher concurrency (1280/2560)"

    • The table title seems to indicate that the results for each of the ESBs were conducted under these conditions. i.e. Using 1,000 requests for the 20,40,80,160 and 320 concurrency levels, and 200 requests for the 1280 and 2560 concurrency levels. The level used for 640 concurrency is not documented. The results for the http://esbperformance.org benchmark used 10,000 requests for the 20 and 40 concurrency levels, 1,000 requests for the 80, 160, 320 concurrency levels and 10 requests for the 640, 1280 and 2560 concurrency levels - as the original benchmark was developed to test 8 open source ESBs, of which some took an extremely longer time to compete the larger tests with huge levels of concurrency and load.

    • However, the WSO2 team placed the results for the WSO2 ESB 4.6.0 taken ONLY with 1000 and 200 requests for each concurrency level, along with the results for the Mule CE ESB, Talend SE ESB and the UltraESB directly off the results published under Round 6.

Selection 001
  • Leaving the matter of possible response corruption with the WSO2 ESB for payloads larger than 16K aside, this presentation would cause confusion to an average user reading the table title, and finding non-compliant and incomparable results within the table.

  • Here again, the WSO2 summary table and graphs include WRONG averages for the Mule CE ESB (See highlighted Cells and compare with the Round 6 or the Google Document of Round 6.5) The errors causes a disadvantage to the Mule ESB and should be rectified.

  • For this table, the WSO2 team acknowledges "We also publish the existing numbers for Mule and Talend. We did not rerun the Mule or Talend tests using n=200", implying that the Table 1 results for Mule and Talend included re-executed results. But ALL results published by the WSO2 team for all non-WSO2 ESB tests only seems to re-publish the same results published in the Round 6.

Selection 005

Back to Home Page