Table of Contents
1. Prologue
2. List of Tested Web Application Scanners
3. Benchmark Overview & Assessment Criteria
4. Test I – The More The Merrier – Counting Audit Features
5. Test II – To the Victor Go the Spoils – SQL Injection
6. Test III – I Fight (For) the Users – Reflected XSS
7. Test IV – Knowledge is Power - Feature Comparison
8. What Changed?
9. Initial Conclusions – Open Source vs. Commercial
10. Morale Issues in Commercial Product Benchmarks
11. Verifying The Benchmark Results
12. Notifications and Clarifications
13. List of Tested Scanners
14. Source, License and Technical Details of Tested Scanners
15. Comparison of Active Vulnerability Detection Features
16. Comparison of Complementary Scanning Features
17. Comparison of Usability and Coverage Features
18. Comparison of Connection and Authentication Features
19. Comparison of Advanced Features
20. Detailed Results: Reflected XSS Detection Accuracy
21. Detailed Results: SQL Injection Detection Accuracy
22. Drilldown – Error Based SQL Injection Detection
23. Drilldown – Blind & Time Based SQL Injection Detection
24. Technical Benchmark Conclusions – Vendors & Users
25. So What Now?
26. Recommended Reading List: Scanner Benchmarks
27. Thank-You Note
28. Frequently Asked Questions
29. Appendix A – Assessing Web Application Scanners
30. Appendix B – A List of Tools Not Included In the Test
31. Appendix C – WAVSEP Scan Logs
32. Appendix D – Scanners with Abnormal Behavior
I've
always been curious about it… from the first moment I executed a
commercial scanner, almost seven years ago, to the day I started
performing this research. Although manual penetration testing has always
been the main focus of the test, most of us use automated tools to
easily detect "low hanging fruit" exposures, increase the coverage when
testing large scale applications in limited timeframes and even to
double check locations that were manually tested. The questions always
pops up, in every penetration test in which these tools are used…
"Is
it any good?", "Is it better than…" and "Can I rely on it to…" are
questions that every pen-tester asks himself whenever he hits the scan
button.
Well, curiosity is a strange beast…
it can drive you to wander and search, consume all your time in a search
for obscure solutions.
So recently, because
of curiosity, I decided that I want to find out for myself, and invest
whatever resources necessary to solve this mystery once and for all.
Although
I can hardly state that all my questions were answered, I can
definitely sate your curiosity for the moment, by sharing insights,
interesting facts, useful information and even some surprises, all
derived from my latest research which is focused on the subject of
commercial & open source web application scanners.
This research covers the latest versions of 12 commercial web application scanners and 48 free & open source web application scanners, while comparing the following aspects of these tools:
· Number & Type of Vulnerability Detection Features
· SQL Injection Detection Accuracy
· Reflected Cross Site Scripting Detection Accuracy
· General & Special Scanning Features
Although
my previous research included similar information, I regretted one
thing after it was published; I did not present the information in a
format that was useful to the common reader. In fact, as I found out
later, many readers skipped the actual content, and focused on sections
of the article that were actually a side effect of the main research.
As a result, the following article will focus on presenting the information in a simple comprehendible graphical format, while still providing the detailed research information to those interested… and there's a lot of new information to be shared
– knowledge that can aid pen-testers in choosing the right tools,
managers in budget related decisions, and visionaries, in properly
reading the map;
But before you read
the statistics and insights presented in this report, and reach a
conclusion as to which tool is the "best", it is crucial that you read Appendix A - Section 29,
which explains the complexity of assessing the overall quality of web
application scanners… As you're about to find out, this question cannot
be answered so easily… at least not yet.
…
So without any further delay, let's focus on the information you seek, and discuss the insights and conclusions later.
The following commercial scanners were included in the benchmark:
· IBM Rational AppScan v8.0.03 - iFix Version (IBM)
· WebInspect v9.10.78.0, SecureBase 4.05.99 (HP)
· Hailstorm Professional v6.5-5267(Cenzic)
· Acunetix WVS v7.0-20110608 (Acunetix)
· NTOSpider v 5.4.098 (NT Objectives)
· Netsparker v2.0.0.0 (Mavituna Security)
· Burp Suite v1.3.09 (Portswigger)
· Sandcat v4.2.4.0 (Syhunt)
· ParosPro v1.9.12 (Milescan)
· JSky v3.5.1-905 (NoSec)
· WebCruiser v2.5.0 EE (Janus Security)
· Nessus v4.41-15078 (Tenable Network Security) – Only the Web Application Scanning Features
The following new free & open source scanners were included in the benchmark:
VEGA 1.0 beta (Subgraph), Safe3WVS v9.2 FE (Safe3 Network Center), N-Stalker 2012 Free Edition v7.1.1.106 (N-Stalker), DSSS (Damn Simple SQLi Scanner) v0.1h, SandcatCS v4.2.3.0
The updated versions of the following free & open source scanners were re-tested in the benchmark:
Zed Attack Proxy (ZAP) v1.3.0, sqlmap v0.9-rev4209 (SVN), W3AF 1.1-rev4350 (SVN), Watobo v0.9.7-rev544, Acunetix Free Edition v7.0-20110711, Netsparker Community Edition v1.7.2.13, WebSecurify v0.8, WebCruiser v2.4.2 FE (corrections), arachni v0.2.4 / v0.3, XSSer v1.5-1, Skipfish 2.02b, aidSQL 02062011
The results were compared to those of unmaintained scanners tested in the original benchmark:
Andiparos v1.0.6, ProxyStrike v2.2, Wapiti v2.2.1, Paros Proxy v3.2.13, PowerFuzzer v1.0, Grendel Scan v1.0, Oedipus v1.8.1, Scrawler v1.0, Sandcat Free Edition v4.0.0.1, JSKY Free Edition v1.0.0, N-Stalker 2009 Free Edition v7.0.0.223, UWSS (Uber Web Security Scanner) v0.0.2, Grabber v0.1, WebScarab v20100820, Mini MySqlat0r v0.5, WSTool v0.14001, crawlfish v0.92, Gamja v1.6, iScan v0.1, LoverBoy v1.0, openAcunetix v0.1, ScreamingCSS v1.02, Secubat v0.5, SQID (SQL Injection Digger) v0.3, SQLiX v1.0, VulnDetector v0.0.2, Web Injection Scanner (WIS) v0.4, Xcobra v0.2, XSSploit v0.5, XSSS v0.40, Priamos v1.0
For the full list of commercial & open source tools that were not tested in this benchmark, refer to Appendix B - Section 30.
The benchmark focused on testing commercial & open source tools that are able to detect
(and not necessarily exploit) security vulnerabilities on a wide range
of URLs, and thus, each tool tested was required to support the
following features:
· The ability to detect Reflected XSS and/or SQL Injection vulnerabilities.
· The
ability to scan multiple URLs at once (using either a crawler/spider
feature, URL/Log file parsing feature or a built-in proxy).
· The ability to control and limit the scan to internal or external host (domain/IP).
The testing procedure of all the tools included the following phases:
· The scanners were all tested against the latest version of WAVSEP
(v1.0.3), a benchmarking platform designed to assess the detection
accuracy of web application scanners. The purpose of WAVSEP’s test cases
is to provide a scale for understanding which detection barriers each
scanning tool can bypass, and which vulnerability variations can be
detected by each tool. The various scanners were tested against the
following test cases (GET and POST attack vectors):
o 66 test cases that were vulnerable to Reflected Cross Site Scripting attacks.
o 80 test cases that contained Error Disclosing SQL Injection exposures.
o 46 test cases that contained Blind SQL Injection exposures.
o 10 test cases that were vulnerable to Time Based SQL Injection attacks.
o 7 different categories of false positive RXSS vulnerabilities.
o 10 different categories of false positive SQLi vulnerabilities.
· In
order to ensure the result consistency, the directory of each exposure
sub category was individually scanned multiple times using various
configurations.
· The
features of each scanner were documented and compared, according to
documentation, configuration, plugins and information received from the
vendor.
· In
order to ensure that the detection features of each scanner were truly
effective, most of the scanners were tested against an additional
benchmarking application that was prone to the same vulnerable test
cases as the WAVSEP platform, but had a different design, slightly
different behavior and different entry point format (currently nicknamed
"bullshit").
The results of the main test
categories are presented within three graphs (commercial graph, free
& open source graph, unified graph), and the detailed information of
each test is presented in a dedicated report.
So, now that you've learned about the testing process, it's time for the results…
The first assessment criterion was the number of audit features each tool supports.
Reasoning:
An automated tool can't detect an exposure that it can't recognize (at
least not directly, and not without manual analysis), and therefore, the
number of audit features will affect the amount of exposures that the
tool will be able to detect (assuming the audit features are implemented properly, that vulnerable entry points will be detected and that the tool will manage to scan the vulnerable input vectors).
For the purpose of the benchmark, an audit feature was defined as a common generic application-level scanning
feature, supporting the detection of exposures which could be used to
attack the tested web application, gain access to sensitive assets or
attack legitimate clients.
The definition of
the assessment criterion rules out product specific exposures and
infrastructure related vulnerabilities, while unique and extremely rare
features were documented and presented in a different section of this
research, and were not taken into account when calculating the results.
Exposures that were specific to Flash/Applet/Silverlight and Web
Services Assessment were treated in the same manner.
The Number of Audit Features in Web Application Scanners – Commercial Tools
The Number of Audit Features in Web Application Scanners - Free & Open Source Tools
The Number of Audit Features in Web Application Scanners – Unified List
The
second assessment criterion was the detection accuracy of SQL
Injection, one of the most famous exposures and the most commonly
implemented attack vector in web application scanners.
Reasoning:
a scanner that is not accurate enough will miss many exposures, and
classify non-vulnerable entry points as vulnerable. This test aims to
assess how good is each tool at detecting SQL Injection exposures in a supported input vector, which is located in a known entry point, without any restrictions that can prevent the tool from operating properly.
The
evaluation was performed on an application that uses MySQL 5.5.x as its
data repository, and thus, will reflect the detection accuracy of the
tool when scanning similar data repositories.
Result Chart Glossary
Note that the BLUE bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories
detected by the tool (which may result in more instances then what the
bar actually presents, when compared to the detection accuracy bar).
The SQL Injection Detection Accuracy of Web Application Scanners – Open Source & Free Tools
The SQL Injection Detection Accuracy of Web Application Scanners – Unified List
It's obvious that testing one feature is
not enough; ideally, the detection accuracy of all audit features should
be assessed, but in the meantime, we will settle for one more…
The
third assessment criterion was the detection accuracy of Reflected
Cross Site Scripting, a common exposure which is the 2nd most commonly
implemented feature in web application scanners.
Result Chart Glossary
Note that the BLUE bar represents the vulnerable test case detection accuracy, while the RED bar represents false positive categories
detected by the tool (which may result in more instances then what the
bar actually presents, when compared to the detection accuracy bar).
The Reflected XSS Detection Accuracy of Web Application Scanners – Commercial Tools
The Reflected XSS Detection Accuracy of Web Application Scanners – Open Source & Free Tools
The Reflected XSS Detection Accuracy of Web Application Scanners – Unified List
The list of tools tested in this benchmark is organized within the following reports:
Additional
information was gathered during the benchmark, including information
related to the different features of the various scanners. These details
are organized in the following reports, and might prove useful when
searching for tools for specific tasks or tests:
For detailed information on the accuracy assessment results, refer to the following reports:
· The Scan Logs (describing the executing process and configuration of each scanner)
Additional
information on the scan logs, the list of untested tools and the
abnormal behaviors of scanners can be found in the article appendix
sections (at the end of the article):
Appendix B - Section 30 – an appendix that contains a list of tools that were not included in the benchmark
Appendix D - Section 32 – an appendix that describes scanners with abnormal behavior
Since the latest benchmark, many open source & commercial tools added new features and improved their detection accuracy.
The following list presents a summary of changes in the detection accuracy of free & open source tools that were tested in the previous benchmark:
· arachni – a dramatic improvement in the detection accuracy of Reflected XSS exposures, and a dramatic improvement in the detection accuracy of SQL Injection exposures (verified on mysql).
· sqlmap – a dramatic improvement in the detection accuracy of SQL Injection exposures (verified on mysql).
· Acunetix Free Edition – a major improvement in the detection accuracy of RXSS exposures.
· Watobo – a major improvement in the detection accuracy of SQL Injection exposures (verified on mysql).
· N-Stalker 2009 FE vs. 2012 FE –
although this tool is a very similar to N-Stalker 2009 FE, the
surprising discovery I had was that the detection accuracy of N-Stalker
2012 is very different – it detects only a quarter of what N-Stalker
2009 used to detect. Assuming this result is not related to a bug in the
product or in my testing procedure, it means that the newer free
version is significantly less effective than the previous free
version, at least at detecting reflected XSS. A legitimate business
decision, true, but surprising nevertheless.
· aidSQL – a major improvement in the detection accuracy of SQL Injection exposures (verified on mysql).
· XSSer – a major improvement in the detection accuracy of Reflected XSS exposures, even though the results were not consistent.
· Skipfish –
a slight improvement in the detection accuracy of RXSS exposures (it is
currently unknown if the RXSS detection improvement is related to
changes in code or to the enhanced testing method), and a slight
decrease in the detection accuracy of SQLi exposures (might be related
to the different testing environment and the different method used to
count the results).
· WebSecurify
– a slight improvement in the detection accuracy of RXSS exposures (it
is currently unknown if the RXSS detection improvement is related to
changes in code or to the enhanced testing method).
· Zed Attack Proxy (ZAP) – Identical results. Any minor difference was probably caused due to the testing environment, configuration or minor issues.
· W3AF
– slight improvement in the detection accuracy of RXSS exposures and
slight decrease in the detection accuracy of SQL Injection exposures.
· Netsparker Community Edition – Identical results. Any minor difference was probably caused due to the testing environment, configuration or minor issues.
· WebCruiser Free Edition – a minor decrease in accuracy, due to fixing documentation mistakes from the previous benchmark.
The following section presents my own personal opinions on the results of the benchmark, and since opinions are beliefs, which are affected by emotions and circumstances, you are entitled to your own.
After testing over 48 open source scanners multiple times, and after comparing the results and experiences to the ones I had after testing 12 commercial ones (and those are just the ones that I reported), I have reached the following conclusions:
· As
far as accuracy & features, the distance between open source tools
and commercial tools is not as big as it used to be – tools such as
sqlmap, arachni, wapiti, w3af and others are slowly closing the gap.
That being said, there still is a significant difference in stability
& false positives, in which most open source tools tend to have more
false positives and be relatively unstable when compared to most
commercial tools.
· Some
open source tools, even the most accurate ones, are relatively
difficult to install & use, and still require fine-tuning in various
fields. In my opinion, a non-technical QA engineer will have
difficulties using these tools, and as a general rule, I'll recommend
using them if your background is relatively technical (consultant,
developer, etc). For all the rest, especially non-technical enterprise
employees that prefer a decent usage experience - stick with commercial
produces, with their free versions, or with the simple variations of
open source tools.
· If
you are using a commercial product, it's best to merge the use of tools
with a wide variety of features with tools with high detection
accuracy. It's possible to use tools that have relatively good scores
in both of these aspects, or use a tool with a wide variety of features
with another tool that has enhanced accuracy. Yes, this statement can be
interpreted to using combinations of commercial and open source tools,
and even to using two different commercial tools, so that one tool will
complete the other. Budget? Take a look at the cost diversity of the
tools, before you make any harsh decisions; I promise you'll be
surprised.
While
testing the various commercial tools, I have dealt with certain moral
issues that I want to share. Many vendors that were aware of this
research enhanced their tools in preparation for it, an action I
respect, and consider a positive step. Since the testing platform that
included most of the tests was available online, preparing for the
benchmark was a relatively easy task for any vendor that invested the
resources.
So, is the benchmark fair for vendors that couldn’t improve their tools due to various circumstances?
The
testing process of a commercial tool is usually much more complicated
and restrictive then testing a free or open source tool; it is necessary
to contact the vendor to obtain an evaluation license, and the latest
version of the tool (a process that can take several weeks), the
evaluation licenses are usually restricted to a short evaluation
timeframe (usually two weeks), and thus, updating and testing the tools
in a future date can become a hassle (since some of the process will
have to be performed all over again)… but why am I telling you all this?
Simply,
because I believe that the relevance of the test I performed for
vendors that provided me an extended evaluation period and access to new
builds was better; for example, a few days before the latest benchmark,
immediately after testing the latest versions of two major vendors, I
decided to rescan the platform using the latest versions of all the
commercial tools I have, to ensure that the benchmark will be published
with the most updated results.
I verified
that JSky, WebCruiser, and ParosPro didn't release a new version, tested
the latest versions of AppScan, WebInspect, Acunetix, Netsparker,
Sandcat and Nessus.
It made sense that
builds that were tested a short while ago (like NTO spider), were also
something that I can rely on to represent the currently state of the
tool (I hopeJ).
I
did however, have a problem with Cenzic and Burp, two of the first
tools that I tested in this research, since my evaluation licenses were
no longer valid, and I couldn't update the tools to their latest version
and scan again, and since I had 2-3 days until the end of my planned
schedule, with a million tasks pending, I simply couldn't afford going
through the evaluation request phase again, with all of my good
intentions, and the willingness to sacrifice my spare time to ensure
these tools will be properly represented.
Even
though the results of some updated products (WebInspect and Nessus
being the best examples) didn't change at all, even after I updated them
to the latest version, who could say that the result would be the same
for other vendors?
So, were the terms unfair to burp and cenzic?
Finally,
several vendors sent me multiple versions and builds – they all wanted
to succeed, a legitimate desire of any human being, even more so for a
firm. Apart from the time each test took (a price I was willing to pay
at the time), the new builds were sent even in the last day of the
benchmark, and afterwards.
But if the new
version is better, and more accurate, by limiting the amount of tests I
perform for a given vendor, isn't that against what I'm trying to
achieve in all my benchmarks, which is to release the benchmark with the
most updated results, for all the tools?
(For
example, Syhunt, a vendor that did very well in the last benchmark,
sent me its final build (2.4.2.5) a day after the deadline, and included
a time based SQL injection detection feature in that build, but since I
couldn't afford the time anymore, I couldn't test the build, so, am I
really reflecting the tool's current state in the most accurate manner?
But if I would have tested this build, shouldn't I provide the rest of
the vendors the same opportunity?)
One of the questions I believe I can answer – the accuracy question.
A
benchmark is, in a very real sense, a competition, and since I take the
scientific approach, I believe that the results are absolute, at least
for the subject that is being tested. Since I'm not claiming that one
tool is "better" than the other in every category, only at the tested
criterion, I believe that priorities do not matter – as long as the test
really reflects the current situation, the result is reliable.
I leave the interpretation of the results to the reader, at least until I'll cover enough aspects of the tools.
As
for the rest of the open issues, I don't have good answers for all of
those questions, and although I did my very best in this benchmark, and
even exceeded what I thought I'm capable of, I will probably have to
think of some solutions that will make the next benchmark terms equal,
even for scanners that were tested in the beginning of the benchmark,
and less time consuming then it has been.
The
results of the benchmark can be verified by replicating the scan
methods described in the scan log of each scanner, and by testing the
scanner against WAVSEP v1.0.3.
The latest
version of WAVSEP can be downloaded from the web site of project WAVSEP
(binary/source code distributions, installation instructions and the
test case description are provided in the web site download section):
How to use the results of the benchmark
The
results of the benchmark clearly show how accurate each tool is in
detecting the tested vulnerabilities (SQL Injection (MySQL ) &
Reflected Cross Site Scripting), as long as it is able to locate and
scan the vulnerable entry points. The results might even help to estimate
how accurate each tool is in detecting related vulnerabilities (for
example SQL Injection vulnerabilities which are based on other
databases), and determine which exposure instances cannot be detected by certain tools;
However,
currently, the results DO NOT evaluate the overall quality of the tool,
since they don't include detailed information on the
subjects such as crawling quality, technology support, scoping,
profiling, stability in extreme cases, tolerance, detection accuracy of
other exposures and so on... at least NOT YET.
I
highly recommend reading the detailed results, and the appendix that
deals with web application scanner evaluation, before getting to any
conclusions.
Additional Notifications
During
the benchmark, I have reported bugs that had a major affect on the
detection accuracy to several commercial and open source vendors:
· A
performance improvement feature in NTOSpider caused it not to scan many
POST XSS test cases, and thus, the detection accuracy of RXSS POST test
cases was significantly smaller then the RXSS GET detection accuracy.
The vendor was notified on this issue, and provided me with a special
build that overrides this feature (at least until they will have a
feature in the GUI to disable this mechanism).
· A
similar performance improvement feature in Netsparker caused the same
issue, however, the feature could have been disabled in Netsparker, and
thus, with the support of the relevant personal at Netsparker, I was
able to work around the problem.
· A
few bugs in arachni prevented the blind sql injection diff plugins from
working properly. I notified the author, Tasos, on the issue, and he
quickly fixed the issue and released the new version.
· Acunetix
RXSS detection result was updated to match the results of the latest
free version (one version above the tested commercial version) - Since
the tested commercial version of Acunetix was older than the tested free
version (20110608 vs 20110711), and since the results of the upgraded
free version were actually better than the older commercial version I
had tested, I changed the results of the commercial tool to match the
ones of the new free version (from 22 to 24 in both the GET & POST
RXSS detection scores).
· Changes
in results from the previous benchmark might be attributed to enhanced
scanning features, and/or to enhanced stability in the test environment
& method (connection pool, limited & divided scope).
The
following report contains the list of scanners tested in this
benchmark, and provides information on the tested version, the tool's
vendor/author and the current status of product:
The following report compares the licenses, development technology and sources (home page) of the various scanners:
The following reports compare the active vulnerability detection features (audit features) of the various tested scanners:
First Report:
Second Report:
Aside from the Count
column (which represents the total amount of audit features supported
by the tool, not including complementary features such as web server
scanning and passive analysis), each column in the report represents an
audit feature. The description of each column is presented in the
following glossary table:
Title
|
Description
|
SQL
|
Error Dependant SQL Injection
|
BSQL
|
Blind & Intentional Time Delay SQL Injection
|
RXSS
|
Reflected Cross Site Scripting
|
PXSS
|
Persistent / Stored Cross Site Scripting
|
DXSS
|
DOM XSS
|
Redirect
|
External Redirect / Phishing via Redirection
|
Bck
|
Backup File Detection
|
Auth
|
Authentication Bypass
|
CRLF
|
CRLF Injection / Response Splitting
|
LDAP
|
LDAP Injection
|
XPath
|
X-Path Injection
|
MX
|
MX / SMTP / IMAP Injection
|
Session Test
|
Session Identifier Complexity Analysis
|
SSI
|
Server Side Include
|
RFI-LFI
|
Directory
Traversal / Remote File Include / Local File Include (Will be
separated into different categories in future benchmarks)
|
Cmd
|
Command Injection / OS Command Injection
|
Buffer
|
Buffer Overflow
|
CSRF
|
Cross Site Request Forgery
|
A-Dos
|
Application Denial of Service / RegEx DoS
|
Privilege Escalation
|
Privilege Escalation Between Different Roles and User Accounts (Resources / Features)
|
Format String
|
Format String Injection
|
File Upload
|
File Upload / Insecure File Upload
|
Code Injection
|
Code Injection (ASP/JSP/PHP/Perl/etc)
|
XML Injection
|
XML / SOAP Injection
|
Source Code Disclosure
|
Source Code Disclosure Detection
|
Integer Overflow
|
Integer Overflow
|
Padding Oracle
|
Padding Oracle Detection / Exploitation
|
Session Fixation
|
Session Fixation
|
The following report compares complementary vulnerability detection features in the tested scanners:
In order to clarify what each column in the report table means, use the following glossary table:
Title
|
Description
|
Web Server Hardening
|
Features
that are able to detect Insecure HTTP method support (PUT, Trace,
WebDAV), directory listing, robots and cross-domain files information
disclosure, version specific vulnerabilities, etc.
|
CGI Scanning
|
Default files, common vulnerable applications, etc.
|
Passive Analysis
|
Security
tests that don’t require any actual attacks, and are instead based on
information gathering and analysis of responses, including
certificate & cipher tests, content & metadata analysis, mime
type analysis, autocomplete detection, insecure transmission of
credentials, google hacking, etc.
|
File / Dir Enumeration
|
Directory and file enumeration features
|
Notes and Other Features
|
Uncommon or Unique features
|
The following report compares the usability, coverage and scan initiation features of the tested scanners:
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(1%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final3.pdf
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(1%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final3.pdf
In order to clarify what each column in the report table means, use the following glossary table:
Title
|
Possible Values
|
Configuration & Usage Scale
|
Very Simple - GUI + Wizard
Simple - GUI with simple options, Command line with scan configuration file or simple options
Complex - GUI with numerous options, Command line with multiple options
Very Complex - Manual scanning feature dependencies, multiple configuration requirements
|
Stability Scale
|
Very Stable - Rarely crashes, Never gets stuck
Stable - Rarely crashes, Gets stuck only in extreme scenarios
Unstable - Crashes every once in a while, Freezes on a consistent basis
Fragile – Freezes or Crashes on a consistent basis, Fails performing the operation in many cases
|
Performance Scale
|
Very Fast - Fast implementation with limited amount of scanning tasks
Fast - Fast implementation with plenty of scanning tasks
Slow - Slow implementation with limited amount of scanning tasks
Very Slow - Slow implementation with plenty of scanning tasks
|
The following report compares the connection, authentication and scan control features of the tested scanners:
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(2%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final.pdf
http://sectooladdict-benchmarks.googlecode.com/files/List%20of%20Scanner%20Features%20(2%20of%203)%20-%20WAVSEP%20Benchmark%202011%20-%20Final.pdf
The following report contains a comparison of advanced and uncommon scanner features:
The
results of the Reflected Cross Site Scripting (RXSS) accuracy
assessment are presented in the following report (the graphical results
representation is provided in the beginning of the article):
The
results that were taken into account only include vulnerable pages
linked from the index-xss.jsp index page (the RXSS-GET and/or RXSS-POST
directories, in addition to the RXSS-FalsePositive directory). XSS
Vulnerable entry points in the SQL injection vulnerable pages were not
taken into account, since they don’t necessarily represent a unique
scenario (or at least, not until the “layered vulnerabilities” scenario
will be implemented).
The
overall results of the SQL Injection accuracy assessment are presented
in the following report (the graphical results representation is
provided in the beginning of the article):
The results of the Error-Based SQL Injection benchmark are presented in the following report:
The results of the Blind & Time based SQL Injection benchmarks are presented in the following report:
While
testing the various tools in this benchmark, I dealt with numerous
difficulties, witnessed many inconsistent results and noticed that some
tools had difficulties optimizing their scanning features on the tested
platform. I had however, dealt with the other end of the spectrum, and
used tools the easily overcome most of the difficulties related to
detecting the tested vulnerabilities.
I'd
like to share my conclusions, with the authors and vendors that are
interested in improving their tools, and aren't offended by someone
that's giving advice.
As far as detecting
SQL injection exposures, I have noticed that tools that implemented the
following features, detected more exposures, had less false positives,
and provided consistent results:
· Time
based SQL Injection detection vectors are very effective. They are,
however, very tricky to use, since they might be affected by other
attacks that are simultaneously executed, or affect the detection of
other tests in the same manner. As a result, I recommended to all the
authors & vendors to implement the following behavior in their
product: execute time based attacks at the end of the scanning
process, after all the rest of the tests are done, while using a reduced
number of concurrent connections. Executing other tests in parallel might have a negative effect on the detection accuracy.
· Since
the upper/lower timeout values used to determine whether or not a time
based exploit was successful may change due to various circumstances, I
recommend calculating and re-calculating this value during the scan, and
revalidating each time based result independently, after verifying that
the timeout values are "normal".
· Implement
various payloads of time based attacks – the sleep method is not enough
to cover all the databases, and not even all the versions of mysql.
So
now that we have all those statistics, it's time to analyze them
properly, and see which conclusions we can get to. Since this process
will take time, I have to set some priorities;
In the near future, I will try to achieve the following goals:
· Find a better way
to present the vast amount of information on web application scanners
features & accuracy. I have been struggling with this issue for
almost 2 years, but I think that I finally found a solution that will
make the information more useful for the common reader… stay tuned for
updates.
· Provide
recommendations for the best current method of executing free &
open source web application scanners; the most useful combinations, and
the tiny tweaks required to achieve the best results.
· Release the new test case categories of WAVSEP that I have been working on. Yep, help needed.
In addition to the short term goals, the following long term goals will still have a high priority:
· Improve the testing framework (WAVSEP); add additional test cases and additional security vulnerabilities.
· Perform additional benchmarks on the framework, and on a consistent basis. I previously aimed for one major benchmark per year, but that formula might completely change, if I'll manage to work a few issues around a new initiative I have in this field.
· Integration with external frameworks for assessing crawling capabilities, technology support, etc.
· Publish
the results of tests against sample vulnerable web applications, so
that some sort of feedback on other types of exposures will be available
(until other types of vulnerabilities will be implemented in the
framework), as well as features such as authentication support,
crawling, etc.
· Gradually
develop a framework for testing additional related features, such as
authentication support, malformed HTML tolerance, abnormal response
support, etc.
I hope that this content will
help the various vendors improve their tools, help pen-testers choose
the right tool for each task, and in addition, help create some method
of testing the numerous tools out there.
Since I have already been in the situation in the past, then I know what's coming… so I apologize in advance for any delays in my responses in the next few weeks.
The
following resources include additional information on previous
benchmarks, comparisons and assessments in the field of web application
vulnerability scanners:
· "Webapp Scanner Review: Acunetix versus Netsparker", by Mark Baldwin (commercial scanner comparison, April 2011)
· "Effectiveness of Automated Application Penetration Testing Tools", by Alexandre Miguel Ferreira and Harald Kleppe (commercial & freeware scanner comparison, February 2011)
· "Web Application Scanners Accuracy Assessment", the predecessor of the current benchmark, by Shay Chen (a comparison of 43 free & open source scanners, December 2010)
· "State of the Art: Automated Black-Box Web Application Vulnerability Testing" (Original Paper), by Jason Bau, Elie Bursztein, Divij Gupta, John Mitchell (May 2010) – original paper
· "Analyzing the Accuracy and Time Costs of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, February 2010)
· "Why Johnny Can’t Pentest: An Analysis of Black-box Web Vulnerability Scanners", by Adam Doup´e, Marco Cova, Giovanni Vigna (commercial & open source scanner comparison, 2010)
· "Web Vulnerability Scanner Evaluation", by AnantaSec (commercial scanner comparison, January 2009)
· "Analyzing the Effectiveness and Coverage of Web Application Security Scanners", by Larry Suto (commercial scanners comparison, October 2007)
· "Rolling Review: Web App Scanners Still Have Trouble with Ajax", by Jordan Wiens (commercial scanners comparison, October 2007)
· "Web Application Vulnerability Scanners – a Benchmark"
, by Andreas Wiegenstein, Frederik Weidemann, Dr. Markus Schumacher,
Sebastian Schinzel (Anonymous scanners comparison, October 2006)
During
the research described in this article, I have received help from quite
a few individuals and resources, and I’d like to take the opportunity
to thank them all.
For all the open source tool authors that assisted me in testing the various tools in unreasonable late night hours, for the kind souls that helped me obtain evaluation licenses for commercial products, for the QA, Support and Development teams
of commercial vendors, which saved me tons of time and helped me
overcome obstacles, and for the various individuals that helped me
contact these vendors.
I would also like to
continue my tradition, and thank all the information sources that helped
me gather the list of scanners over the years, including (but not
limited to) information security sources such as PenTestIT (http://www.pentestit.com/), Security Sh3ll (http://security-sh3ll.blogspot.com/), NETpeas Toolswatch Service (http://www.vulnerabilitydatabase.com/toolswatch/), Darknet (http://www.darknet.org.uk/), Packet Storm (http://packetstormsecurity.org/), Help Net Security (http://www.net-security.org/), Astalavista (http://www.astalavista.com/), Google (of course) and many others.
I
hope that the conclusions, ideas, information and payloads presented in
this research (and the benchmarks and tools that will follow) will be
for the benefit of all vendors, open source community projects and
commercial vendors alike.
Q: 60 web application scanners is an awful lot, how many scanners exist?
A: Assuming you are using the same definition for a scanner that I do, then I'm currently aware of 95 web application scanners that can claim to support the detection of generic application level exposures, in a safe an controllable manner, and in multiple URLs
(48 free & open source scanners that were tested, 12 commercial
scanners that were tested, 25 open source scanners that I didn't test
yet, and 10 commercial scanners that slipped my grip). And yes, I'm
planning on testing them all.
Q: Why RXSS and SQLi again? Will the benchmarks ever include additional exposures?
A:
Yes, they will. In fact, I'm already working on test case categories of
two different exposures, and will use them both for my next research.
Besides, the last benchmark focused on free & open source products,
and I couldn't help myself, I had to test them against each other.
Q: I can't wait for the next research, what can I do to speed things up?
A:
I'm currently looking for methods to speed up the processes related to
these researches, so if you're willing to help, contact me.
Q: What’s with the titles that contain cheesy movie quotes?
A: That's just it - I happen to like cheese. Let's see you coming up with better titles at 4AM.
Although this benchmark contains tons of information, and is very useful
as a decision assisting tool, the content within it cannot be used to
calculate the accurate ROI (return of investment) of each web
application scanner. Furthermore, it can't predict on its own exactly
how good will the results of each scanner be in every situation (but it can predict what won't be detected), since there are additional factors that need to be taken into account.
The
results in this benchmark could serve as an accurate evaluation formula
only if the scanner will be used to scan a technology that it supports,
pages that it can detect (manual crawling features can be used to
overcome many obstacles in this case), and locations without
technological barriers that it cannot handle (for example, web
application firewalls or anti-CSRF tokens).
In
order for us to truly assess the full capability of web application
vulnerability scanners, the following features must be tested:
· The entry point coverage of the web application scanner must be as high as possible; meaning, the tool must be able to locate and properly activate
(or be manually "taught") all the application entry points (e.g. static
& dynamic pages, in-page events, services, filters, etc).
Vulnerabilities in an entry point that wasn't located will not be
detected. The WIVET project can provide additional information on coverage and support.
· The
attack vector coverage of the web application scanner – does it support
input vectors such as GET / POST / Cookie parameters? HTTP headers?
Parameter Names? Ajax Parameters? Serialized Objects? Each input vector
that is not supported means exposures that won't be detected, regardless
of the tool's accuracy level (assuming the unsupported attack/input
vector is vulnerable).
· The
scanner must be able to handle the technological barriers implemented
in the application, ranging from authentication mechanism to automated
access prevention mechanisms such as CAPTCHAs and anti-CSRF tokens.
· The
scanner must be able to handle any application specific problems it
encounters, including malformed HTML (tolerance), stability issues and
other limitations. If the best scanner in the world will consistently
cause the application to crash in a couple of seconds, then it's not
useful for assessing the security of that application (in matters that
don't relate to DoS attacks).
· The number of features (active & passive) implemented in the web application vulnerability scanner.
· The accuracy level of each and every plugin supported by the web application vulnerability scanner.
That
being said, it's crucial to remember that even in the most ideal
scenario, with the absence of human intelligence, scanners can't detect
all the instances of exposures that are truly logical – meaning, are
related to specific business logic, and thus, are not perceived as an
issue by an entity that can't understand the business logic.
But the sheer complexity of the issue does not mean
that we shouldn't start somewhere, and that's exactly what I'm trying
to do in my benchmarks – create a scientific, accurate foundation for
obtaining that goal, with enough investment, over time.
Note
that my explanations describe only a portion of the actual tests that
should be performed, and I'm sharing them only to emphasize the true
complexity of the core issue; I haven't touched stability, bugs, and a
lot of other subjects, which may affect the overall result you get.
Additional information on evaluation standards for web application vulnerability scanners can be found in the WASC Web Application Security Scanner Evaluation Criteria web site.
The following commercial web application vulnerability scanners were not included in
the benchmark, since I didn't manage to get an evaluation version until
the article publication deadline, or in the case of one scanner
(mcafee), had problems with the evaluation version that I didn't manage
to work out until the benchmark's deadline:
Commercial Scanners not included in this benchmark
· N-Stalker Commercial Edition (N-Stalker)
· McAfee Vulnerability Manager (McAfee / Foundstone)
· Retina Web Application Scanner (eEye Digital Security)
· WebApp360 (NCircle)
· Core Impact Pro Web Application Scanning Features (Core Impact)
· Parasoft Web Application Scanning Features (a.k.a WebKing, by Parasoft)
· MatriXay Web Application Scanner (DBAppSecurity)
· Falcove (BuyServers ltd, currently Unmaintained)
· Safe3WVS 9.2 Commercial Edition (Safe3 Network Center)
The following open source web application vulnerability scanners were not included in the benchmark, mainly due to time restrictions, but will be included in future benchmarks:
Open Source Scanners not included in this benchmark
· Kayra
· 2gwvs
· Webarmy
· Mopset 2
· GNUCitizen JAVASCRIPT XSS SCANNER - since WebSecurify, a more advanced tool from the same vendor is already tested in the benchmark.
· Vulnerability Scanner 1.0 (by cmiN, RST) - since the source code contained traces for remotely downloaded RFI lists from locations that do not exist anymore.
The
benchmark focused on web application scanners that are able to detect
either Reflected XSS or SQL Injection vulnerabilities, can be locally
installed, and are also able to scan multiple URLs in the same
execution.
As a result, the test did not include the following types of tools:
· Online Scanning Services –
Online applications that remotely scan applications, including (but not
limited to) Appscan On Demand (IBM), Click To Secure, QualysGuard Web
Application Scanning (Qualys), Sentinel (WhiteHat), Veracode (Veracode),
VUPEN Web Application Security Scanner (VUPEN Security), WebInspect
(online service - HP), WebScanService (Elanize KG), Gamascan (GAMASEC –
currently offline), Cloud Penetrator (Secpoint), Zero Day Scan, DomXSS
Scanner, etc.
· Scanners without RXSS / SQLi detection features:
o Dominator (Firefox Plugin)
o fimap
o lfimap
o lfi-rfi2
o LFI/RFI Checker (astalavista)
o etc
· Passive Scanners (response analysis without verification):
o Watcher (Fiddler Plugin by Casaba Security)
o Skavanger (OWASP)
o Pantera (OWASP)
o Ratproxy (Google)
o CAT The Manual Application Proxy (Context)
o etc
· Scanners of specific products or services (CMS scanners, Web Services Scanners, etc):
o WSDigger
o Sprajax
o ScanAjax
o Joomscan
o wpscan
o Joomlascan
o Joomsq
o WPSqli
o etc
· Web Application Scanning Tools which are using Dynamic Runtime Analysis:
o PuzlBox (the free version was removed from the web site, and is now sold as a commercial product named PHP Vulnerability Hunter)
o Inspathx
o etc
· Uncontrollable Scanners
- scanners that can’t be controlled or restricted to scan a single
site, since they either receive the list of URLs to scan from Google
Dork, or continue and scan external sites that are linked to the tested
site. This list currently includes the following tools (and might
include more):
o Darkjumper 5.8 (scans additional external hosts that are linked to the given tested host)
o Bako's SQL Injection Scanner 2.2 (only tests sites from a google dork)
o Serverchk (only tests sites from a google dork)
o XSS Scanner by Xylitol (only tests sites from a google dork)
o Hexjector by hkhexon – also falls into other categories
o d0rk3r by b4ltazar
o etc
· Deprecated Scanners
- incomplete tools that were not maintained for a very long time. This
list currently includes the following tools (and might include more):
o Wpoison
(development stopped in 2003, the new official version was never
released, although the 2002 development version can be obtained by
manually composing the sourceforge URL which does not appear in the web
site- http://sourceforge.net/projects/wpoison/files/ )
o etc
· De facto Fuzzers
– tools that scan applications in a similar way to a scanner, but where
the scanner attempts to conclude whether or not the application or is
vulnerable (according to some sort of “intelligent” set of rules), the
fuzzer simply collects abnormal responses to various inputs and
behaviors, leaving the task of concluding to the human user.
o Lilith 0.4c/0.6a (both
versions 0.4c and 0.6a were tested, and although the tool seems to be a
scanner at first glimpse, it doesn’t perform any intelligent analysis
on the results).
o Spike proxy 1.48
(although the tool has XSS and SQLi scan features, it acts like a
fuzzer more then it acts like a scanner – it sends payloads of partial
XSS and SQLi, and does not verify that the context of the returned
output is sufficient for execution or that the error presented by the
server is related to a database syntax injection, leaving the
verification task for the user).
· Fuzzers
– scanning tools that lack the independent ability to conclude whether a
given response represents a vulnerable location, by using some sort of
verification method (this category includes tools such as JBroFuzz,
Firefuzzer, Proxmon, st4lk3r, etc). Fuzzers that had at least one type
of exposure that was verified were included in the benchmark
(Powerfuzzer).
· CGI Scanners:
vulnerability scanners that focus on detecting hardening flaws and
version specific hazards in web infrastructures (Nikto, Wikto, WHCC,
st4lk3r, N-Stealth, etc)
· Single URL Vulnerability Scanners - scanners that can only scan one URL at a time, or can only scan information from a google dork (uncontrollable).
o Havij (by itsecteam.com)
o Hexjector (by hkhexon)
o Simple XSS Fuzzer [SiXFu] (by www.EvilFingers.com)
o Mysqloit (by muhaimindz)
o PHP Fuzzer (by RoMeO from DarkMindZ)
o SQLi-Scanner (by Valentin Hoebel)
o Etc.
· Vulnerability Detection Assisting Tools – tools that aid in discovering a vulnerability, but do not detect the vulnerability themselves; for example:
o Exploit-Me Suite (XSS-Me, SQL Inject-Me, Access-Me)
o XSSRays (chrome Addon)
· Exploiters - tools
that can exploit vulnerabilities but have no independent ability to
automatically detect vulnerabilities on a large scale. Examples:
o MultiInjector
o XSS-Proxy-Scanner
o Pangolin
o FGInjector
o Absinth
o Safe3 SQL Injector (an exploitation tool with scanning features (pentest mode) that are not available in the free version).
o etc
· Exceptional Cases
o SecurityQA Toolbar (iSec)
– various lists and rumors include this tool in the collection of
free/open-source vulnerability scanners, but I wasn’t able to obtain it
from the vendor’s web site, or from any other legitimate source, so I’m
not really sure it fits the “free to use” category.
The
execution logs, installation steps and configuration used while
scanning with the various tools are all described in the following
report:
The
following appendix was published in my previous benchmark, but I
decided to include in the current benchmark, mainly because I didn't
manage to invest the time to get to the bottom of these mysteries, and
didn't see any information on someone else that did.
During the current & previous
assessment, parts of the source code of open source scanners and the
HTTP communication of some of the scanners was analyzed; some tools
behaved in an abnormal manner that should be reported:
· Priamos IP Address Lookup
– The tool Priamos attempts to access “whatismyip.com” (or some similar
site) whenever a scan is initiated (verified by channeling the
communication through Burp proxy). This behavior might derive from a
trojan horse that infected the content on the project web site, so I’m
not jumping to any conclusions just yet.
· VulnerabilityScanner Remote RFI List Retrieval (listed in the scanners that were not tested, appendix A, developed by a group called RST, http://pastebin.com/f3c267935)
– In the source code of the tool VulnerabilityScanner (a python
script), I found traces for remote access to external web sites for
obtaining RFI lists (might be used to refer the user to external URLs
listed in the list). I could not verify the purpose of this feature
since I didn’t manage to activate the tool (yet); in theory, this could
be a legitimate list update feature, but since all the lists the tool
uses are hardcoded, I didn’t understand the purpose of the feature.
Again, I’m not jumping to any conclusions; this feature might be
related to the tool’s initial design, which was not fully implemented
due to various considerations.
Although I did not
verify that any of these features is malicious in nature, these
features and behaviors might be abused to compromise the security of the
tester’s workstation (or to incriminate him in malicious actions), and
thus, require additional investigation to disqualify this possibility.
COPIED FROM ONE OF OUR BLOG MEMBER.
THANKS TO HIM AND HIS CONTRIBUTION.
Vijay
3 comments:
McAfee Endpoint Encryption for PC or Mac–major enhancements have reduced overhead to near Security Testing zero performance impact on SSD drives using Intel’s AES-NI technology. Also integration with Intel AMT technology and McAfee ePO Deep Command allows for secure and remote management of powered off or disabled devices including devices using Windows 8.
hi vijays, can give link download mcafee endpoint encryption software cliett and manager for pc windows thanks (Ibas)
Sorry mate #Ibas,
I think, We need login credentials to access licensed McAfee products to download..
I don't have .. and hopefully #Rock Den can give more details..
Post a Comment