2017 OWASP Top 10 for PHP Developers Part 4: XML External Entities (XXE)

Ever processed XML files in your web application? If you did, you probably parsed their contents. And if you parsed their contents, your web application might have been vulnerable to an attack known as XML External Entities (XXE).

What is XXE?

An XXE attack is a security vulnerability often found in XML parsers. An XML External Entities attack often exploits the XML parsing functionality in a web application making it parse data of interest to an attacker.

How does it work?

XML is a metalanguage. In other words, it is a language used for describing other languages. An entity in an XML document is used to map a name to a value.
A basic XML document could look something like this:

<!–?xml version=”1.0″ ?–>
<Credentials>
<Username>Username</Username>
<Password>Password</Password>
</Credentials>

The first line in the document is called a Document Type Definition (DTD). Document Type Definition defines the structure of the document.

In the same XML document, we can also use an XML entity.

Here’s the same document with it in use:

<!–?xml version=”1.0″ ?–>
<!DOCTYPE replace [<!ENTITY entity “Password”> ]>
<Credentials>
<Username>Username</Username>
<Password>&entity;</Password>
</Credentials>

This way, we defined an entity named, you guessed it, “entity” and accessed it.

That being said, the document could also look a bit differently:

<!–?xml version=”1.0″ ?–>
<!DOCTYPE replace [<!ENTITY entity SYSTEM “file:///etc/passwd”> ]>
<Credentials>
<Username>Username</Username>
<Password>&entity;</Password>
</Credentials>

You probably noticed an issue with a file looking like this, but let me emphasize anyway: this file was created by an attacker. How do I know this? Look closely. Do you notice anything odd after the “replace” word?

<!–?xml version=”1.0″ ?–>
<!DOCTYPE replace [<!ENTITY entity SYSTEM “file:///etc/passwd“> ]>
<Credentials>
<Username>Username</Username>
<Password>&entity;</Password>
</Credentials>

Such a file aims to read user account information from a file found in Linux operating systems – the /etc/passwd.

If your web application is vulnerable, the XML document is processed and does not check its contents properly, the contents of the /etc/passwd file will be embedded within the document and that is never a good thing because this way, an attacker could get access to a list of user accounts belonging to the system.

A billion laughs

A billion laughs attack is an attack that takes advantage of how XML parsers are processing data. Such an attack occurs when a document has multiple external entities defined and each of those entities consists of ten instances of the previous entity. When a parser tries to resolve all of the entities defined, an application will consume a lot of memory causing the application to crash.

Remediation

In order to patch such a vulnerability, a developer could utilize white-list or black-list input validation, perform source code review ensuring that no vulnerable Application Programming Interfaces (APIs) are used or disable Document Type Definitions completely.