An approach to PDF shielding

Wed 01 September 2010 by guillaume

In a previous article we showed how one could delve into a document's internals to look for suspicious elements (like JavaScript scripts registered to run at the document opening). This method can give a good heuristic about whether a document is malicious or not.

However while many antivirus vendors and online-scanning tools claim to handle malicious PDF detection, they often rely on a small subset of the PDF specification to perform their analysis. We can quickly view how much scanning tools are confused as we harden a document with very simple obfuscation techniques (cascaded compression filters, encryption...).

Most of the malicious documents I've seen in the wild used syntactic-based obfuscation, which is something very easy to overcome for any PDF-compliant parser. In fact, it is possible to achieve much more effective obfuscation combining higher-level PDF features.

Document encryption

First of all, a PDF document can be encrypted. Better: it can be encrypted and still be opened without requiring any user intervention. The tip is to encrypt the document contents with a null password. At the document opening, the PDF reader checks whether the inner hash matches with a null password-derived key. In this case, the document is automatically decrypted and rendered on the screen.

PDF supports RC4 encryption (40 to 128 bits keys) and AES128 (AES256 is also supported now since the Extension Level 3). Using a null password for encrypting a document is pretty useless, but a lot of PDF parsers are merely broken when dealing with such documents.

Origami supports PDF encryption up to AES128. Encrypting a whole document with a null password is very easy:

require 'origami'
include Origami

pdf = PDF.new.encrypt "",""
pdf.onDocumentOpen Action::JavaScript.new('app.alert("Hi!");')
pdf.saveas "encrypted.pdf"

or even

PDF.read("sploit.pdf").encrypt("","").saveas("sploit.encrypted.pdf")

Voilà. A quick and low-cost obfuscation technique bypassing actually most PDF analyzers.

But not all the document contents are actually encrypted. Let's look at the beginning of the file generated by the first script:

%PDF-1.4
1 0 obj
<<
    /OpenAction <<
        /S /JavaScript
        /JS (ê$9;i3SCúãÒ?'kO)
    >>
    /Pages 3 0 R
    /Type /Catalog
>>

PDF only encrypts streams and strings objects. So in this case, only the string object holding the JS script has been encrypted.Consequently, our document is still leaking useful information in clear-text. Even though leaking the document structure is not considered critical from a security point of view, in our case it is. A PDF parser can detect in no effort that this document has an /OpenAction key planning to trigger a JavaScript at the document opening. It may not be able to decrypt the script but it will know that it's there, ready to run. Suspicious. In the same idea, one could quickly grep the document searching for a /JBig2Decode name key, this name object will not be encrypted.

So we have to find a way to achieve a greater level of obscurity.

Object streams

This is another feature that is rarely handled by analyzing engines. The idea is to embed any kind of PDF objects into a stream object. The resulting stream is then called an object stream. Parsers have to be able to parse the stream contents and extract the embedded objects.

The object stream can be compressed and encrypted like any other PDF streams. So here is the point:

  • Create a single object stream in a document
  • Embed every non-stream objects in that stream
  • Encrypt the PDF

The object stream will get encrypted too, therefore hiding the whole document structure.

Here is an example:

require 'origami'
include Origami

#
# Create a new PDF encrypted object with null password
#
pdf = PDF.new.encrypt '',''

#
# Create a new Object stream (which will be compressed and encrypted)
#
objstm = ObjectStream.new.setFilter(:FlateDecode)
pdf.insert(objstm)

#
# Build a page tree and embed it into the stream.
#
pagetree = PageTreeNode.new.insert_page(0, page = Page.new)
pdf.Catalog.Pages = objstm.insert(pagetree)
objstm.insert(page)

#
# JavaScript payload.
#
page.onOpen Action::JavaScript.new("app.alert('Hello world');")

#
# Save the PDF file to disk.
#
pdf.saveas("sheltered.pdf")

I build the page tree by hand because I want it to be embedded into the object stream too. The resulting document has only 4 visible root objects: the catalog, the encryption dictionary, the xref stream, and our object stream. The page tree is encrypted in the object stream and the payload is triggered at the first page opening. I did not make use of /OpenAction because the catalog is not encrypted.

What about the catalog? Reading the specifications, nothing forbids us to push it in a stream. But as we try Adobe Reader fails to open the document. However Foxit Reader behaves properly in this case.

Now we have a quasi-totally encrypted document clearly leaking less information than ever. With such a document, any analysis tool would be forced to decrypt the object stream and parse it to have a chance to investigate.

Nested PDF documents

We can get even nastier. Combining the previous techniques, we will bury the payload deep inside a document ... which will be itself embedded into another document. PDF offers the GotoE action, allowing us to jump directly into a nested document. No JavaScript is required and it cannot be disabled in Reader.

require 'origami'
include Origami

EMBEDDEDNAME = File.basename(ARGV[0])

#
# Create a new PDF encrypted object with null password
#
pdf = PDF.new.encrypt '',''

#
# Create a new Object stream (which will be compressed and encrypted)
#
objstm = ObjectStream.new.setFilter(:FlateDecode)
pdf.insert(objstm)

#
# Build a page tree and embed it into the stream.
#
pagetree = PageTreeNode.new.insert_page(0, page = Page.new)
pdf.Catalog.Pages = objstm.insert(pagetree)
objstm.insert(page)

#
# Embed the payload document. Register it manually in the names directory.
#
file = objstm.insert(pdf.attach_file(ARGV[0], :Register => false))
pdf.Catalog.Names = objstm.insert(
  Names.new.setEmbeddedFiles(NameTreeNode.new.setNames([ EMBEDDEDNAME, file ]))
)

#
# Jump into the nested document.
#
page.onOpen Action::GoToE.new(EMBEDDEDNAME, Destination::GlobalFit.new(0))

#
# Save the PDF file to disk.
#
pdf.saveas("cocoon.pdf")

Recycling the previous encrypted document:

ruby cocoon.rb sheltered.pdf

The result is a PDF document with only five objects: catalog, encryption dictionary, xref stream, an object stream, and an embedded file stream.Once opened, here's what happens:

  1. The reader decrypts the top-level document.
  2. It retrieves the first page in the object stream.
  3. The action GotoE is triggered.
  4. The embedded (and encrypted) file stream is loaded.
  5. The reader parses this document, and decrypts it (as it is itself encrypted too).
  6. The first page is retrieved in the object stream of the embedded document.
  7. The JavaScript payload is triggered.

There is no limit on the nesting level. This is getting quite complex to follow this control flow, especially for tools that cannot handle advanced PDF features. The "cocoon" document can look totally harmless. Just running the above script with any document will make the detection probability drastically fall.

Maybe the reason why malicious PDF creators do not heavily obfuscate their documents is that it is still unnecessary. Those techniques are nevertheless easy to use with a few lines of Ruby code, and greatly effective.