An approach to PDF shieldingWed 01 September 2010 by guillaume
However while many antivirus vendors and online-scanning tools claim to handle malicious PDF detection, they often rely on a small subset of the PDF specification to perform their analysis. We can quickly view how much scanning tools are confused as we harden a document with very simple obfuscation techniques (cascaded compression filters, encryption...).
Most of the malicious documents I've seen in the wild used syntactic-based obfuscation, which is something very easy to overcome for any PDF-compliant parser. In fact, it is possible to achieve much more effective obfuscation combining higher-level PDF features.
First of all, a PDF document can be encrypted. Better: it can be encrypted and still be opened without requiring any user intervention. The tip is to encrypt the document contents with a null password. At the document opening, the PDF reader checks whether the inner hash matches with a null password-derived key. In this case, the document is automatically decrypted and rendered on the screen.
PDF supports RC4 encryption (40 to 128 bits keys) and AES128 (AES256 is also supported now since the Extension Level 3). Using a null password for encrypting a document is pretty useless, but a lot of PDF parsers are merely broken when dealing with such documents.
Origami supports PDF encryption up to AES128. Encrypting a whole document with a null password is very easy:
Voilà. A quick and low-cost obfuscation technique bypassing actually most PDF analyzers.
But not all the document contents are actually encrypted. Let's look at the beginning of the file generated by the first script:
So we have to find a way to achieve a greater level of obscurity.
This is another feature that is rarely handled by analyzing engines. The idea is to embed any kind of PDF objects into a stream object. The resulting stream is then called an object stream. Parsers have to be able to parse the stream contents and extract the embedded objects.
The object stream can be compressed and encrypted like any other PDF streams. So here is the point:
- Create a single object stream in a document
- Embed every non-stream objects in that stream
- Encrypt the PDF
The object stream will get encrypted too, therefore hiding the whole document structure.
Here is an example:
I build the page tree by hand because I want it to be embedded into the object stream too. The resulting document has only 4 visible root objects: the catalog, the encryption dictionary, the xref stream, and our object stream. The page tree is encrypted in the object stream and the payload is triggered at the first page opening. I did not make use of /OpenAction because the catalog is not encrypted.
What about the catalog? Reading the specifications, nothing forbids us to push it in a stream. But as we try Adobe Reader fails to open the document. However Foxit Reader behaves properly in this case.
Now we have a quasi-totally encrypted document clearly leaking less information than ever. With such a document, any analysis tool would be forced to decrypt the object stream and parse it to have a chance to investigate.
Nested PDF documents
require 'origami' include Origami EMBEDDEDNAME = File.basename(ARGV) # # Create a new PDF encrypted object with null password # pdf = PDF.new.encrypt '','' # # Create a new Object stream (which will be compressed and encrypted) # objstm = ObjectStream.new.setFilter(:FlateDecode) pdf.insert(objstm) # # Build a page tree and embed it into the stream. # pagetree = PageTreeNode.new.insert_page(0, page = Page.new) pdf.Catalog.Pages = objstm.insert(pagetree) objstm.insert(page) # # Embed the payload document. Register it manually in the names directory. # file = objstm.insert(pdf.attach_file(ARGV, :Register => false)) pdf.Catalog.Names = objstm.insert( Names.new.setEmbeddedFiles(NameTreeNode.new.setNames([ EMBEDDEDNAME, file ])) ) # # Jump into the nested document. # page.onOpen Action::GoToE.new(EMBEDDEDNAME, Destination::GlobalFit.new(0)) # # Save the PDF file to disk. # pdf.saveas("cocoon.pdf")
Recycling the previous encrypted document:
ruby cocoon.rb sheltered.pdf
The result is a PDF document with only five objects: catalog, encryption dictionary, xref stream, an object stream, and an embedded file stream.Once opened, here's what happens:
- The reader decrypts the top-level document.
- It retrieves the first page in the object stream.
- The action GotoE is triggered.
- The embedded (and encrypted) file stream is loaded.
- The reader parses this document, and decrypts it (as it is itself encrypted too).
- The first page is retrieved in the object stream of the embedded document.
There is no limit on the nesting level. This is getting quite complex to follow this control flow, especially for tools that cannot handle advanced PDF features. The "cocoon" document can look totally harmless. Just running the above script with any document will make the detection probability drastically fall.
Maybe the reason why malicious PDF creators do not heavily obfuscate their documents is that it is still unnecessary. Those techniques are nevertheless easy to use with a few lines of Ruby code, and greatly effective.