Recon2012Mon 02 July 2012 by jj
For the third year we had the chance to participate to REcon 2012.
Here is a summary of the most intelligible talks.
The case for semantics-based methods in reverse engineering
Rolf Rolles started the conference on his predilection subject, semantic analysis of code.
A lengthy introduction reminded us that semantic analysis means working on the code based on the effects of the instructions and instruction sequences, as opposed to pattern-based techniques (which work on the form of the code).
He then detailed as an exemple how to recover a switch table from assembly generated by Visual Studio.
In another exemple he used abstract interpretation to compare two code snippets, as a way to validate a deobfuscation algorithm : the obfuscated and deobfuscated versions should have the exact same semantics. This way he was able to hilight bugs in some commercial obfuscators: transforming an 'inc' instruction into an 'add' is incorrect as the effects on the eflags are not strictly equivalent (inc does not update the carry flag).
Overall an interesting introductory level talk. (slides)
Extraordinary string based attacks: smashing the atom
The second speaker, Tarjei Mandt (aka @kernelpool) gave an impressive talk on Windows vulnerabilities based on Atoms (MS12-041).
Atoms are a way for processes to share string values. Windows can store them in different tables: for the process only (application table), for the current session (user table) and globally (system table). If two processes use the same atom value, the kernel will give them access to the same atom, and use a reference counter to check when the resource can be freed. If too many references to an atom are made and the counter overflows, the atom is (correctly) set to the 'pinned' state, and can not be freed (as if it had an infinite number of references) except by destroying the whole table. When Windows wants to create a new atom, it will reuse the slot of the last deleted atom if possible. Classic uses for atoms are the Windows class names, or the clipboard formats. A more unexpected user of atoms is the system of hooks that implements the Windows UI themes : an User atom stores the path to the dll that is mapped into every graphical process. Thankfully, a program cannot directly reference the User atoms, as the API exported by win32k.sys does not allow this.
MS12-041 is a bug in RegisterClass (a win32k function), where if you pass a string as parameter, the kernel will correctly transform it into an User atom and register it ; however if you pass an atom number directly to the function (any number in the range 0xc000-0xffff is accepted as such a handle), it will use that atom without incrementing its refcount. When a program has registered a class name, it is allowed to release it, which will decrement its refcount.
That means that by repeatedly calling RegisterClass with an atom handle and DestroyClass, you can decrease the refcount, and thus free, an arbitrary User atom. Now we can free the User atom used to store the path to uxtheme.dll, re-allocate it with a path to a dll we control, and if we can start a new privileged process in the current session we will achieve code execution at an increased privilege level. The two obvious privileged process to target are csrss and winlogon, however a check in the kernel prevents using csrss. Tarjei gave a demo running on win8, where the target process is logonui, a new process started by winlogon whenever it needs to display information to the user, and runs as System.
On the mitigation side, in win 7 you can use job restrictions (JOB_OBJECT_UILIMIT_GLOBALATOMS) to restrict access to the System atom table, but you cant do anything to prevent User or Application atom access.
Win8 appcontainers application cannot delete standard atoms anywhere unless the atom specifically allows this kind of lowbox access. Tarjei retries its demo, this time running inside an appcontainer, and now it fails. (slides)
Injecting custom payload into signed Windows executables
The next talk was Igor Glucksmann from Avast, discussing the recently disclosed vulnerability affecting the Authenticode signature scheme used by Microsoft to digitally sign PE modules.
The signature can be stored in the PE itself or in a so-called catalog. When stored in the PE, the signature needs to ignore itself and a few related fields to be able to generate the signature in the first place. The specification says that the data hashed for the signature consists of the full PE file after removing the PE checksum field (4 bytes), the security directory (8 bytes, holds the file offset and the size of the digital signature) and the signature itself, as pointed by the directory. The specification says that PE data can exist in the file after the signature (it is included in the hash), but the implementation on Windows XP only uses data before the signature offset. In any case, it is forbidden to store the signature in the file before any non-empty PE section.
So, an attacker can change any of these 3 blocks without invalidating the digital signature. Especially, he can change the size of the signature block. The signature verification code will use only the data it needs, and whatever space is left after that can be used to store anything, and won't be included in the signature.
This is the basis of the first attack. Some executable files store data after the standard parts of the binary, in what is called the overlay. Moreover, some of these executable do not hardcode where this part starts in the file, but scans the data for a given pattern. This is common among self-extractible compressed files. Self-extracting schemes can scan for the signature from the beginning of the file or from the end. If the scan starts from the end, the attacker can grow the segment reserved for the signature (that comes at the very end of the file), and append his own compressed data inside. When the executable runs, the code will scan for the pattern, and find the one in the attacker-forged signature block first. This is a practical attack that impacts, for exemple, Adobe Flash installer. As an added bonus, these installer files usually run elevated.
If the installer scans for the pattern forward, the attacker can carry the same attack by moving the signature at the beginning of the overlay, however this will invalidate the signature on XP. Some installers will scan forward, but will scan for two patterns: a generic one, and a localized one, matching the current locale. As an exemple, the author gives a demo of the Symantec antivirus installer, which is a signed 7zip autoextractible. He inserted in the signature block patterns matching all the most prevalent locales, and is able this way to actually launch an installer from his own company, Avast, from the original (and signed) Symantec installer.
These attacks rely on indirect means to achieve code execution. However a more direct attack is possible: by moving the signature block to the very beginning of the PE file, you can overwrite the dword at offset 0x3C, which dictates where the PE header is stored. If you combine that with an expansion of the signature block, you can craft the PE executable to have its header inside the attacker-controlled segment ; at this point you control the full PE section list and the PE entrypoint. You need to keep the original header and all file content intact after that, to keep the signature verification code happy. The specification forbids the signature segment to come before any non-empty PE section, so you can simply say that the file has no section at all, or one covering only the first few bytes. This is enough to store a shellcode, and achieve direct code execution. For more stealth, the shellcode could be made to manually map the original sections at their intended address, and let the original program run seemingly unmodified. This very clever hack requires to move the signature block around, so it will only work on windows vista or newer and fail on XP.
All these attacks work even if the file is signed through the catalog, as the signature verification code is the same in both cases, and will ignore the signature and related fields stored in the PE file itself.
However the author noticed that these flaws did not apply to digitally signed drivers, presumably because the kernel signature verification code is more strict and because drivers do not usually use overlays.
These issues were fixed in ms12-024 by adding the following features:
- All versions of Windows will ignore any data stored after the signature block, WinXP-style.
- Additionally, all versions of Windows will ignore signatures containing certain byte sequences, used in the most common auto-extraction schemes.
These fixes may seem weak, but microsoft cannot do much more without invalidating all previously-generated file signatures, which is clearly out of question.
Backside optical analysis hardware of ICs
Dmitry Nedospasov describes his setup to realize backside optical analysis using custom components, like an astronomer infrared captor and a standard microscope. For less than 20k$, approximately a 10th of the price of professional setups, he is able to carry Backside optical analysis of integrated circuits.
This technique is based on a physical property of the transistors, that once in a while will emit a photon when changing state, if i remember correctly. This happens roughly once every 1 million transitions. The 'backside' part means that this happens through the silicon substrate, without needing to decap the chip ; so it stays functionnal during the acquisition process. Using very long exposure times (hours), it is possible to take a picture of the running chip, and the bright spots will reveal where the transistor that were very active during this period are located on the circuit board. Coupled with very specific pieces of code running, for exemple a simple loop that will read the value of one register, you can then visualize where on the chip you will find the transistors responsible for implementing this register.
This can be very useful when you want to carry fault-injection techniques: now you know precisely where to point your laser to in order to affect a specific component.
The talk was very interesting, with lots of fascinating pictures. (slides)
Modern static security checking of C/C++ programs
Julien Vanegue from Microsoft tells us how they try to use static analysis everyday. Their approach is based on the framework HAVOC/Boogie, which works on C++ code with annotations. This tool works with z3 as SAT backend, and ensures that the code meets the pre and postconditions described in the annotations. It scales pretty well, and the usual run on 6M lines of code takes approx. 24h on a high-end machine (24 cores, heaps of ram...).
The tool has no false negatives, which means that it will never miss a flaw in the code (according to the annotations), but it can generate false positives. Those happen if the annotations are too broad or the code too complex, in this case the programmer has to intervene, usually to add more invariants or detail the ones already there.
Another interesting feature of their setup is Houdini, an algorithm able to work on a big set of functions, to remove redundant annotations. It can lead to improved run time, and better error messages. It also helps to fix a number of false positives by adding the minimum set of new annotations on the root cause of the problem. Julien finished by noting that the future of static checking will be to try to automatically generate invariants for the code.
Facedancer USB: Exploiting the magic school bus
The last talk of the day is Travis Goodspeed, assisted by Sergey Bratus for the vocals, on an (as usual) awesome new piece of hardware, named Facedancer.
It consists of a minimal board with 2 USB connectors, on for the controller side and one for the target side, and comes with a python framework that allows you to write fake USB devices, including all the very low-level protocols. This can be seen as similar to existing devices, like the teensy, but much more practical to work with, as you dont have to unplug and reflash the device every time you want to try something new ; just change your python code on the fly in your favorite editor and the device behavior updates immediately. Another advantage is that the 'virtual device' code runs on your PC, with all the capabilities that comes with it, compared to being limited to the constrained ressources of a standalone embedded device. This way, it makes a wonderful tool to prototype payloads, that you can later polish and embed in a more lightweight device.
The only feature that is handled directly by the hardware is a form of acknowledgement required by the USB protocol, and needs by design to be realtime and is incompatible with a full round-trip to the controlling PC. Any other kind of USB interaction is forwarded to the host to be handled by the python code. The device can also emulate a Host and send bogus data to an USB device, but in this case more timing problems can arise. It is also possible to self-fuzz a machine, using the same computer for controlling the Facedancer and as a target, but in this setup some OSes may have problems, especially MacOS that has some kind of Big Lock for USB and cannot do anything on the USB bus while enumerating new devices.
The rationale for creating this device is that people usually trust that anything can be plugged in their PC and do not realize that most drivers were written by the constructor usually with little security in mind, and furthermore said developper had only the real device to work with, and so probably did a poor job of validating the messages sent by the hardware, and let's not talk about the hardware sending totally unexpected data.
Travis went on to describe some of the numerous bugs they were able to trigger. The USB device is responsible for sending a textual description of itself to its host, and using some special names like %n gave interesting results, especially on userland systems like X11 and skype. Another interesting behavior is that some drivers ask their device for a string, get its length to allocate a buffer, and then re-ask the string to copy it. If the facedancer sends two different replies, fun ensues.
Very entertaining talk, and the hardware and controlling python framework is open-sourced as usual. (facedancer)
Predicting english keywords from java bytecodes using machine learning
Pablo Duboue starts the 2nd day with an interesting approach to auto-documenting unknown code: he will crawl the web for java sources, compile the functions, and associate the javadoc comments on the function to the generated bytecode for the function. When working an unknown class, he will then scan for known bytecode patterns and associate the comments to it. When working on big classes, unknown functions that call many known functions can be annotated with the most common words appearing in the comments of these known subfunctions, and so recusively. This can give a quick hint as to what a top-level function will relate to.
A further interesting refinement would be to flag methods calling subfunctions where the comments TODO or FIXME are present, as possibly flawed.
Win8 user mode drivers
Alex Ionescu, from CrowdStrike, presents his research on a feature of Win8, usermode drivers. UserMode Driver Framework v1.11 is Win8 only for now, but should be backported to Vista and Win7 soon. It adds support for interruption handling in usermode drivers, which is very important for general purpose drivers ; without this feature the old framework was mostly useful for bus-style devices like usb or 1394 oriented towards packet reception and emission. The usermode driver model is based on COM and coded in C++. A kernel component, the redirector, proxies requests from the devices to the usermode through ALPCs. A usermode driver has the same privilege level as a service, it need not be digitally signed. The UMD threads are hosted in one or more wudfhost.exe process, similar to svchost for the services.
In UMDF1.11, a driver can now access memory-mapped IO hardware registers. The access is possible through a system call, but also directly through the thread virtual memory using a specific switch in the driver description file (.inf): RegisteraccessUsingUserModeMapping, and a directive:
[ LogConf ] MemConfig=<len>@<addr>-<addrend>
However the kernel restricts access to the hardware-allocated pages. When developping a software driver, the only accessible page is 0xC0000 which maps to the video ram bios. It is possible to trigger the kernel to run code at this address when changing resolution with a VGA card, when displaying a BSOD, or during the shutdown sequence when the screen shows 'it is now safe to power off'. This is not very convenient to exploit. Before Vista, the bios code was run in VM8086 mode. Now windows uses an in-kernel emulator (HAL x86 emulator, named XM) unless disabled by a registry key. The emulator checks the addresses accessed by the code, and restrict them to BIOS-mapped memory. Without this restriction it was possible to access the kernel code through the physical memory, but the emulator seems to do a good job at filtering. However it does not restrict direct port I/O access, except for PCI and BIOS CMOS which are virtualized. An attack vector could then be direct access to the hard disk through PIO, but this is hard and could conflict with the normal DMA accesses made by the OS.
Alex concludes by admitting he found no reliable way to simply escalate privileges from a usermode driver to kernel, and that it seems pretty safe to deploy. It would actually improve the security of the machine to deploy e.g. printer drivers this way instead of loading them in the kernel. As a final remark he could only recommand that windows engineers mandate a digital signature for UM drivers.
Reverse engineering of binary programs for custom virtual machines
Alexander Chernov and Katerina Troshina recount how they reversed a firmware for an unknown architecture, with only basic knowledge of which part of the firmware were data and which were code. They usually do firmware reverse-engineering of embedded devices, using their own decompiler: SmartDec, but this time they had no CPU datasheets to work with at all. Trying hundreds of existing CPUs yielded nothing, so they resorted to statistic analysis.
The goal was to identify a ret instruction based on a few assumptions: it would occur often, and have a fixed binary representation (as it takes no argument). Assuming the code is organized in subroutines ending with a ret, there should be 'call' instructions, whose argument would be the absolute address of the target subroutine, ie the address right after a ret. By trying different possible encodings for the address argument, they were able to find one candidate that was much more frequent than the others. The opcode had the form XXYYZZTT, with the address stored in TTYY. Assuming the jmp instruction would have a similar layout, and furthermore hinting that some of the subroutines would end with a jmp instead of a ret, and cross-referencing that with the results of the call analysis, they were able to identify the absolute jmp instruction. At this point they had the rough structure of the binary, with most subroutines and the links between them.
Assuming relative jumps were intra-function and had an 'offset' encoding similar to the absolute jmp, they were able to identify conditionnal and unconditionnal instructions, and so recovered intra-function code flow. Then the cmp instructions are found right before the conditional jumps, giving away the binary format for the arithmetic instructions, and these can be distinguished by the value of their immediate argument: and-style instructions would most often take a value with only one bit set. The last remaining piece is to match
set reg, addr load modify store
patterns to identify memory accessing instructions.
Using the reversed instruction format, they were then able to produce useful annotations so that their decompiler can work on the binary, and finally recover all of the original code.
This talk presented an interesting approach to a very unusual problem, which may be useful when working on old embedded devices.
Debugging baseband stacks
Ralf-Philipp Weinmann describe the world of GSM devices, where a distinct 'baseband' processor is responsible for managing the radio equipement, in parallel with the 'application' processor which runs the user-interface and applications of the phone. The baseband is a very closed universe, with full-blown operating systems which were almost never looked at from a security perspective. These operating systems implement complete 3GPP stacks, which are huge, complex and interact a lot. This makes the code very unfriendly to static analysers, as data packets hop from one layer to the other.
He then cites qcombbdbg, published by Guillaume from our lab. This is a debugger compatible with gdb, designed to work on the REX real-time operating system, used on qualcomm DSPs. It represents an unique open-source alternative to vendor-provided diagnostic tools, and is user-friendlier than hardware debugging (eg JTAG). However newer versions of qualcomm basebands are based on the OKL4 micro-kernel architecture, where REX runs as an user-mode task. He is in the process of porting the debugger to run on these new devices, and will try to release the code before BlackHat.
Designing a minimal operating system to emulate x86 code snippets
Elias Bachaalany explains the internals of his custom virtual machine for malware analysis. He chose to use a full virtual machine over a simpler code emulator because code emulators often have wrong semantics for unusual instructions, and are slower than a VM with JIT code compilation. With a virtual machine, he can prepare a custom bootloader/kernel that will only load the specified binary and execute a given address. He can either load libraries or emulate them in the host.
With this setup he can very quickly run a given sample, with all the benefits of a full virtual machine coupled with an almost-instantaneous boot time. Check the slides for more technical details.
The 2nd ended with the traditionnal Lightning Talks.
We learned that offensivecomputing.net got renamed to openmalware.org, and changed hands.
One guy volunteered for a custom reimplementation of the CloudRE server, allowing to host your private server in your own network (now published: cloudnein).
Someone is building an antidebug archive, all within a single executable, similar to the corkami project.(antiRE)
And finally we had an epic chiptunes concert by Battle Lava, on two original modded GameBoys.
Reversing Dwarf Fortress for fun and ruby
I started the 3rd and last day with a retrospective on the hacking of the game Dwarf Fortress with dfhack, and how we manage the adaptation of the tools when a new game version is released. DFHack now includes a ruby interpreter, so that you can now fully script the game without the hassle of a full C++ recompilation. (slides)
Doing it for the lulz: anonymous and a radical politics of dissent
Gabriella Coleman gave a rehearsal of a TEDtalk: the sociological analysis of the group known under the name 'anonymous', where she was able to identify many inner currents, but all united under the motto of never putting oneself forward.
Nice summary of the events of the last few years.
Recognition of binary patterns by morphological analysis
Aurelien Thierry, from INRIA, presents a tool to identify library function statically linked in a binary.
In the learning phase, the tool will disassemble a given library using a custom disassembler based on beaengine and extract the control flow graph for it. All instructions are discarded, except for jcc/call/ret. Simple jumps are removed too, but the destination for calls is retained as an additional information. The graph is stored in a database, and can be read by an IDA plugin. The scanning of a library is pretty fast, for exemple it maps all openssl in 12s, for around 30k nodes and a pretty good coverage.
The IDA plugin will iterate over every function, and try to match the control flow graph with the database. The general problem of graph homomorphism is known to be NP-complete, but in this case he can apply additional constraints: all graphs have a root node, and node children are ordered. In this case, there exist polynomial matching algorithms. To further improve matching time, and handle compiler inlining, the plugin matches limited sub-graphs, of around 15 nodes. When a large number of matches is found between a function in the idb and one in the database, matches are tried with increasingly larger subgraphs.
One limitation of this scheme is that it does not work on functions with a small number of blocks, but otherwise it works pretty well as an alternative to FLIRT. The scheme is still quite sensitive to the compilation options used for the initial corpus of libraries. The program may evolve as (yet another) cloud-based community reversing service.
Cryptographic function identification in obfuscated binary programs
Joan Calvet tries to find and categorize cryptographic algorithms in unknown binaries, including obfuscated code.
The usual way to do that would be to look at functions in the program, and search for loops and magic constants. However with obfuscated code, you cannot even trust basic function detection, as call instructions may be used for obfuscation and never return. You cannot either trust loop recognition, as with code flattening artificial loops are created for code dispatch. The solution chosen is to trace the whole program with pin, and detect loops as repetition of the same sequence of instructions in the trace. This also solves the problem of loop unrolling.
When a loop is detected, the algorithm tries to extract the loop inputs and outputs. When one loop is detected, it is then matched through brute-force against all ciphers, as implemented in a python library. The python implementation is tested for all algorithms with all possible input combination and then tests if any of the output matches. This straightforward approach works well for simple algorithms, like TEA. It can be puzzling to test the script against known malware identified as using TEA and see it fail ; in fact that revealed a bug (or intentional modification) of the original algorithm in the malware where some instructions were re-ordered.
This method could probably work on other fields, like compression or hashing.
A very interesting and pragmatic approach.
Dynamic binary instrumentation frameworks: I know you're spying on me
Francisco Falcon and Nahuel Riva, from Core Security gave an exhaustive list of ways to detect when a binary is run under the Pin DBI framework, using the pin library:
- file mappings in memory,
- process module enumeration,
- timing attacks (esp. on LoadLibrary),
- checking the value of eip in seen by fstenv, etc.
Aaron Portnoy and Brandon Edwards present a very interesting IDA plugin, intended to present an API to work with the idb as a real SQL database. For that, the first time you run it, it will create an SQLite database and import most information from the idb in it. This takes some time but is only necessary one time.
The plugin adds a lot of misc features to IDA, for exemple:
- a graphical view of the navigation history, with separation for inter-function navigation and intra-function ;
- ability to add 'marks' in the IDB that are scoped (eg you only see global marks and local marks related to the current function) ;
- ability to display blocks of the current function with those of subfunctions with customized depth ;
- ability to manually solve indirect jumps (ctrl+[ on a 'call ebx', then ctrl+] on the destination, will create a code path if the destination is code, or a jmp table if the destination is data) ;
- can display a path between two nodes of the graph (binnavi-style) ;
- all queries are journalized and can be pushed to other IDAs over the network or to 'agents' ;
- a python script is given as exemple for a vtrace client (a debugger in python) to colorize blocks taken in a trace run...
The plugin is fully documented, its free, get it, use it. (toolbag)
GPUs for mobile malware
Jared Carlson describes the various primitives offered for a malware that choses to execute on the GPU of a mobile device. It can access the host memory, and use various algorithms available off-the-shelf to eg locate and patch a specific piece of code on the host memory. On Iphone, GPU shader code is one of the last pieces of code that is not digitally signed and could prove an interesting option for malware authors.
Compiler internals: exception and RTTI
To conclude Recon2012, Igor Skochinsky described methodically the meta-data added by the compilers to C++ code to handle RunTime Type Information, and exception structures in x86, both 32 and 64-bit, for Linux and Windows.
Very good slides.
And another wonderful Recon session ends !