Metasm classes (1)

Tue 28 April 2009 by jj

This post will show the basic usage of the metasm framework as a disassembler, by following step by step the disassemble.rb sample script.

First of all, you'll need to install ruby on your machine, and download the framework from http://metasm.cr0.org/

Script options

The reference script for this is samples/disassemble.rb

The first non-option argument is a binary file name, which may be followed by a sequence of entrypoints (if none is specified, the defaults for the executable are used).

You can add various options to fine-tune the behavior of the disassembler:

  • --no-data-trace will disable tracking of read/write memory accesses, this will usually result in a much faster run.
  • --no-data will inhibit the display of data sections (useful if you have a big file from which you disassemble only a small part)
  • -v or -d enable the verbose (resp. debug) modes. Verbose is recommanded most of the time.
  • -o file.asm specifies which file the dissassembled output should be written to (stdout by default)
  • -c header_filename loads the specified C file, which should contain the definition of imported functions.

Minimal C header generation

The -c option is very important, especially when dealing with Windows binaries. Without a correct definition for external functions, metasm may easily getconfused about ABI conventions and fail to properly disassemble the program.It takes a standard C file as argument, with definitions like"int MessageBoxA(int, char*, char*, int);" The metasm C parser only knows about standard types; if you use any nonstandardtypedef without proper definition the file will not load. Another sample script is able to generate this header file automaticallyfrom the binary itself and a complete compiler header set. For a PE file (a Windows '.exe' or '.dll'), you'll need to use" samples/factorize-headers-peimports.rb foo.exe <vspath> -o foo.h"Vspath is the path to the Visual Studio folder containing the VC/subdirectory (e.g. "C:/program files/microsoft visual sudio 8/")The script will then load all the C definitions from the compiler header set,and save in foo.h the ones relevant to the program by checking its import table,along with their dependencies (e.g. if the program references 'MessageBoxA',which depends on the definition of the 'DWORD' type, the script will dump both). You can specify additional functions to print, headers to parse, macros todefine from the command line; check the --help option.

Now that everything is ready, you can sit back and relax until the disassemblyis complete. This may take quite a while. On the performance side, using theruby interpreter version 1.9 reduces the process run time by a factor 3. If you enabled the verbose flag, you'll see on the console the name of newlydiscovered functions as they are found. With the debug flag, every new instruction is dumped on the screen, and everybacktracked expression is displayed.To have more feedback from the backtracker, you can specify the --debug-backtraceflag, but this one is very verbose and will also substantially slow the processdown. Use it only on small codepaths to pinpoint a particular problem.

Once done, you can navigate in the output using your favorite source editor.

For a more interactive session, use the disassemble-gtk script; it needs theruby-gtk2 libraries.

Internals

Let's dive into the script to see what's going on behind the scene.

require 'metasm'
include Metasm

The first lines loads the framework, and the second allows us to use classesin the Metasm module without having to type the 'Metasm::' prefix.

The optparse stuff is for handling the command-line options.

What comes next is more interesting:

if exename =~ /^live:(.*)/
  raise 'no such live target' if not target = OS.current.find_process($1)
  exe = Shellcode.decode(target.memory, Ia32.new)
else
  exe = AutoExe.orshellcode(Ia32.new).decode_file(exename)
end

exename is the first argument to the program, which is supposed to be the pathof an executable file. We see here that a special filename may be used, e.g."live:foobar". In this case, the script will ask the framework if a running process matchesthe pattern 'foobar', and will complain if it can't find one.If a target is found, its (live!) memory is loaded as a raw shellcode forfurther processing.

If 'exename' does not start with 'live:', it is passed to AutoExe.decode_file.

AutoExe is a special class in metasm that can identify an executable from itssignature (e.g. if it starts with "\\x7fELF" it is interpreted as an ELF file). The 'orshellcode' part specifies that if the file signature is not recognized,it should be loaded as a raw shellcode for the Ia32 architecture (without this,AutoExe would raise an exception for unknown signature and terminate thescript).

The decode_file method loads the file and interpret all its data (imported/exported symbols, relocations etc).

dasm = exe.init_disassembler

From the ExeFormat object, we generate a Disassembler, which will hold thevarious memory-mapped sections of the binary, and a cpu object matching thedata found in the exe header (if we load a MIPS PE file, the disassembler willbe loaded with a MIPS cpu object that can handle MIPS binary code).

After that we setup the Disassembler according to the commandline options, andthe disassembly starts with:

dasm.disassemble

This function call will disassemble all the codepath it can find from any ofthe specified entrypoints, and return only when all options are exhausted.

Finally the scripts dumps the disassembled data using:

dasm.dump(true)

or

dasm.dump(false)

which will dump the code and optionally the data found in the program.

The attentive reader will have noticed the decompiler part that I havedeliberately ignored: this is a very experimental feature that aims torebuild C code from the decoded opcodes; but this will be the subject of alater post. Feel free to play with it.

If you examine the GTK version of the script, you'll see that it is quitesimilar, except that '#disassemble' part is done behind the scene by the GUIto allow some interface responsiveness (you don't want to launch an interactiveGUI and wait 30mn that /bin/true is disassembled without any feedback do you?)

This was a very high-level overview of a part of the metasm framework. Next weekwe'll take a deeper look at the internals of the disassembly process.