Instrumenting Parrot

Having an instrumentation framework opens the doors to having many different tools that can help to diagnose problems within a piece of code. One main example of this is Valgrind. Valgrind provides an interface for making many different tools that help to diagnose and identify certain specific problems, ranging from memory leaks to multithreaded data races between threads. Furthermore, the framework is also used to provide profiling tools, such as Callgrind and Cachegrind, to determine useful information such as call graphs and execution times of functions. As such, given a good framework to work with, a whole new world of performance and error analysis tools is opened up.

In the case of Parrot, which aims to be the virtual machine for many different kinds of languages, having such a framework can assist in creating debugging tools that cater to each languages’ idiosyncrasies or be general enough to apply to all. With proper hooks in place, such a framework can also provide the ability to inspect on the inner workings of the virtual machine itself, introspection that is more in depth that what is currently offered in the various PMCs offered by Parrot.

Thus, over the summer I will be working on creating such a framework for Parrot. In general terms, I would be looking to create an interface for various user-generated tools to hook into a Parrot program. Such tools can be written in PIR, and are run in a separate interpreter from the code to instrument against. There will be three layers that I hope to be able to provide an interface for instrumenting, which are the opcode layer, PMC layer and the GC layer. Explanations for these three layers are given below:

  1. Opcode layer
    This layer allows the tools to profile and inspect the code on an opcode level. Abstractions can be made such that the tools can inspect on subroutine, class or file levels.
  2. PMC layer
    This layer allows the tools to observe the creation, deletion and accesses of the various PMCs.
  3. GC layer
    This layer allows the tools to observe the behavior of the GC, seeing when it is invoked, what gets freed and etcetera.

I intend to implement the interfaces in the order shown above and as each layer gets implemented, create tools that serve specific purposes such as call graphing, I/O monitoring and more. Tentatively, the API for the tools should look like the following:


$P0 = new ['Instrument']
$P0.'attach'('ops', 'catchall', 'sub_callback)
$P0.'finalize'('finalize_sub')
$P0.'run'(args)

In the example shown above, the ':main' subroutine of the tool will perform the required initialization, creating an Instrument instance that it then registers callbacks into before executing the code to instrument against.

During this Community Bonding period, I will be prototyping the instrumenting framework and will be looking at ways on how to insert hooks into the various PMCs and the GC subsystem in a safe manner. I would be posting more information on the project based on my prototyping efforts here and at the following blog, http://parrot.mangkok.com, and would welcome any feedback regarding this project on #parrot or through email at khairulsyamil@gmail.com.