I Think It's Time for a Break

No, I'm not that lazy. I'm talking about breakpoints. :)

Right now, I'm focusing on how to implement breakpoints. First I will talk about how breakpoints are implemented in traditional debuggers. Then I will consider what this means for Parrot.

For those who aren't too familiar with debuggers, allow me to explain. A breakpoint is an event that tells the debugger to stop/pause at a particular place during program execution. The true power of breakpoints are their ability to allow for the examination of a program's execution environment and state while execution is suspended. They are a standard feature of nearly all debuggers which is why I wish to address them first.

 
THE BREAKPOINT DATA STRUCTURE

Traditionally, breakpoints are represented at two levels: the logical level and the physical level.

  • Logical breakpoint: Associated with a point in the source code. Represents a breakpoint set by the user.
  • Physical breakpoint: Associated with the actual executable machine instruction.

Physical breakpoints are the points where the actual breakpoint instruction gets written. They also preserve the original instruction that will be replaced if the breakpoint is removed. Logical breakpoints are at a higher level of abstraction. They are used to represent a breakpoint as fully resolved or not yet resolved (e.g. set in a module that is not yet loaded).

Conditional breakpoints may or may not actually stop when the breakpoint is hit. The associated conditions are maintained at the logical level.

The traditional approach is to have two separate structures for logical and physical breakpoints. The node in each list contains an address that is a link between the two. This creates a downward and upward mapping between logical and physical breakpoints. The downward mapping (logical to physical) takes place when creating, destroying, or modifying a breakpoint. This results in a physical address that can be used as a lookup when searching the list of physical breakpoints. The upward mapping (physical to logical) takes place when a breakpoint "fires" when the debuggee process executes the breakpoint instruction. This address can be used as a lookup to search the list of logical breakpoints for breakpoints that map to that specific address. At this point, any related conditions can be evaluated to determine whether or not the user should be alerted of the suspension of the debuggee process.

Because it is possible for several logical breakpoints to map into a single physical breakpoint, it is necessary for physical breakpoints to be aware of all the logical breakpoints that refer to a single physical address. This is accomplished without much difficulty using reference counting on each node in the list of physical breakpoints. A node is not removed from the list (and the original instruction restored) until its reference count reaches zero.

Here is a (very poor) ASCII representation of a "many-to-one" mapping:

+------------------------------------------------+
|      Logical                    Physical       |
|                                                |
|  foo.rb, line 32 -,                            |
|                    \                           |
|                     \                          |
|                      \      +--------------+   |
|                       +---> |              |   |
|  foo.rb, line 32 ---------> |  0x6675636b  |   |
|                       +---> |              |   |
|                      /      +--------------+   |
|                     /                          |
|                    /                           |
|  foo.rb, line 32 -'                            |
|                                                |
|                             +--------------+   |
|                             |              |   |
|  foo.rb, line 63 ---------> |  0x670f39a1  |   |
|                             |              |   |
|                             +--------------+   |
|                                                |
+------------------------------------------------+

 
BREAKPOINT SETTING

This is the algorithm that is run when the user sets a breakpoint:

  1. Call symbol table agent to map filename and line number into physical address.
  2. Create logical breakpoint object with the given information.
  3. Create physical breakpoint object or increment reference count if breakpoint already exists.
  4. Insert breakpoint instruction and save original instruction at that location.

A physical breakpoint object would look like this:

typedef struct {
    int      address;
    int      ref_count;
    opcode_t instruction;
} p_breakpoint_t;

 
WHAT DOES THIS MEAN FOR PARROT?

I'm not really sure, to be honest. That's what I need some guidance with.

To begin with, Parrot does not provide an interrupt instruction (opcode, whatever) that breakpoints like this require. Is it worth creating my own interrupt opcode? How can I tell the Parrot runcore, "Freeze! Put your hands in the air!"?

Secondly, I need to be able to access the symbol table. Is there a segment in the packfile where I can read symbol information?

There a few more things to consider regarding breakpoints. In particular, breakpoint validation, temporary breakpoints, and internal breakpoints. I will go into these subject in my next post. For now, I want people to be able to digest my plans so far and would like to get some feedback on these things first before moving on to other areas of breakpoints.

debugging

As previously stated on #parrot, I think this was an excellent post. I never previously was exposed to the "general theory of debugging." I also appreciate your frank statements as to what you don't yet know.

Re: debugging

Well, I'm still relatively new to debugging techniques/theory myself. Surprisingly, there are very little resources regarding the subject. :(

Because of that, I plan to be very thorough in my documentation. Not just with a tutorial regarding basic usage but I'd also like to explain how it actually works.