Illumos: Getting Started with MDB

This is a short introduction to MDB for the Illumos kernel.  Everything we do here is easier and more straight forward to do with DTrace, but that is normal for such a simple example.  We cover how to attach the debugger to a running kernel, set breakpoints and step through the kernel side of an ioctl() call.

To begin with, we will set up serial console for use in VirtualBox on a Windows host, and connect to it with Putty.  You can skip this part if you already have console access to your Illumos box.

Table of Contents

Configuring the Serial Console

Quick Directions for OS X

Preparing for Debugging with MDB

Introduction to MDB

Stepping Through the Kernel Code

Conclusion

Related Links

Configuring the Serial Console

While I hardly ever use Windows myself, I had the opportunity to do these experiments on a Windows host and the directions reflect that.  Big kudos to Pinchii for a quick introduction, and the Haiku team for covering this too.

We start by configuring VirtualBox to connect the COM1 serial port to a named pipe.

VirtualBox, configuring host pipe

The port mode is host pipe, the create pipe box is checked, and the port/file path is \\.\pipe\illumos.  This means the pipe is created on the localhost (the dot) with the name illumos.  Pipe names can be 256 characters, can include any (special) character but backslash and are not case sensitive according to the MSDN; they must follow this pattern.

In Putty, we select serial connection, with the serial line set to the named pipe from earlier; in this case \\.\pipe\illumos.  Leave the speed at 9600 for now1.  As soon as we boot the virtual machine, we can connect with the open button.  Be careful not to let the grub timeout expire.

Putty, selecting illumos pipe

To direct the console to the serial line, we edit the kernel command line in the grub menu.  Select the desired boot menu entry and press e to edit it.

Grub, selecting kernel command line

Then select the kernel line and press e again.

Grub, editing console device

Change console=graphics at the end of the line to console=ttya and press return.  Then use b to boot.

When the system is ready, our console session should look something like this.

Putty, ready to log in

Quick Directions for OS X

First if you don’t have it, install socat with Brew: $ brew install socat

Then configure VirtualBox exactly like above, except you need to name the pipe with a valid filename.  Something in /tmp is a good choice, in this case: /tmp/illumos

Set the console to ttya in grub.  Then the following one-liner from tonejito will do the trick with screen.

 screen `socat -d -d /tmp/illumos PTY 2>&1 | tee /tmp/x &>/dev/null & grep '.*N\ PTY\ is\ ' /tmp/x | sed -e 's/.*N\ PTY\ is\ //g'` 

Preparing for Debugging with MDB

We are going to step through some kernel code, in this case the ioctl() entry point of our own module, to watch the reference count, and the write count of a vnode as we call some file manipulation functions.

The following structure is the kernel’s internal representation of a file, a vnode.  It is the v_count, v_rdcnt and v_wrcnt fields we are going to monitor.

typedef struct vnode {
	kmutex_t	v_lock;		/* protects vnode fields */
	uint_t		v_flag;		/* vnode flags (see below) */
	uint_t		v_count;	/* reference count */
	void		*v_data;	/* private data for fs */
	struct vfs	*v_vfsp;	/* ptr to containing VFS */
	struct stdata	*v_stream;	/* associated stream */
	enum vtype	v_type;		/* vnode type */
	dev_t		v_rdev;		/* device (VCHR, VBLK) */

	/* PRIVATE FIELDS BELOW - DO NOT USE */

	struct vfs	*v_vfsmountedhere; /* ptr to vfs mounted here */
	struct vnodeops	*v_op;		/* vnode operations */
	struct page	*v_pages;	/* vnode pages list */
	struct filock	*v_filocks;	/* ptr to filock list */
	struct shrlocklist *v_shrlocks;	/* ptr to shrlock list */
	krwlock_t	v_nbllock;	/* sync for NBMAND locks */
	kcondvar_t	v_cv;		/* synchronize locking */
	void		*v_locality;	/* hook for locality info */
	struct fem_head	*v_femhead;	/* fs monitoring */
	char		*v_path;	/* cached path */
	uint_t		v_rdcnt;	/* open for read count  (VREG only) */
	uint_t		v_wrcnt;	/* open for write count (VREG only) */
	u_longlong_t	v_mmap_read;	/* mmap read count */
	u_longlong_t	v_mmap_write;	/* mmap write count */
	void		*v_mpssdata;	/* info for large page mappings */
	void		*v_fopdata;	/* list of file ops event watches */
	kmutex_t	v_vsd_lock;	/* protects v_vsd field */
	struct vsd_node *v_vsd;		/* vnode specific data */
	struct vnode	*v_xattrdir;	/* unnamed extended attr dir (GFS) */
	uint_t		v_count_dnlc;	/* dnlc reference count */
} vnode_t;

Defined in usr/src/uts/common/sys/vnode.h.

The following is the function we are going to step through.  The sequence of calls to vn_open(), vn_rdwr(), VOP_CLOSE() and VN_RELE() is based on examples found in the kernel sources.

static int
demo_ioctl( dev_t dev, int cmd, intptr_t arg, int mode,
    cred_t *cred_p, int *rval_p )
{

    if ( cmd == 0 )
    {
   	 vnode_t *vp;

   	 int r = vn_open( "/root/test",
   		 UIO_SYSSPACE,
   		 FWRITE | FCREAT,
   		 0,
   		 &vp,
   		 0,
   		 0 );

   	 if ( r != 0 )
   	 {
   		 cmn_err( CE_WARN, "Unable to vn_open(/root/test)" );

   		 return r;
   	 }

   	 char test[] = { 't', 'e', 's', 't', '\n' };

   	 ssize_t resid;

   	 r = vn_rdwr( UIO_WRITE,
   		 vp,
   		 (caddr_t) test,
   		 sizeof test,
   		 (offset_t) 0,
   		 UIO_SYSSPACE,
   		 0,
   		 (rlim64_t) sizeof test,
   		 CRED(),
   		 &resid );

   	 if ( r != 0 )
   	 {
   		 cmn_err( CE_WARN, "Unable to write to /root/test" );

   		 goto vop_close;
   	 }

   	 // Potentially use the file more...

vop_close:
   	 VOP_CLOSE( vp, FWRITE, 1, (offset_t) 0,
   		 CRED(), NULL );
   	 VN_RELE( vp );

   	 return r;
    }
    else
    {
   	 return EINVAL;
    }

We use -rS -fvebose-asm instead of -c for the the gcc command line to get the following assembly output.

	.type    demo_ioctl, @function
demo_ioctl:
.LFB13:
    pushq    %rbp    /
.LCFI3:
    movq    %rsp, %rbp    /,
.LCFI4:
    pushq    %rbx    /
.LCFI5:
    movl    $22, %ebx    /, r
    subq    $72, %rsp    /,
.LCFI6:
    testl    %esi, %esi    / cmd
    je    .L12    /,
    movl    %ebx, %eax    / r,
    addq    $72, %rsp    /,
    popq    %rbx    /
    leave
    ret
    .p2align 4,,10
    .p2align 3
.L12:
    leaq    -40(%rbp), %r8    /,
    xorl    %r9d, %r9d    /
    xorl    %ecx, %ecx    /
    movl    $0, (%rsp)    /,
    movl    $258, %edx    /,
    movb    $1, %sil    /,
    movq    $.LC0, %rdi    /,
    call    vn_open    /
    testl    %eax, %eax    / r
    movl    %eax, %ebx    /, r
    jne    .L13    /,
    movb    $116, -32(%rbp)    /, test
    movb    $101, -31(%rbp)    /, test
    movb    $115, -30(%rbp)    /, test
    movb    $116, -29(%rbp)    /, test
    movb    $10, -28(%rbp)    /, test
    call    threadp    /
    leaq    -48(%rbp), %rcx    /, tmp71
    leaq    -32(%rbp), %rdx    /, tmp69
    xorl    %r8d, %r8d    /
    movl    $1, %r9d    /,
    movl    $1, %edi    /,
    movq    %rcx, 24(%rsp)    / tmp71,
    movq    456(%rax), %rax    / .t_cred, .t_cred
    movl    $5, %ecx    /,
    movq    $5, 8(%rsp)    /,
    movl    $0, (%rsp)    /,
    movq    %rax, 16(%rsp)    / .t_cred,
    movq    -40(%rbp), %rsi    / vp,
    call    vn_rdwr    /
    testl    %eax, %eax    / r
    movl    %eax, %ebx    /, r
    jne    .L14    /,
.L10:
    call    threadp    /
    movq    -40(%rbp), %rdi    / vp,
    movq    456(%rax), %r8    / .t_cred,
    xorl    %r9d, %r9d    /
    xorl    %ecx, %ecx    /
    movl    $1, %edx    /,
    movl    $2, %esi    /,
    call    fop_close    /
    movq    -40(%rbp), %rdi    / vp,
    call    vn_rele    /
    movl    %ebx, %eax    / r,
    addq    $72, %rsp    /,
    popq    %rbx    /
    leave
    ret
    .p2align 4,,10
    .p2align 3
.L14:
    movq    $.LC2, %rsi    /,
    movl    $2, %edi    /,
    xorl    %eax, %eax    /
    call    cmn_err    /
    jmp    .L10    /
    .p2align 4,,10
    .p2align 3
.L13:
    xorl    %eax, %eax    /
    movq    $.LC1, %rsi    /,
    movl    $2, %edi    /,
    call    cmn_err    /
    movl    %ebx, %eax    / r,
    addq    $72, %rsp    /,
    popq    %rbx    /
    leave
    ret
.LFE13:
    .size    demo_ioctl, .-demo_ioctl

Armed with this, we know what the assembly output of our function looks like. We see from line 129 that vp is stored at -40(%rbp) which is what we want to inspect. On line 101, this address is loaded into register %r8 and we can simply grab it from there.

Introduction to MDB

MDB is the Modular Debugger.  The -K parameter tells mdb to stop the running kernel.

The basic skeleton of an mdb command is address ::command arguments.  Where the address is often omitted.  The commands covered in this short tutorial are ::dis to disassemble, ::bp to set breakpoints, ::cont to continue execution, ::next to step over an instruction, ::regs to show register contents and ::print to display variables.

The ::print command takes as argument the type of data to display, in the examples below we print a pointer with vnode_t* and the vnode itself with vnode_t.  For structures, ::print also accepts fields to display.

Stepping Through the Kernel Code

After loading the module, we enter the kernel debugger in the console with mdb -K.  In our case, the module is called demo and we tell mdb to disassemble the function with ::dis.

[0]> demo`demo_ioctl::dis
demo`demo_ioctl:              pushq  %rbp
demo`demo_ioctl+1:            movq   %rsp,%rbp
demo`demo_ioctl+4:            pushq  %rbx
demo`demo_ioctl+5:            movl   $0x16,%ebx
demo`demo_ioctl+0xa:          subq   $0x48,%rsp
demo`demo_ioctl+0xe:          testl  %esi,%esi
demo`demo_ioctl+0x10:         je     +0xe     <demo`demo_ioctl+0x20>
demo`demo_ioctl+0x12:         movl   %ebx,%eax
demo`demo_ioctl+0x14:         addq   $0x48,%rsp
demo`demo_ioctl+0x18:         popq   %rbx
demo`demo_ioctl+0x19:         leave
demo`demo_ioctl+0x1a:         ret
demo`demo_ioctl+0x1b:         nopl   0x0(%rax,%rax)
demo`demo_ioctl+0x20:         leaq   -0x28(%rbp),%r8
demo`demo_ioctl+0x24:         xorl   %r9d,%r9d
demo`demo_ioctl+0x27:         xorl   %ecx,%ecx
demo`demo_ioctl+0x29:         movl   $0x0,(%rsp)
demo`demo_ioctl+0x30:         movl   $0x102,%edx
demo`demo_ioctl+0x35:         movb   $0x1,%sil
demo`demo_ioctl+0x38:         movq   $0xfffffffff8bf4ef0,%rdi
demo`demo_ioctl+0x3f:         call   +0x2f6f5b4       <vn_open>
demo`demo_ioctl+0x44:         testl  %eax,%eax
demo`demo_ioctl+0x46:         movl   %eax,%ebx
demo`demo_ioctl+0x48:         jne    +0xba    <demo`demo_ioctl+0x108>
demo`demo_ioctl+0x4e:         movb   $0x74,-0x20(%rbp)
demo`demo_ioctl+0x52:         movb   $0x65,-0x1f(%rbp)
demo`demo_ioctl+0x56:         movb   $0x73,-0x1e(%rbp)
demo`demo_ioctl+0x5a:         movb   $0x74,-0x1d(%rbp)
demo`demo_ioctl+0x5e:         movb   $0xa,-0x1c(%rbp)
demo`demo_ioctl+0x62:         call   +0x2c77f49       <threadp>
demo`demo_ioctl+0x67:         leaq   -0x30(%rbp),%rcx
demo`demo_ioctl+0x6b:         leaq   -0x20(%rbp),%rdx
demo`demo_ioctl+0x6f:         xorl   %r8d,%r8d
demo`demo_ioctl+0x72:         movl   $0x1,%r9d
demo`demo_ioctl+0x78:         movl   $0x1,%edi
demo`demo_ioctl+0x7d:         movq   %rcx,0x18(%rsp)
demo`demo_ioctl+0x82:         movq   0x1c8(%rax),%rax
demo`demo_ioctl+0x89:         movl   $0x5,%ecx
demo`demo_ioctl+0x8e:         movq   $0x5,0x8(%rsp)
demo`demo_ioctl+0x97:         movl   $0x0,(%rsp)
demo`demo_ioctl+0x9e:         movq   %rax,0x10(%rsp)
demo`demo_ioctl+0xa3:         movq   -0x28(%rbp),%rsi
demo`demo_ioctl+0xa7:         call   +0x2f6f07c       <vn_rdwr>
demo`demo_ioctl+0xac:         testl  %eax,%eax
demo`demo_ioctl+0xae:         movl   %eax,%ebx
demo`demo_ioctl+0xb0:         jne    +0x3e    <demo`demo_ioctl+0xf0>
demo`demo_ioctl+0xb2:         call   +0x2c77ef9       <threadp>
demo`demo_ioctl+0xb7:         movq   -0x28(%rbp),%rdi
demo`demo_ioctl+0xbb:         movq   0x1c8(%rax),%r8
demo`demo_ioctl+0xc2:         xorl   %r9d,%r9d
demo`demo_ioctl+0xc5:         xorl   %ecx,%ecx
demo`demo_ioctl+0xc7:         movl   $0x1,%edx
demo`demo_ioctl+0xcc:         movl   $0x2,%esi
demo`demo_ioctl+0xd1:         call   +0x2f72a52       <fop_close>
demo`demo_ioctl+0xd6:         movq   -0x28(%rbp),%rdi
demo`demo_ioctl+0xda:         call   +0x2f6f279       <vn_rele>
demo`demo_ioctl+0xdf:         movl   %ebx,%eax
demo`demo_ioctl+0xe1:         addq   $0x48,%rsp
demo`demo_ioctl+0xe5:         popq   %rbx
demo`demo_ioctl+0xe6:         leave
demo`demo_ioctl+0xe7:         ret
demo`demo_ioctl+0xe8:         nopl   0x0(%rax,%rax)
demo`demo_ioctl+0xf0:         movq   $0xfffffffff8bf4edd,%rsi
demo`demo_ioctl+0xf7:         movl   $0x2,%edi
demo`demo_ioctl+0xfc:         xorl   %eax,%eax
demo`demo_ioctl+0xfe:         call   +0x2ea5fb5       <cmn_err>
demo`demo_ioctl+0x103:        jmp    -0x53    <demo`demo_ioctl+0xb2>
demo`demo_ioctl+0x105:        nopl   (%rax)
demo`demo_ioctl+0x108:        xorl   %eax,%eax
demo`demo_ioctl+0x10a:        movq   $0xfffffffff8bf4efb,%rsi
demo`demo_ioctl+0x111:        movl   $0x2,%edi
demo`demo_ioctl+0x116:        call   +0x2ea5f9d       <cmn_err>
demo`demo_ioctl+0x11b:        movl   %ebx,%eax
demo`demo_ioctl+0x11d:        addq   $0x48,%rsp
demo`demo_ioctl+0x121:        popq   %rbx
demo`demo_ioctl+0x122:        leave
demo`demo_ioctl+0x123:        ret

Here, every immediate value is given in hexadecimal, so we have to be careful not to get confused when comparing to the annotated assembly from earlier. The instruction we are paying attention to is on line 15 so we can just break there with

[0]> demo`demo_ioctl+0x20::bp

And then re-start the kernel with

[0]> ::cont
root@janus:~#

Now, when we execute the ioctl, mdb will stop before that instruction in the console. Note that the GUI, if you have it in front of you, will freeze at this time.

root@janus:~# kmdb: stop at demo`demo_ioctl+0x20
kmdb: target stopped at:
demo`demo_ioctl+0x20: leaq   -0x28(%rbp),%r8

After executing ::next, we have the address of our pointer in register %r8. We can take a look with ::regs.

[0]> ::next
kmdb: target stopped at:
demo`demo_ioctl+0x24: xorl   %r9d,%r9d
[0]> ::regs
%rax = 0x0000000000000000                 %r9  = 0xffffff0003f96de4
%rbx = 0x0000000000000016                 %r10 = 0xffffffffc0014e80 demo`demo_cb_ops
%rcx = 0x0000000000100003                 %r11 = 0xffffff01061fb000
%rdx = 0x000000000d7acdc4                 %r12 = 0xffffff0003f96de4
%rsi = 0x0000000000000000                 %r13 = 0x000000000d7acdc4
%rdi = 0x0000012300000000                 %r14 = 0x0000000000000000
%r8  = 0xffffff0003f96c98                 %r15 = 0xffffff010e660e40

%rip = 0xfffffffff7ace944 demo`demo_ioctl+0x24
%rbp = 0xffffff0003f96cc0
%rsp = 0xffffff0003f96c70
%rflags = 0x00000246
  id=0 vip=0 vif=0 ac=0 vm=0 rf=0 nt=0 iopl=0x0
  status=<of,df,IF,tf,sf,ZF,af,PF,cf>

                        %cs = 0x0030    %ds = 0x004b    %es = 0x004b
%trapno = 0x1           %fs = 0x0000    %gs = 0x01c3
   %err = 0x0
[0]>

We can take a look at the value of this pointer with ::print. Note that in PuTTY, selecting will copy, and a right mouse button will paste.

[0]> 0xffffff0003ec2c98::print vnode_t*
strwrite+0x1f

Obviously, before the call to vn_open() the pointer contains garbage. After a few more ::nexts.

demo`demo_ioctl+0x3f: call   +0x2f6f5b4       <vn_open>
[0]> ::next
kmdb: target stopped at:
demo`demo_ioctl+0x44: testl  %eax,%eax
[0]> 0xffffff0003ec2c98::print vnode_t*
0xffffff0116906200

We now have a valid pointer to the vnode we want to monitor. If we take a look at the ::regs again, we see that this value is also in registers %rsi, %rdi and %r8; noting this makes future debugging easier because we can just grab the value from the registers after calling vn_open().

We take a look at the vnode itself with another ::print.

[0]> 0xffffff0116906200::print vnode_t
{
    v_lock = {
        _opaque = [ 0 ]
    }
    v_flag = 0x10000
    v_count = 0x2
    v_data = 0xffffff01181ae4f8
    v_vfsp = root
    v_stream = 0
    v_type = 1 (VREG)
    v_rdev = 0xffffffffffffffff
    v_vfsmountedhere = 0
    v_op = 0xffffff0107206980
    v_pages = 0
    v_filocks = 0
    v_shrlocks = 0
    v_nbllock = {
        _opaque = [ 0 ]
    }
    v_cv = {
        _opaque = 0
    }
    v_locality = 0
    v_femhead = 0
    v_path = 0xffffff01135d91f8 "/root/test"
    v_rdcnt = 0
    v_wrcnt = 0x1
    v_mmap_read = 0
    v_mmap_write = 0
    v_mpssdata = 0
    v_fopdata = 0
    v_vsd_lock = {
        _opaque = [ 0 ]
    }
    v_vsd = 0
    v_xattrdir = 0
    v_count_dnlc = 0x1
}

However, since we are only interested in v_count, v_rdcnt and v_wrcnt, we can include those fields on the ::print command, after the type.

[0]> 0xffffff0116906200::print vnode_t v_count v_rdcnt v_wrcnt
v_count = 0x2
v_rdcnt = 0
v_wrcnt = 0x1

At the moment, we can presume that the FCREAT flag is the cause of the v_wrcnt; further testing will confirm or deny that2.

Now we step past the vn_rdwr() call.

kmdb: target stopped at:
demo`demo_ioctl+0xa7: call   +0x2f6f07c       <vn_rdwr>
[0]> ::next
kmdb: target stopped at:
demo`demo_ioctl+0xac: testl  %eax,%eax
[0]> 0xffffff0116906200::print vnode_t v_count v_rdcnt v_wrcnt
v_count = 0x2
v_rdcnt = 0
v_wrcnt = 0x1

We note here that the vn_wrcnt hasn’t changed. Nor has v_count.

Now we step past fop_close() which is what the macro VOP_CLOSE() becomes. The next two examples are done by setting a breakpoint after each call.

kmdb: target stopped at:
demo`demo_ioctl+0xd6: movq   -0x28(%rbp),%rdi
[0]> 0xffffff0116906200::print vnode_t v_count v_rdcnt v_wrcnt
v_count = 0x2
v_rdcnt = 0
v_wrcnt = 0

Here, vn_wrcnt is reset and v_count is still 2.

Stepping past vn_rele,

kmdb: target stopped at:
demo`demo_ioctl+0xdf: movl   %ebx,%eax
[0]> 0xffffff01169062000::print vnode_t v_count v_rdcnt v_wrcnt
v_count = 0x1
v_rdcnt = 0
v_wrcnt = 0

we see that the v_count is set to 1.

Conclusion

Further research3 is needed to determine if this is the proper way to open and close files in the kernel. With only this information, we can’t tell if file creation should have two VN_RELE()s — unlikely, but we should figure out what holds another reference to the file.

Don’t be afraid of mdb.


1To change the speed, we need to configure the Illumos kernel too, something we don’t cover here.

2Or just code inspection at src.illumos.org.

3Code inspection at src.illumos.org and experiments.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: