Wednesday, September 27, 2006

Going back to Rhody... Rhody... Rhody...

I don't think so.

I'm going back to my alma mater to give a talk. I figure I might as well do what little I can to boost attendance, so pretty please, New England virtualizanti, do what you can to make it.

While I'm waxing nostalgic/faux hip-hop about my misspent college youth, it has been pointed out to me that a shout-out is due to my erstwhile CS 169 TA and all-around bad-boy of the hardware/software interface, Bryan Cantrill. Chapeau, Bryan, Adam, and Mike.

As I've pointed out again and again, to the point of eye-rolling boredom on the part of my colleagues, DTrace really is outrageously cool. The improvement in leverage that DTrace yields feels comparable to the improvement going from debugging with printf to a source-level interactive debugger. Those who haven't tried it yet, run, do not walk, to download recent build of OpenSolaris, install it in a VM, and start working your way through the numerous tutorials. You'll be glad you did.

Wednesday, September 20, 2006

Yes, Virginia. We deliver GP's on control flow transfers at the wrong EIP.

Busted! This is just one of those tough situations where "perfect" software and "useful" software are in unresolvable conflict. While we could, in theory, get this corner case completely right, no customer would want us to. To patch up this hole we'd have to make all indirect kernel control flow transfers slower. Since literally nothing we've ever run into in the wild cares, we just get this wrong. C'est la vie. Though, interestingly enough, the one piece of software out there that we know relies on this behavior: the VMware VMM itself.

Since there are both architected (the backdoor port) and unarchitected (i.e., the s[gi]dt/lsl instructions) ways of discovering you're in a VM, this isn't really "news," per se, but interesting x86 trivia none the less. If it makes you upset, you have a few options to make the behavior in a VM more like that of real hardware. As Derek discovered, you can tick the "disable acceleration" checkbox in your VM. You can also buy a shiny, new Intel processor and drop "monitor_control.vt32=TRUE" into your config file, but expect lousy performance. (VT32 really only exists as a check on the correctness of our VT implementation; all the VT performance work has been sunk into running 64-bit guests.)

At the end of the day, discovering you're in a VM, in general, is always likely to be easy, just because of timing attacks. These pieces of code that folks write to detect that you're in a VM are sort of modern-day equivalents of the little assembly programs that were used for CPU identification before the CPUID instruction came along; these terribly clever little DOS programs would try to infer the size of the write buffer, time a few instructions, etc., and come up with a "fingerprint" of the CPU. These programs are serious fun to write, and have some educational value. I haven't seen any legitimate, practical uses of the VM detection programs thus far, though. I.e., congratulations, you now know you're on a VMware Workstation 4.5 VM. What are you going to do about it?

Wednesday, September 06, 2006

Pining for microcode...

Some synchronization arguments we've been having at work led me to read Lampson and Redell's classic synchronization paper. It's rightly famous for providing the first clear exposition of the priority inversion problem. There's also a good deal of implementation detail about Mesa and Pilot, which were a programming language and operating system, respectively, for PARC's famed Alto workstation.

Most modern readers gloss over this implementation section, since the system described is trivial by modern standards (the OS kernel is 24KLOC) and obsolete in many ways. However, the extremely tight coupling between the OS and the hardware ISA is fascinating to me. The hardware directly supports the kernel's run queue, providing a context switch in a single, one-byte instruction! Faults move the current process straight from the run queue to a hardware-supported "fault queue" and reschedule in hardware. Stacks consist of discontiguous, linked frames allocated and deallocated from a variable-sized "frame heap" maintained in, you guessed it, hardware. When the frame heap is exhausted, a "frame fault" sticks the current process on the fault queue. Not quite as trippy as the StackFrame-as-firstclass Object supported by the SmallTalk system running on the same hardware, but still pretty darned far from the post-RISC, all-C-code-all-the-time machines we're accustomed to.

This tight coupling of ISA to OS was made possible by a writable microcode store, whose internals were exposed to system software. Before I get all starry-eyed about how wonderful a microcode revival would be, I will admit up front that it's 2006. Even if it somehow were practical to expose the microcode stores of modern processors, no sane system software vendor would take advantage. Modern machines' microarchitectures are too complicated to be programmable by non-specialists, and it would be painful beyond belief trying to maintain correct implementations of the same ISA on different microarchitectures. Besides, the whole idea of adapting the ISA to a particular OS seems like a quaint holdover from the CISC days; the Alto folks were working in an environment where the hardware could execute a sequence of microcode instructions much faster than the equivalent sequence of macro-instructions, and that just isn't the world we live in anymore.

So, while writable microcode is probably a stupid thing to wish for if you're writing your own OS, it might still be useful, say, to change the behavior of already existing instructions... It's obvious Intel and AMD are relying on some microcode changes to provide the first generation of VT and SVM in minor revisions of their processors. I've complained that VT/SVM only allow software intervention on the far side of a heavyweight hardware context. Wouldn't it be lovely if, instead of software trap handlers for virtualization-sensitive events, the VMM provided "microcode" handlers for such events? Obviously, it wouldn't really be "microcode" proper; the ISA would have to be "architected", perhaps as a subset of the x86, and run with some vaguely SMM-esque constraints on its behavior. But, if it were possible to arrive at this "pseudo-microcode" in a more expeditious manner than the full context switch out to the VMM, it might provide an opportunity to handle some of the more frivolous exits much more efficiently.