Monday, 21 May 2012



In this post I will demonstrate how to create an implementation of the minimalist Bytepusher virtual machine using C# and the XNA gaming framework.  For clarity, I'll be a bit more verbose than is really necessary.
The application itself will be called BytePusherXna.

As I'm using XNA the code will be portable across microsoft platforms including windows, xbox and windows phone.


Windows 7
Visual Studio 2010

I've included GIT in the list of tools as if you want to access and build the code, then you will need to pull of from Github.  I'll provide details later.


 Bytepusher is a minimalist virtual machine with the following specs:

60 frames per second
ByteByteJump with 3-byte addresses
CPU speed:
65536 instructions per frame (3932160 instructions per second, ~3.93 MHz).
Byte ordering:
16 MiB RAM
256*256 pixels, 1 byte per pixel, 216 fixed colors
8-bit mono, signed values. 256 samples per frame (15360 samples per second)
16 keys, organized into 4 rows by 4 columns

See: for a complete description, it’s history, known implementations and downloadable ROMS.   There appears to be 8 ROMS in existence, of which I can only find 7.  The 8th is a dud link.


The bytepusher CPU has only 1 instruction, known as byte byte jump.  Starting at PC (program counter) the CPU reads 9 bytes from memory and splits these into 3 24bit words - AAA BBB and CCC.  It copies the contents of memory location AAA to memory location BBB then jumps to CCC (ie sets PC=CCC) and repeats.  Byte byte jump is Turing complete meaning that any other Turing machine can be created from it. ie. it could be used to emulate any other cpu in existence!

BytePusherXna Architecture

I’ve separated the BytePusherXna architecture into 2, the first encapsulates the Virtual machine, agnostic of any hardware implementation (coloured red), the second encapsulates the hardware implementation (coloured blue)

The design allows for different “hardware” implementations of the audio, keyboard and display technologies to be swapped in.  This is done via a Hardware Abstraction layer (implemented as an interface).  This is good programming practice as it separates concerns, allowing for focused testing ie. we can test the BytepusherVM independently of the hardware tier and so on.    

I'll highlight in yellow bits from the diagram as we go along so you can see how the code relates back to the architecture.


 Every 60th of a second, the BytePusherVM does the following:
  1. Poll the keys and store their states as a 2-byte value at address 0.
  2. Fetch the 3-byte program counter from address 2, and execute exactly 65536 instructions.
  3. Send the 64-KiB pixeldata block designated by the byte value at address 5 to the display device. Send the 256-byte sampledata block designated by the 2-byte value at address 6 to the audio device.

public class BytePusherVM
  byte[] mem = new byte[0xFFFFFF];
  BytePusherIODriver ioDriver;

  public BytePusherVM(BytePusherIODriver ioDriver)
    this.ioDriver = ioDriver;

  // load rom into memory
  public void load(string rom)
    mem = new byte[0xFFFFFF];
    using (FileStream fs = new FileStream(rom, FileMode.Open))
      int pc = 0;
      int i = 0;
      while ((i = fs.ReadByte()) != -1)
        mem[pc++] = (byte)i;

  public void run()
    // run 65536 instructions
    var s = ioDriver.getKeyPress();
    mem[0] = (byte) ((s & 0xFF00) >> 8);
    mem[1] = (byte)(s & 0xFF);
    var i = 0x10000;
    var pc = getVal(2, 3);
    while (i-- != 0)
      mem[getVal(pc + 3, 3)] = mem[getVal(pc, 3)];
      pc = getVal(pc + 6, 3);
    ioDriver.renderAudioFrame(copy(getVal(6, 2) << 8, 256));
    ioDriver.renderDisplayFrame(copy(getVal(5, 1) << 16, 256 * 256));

  int getVal(int pc, int length)
    var v = 0;
    for (var i = 0; i < length; i++)
      v = (v << 8) + mem[pc++];
    return v;

  byte[] copy(int start, int length)
    var b = new byte[length];
    Array.Copy(mem, start, b, 0, length);
    return b;

When constructing a  BytePusherVM you need to pass a BytePusherIODriver.  This is the hardware abstraction layer.  It decouples the BytePusherVM from the physical implementation.

I’m a bit uneasy about the load() methods, as this couples the code to loading ROMS from a file system.  It may have been better to use a stream to pass in the ROM data, or even better push the whole thing down into the hardware abstraction layer.

Hardware Abstraction Layer

This is the bit which decouples the hardware from the BytePusherVM.  As long as my hardware "driver" implements this interface, then it can be used by the BytePusherVM:

public interface BytePusherIODriver
    * Get the current pressed key (0-9 A-F
  ushort getKeyPress();

   * Render 256 bytes of audio 
  void renderAudioFrame(byte[] data);

   * Render 256*256 pixels.  
  void renderDisplayFrame(byte[] data);

Hardware Drivers

The implementation of BytePusherIODriver uses XNA to read key presses, generate graphics and play audio:

public class XnaBytePusherIODriver : BytePusherIODriver
  public Texture2D texture { get; set; }
  GraphicsDevice gd;
  DynamicSoundEffectInstance soundEffect;

  public XnaBytePusherIODriver(GraphicsDevice gd)
  { = gd;
    // start audio device
    soundEffect = new DynamicSoundEffectInstance(15360, AudioChannels.Mono);

  public void renderAudioFrame(byte[] data)
    // convert to 16bit data
    var audio = new byte[data.Length * 2];
    var z = 0;
    foreach (var d in data)
      audio[z++] = 0;
      audio[z++] = d;
    // submit audio buffer

  public void renderDisplayFrame(byte[] data)
    // convert 8 bit data to 24 bit RGB            
    var rgbuffer = new byte[256 * 256 * 3];
    var z = 0;

    foreach (var c in data)
      if (c > 215)
        z = z + 3;
          int blue = c % 6;
          int green = ((c - blue) / 6) % 6;
          int red = ((c - blue - (6 * green)) / 36) % 6;

          rgbuffer[z++] = (byte)(blue * 0x33);
          rgbuffer[z++] = (byte)(green * 0x33);
          rgbuffer[z++] = (byte)(red * 0x33);

      // Gerate bitmap.
      var b = new SD.Bitmap(256, 256, SDI.PixelFormat.Format24bppRgb);
      var rect = new SD.Rectangle(0, 0, 256, 256);
      var bitmapdata = b.LockBits(rect, SDI.ImageLockMode.WriteOnly, b.PixelFormat);
      System.Runtime.InteropServices.Marshal.Copy(rgbuffer, 0, 
     bitmapdata.Scan0, 256 * 256 * 3);

      // convert to png (which xna can use)
      using (var ms = new MemoryStream())
        b.Save(ms, SDI.ImageFormat.Png);
        // create texture object which can then be rendered in the next Draw() call
        texture = Texture2D.FromStream(gd, new MemoryStream(ms.GetBuffer()));

  public ushort getKeyPress()
    ushort key = 0;
    foreach (var k in Keyboard.GetState().GetPressedKeys())
      switch (k)
        case Keys.D0: key += 1; break;
        case Keys.D1: key += 2; break;
        case Keys.D2: key += 4; break;
        case Keys.D3: key += 8; break;
        case Keys.D4: key += 16; break;
        case Keys.D5: key += 32; break;
        case Keys.D6: key += 64; break;
        case Keys.D7: key += 128; break;
        case Keys.D8: key += 256; break;
        case Keys.D9: key += 512; break;
        case Keys.A: key += 1024; break;
        case Keys.B: key += 2048; break;
        case Keys.C: key += 4096; break;
        case Keys.D: key += 8192; break;
        case Keys.E: key += 16384; break;
        case Keys.F: key += 32768; break;

   return key;

This takes in 256 bytes of 8 bit sound data.  The XNA sound system supports 16 bit  audio so we need to convert from 8 bit signed to 16 bit signed.  Once the data has been converted it’s then submitted into the audio system.  The XNA audio system operates a queuing system.  You submit an audio buffer and internally it puts it onto a queue.  It plays the buffers in the order in which you sent them.  Providing the timing is right, and you submit before the previous buffer has finished playing, you get smooth audio.

This creates a .png image (which is a lossless format) then converts this into an XNA Texture2D object.

The first thing we need to do is convert from 8 bit web safe colour to 24bit RGB colour.  The algorithm I came up with was:

# = 8 bit websafe colour
B = #%6
G = ( ( # - B ) / 6 ) % 6
R = ( ( # - B - 6*G ) / 36 ) % 6
A bit trickier, was creating a bitmap.  Bitmap has a method SetPixel(), unfortunately the method call is ultra slow to the point that doing 256*256 calls to SetPixel() can blow the 60th second time interval.  The workaround is a bit on the funky side, which involves directly manipulating the bitmap data.  It works though, and importantly is fast.  Perhaps I could have done this using the XNA libraries directly?

The only way I know to convert it into a XNA Texture2D object is to convert the Bitmap into a .png, save it to memory, then pass this into the Texture2D object.

This function coverts a keypress (or multiple keypresses) into a 16 bit value.  Each key represents a single bit of the 16 bit value, allowing simultaneous key presses to be detected.

Bringing it all together

The integration peice is to pull everything together into an XNA project, and plug into the hooks which the XNA framework provides to  trigger frame updates and re-draws.

public class BytePusherXna : Microsoft.Xna.Framework.Game
  GraphicsDeviceManager graphics;
  SpriteBatch spriteBatch;
  BytePusherVM vm;
  XnaBytePusherIODriver driver;
  public BytePusherXna()
    graphics = new GraphicsDeviceManager(this);
    graphics.IsFullScreen = true;
    graphics.PreferredBackBufferHeight = 768;
    graphics.PreferredBackBufferWidth = 1024;
    Content.RootDirectory = "Content";
    this.TargetElapsedTime = TimeSpan.FromSeconds(1.0f / 60);

  protected override void LoadContent()
    // initialise BytePusher VN
    driver = new XnaBytePusherIODriver(GraphicsDevice);
    vm = new BytePusherVM(driver);
    // Setup fonts and screen dimensions            
    spriteBatch = new SpriteBatch(GraphicsDevice);

   * Update called every 60th second.  This seems to work fine up to 60fps.  
  protected override void Update(GameTime gameTime)

   * Called once per update(), however calls may be dropped.  
  protected override void Draw(GameTime gameTime)
    // render BytePusherVM disaply
    spriteBatch.Draw(driver.texture, new Rectangle(0, 0, 256*3, 256*3), Color.White);

Simplistically, the code above amounts to:

every 60th of a second {

Update() {

Draw() {
render vm.texture to the screen

That's it...


To make BytePusherXna stand out from the crowd (in the very competative bytepusher markert!) I've added a few extra features to allow you to monitor framerate, toggle between full screen and windowed mode, increase and decrease the CPU frequency.  


GIT repository:

BytePusherXNA App:

Installation instructions

To install, unzip into a temporary folder. 
run setup.exe
Create folder c:/bytepusher/roms
Copy the contents of roms folder into c:/bytepusher/roms

BytePusherXna will scan c:/bytepusher/roms for any file matching *.BytePusher.


There are 2 measures, I included in the app:
1) CPU rate.  This is measured as the number on calls to per second.
2) GPU rate.  This is measured as the number of rendered frames drawn to the screen.

In full screen mode the VM happily runs at CPU=60 and GPU=60.  The audio is smooth.

When there is a slow down, what tends to happen is that the CPU rate stays at 60, whereas the GPU rate goes down.   This manifests itself as perfect audio, but a slow stuttering display.

Strangely when there is a CPU slow down (I’ve added the ability to adjust the framerate), the audio seems to sloooowwwwww down as well.   I would have expected the audio to sound a bit like a machine gun?