Press "Enter" to skip to content

Inside the Core: C64

Crispy passed away August 8, 2022 and I thought I’d publish this unfinished article he was working on.

mmdc_commodore

When I began development on the MMDC I was focused solely on creating an implementation of the Atari 2600 game console using modern day electronic components. Because of this, I designed the circuit board with hardware features specific to the Atari 2600, and it wasn’t until well after I had it all working that I was inspired to develop cores for other platforms.

The first problem I ran into when exploring the possibility of a C64 core was that of RAM. The Atari 2600 doesn’t have any system RAM to speak of. The RIOT chip has 128 bytes that are used for the CPU stack, but that’s it. The C64, on the other hand, has 64K bytes of RAM, so I needed to find a way of mapping out the hardware resources on my circuit board in a way that would give me 64K of RAM for the C64 system memory.

Now the Xilinx FPGA that I chose does have 32 RAM blocks, and that works out to 64K, but 12 of those blocks are already being used for video line buffers and look up tables. There is also a single 256Kx8 static RAM chip that I put on the main circuit board, but that is being used for the video scaler triple buffer. After thinking it through for a bit, I arrived at an initial solution of sharing the on-board RAM between the triple buffer and the C64 system memory.

The triple buffer uses 192K of the on-board RAM, and that leaves exactly 64K. The RAM chip itself has a 45 ns access time, and my memory controller is clocked at 18.432 MHz. My initial calculations showed that I had enough memory bandwidth to support six video streams in and out of RAM. Since the triple buffer uses only two streams, one write and one read, there would be plenty of bandwidth left over. There was, however, one very big gotcha. The CPU needs immediate access to RAM; a CPU memory access has to complete in just under a microsecond. I designed the memory controller with a 32 byte burst size, which works out to 1.736 us, and this means that it’s likely that more times than not the memory controller won’t be able to grant a CPU memory request because it’s busy fulfilling a triple buffer memory access. It became clear that I could make it work by adding some logic to the memory controller to support high priority memory requests. The idea was to interrupt the triple buffer when the CPU needed access to memory. While this would work fine, it added a fair amount of complexity to the memory controller design.

So I started work on the memory controller, and then came to a realization. The whole reason for the triple buffer is to absorb the video timing instabilities from the Atari 2600 core. The Atari 2600 is unique in the way it generates video. The vertical timing is completely under software control; the game programmer is solely responsible generating the vertical refresh signal. Because of this, the vertical timing can differ from game to game, and can even change during the game. While this worked fine on the analog television sets of the day, modern flat panel televisions do not like this at all. The triple buffer alleviates this issue by frame syncing the Atari 2600 timing to the standard video timing that modern flat panel televisions expect. In contrast, the C64 video timing is generated by hardware, is stable, and is close enough to standard timing that it works fine on most modern televisions. Based on these observations I decided that the triple buffer was not needed for the C64 core. I could throw away the triple buffer and memory controller, and use the on-board RAM directly. Problem solved.

Of course I still needed to upscale the C64 video, and translate the SD timing from the C64 core to HD timing. I decided to make it all work by vertical locking the output timing to the C64 timing. The C64 uses a pixel clock of 8.1818 MHz, a horizontal total of  520 pixels/line, and a vertical total of 263 lines/frame to produce 59.826 frames/second. Standard HD video timing falls in at 59.94 fps, but 59.826 fps is close enough that it will work fine with most modern televisions. So now it was just a matter of adjusting the speed of the output pixel clock so that the first pixel of an output frame would always occur at the exact same time as the first pixel of a C64 frame. For 1080p, which has a horizontal total of 2200 pixels/line and a vertical total of 1125 lines/frame, the required pixel clock would be 148.07 MHz. Fortunately I chose the right PLL clock generator for the job. I’m using the Silicon Labs Si5338, which has four independent clock outputs each of which can be adjusted with high precision to nearly any frequency in its supported range, and with it I was able to produce the exact frequency I needed for the output pixel clock. Even more importantly, since both the C64 vertical timing and the output vertical timing would be the result of dividing the PLL frequency by the exact same integer value, both timings would be phase locked, and would not drift apart. Of course there were two more issues to address. First off I needed to determine the rate at which video data is being produced by the C64 compared to the rate at which video data is being consumed by the scaler. Doing the math showed that the C64 is producing video data faster than the speed at which the scaler is consuming it, and at the end of a frame the C64 would be ahead by about twelve lines of data. This means that line buffers are needed to store the C64 video data. A single FPGA RAM block can hold eight lines of C64 video data, so using two blocks would provide enough storage for twelve lines with a little left over. Next I needed to create a lock signal that would align the start of output video with the start of C64 video. I decided to lock the start of output active video to the 128th pixel on the first line of C64 active video. This would guarantee at least 128 pixels of data available to the scaler, and would prevent any kind of buffer under run while maintaining extremely low latency. The lock mechanism I designed will re-lock the output video sync generator only if it drifts more that 16 pixels away from the C64 lock signal. This allows for some jitter between the C64 and output timings.

2 Comments

  1. Eric
    Eric September 4, 2022

    What an amazing, tremendous talent – and an epic project. We’d done so many projects together these past couple of years. His ingenuity and knowledge was so deep, I was constantly in awe of his skills, kindness and generosity.

    I miss Crispy so much. Thank you SO much for posting this article.

    RIP Chris

    Eric
    (AmigaLove)

  2. Jengel
    Jengel September 15, 2022

    My condolances to all who knew him. I was following this project. I hope his ideas may live on, in his honour and memory…rip

Leave a Reply

Your email address will not be published. Required fields are marked *