Watch on YouTube: https://www.youtube.com/watch?v=SqWkACvpne8
A COAST Module For My Pentium 100
Introduction
In the beginning, there was RAM, and that was fast enough for everyone - until one day it was decided that it actually wasn’t fast enough, and so CPU cache was born. Inevitably, someone decided to put it on a stick. So let’s investigate that.
Script
This is a COAST, or Cache On A STick module, and it’s an old standard that was designed to help increase the performance of older computers, such as our Pentium 100 PC here, which dates to around 1996. Now the way that this worked is that it slotted into a dedicated slot on the motherboard and added additional CPU cache memory - and in this video, I want to get this installed in this system, do some benchmarks before and after, and find out just what kind of performance increase people might have expected from using such a module back in the day. But I think first, we do need a very basic understanding of how a computer actually works.
…and when I say basic, what I mean is that this is going to be massively oversimplified, so apologies to the computer scientists in the audience, please do let me know down in the comments if you’re not happy with this explanation for any reason.
So in a system such as this one, our data - meaning our operating system, programs and files - is all stored on a hard drive, which back in 1996 would have been mechanical, and as you can imagine, they were quite, quite slow - so for our CPU to be able to work with said data in any kind of timely fashion, We should probably copy it to something much faster - and that is where the system RAM comes in.
Now, this is how computers worked for quite a long time - we’d load the data that we wanted to work with into RAM, the CPU would then quite happily read that data and write it back as it modified it, and that was pretty much fast enough for everyone.
But as CPU speeds increased, the bottleneck between the CPU and the RAM started to become a bit more of a problem - and so cache was born, and various competing standards, including our Cache On A STick module here, started to come onto the market.
So how does this so called cache actually work? Well, it sits between the main system RAM and the CPU, and it consists of a much, much faster type of RAM known as SRAM - now you might be thinking, well, if this SRAM is so much faster, why not just make the entire system RAM out of that? And the answer to that is cost. SRAM was very, very expensive back in the day, and so we could only add a very small amount of it to PCs, but it was considered that the cost was worth it.
And so any data that the CPU was working regularly with was copied to the cache, and actually accessed from there, rather than the main system RAM, which did help to increase performance of the computer. Yeah, I did say that it would be massively oversimplified, and of course we could get into the different cache levels, and cache management strategies, and system architecture and all the rest of it - but to be honest with you, I want to approach this as just a normal PC user from the mid nineties, someone who would have been using a system just like this one on a day to day basis - and ultimately all they would have cared about is, is it a worthwhile investment?
Does it actually make a noticeable difference to my day to day PC experience? And so that is what I want to investigate in this video - and I think a very good place to start would be by running some benchmarks.
But before we get stuck into those just a very quick message from this video’s sponsor PCBWay.com. Now, you’re probably already familiar with PCBWay from their 10 years of experience in the PCB fabrication business - but, did you know that they’ve also started offering additional services? Including CNC machining, injection molding and 3D printing?
So if that’s a requirement for your next project please do check out their link down in the description, and a big thank you once again to PCBWay for sponsoring this video.
Now, let’s get on with these benchmarks…
Of course, the first thing that we need is a baseline measurement with no external cache at all, and I must admit, when I first got my hands on this PC and opened it up, I did actually think that that was the state of things because, well, that slot was unpopulated. But running CACHECHK shows that we do actually have 256k of onboard cache - so I’m glad I checked that, and I’m not quite sure how I missed it but it does make our tests a lot more useful because we can test all three configurations: we can disable the onboard cache and see how things run without that, we can re-enable it and see what the performance increase is with 256k of external CPU cache, and then we can take it up to its maximum of 512k and run those tests again and see whether it would have been worthwhile upgrading this machine to 512k, or if indeed 256k would have been enough - and it turns out that disabling the onboard cache is as easy as going into the BIOS and just setting it to “Disabled” in there.
Now this PC is running Windows 95 and it automatically boots up into that, so that seems like a logical place to start - and I’m going to use a Windows based tool called WinTune 98, which was quite a popular tuning and benchmarking tool back in its day and has become something of a popular benchmark among YouTubers as well, so who am I to argue?
This is really going to put the system through its paces - as you can see there are quite a few different tests here - I have disabled the Direct3D test, for some reason that crashes this system. It’s got a Matrox Mystique graphics card in it and it’s not known for its Direct3D support but to be honest we’re not really interested in that, this isn’t really a Windows gaming machine - so this is going to give us a really good overview of kind of the default state of the system. We’ll get scores in all of these different categories - of course, we’re not expecting the CPU cache to make a difference to some of these, although you never know, there might be some surprises, but I think it will just serve as a really good baseline for the kind of overall performance of the system.
The next benchmark that I want to run is a bit of an older one - it’s DOS based, so I’m going to shut down Windows 95 and fire up an MS-DOS prompt - and if you’re a fan of the Phil’s Computer Lab YouTube channel like I am, you’ll no doubt be familiar with this tool, it’s one that he uses all the time on his channel - and in fact, I actually downloaded this from his website. It’s a really, really useful resource, there’s all sorts of drivers and tools and things on there - so big thanks to Phil for making all of this stuff available.
This is of course the SuperScape Virtual Realities VGA Benchmark, otherwise known as 3DBench mark 2 - and this is an older tool, like I say, so we are going to get quite a high score on this Pentium system, but hey, it’s an old stalwart, it’s a bit of an industry standard, so I couldn’t not run this one - and all it does is gives us an FPS value for the stuff that it is currently rendering on screen.
Now, I don’t know about you, but back in 1996, I was playing rather a lot of Doom - in fact, it’s 2024 and I’m still playing Doom because, well, it’s one of the greatest games of all time, of course - and if you have The Ultimate Doom, there is actually a built in benchmark of sorts in the form of the timedemos - so what you can do is you can type in a command line switch and it runs through one of the built in timedemos and then spits out a score at the end.
This score is in the form of gametics - which are essentially the game engine’s way of timing things internally - but there is a very simple mathematical formula that you can use to convert this into an FPS value, so it makes a very useful benchmark. So what I did was run through timedemo 1, 2 and 3 on this system, convert the scores into FPS and make a note of those.
And of course, your PC gamer of 1996 would have been very hyped indeed for id Software’s latest and greatest release, Quake, which came out in the summer of that year. id did include a similar benchmark function in this game as well, so you can run through the timedemos and it will spit out an FPS value at the end, so I ran this in three different resolutions: 320x200, 640x480 and 800x600, and that’s all three demos in all three of those resolutions - so nine different tests for each state - pre-upgrade, 256k, and 512k - so hopefully this will also give us a very good idea of the real world performance increase that we can expect from adding this COAST module.
So some very useful baseline benchmarks there that give us a good idea of how this Pentium 100 system performs without any external CPU cache at all - of course, we’re going to be using the onboard 256k that’s built into the motherboard so all that we need to do is go back into the BIOS and re-enable that - and while it was in there, I should also point out that I found a setting for the speed of the cache as well. Curiously enough, no actual numbers here, but yeah, just options for “Faster” and “Fastest” - so I tried it on the “Fastest” setting and unfortunately it doesn’t seem to work. I’m not sure if there’s an issue with the speed of the SRAM chips on this motherboard, but it won’t even get past the POST, so I’m going to run the rest of these tests on the “Faster” setting, but thought that was worth pointing out because it is quite interesting.
And so with that onboard 256k of CPU cache re-enabled, let’s run through all of those tests again!
First up, we have WinTune 98 again, of course, that tuning and benchmarking tool for Windows from back in the day, and I should say that I have run this from a cold boot both times, I did run it multiple times, and of course the very nature of the benchmark means that it runs through all of the tests three times every single time - so hopefully this is a fair test!
It takes a few minutes to run again, and again I am skipping that Direct3D test, because the graphics card doesn’t like it. But yeah, when it does eventually finish running, we’ve got some really interesting results here, so we have a 3.5 percent - approximately - improvement in the CPU integer performance, which is not huge but quite nice to see. CPU floating point - no real big difference there, that’s a 0.16 percent - or 0.17 percent - increase.
Video (2D) is where we see our first really impressive boost - so that’s an 11.2 percent increase in performance, which is going to be really noticeable using the machine day to day, so very worthwhile. A 1.6 percent improvement in OpenGL performance, which again is not huge, but nice to see - and then memory, which is where I would expect the biggest improvement - 17 percent, that’s absolutely massive, isn’t it? But nowhere near as big as the improvement in the cached disk performance - 51.4 percent! I mean, yeah, the disk, of course, being the slowest part of the machine here so yeah, the cache doing its job there - and finally, just the uncached disk, that’s 3.15 percent. Not huge again, but yeah, a very noticeable improvement across the board.
Next up, that old DOS based benchmarking tool, 3DBench 2, and with no external cache, we were seeing 1023 FPS. With the 256k, we’re getting 1056, which actually isn’t a huge improvement - it’s only around 3.2 percent - so very interesting to see.
And now, my personal favourite, again, it’s The Ultimate Doom, of course, and, alright, this game was running perfectly fine before - it’s designed to work on a 386 and upwards anyway, so, before, without the external cache, we were seeing kind of 45, 50 plus FPS on some of those timedemos, which is perfectly playable, as you might imagine, but, yeah, a really decent performance increase with the 256k of cache, around 12 percent - so we’re now seeing 50 FPS on that slowest timedemo, the most complex one, number one, and 60 FPS on timedemo three, which is gonna be a really noticeable performance increase when you’re playing the game.
So, a lot to talk about here with Quake, the game that probably single-handedly sold millions of Pentium CPUs - and I also wanted to demonstrate what people mean when they say that - so as I had my 486 DX4/100 also set up in the studio here, I thought I’d run through the timedemos on there, and I got around 8 FPS just running in the absolute lowest resolution of 320x200, and bear in mind that’s on a system that has the same clock speed and the same amount of RAM as our Pentium system here - so running in 320x200 on the Pentium, with no external cache whatsoever, we were in the 20s - in the low 20s admittedly - but yeah, sort of 24, 25 FPS for that timedemo, which is an absolute massive improvement.
But of course, what we’re interested in here is this Pentium system with the external cache with that COAST module - so yeah, going from those low 20s to kind of high 20s in 320x200 resolution - which is around a 15, 16 percent performance increase - it doesn’t sound like a lot, but I think it does make the game quite playable on this system, you kind of have to judge it by the standards of the time, I appreciate that 29 FPS doesn’t sound all that great, but it’s a big improvement on 25!
At 640x480 the game is - to be honest - it’s still pretty much unplayable, although it is about 10 percent less unplayable, I guess, if you want to look at it that way. So, yeah, around 10 FPS going up to around 12 - and 800x600, which I absolutely would not recommend on this system, we’ve gone from low 7 FPS up to… kind of high 7. But again, it is around a 10 percent performance increase.
Of course, the purpose of this video was to talk about these - COAST, or Cache On A STick modules - and now we have some benchmarks with zero external CPU cache and 256k, which in our case is built into the motherboard, but of course, depending on the motherboard manufacturer back in the day, could well have come in the form of one of these - but now we have those benchmarks, we can finally upgrade to the full 512k of external cache and see what kind of difference that makes to performance.
So without further ado, let’s get this installed and run through all of those benchmarks… again!
…and, again, that means starting with WinTune 98, of course, running under Windows 95 from a cold boot, as before, and once it’s finally finished running through its raft of tests three times, as before, some really interesting results here - and when I say really interesting, I mean basically a non-result. So yeah, I think this is all within the margin of error really - We’ve got 0.05 percent, we’ve got some that are actually ever so slightly down, minus 1.2 percent, 1.5 percent, even minus 2.49 percent in the case of the uncached disk performance.
The only real improvement we can see here is the cached disk - which is 4 percent - but absolutely not justifying the extra cost of that extra 256k of cache memory, not what I was expecting to see at all, I have to say.
…and jumping back into DOS now and of course 3DBench 2 and it’s a very similar story once more - absolutely marginal here, 0.28 percent performance increase - we’ve gone from 1056 FPS to 1059 so no benefit here either.
So back to Doom again while we are still in DOS and again not really a noteworthy performance increase here - so we’ve gone from 256k of external cache to 512k and we’ve only got around a 2 percent increase in performance, in frame rates, running those three different timedemos - so again, not really justified for the upgrade here - we’ve gone from around kind of 50 FPS to 51, 60 to 61 - really not worth the effort.
Okay, so Quake, one of the most technically impressive games of 1996, and a game that was well known for really pushing the limits of what was possible on the hardware of the day so if we’re going to see some kind of impressive performance increase, perhaps this is where we’re going to find it - and indeed, early signs do look quite good, 320x200 resolution, which is the lowest resolution the game runs in, of course, and we’ve got around a 3 percent increase - so that’s pushing us from those high 29 FPS’s up to, dare I say it, the heady heights of around 30 FPS, at least in timedemo 2 here, so hey, you know, a 3 percent improvement not to be sniffed at I guess, but again, probably not worth the investment for the much more expensive 512k cache module - and going up to 640x480… the game is still completely and utterly unplayable at this resolution!
Even with that 1.6 or 1.8 percent performance improvement, we’re getting an extra 0.2 FPS in these timedemos, so absolutely nothing to write home about - and the story is very similar in 800x600 resolution too, of course - completely and utterly unplayable, like I say, so absolutely not worth bothering - and if you were a huge Quake fan in 1996, I think it’s safe to say that a Voodoo card would have been a much better investment.
So, as perhaps expected, quite a decent performance bump when going from zero external CPU cache to that 256k, and I think based on that, if you had a Pentium system like this in the mid 90s, and it had a COAST slot but didn’t have that onboard external cache like this one does, that the 256k COAST module would have actually been quite a decent investment.
Now, the jury is still out on the full 512k, of course, based on the results of my tests today - if I’d had more system RAM or perhaps run more complex, more demanding things, perhaps we would have seen more of a boost. But as you’ve seen from the results today, not really that much of an increase - and I think the 512k COAST module would have been a bit of a harder sell.
But still, quite an interesting result, I hope you found that useful - and all that’s left for this video is to say a big thank you to my supporters on Patreon, Ko-Fi, and indeed my YouTube channel members. They get videos early and also ad and sponsor free - and of course, a big thank you to you for watching - and hopefully, I’ll see you in the next one.
Original Video Links
Support The Channel:
Patreon: https://www.patreon.com/ctrlaltrees
Ko-Fi: https://ko-fi.com/ctrlaltrees
Become a Member: https://www.youtube.com/channel/UCe7aGwKsc40TYqDJfjggeKg/join
Episode Links:
Phil’s Computer Lab: https://www.youtube.com/@philscomputerlab
Phil’s Website: https://www.philscomputerlab.com
If you liked this video please consider subscribing to ctrl.alt.rees on YouTube!