GLQuake on the ViRGE family
'One mother of an article which was long overdue...'

Last updated 29th August 1998

Article by Tom Browne


Is it possible?

Of course, why ever not? I've personally been playing GLQuake on my GX2 for practically as long as I've had it. That doesn't mean that it's always been playable, as such...

What do you mean?

Well, erm, the performance used to be dreadful, to put it mildly. The old wrapper that David Springer released (I didn't say he wrote it... :) was particularly bad on the ViRGE due to DirectX5's horribly slow retained mode, and was also lacking a lot of the textures. S3's first wrapper, released on my webpage, sometime in January (I think - my webpage was still on Demon back then) was a slightly modified version of the original MS D3D wrapper, which didn't run an awful lot faster. I remember the release well - some smart arse thought it would be a really good idea to inform the world about it, which brought in derision from well-known websites such as Blue's News, Voodoo Extreme, and Zanshin's GLDojo - who thought it would be a good laugh to try it out on a 2Mb 325 with a crap CPU. Apparently, none of them read the text that the driver was pre-release alpha, and was created to show that it could be done.

Then what?

S3 add their own geometry pipeline to the wrapper, so the overhead of retained mode is no more. GLQuake now gives a consistent enough framerate to be playable on ViRGE/DX's and above. Rejoice! Unfortunately there's a problem... this wrapper doesn't support bilinear filtering, since S3 decided that it was too much of a speed hit. I will now proceed to prove them wrong...

How?

Remember S3's first wrapper that I mentioned above? Well, it just happens to support bilinear filtering, unlike the one with the geometry pipeline. MS got their act together and rewrote Retained Mode for DX6, including 3DNow! support into the bargain as well. Which makes it a rather nice partner for my K6-2/300...

What sort of speed?

I'll let you see the results for yourselves. Since these figures were initially intended to show what could be achieved with a fully completed wrapper, there are a number of things you should be aware of.

  1. VSync was disabled. This was done on the premise that any final driver would support triple buffering. If you want to know the VSync on scores, knock one or two fps off. It really doesn't have an awfully big affect on the ViRGE famly, unless you're talking about really low resolutions, where the ViRGE isn't 100% fill-limited, for a change...
  2. Lighting was disabled. This was done since it was determined that the only way to get decent speed out of the ViRGE at higher resolutions (512x384 upwards) was to ditch the lightmaps, emulate the OGL multitexture extension, and do vertex lighting. While this wouldn't look as good as lightmaps, it would give a massive speed boost. Quake1 would probably look quite good with vertex lighting, while Quake2 would suffer more, due to the rather more complex lightmaps, along with the dreaded src*dst blending which the ViRGE has no hope of doing, at a reasonable speed, anyway.
  3. Viewsize was 120 (i.e. no status bar), Weapon model was on, Entities were on. Sound was disabled. All other settings were as default.
  4. Spec. of the machine in question is a 64Mb K6-2/300 (running 90*3.5), with a pathetic Quantum Pioneer 2.1SG hard drive, and the S3 reference 4Mb GX2, MCLK'd at 102.61MHz (as far as it'll go without falling on its arse - yes, a whole 3MHz 'o/c'). Win95 4.00.950B was the OS underneath, with DirectX 6.00.0318 providing the hardware abstraction. Latest GX2 (May'98) Win95 drivers (CommandDMA on, CheckUV off), with all extraneous apps murdered, as necessary...
  5. DEMO1, DEMO2, DEMO3, and BIGASS1 all refer to TIMEDEMO results. START is a TIMEREFRESH that was taken at the start of a new game, representing a very complex scene - highly CPU intensive. PEAK is a TIMEREFRESH taken on E1M2, underwater near the start, at the right end corner of the secret passageway - this is one of the simplest areas in the game. Tests were performed at least two times, and a third if the first two results weren't the same. This doesn't apply to PEAK results below 512x384 since the scores fluctuate too much - a rough average was taken from 5 or so runs.
  6. Images may have had the colour brightened due to the darkness of the screenshots. This sort of modification could be done for the actual game with a utility like idGamma, which alters the palette information for the textures. The screenshots have not been modified in ANY OTHER WAY - either that, or I haven't done a very good job...) No really, I haven't. It would totally defeat the point of giving you PNGs to look at - every image is clickable to give a full-size PNG. Some of these files are quite big, so be warned!

Before I go any further, it's only right that I should point out that a plain 325 (i.e. ViRGE 'classic') is next to useless for GLQuake. This chipset not only incurs a speed hit with bilinear filtering (like the rest of the ViRGE family), but even just doing perspective correction. Now, while the Playstation can get away with similar limitations, the ViRGE can't - it has no dedicated triangle setup unit, and so relies on the main CPU to do all the work for it. To make this worse, the ViRGE family must have one of the heaviest setup times of all 3D accelerator chips. Combined with the poor fillrate, this makes the 325 a non-contender. Give it a P233MMX at the very least, and you might see some action at 320x240. But, then again, software rendering will probably be a better option for you.

S3 wrapper (pre-pipeline) results

Res. 320x240 400x300 512x384 640x480
Filt. Bilinear Point Bilinear Point Bilinear Point Bilinear Point
DEMO1 54.8 57.2 45.3 49.5 31.9 37.3 22.0 27.5
DEMO2 57.5 60.2 45.8 50.2 31.1 36.3 21.8 26.6
DEMO3 50.9 53.2 41.7 46.5 29.2 35.0 20.6 26.1
START 26.3 27.2 24.6 25.9 20.4 22.4 16.7 19.2
PEAK 124.9 163.9 77.6 109.3 47.5 69.6 30.5 46.2
BIGASS1 20.7 21.2 - - - - - -

Before the cynics among you start, yes, these numbers are frames per second, not seconds per frame. Under normal conditions, this setup can easily push in excess of 50fps @ 320x240. While this resolution currently looks ugly due to the ordered dither effect, a wrapper which implemented 24bpp rendering would look quite decent at this resolution. 640x480 w/bilinear is bordering on the edge of being playable, with a framerate fluctuating between 20-25fps, typically...

Other problems which plague the old S3 wrapper include that it doesn't support mip-mapping of any kind, and more importantly, no water transparency. Those of you who've tried out D3DRAP1x will be surprised at this omission from the S3 wrapper, but somewhere along the line of modification, the alpha blending on the water stopped working. This is the only real problem with the pre-pipeline wrapper, which is a real shame, since it's great in almost every other respect. As you can see from the BIGASS1 figure, it's still unsuitable for heavy deathmatching. This is hardly a surprise, since all the ViRGE family have no triangle setup. An even faster CPU would scale the score up accordingly.

Even though the wrapper doesn't support 24bpp rendering, it does support the rendering of the shaded spheres used around weapon explosions, which is a neat feature lacking from all the other MiniGL drivers. Some textures are lacking in detail (tall textures are scaled down), though, and this does spoil the look of the game, to a degree.

The Techland MiniGL

This driver was initially released to the public in early July. While not specifically designed for use with GLQuake and the like (it's for their game, Crime Cities), the programmers there have made an effort to make it work with GLQuake and Quake2. It's a native ViRGE driver in the sense that it uses the S3d Toolkit. Mip-mapping (though only per-polygon) support is included, and water transparency is also fully working. Now, onto the bad news. While native ports usually conjure up images of things running quickly - the Techland driver is not, I'm afraid.

Most of the blame can probably be attributed to the S3d Toolkit being outdated compared to the Win95 drivers, which now support reliable CommandDMA (see the Tweaking page), along with many other performance tweaks. Work is continuing on the Techland driver, though (unlike the S3 wrapper, which is unlikely ever to be updated), and the programmer is adding numerous features, such as support for tall textures, and 24bpp support. While the 24bpp support isn't very stable with the latest Win95 driver, it does work with GLQuake. The image quality of the Techland driver is generally very good - providing you don't turn on the lighting; the Z-buffer errors are pretty hideous. The screenshot below shows the quality of image of the beta driver at 24bpp, in a rare shot where the Z-buffer lighting errors aren't too bad... If you're wondering where the weapon is - you're not the only one; I certainly didn't turn it off. Weird...

Tests were performed at 320x240, and mm indicates mip-mapping was enabled.

Techland Results

Filt. Bilinear+mm Bilinear Point+mm Point
DEMO1 29.3 27.0 35.3 32.1
DEMO2 30.6 27.8 37.2 33.2
DEMO3 27.2 24.9 33.0 29.7
START 15.4 14.0 16.5 14.8
PEAK 77.9 76.5 120.1 115.5

As you can see, mip-mapping is most definitely a good thing. Not only does it improve image quality, but it improves framerate as well by a substantial amount. The only thing is that these framerates aren't all that great anyway - with the Techland driver, expect around 15fps @ 512x384 on a setup similar to mine under the aformentioned conditions. On a powerful P-II system (say, 400MHz), this driver would probably be very good @ 512x384, but for most people with ViRGE family boards - it isn't really a playable option. In my opinion, Techland would do best to concentrate on doing a DirectX6 DrawPrimitive version of their wrapper - they may well find it to be faster than a number of their specific ports. A DX6 IM version could take advantage of CommandDMA, and as a result would probably be around 30-40% faster.

Quake2 will run on the Techland driver, so I gave the beta a go. The public release driver has a switch to emulate Permedia 2, which uses the software's black and white lightmaps, but this looks so dull that I tried switching it off. And to my surprise, the beta did alpha blending! Here's what it looked like...

Yeah, I know - it isn't meant to look anything like that, but is coloured lighting of a sort. The type of lighting that Quake2 uses (multiplicative) would be extremely difficult to emulate accurately on the ViRGE, but a better approximation than the above (which just uses the lightmaps unprocessed) would indeed be possible. More on this later...

S3MESA, anyone?`

Yeah, we finally got 'round to compiling it. MESA is a clone of OpenGL - a full ICD. S3MESA... well... is a bit of a mess, to put it bluntly. MESA v2.6 itself can't even run Quake2 properly, so there wasn't much hope of S3MESA (originally written for v2.5) to do any better. It didn't... :-/

Quake2 is 100% unplayable with unmodified S3MESA. I challenge anyone to finish Quake2 without any walls or floors to navigate by, not to mention the complete lack of textures. Come to think of it, someone will probably attempt it now - there's no prize, honest!

The situation with GLQuake is somewhat better. The lighting works to a degree (very much like the Techland driver), as does water transparency. But there's no mip-mapping, and there's alpha problems galore (check out that lovely shade of pink... :), combined with a bizarre lack of perspective correction (it used to be okay?!) which makes an 'interior' game like Quake look a bit swirly...

S3MESA was undergoing some 'development', in the shape of our S3MESA mailing list, where we exchange ideas on implementing features and fixing bugs. Unfortunately, nothing really ever got done. This might have been because we thought we were never going to get close to the performance of the S3 wrapper (since we wouldn't have CommandDMA - S3MESA uses the S3d Toolkit), though its performance with untextured polys actually beats the Techland MiniGL. In fact, TIMEDEMO DEMO1 in 400x300 @ 24bpp w/bilinear brings in a score of 18.6fps, which is similar to what the Techland driver achieves. So, it's not too bad and it would probably be worthwhile fixing the perspective correction bug, along with some of the other glitches - but someone needs to get off their arse and do it.

S3MESA's compatibility with other applications is limited. While 24bpp rendering works for some applications, it won't work for others. Windowed mode is hit and miss, and none of the OpenGL screensavers work. If you grab a couple of SGI's most basic OpenGL examples, they'll probably work in some shape or form.

Hacked i740 wrapper

It's out there, if you can find it... But don't bother - it has all the same problems that the original MS D3D wrapper did on the ViRGE family. The only thing that it has going for it over the S3 wrapper is that it actually works under Quake2, and it's considerably faster than the Techland one as well. So why don't I recommend tracking it down? The reasons are quite simple - it doesn't support lighting in any shape or form on the ViRGE family (you could try hacking the Permedia2 ID into it, I guess), there's no 24bpp support, half the textures are missing, and the software renderer will easily surpass it on performance. If you were really cunning you might be able to reverse engineer the wrapper, and find the code that specifies src*dst alpha blending, changing it to plain alpha - but I really don't think it's worth the time...

The NT route

Yes, you can play GLQuake (and indeed Quake2) under the NT4 Mini Client Driver. Just make sure you've got the latest drivers, plus at least Service Pack 3, and be prepared for some very tasty looking graphics. The problem is that you'll be making cups of tea between frames. Okay, a slight exaggeration - it's not that bad, but I'll say that for the screenshot I took for this page, it was moving at around 3-5fps or so (due to me being a lazy git and using r_novis 1 rather than recompiling the levels to get transparent water, it actually went lower - best to recompile them, and use gl_flashblend 0). I would suspect that the main reason for this would be the software emulation of particular features - namely the additive alpha blending. It seems pretty obvious to me that this is being done in software, since every time a shaded ball is visible, the frame rate plummets through the floor.

This is probably typical of most NT4 MCDs for chipsets with missing features. The emphasis here is on image quality, not speed - and it certainly shows here. Needless to say, I was running at 24bpp, the water transparency was excellent, the per-pixel mip-mapping sublime, as was the lighting and everything else, but at that sort of framerate, the only thing you're going to be doing is getting killed. See the screenshot below. Apart from some coloured dots (this appears to be a mip-mapping problem that only occurs at 24bpp, not 16bpp), the image is nigh-on perfect - certainly far better than anything my old Voodoo1 could've achieved.

The Quake2 image quality is great, apart from the lack of coloured lighting - one thing that the MCD neglected to emulate, unfortunately. Even allowing for that and the strange dots (I used r_fullbright 1), the image quality was extremely good.

The Future

Well, Quake on the ViRGE family isn't going to advance dramatically, since the chipsets have finally had their day. You can now pick up a 4Mb AGP GX2 with TV output for 27GBP, while I've seen 8Mb SDRAM i740's as going for as little as 39GBP. Even a branded 4Mb Permedia 2 in Dixons (of all places) is a mere 50GBP. I can't see S3 doing any more driver revisions unless a serious bug is uncovered somewhere.

So this leaves us with the S3MESA and the Techland driver. In DX6 DrawPrimitive form, the Techland driver could offer performance close to the S3 wrapper, combined with image quality almost as good as the NT4 MCD. As for lighting, we on the S3MESA list came up with a couple of ideas...

There's two main ones - as mentioned above, the fastest solution would be to utilise vertex lighting (by looking up the colours on the lightmaps for each point of the triangles drawn - the SGI multitexture extension would need to be emulated). This technique was utilised effectively on Forsaken, and could probably be applied successfully to GLQuake, since the lighting used is not particularly detailed for most cases. You can think of vertex lighting essentially as drawing an alpha-blended, gouraud-shaded triangle over the top of a textured polygon - that can be done on the ViRGE family with very little overhead.

So, vertex lighting sounds great. But, there's a problem. Quake2 uses much bigger, more detailed lightmaps to achieve a wide range of spot effects, which would be lost in the use of vertex lighting. We arrive back at using lightmaps again. But, as the Techland driver shows, the ViRGE can't do multiplicative blending. It can't even do additive (darken) blending, which poses a significant problem.

There are two ways around this, both similar, achieving the same thing by different means. First, here's the two components that we're dealing with...

On the left is the textured scene, and on the right is the lightmap scene. These two are blended together to produce the final result. Now, with multiplicative alpha you should end up with something like the image below on the left, but since the ViRGE only supports plain alpha blending, you get something resembling the image on the right...

As you can see, this bears little resemblance at all to how the lighting should look. Since multiple blends are not feasible (drawing the lightmaps just once is slowing it down enough already), we struck on the idea of modifying the lightmaps to give a rough rendition of multiplicative alpha. And when I say rough, I mean rough! The effect achieveable is certainly much better than alpha blending unprocessed lightmaps, as you'll see. Essentially, the lightmaps need to be super-saturated, so the colours are far more vibrant than they would normally be. By upping the contrast and lowering the brightness, the end result isn't bad at all.

There are two ways of doing this (as usual) - a slow way which is much easier to code, and a fast way which would be a nightmare to do. The slow way simply involves doing the saturation and contrast calculations on the textures as they are loaded into video memory. Some elaborate caching system would need to be implemented in order to keep the speed reasonable, but with MMX optimisation, these calculations might be able to work successfully in realtime.

The other way is something a bit cunning. All of the ViRGE family have something called a Streams processor, which has the job of blending multiple framebuffers into the final image that gets sent to your monitor. If you're using such a board, you're probably using two of these already - one for the GUI framebuffer, and one for the hardware cursor. A third overlay is available for use for video applications, but we have a far more cunning use for it...

The Streams Processor can blend in a multitude of ways - two of the notable ones are chromakeying and alpha blending. What's interesting about the Streams Processor is that it can apply colour correction to this third overlay on the fly, without any CPU overhead - we can alter the brightness, the contrast, the saturation, even the hue, all in realtime. So the plan was to have two seperate framebuffers for a GL driver - one for the standard textured view, and one for the lightmap textures. The Streams Processor would then blend the lightmap buffer onto the textured buffer, in realtime. The beauty of this method would be that it imposes no real additional CPU overhead, and will produce much better quality than the software processed lightmaps. We could even use the scaling feature so we could draw the lightmap buffer at half-size the texture buffer, and blow it up to blend over the top. This would make things even faster, at a slight loss of image quality.

Bad thing is, that to have colour controls available, this third overlay has to be of YUV format. To my knowledge, the ViRGE can only do 3D acceleration to an RGB buffer, which stuffs that idea to a certain degree. Realtime conversion to YUV, or direct software rendering (could probably get away with quarter size lightmap buffer) might be possible options, but the programmer who attempted to go down this route would need a lot of time and patience on their hands.

Onto the mocked up shots you've been waiting for - these are two examples of what Quake2 could be made to look like with super-saturated, high contrast lightmaps...

Da daa! Both of these screenshots should be possible to be generated by one single pass of alpha blending, since that's all that I used when generating them. The lightmap blended into the one on the left uses toned down brightness as well as the higher contrast and saturation of the image on the left. The one on the right would probably work better in outdoor areas, but the left image is rather more dramatic. Looks a bit like a console game, if I do say so myself...

Stuff which I forgot to mention... (29/08/98)

As is always the case with these things, I forgot to mention something which I should have. Oops. :) It's not a big deal, but it's regarding the poor triangle setup rate on the ViRGE. I won't divulge exact figures, but I will say that a fully textured, lit, polygon takes thousands of cycles of CPU time to setup. The something in particular was the question of the 3DNow! instruction set and the triangle setup time. An estimate from a source at S3 reckons the speed of the triangle setup would triple or so, with 3DNow! optimisations. Now, you can imagine what that would do for all games, let alone just GLQuake/Quake2. Multiplayer deathmatch would probably be very playable @ 512x384 with bilinear filtering, and all polygon intensive games such as Formula 1, and G-Police could double in speed. There's one small thing missing here; that is - it's not going to happen. S3 have effectively ceased development on the ViRGE family drivers, so this amazing speed boost will be denied to us.

Bit of a shame, really, yes? Of course, it could still happen. The S3d Toolkit source is available on our site, and if someone with the right skills spent a couple of months on the job, they could probably double the speed of the Techland MiniGL at the very least. Yep, that would be perfectly possible. According to the profiling done at Techland, more than half the CPU's game runtime can be spent in the S3d Toolkit, just drawing the triangles. See the balance problem there?

Don't get your hopes up, though. The only people with ViRGE family boards and K6-2's are prospective upgraders (like me...), so while it'd be a great project to do for fun (or a uni. project - hmm... nice idea...:-), it'd require someone with an awful lot of programming talent who was prepared to spend time rewriting already highly-optimised code. Are there any '98 C64 democoders out there with some free time?

The End

Well, that's about it. This is probably the longest article I've ever written for the web, and the longest I hope it'll remain! Until my Savage3D arrives, that is...

Links

S3MESA 2.6 v0.3 [220k] - a bit broken, but give it a go and see how it works for you.
S3 Pre-pipeline wrapper [36k] - don't try this without DX6, the speed will be dreadful. All K6-2 owners should check it out...
S3 Wrapper w/geometry pipeline [31k] - for people still using DX5 and who don't mind not having bilinear filtering.


Mirrored from http://s3.dimension3d.com/quakevirge.htm, links have been fixed with local mirrors where possible.