April 28th, 2011

dEngine: iOS 3D renderer source code

I've decided to release the source code of the OpenGS ES 1.0/2.0 renderers I wrote in the summer of 2009 nicknamed "dEngine". It was the first renderer to feature Shadow Mapping and Bump Mapping on iPhone at the time. Note that shadow mapping was achieved by packing the depth informations in a color texture but now you have access to GL_OES_depth_texture so you should be able to gain some speed.

The overall code quality is far from exceptionnal (support for two object types WTF ?!?)but I consider it a good tutorial for OpenGL ES 2.0, you can read about bump-mapping and shadow-mapping with a fun example from a Doom 3 level.

The OpenGL ES 2.0 renderer feature uber-shaders: Depending on the material currently renderered a shader is recompiled on the fly in order to avoid branching.


Screenshot Source Code on GitHub
Browse the source code on gitHub :

Get it from GIT command line :

   git clone https://github.com/fabiensanglard/dEngine.git


A few videos

The following videos show characters from Doom3 that I used to test the engine, the HellKnight is 2200 poly, the rest of the room visible is 1000. The materials all feature a diffuse map, a normal map and a specular map ( up to 512x512 ). The shadow is generated via a shadowmap ( because render to depth texture is not supported on iPhone (GL_OES_depth_texture), depth value are packed in a R8G8B8A8 color texture twice the size of the screen).

iPHone 3GS programmable pipeline, running at 27fps.
iPhone 2G/3G fixed pipeline, running at 45fps.


The rendering path is abstracted via a C struct containing function pointers a la Quake 2.

	typedef struc renderer_t
		uchar type;
		void (*Set3D)(void); 
		void (*StartRendition  )(void); 
		void (*StopRendition  )(void); 
		void (*SetTexture)(unsigned int);
		void (*RenderEntities)(void);
		void (*UpLoadTextureToGpu)(texture_t* texture);
		void (*Set2D)(void);
	} renderer_t

	//	renderer_fixed.h
	void initFixedRenderer(renderer_t* renderer);

	//   renderer_progr.h
	void initProgrRenderer(renderer_t* renderer);


The "implementation" of every function is hidden in the .c of each renderer, initFixedRenderer and initProgrRenderer only expose the function address via the pointer.

A few optimizations...

Texture compression is a big win as a 32bits per texel RGBA textures is a pig with no real reason to exist when working with a small display. OpenGS ES 1.1 and 2.0 do not require the GPU to support any texture compression but the good guys at Imagination Technologies provided support for PVRTC which bring down consumption to as low at 2bits per pixel with alpha support !

Vertex metadatas can be slimmed down as well:

A "regular" vertex is:

	Vertex Elementary unit:

	position   	3 floats
	normal		3 floats
	tangent		3 floats
	textureCoo	2 floats
	            44 bytes

By packing the components in "shorts" instead of "floats" via normalization, you end up having:

	Vertex Elementary unit:

	position   	3 floats
	normal		3 shorts
	tangent		3 shorts
	textureCoo	2 shorts
	            28 bytes

It's almost like we "compress" the data on the CPU, send it to the GPU where they are "decompressed". Abusing normalization divide bandwidth consumption by almost 50% and help to slightly improve performances.

Compiler tuning is also important. Xcode is setup by default to generate ARM binaries using the Thumb instruction set, which is 16 bits instead of 32 bits. This reduce the size of the binary and the cost for Apple but it's bad for 3D as Thumb instruction have to be translated to 32bits. Uncheck this option for an instant gain of performances.

Framebuffer refresh can also be improved a lot with 3.1 firmware. This is an issue I mentionned in my article about Wolfenstein for iPhone: NSTimer is an abomination and I was trilled to find we can now use CADisplayLink to perform vsync and get adaptative framerate ( although I'm experimenting some nasty touchesMoved on non 2G v3.X devices, if you have any info about this, email me !).

Reduze Framebuffer colorspace is an other way to improve performances by reducing the amount of written data. Move from 24bits color space to 16 bits provides some good improvements.

    CAEAGLLayer *eaglLayer = (CAEAGLLayer *)self.layer;
    eaglLayer.opaque = YES;
    eaglLayer.drawableProperties = [NSDictionary dictionaryWithObjectsAndKeys:
        [NSNumber numberWithBool:YES], 

        kEAGLDrawablePropertyColorFormat, nil];

Stating the obvious here, but reduce texture & blending mode switches are very important ( Forget about good perf if you do more than 60 textures changes). The material approach of the engine can very handy in this regard.

Reduce blending of your polygons is PARAMOUNT: PowerVR performs TBDR (tile-based deferred rendering) which mean that one pixel is rendered only once via hidden surface removal, blending is defeating the purpose. My take is that a blended polygon is rendere regardless of the culling outcome and it destroys perfs.

And last but not least, optimize the vertice indices so GPU fetches will hit the cache as much as possible.

Uber shader

Depending on the materials properties used in a scene , the shader is re-compiled at runtime and then cached. This approach allow to reduce branching operation in the shader. I was very pleased with the result, if I stay below 10/15 shader switches per frame there is no significant performance drop.

    //snipet of the fragment shader

    #ifdef BUMP_MAPPING
        bump		=  texture2D(s_bumpMap, v_texcoord).rgb * 2.0 - 1.0;
        lamberFactor  =  max(0.0,dot(lightVec, bump) );
        specularFactor = max(0.0,pow(dot(halfVec,bump),materialShininess)) ;
        lamberFactor  =  max(0.0,dot(lightVec, v_normal) );
        specularFactor = max(0.0,pow(dot(halfVec,v_normal),materialShininess)) ;
    #ifdef SPEC_MAPPING
        vec3 matTextColor = texture2D(s_specularMap, v_texcoord).rgb; 
        vec3 matTextColor = matColorSpecular;

The now obsolete depth packing into a color buffer.

I love shadows effects, I think the realism and ambiance you get totally justify the cycles and bandwidth cost. It doesn't come for free in openGL and it's quite ugly to do with the fixed pipeline but I was trilled to have it working on mobile shaders. Unfortunatly as of today, iPhones don't support GL_OES_depth_texture, which mean you cannot render directly to the a depth texture. The workaround is to pack a 32 floating point value into 4x4 bytes color (RGBA) texture:

	// This is the shadowmap generator shader

	const vec4 packFactors = vec4( 256.0 * 256.0 * 256.0,256.0 * 256.0,256.0,1.0);
	const vec4 bitMask     = vec4(0.0,1.0/256.0,1.0/256.0,1.0/256.0);

	void main(void)
		float normalizedDistance  = position.z / position.w;
		normalizedDistance = (normalizedDistance + 1.0) / 2.0;

		vec4 packedValue = vec4(fract(packFactors*normalizedDistance));
		packedValue -= packedValue.xxyz * bitMask;

		gl_FragColor  = packedValue;


This method to pack float in bytes is pretty clever (not mine) because it accounts for the internal accuracy of any GPU ( via the substraction line) and hence can be used on any kind of GPU (PowerVR,ATI,NVidia). Gratz to however came up with this.
Note: Since the publication of this article, iOS devices have added support for GL_OES_depth_texture.



Fabien Sanglard @2011