Stories for Outcasts

Bare-bones WebGL and the math behind 3D graphics

Thursday, March 27, 2014

This introduction is designed for programmers who are wholly unfamiliar with WebGL and 3D graphics. It will take you through the basics of feeding vertex and color data to a WebGL buffer, the (surprisingly simple) math behind 4x4 translation matrices, and more complex topics like homogenous coordinates, orthographic versus perspective projections, Euler angles, gimbal lock, quaternions, and the like.

The official spec for the WebGL API can be found at The khronos.org website, and while it contains good information for the end programmer, particularly in the "non-normative" and example sections, it reads mainly like a guide for browser authors to implement the API itself, and is not an ideal starting point.

The net is also littered with tutorials for the budding WebGL programmer, but the ones I found initially had dependent third party libraries as an integral part, such as glMatrix, a good tool, but one that presumes a fair amount of familiarity with 3D rendering concepts. Use of external libraries in beginner-level tutorials I found muddied the waters, and it took me some effort to sort out the essential basics from the external libraries and design framework.

After I extracted the most basic path to draw some simple shapes, I decided to make that the basis for a comprehensive tutorial on WebGL concepts.

Lesson 1: Draw simple shapes

Lesson 1

The above page is one of a set of 10 live demos of WebGL. It draws a triangle and square on a black background, similar to the first example on the learningwebgl.com site. The code is thoroughly commented, and if you're like me you'll dive right into it and come back here later. Have at it!

If you're back, or if you stayed, let's go over some of the basic concepts.

What are the dimensions of the WebGL rendering area?

2 by 2 by 2, in the range -1 through 1. If the final X, Y, or Z of a pixel falls outside of that range, it won't be displayed. The X and Y are scaled to the size of the canvas (viewport actually, there are some subtleties between the two when you resize the canvas which aren't important for getting started), with [0,0] being the center, and the directions being the same as standard Cartesian coordinates (e.g., Y increases as you move up, in contrast to Canvas 2D and most other computer rendering models where [0,0] is the upper-left corner and Y increases as you move down).

What are we ultimately trying to do?

Draw a three dimensional scene on a flat canvas.

It's important to keep this in mind, especially when you're first starting out and things don't appear to have the depth you expect them to. Depth is an illusion; we're only displaying things on a flat screen. The final Z of a pixel is, depending on what rendering mode you're in, either completely irrelevant, or used strictly for drawing order.

If you want to draw a scene with perspective (where a far away object is smaller and closer to the center of your field of view), or to rotate a scene on the X axis, say, towards you a little, so you can see the top and front in the same view, then you'll need to understand the maths behind translation matrices, which I'll get to below. Let's take some baby steps first.

Create a WebGL context, set a background color.

var gl = canvas.getContext('webgl');

This is the official starting off point for rendering with WebGL. An HTML5 canvas element is queried with a context ID of "webgl", and returns a "rendering context", which transfers data between an OpenGL ES (embedded systems) engine and JavaScript. Early browser support for this was done under a different context ID, "experimental-webgl", leading to this common idiom today:

var gl  = canvas.getContext('webgl') || canvas.getContext('experimental-webgl');
if (!gl) { alert("WebGL not available"); return; }

Once your context has been created, it is ready to compile shaders, create buffers for JavaScript to feed vertex and color info to, etc. Before any real drawing happens, the simple step of clearing the canvas must take place. This happens in two steps:

gl.clearColor(0, 0, 0, 1);
gl.clear(gl.COLOR_BUFFER_BIT);

Setting clearColor (the method takes 4 float variables between 0 and 1, representing red, green, blue, and alpha) can be done independently of all other functions, and maintains its state between draw calls (you can call clearColor once immediately after the WebGL context is created, and never again until you want to change the background color).

The second step, gl.clear(), is used for clearing the color, depth, and stencil buffers. It receives an integer composed of various constant values OR-ed together, depending on what exactly you're clearing. These include the values for COLOR_BUFFER_BIT, DEPTH_BUFFER_BIT, and STENCIL_BUFFER_BIT. What's going on behind the scenes when this is called is involved, but essentially the color buffer is initialized to what's in clearColor, and when the function containing clear() returns, a "compositing operation" is performed. If no draw operations happen between clear() being called and the containing function returning, then the canvas is cleared.

After a compositing operation is called (this is important), the color buffer resets to transparent black, ignoring the value stored in clearColor. This means that the next time a draw operation is called, the canvas area not covered by the draw operation will show the page body's background color. If, as in the lesson1 example above, the fragment shader (I'll talk about what that means shortly) doesn't set a color, then all the objects being rendered will also be in the page body's background color. All the draw operations will still happen, but the canvas will appear to be blank. To prevent this, call gl.clear() in the same function as the draw operation. There are alternatives to that, such as creating a WebGL context with the "preserveDrawingBuffer" attribute enabled, which has the ancillary effect of not clearing the cavas between draws. This can be handy, but in general, an explicit call to gl.clear() before draw operations is preferred.

If that doesn't make any sense, don't worry; just remember to always call clear before drawArrays or drawElements, and doing so in the same function call is safest.

What are shaders?

Good question. "Shader" is a term used differently by different sources. It can mean "something that uses light and color to indicate depth", or it can refer explicitly to a 3D graphics program meant to run on a GPU, with different sub-programs running at different rendering stages.

For the purpose of WebGL, there are two types of shaders: vertex and fragment. The vertex shader concerns itself with translating 3D locations to screen coordinates, applying translation matrices, and passing color variables to the fragment shader. The fragment shader (this is a gross simplification) is responsible for assigning color and depth order to each of an object's rasterized pixels (a fragment is, in essence, a pixel, but scaling and the differences between "canvas" and "viewport" mean fragments and pixels don't necessarily have a 1 to 1 relationship).

For the lesson1 page above, we have a vertex shader that does no translation, and we just feed it a handful of triangle vertices. The fragment shader, however, is empty, and will follow it's default behavior: rendering object pixels in transparent black. Here is the source code for those:

Vertex shader

attribute vec2 vertexCoord;     /* x and y coordinates of current vertex */
void main(void) {
    /* Set vertex position with above variable, at 0 depth, normal scale */
    gl_Position = vec4(vertexCoord, 0, 1);
}

The vertex shader above receives coordinates from JavaScript (from a buffer that JavaScript binds to, actually, but that's splitting hairs) in basic Cartesian X/Y coordinates, and tacks on a Z value of 0 so everything is on the same plane, and a W of 1, making a 4-element vector (vec4). What's W? The "homogeneous coordinate". I'll talk about what that means in the section on orthographic vs. perspective transformation.

Fragment shader

void main(void){ }

Yep. The fragment shader in the first example does nothing. The pixel color will end up being transparent black, which means the HTML page background color will show. Where the vertex shader has a "gl_Position" variable to declare position, the fragment shader has a variable gl_FragColor, a vec4 of red, green, blue, and alpha floats, ranging from 0 to 1. If I wanted to explicitly set a color of opaque white for each object pixel, I could do this:

void main(void){
    gl_FragColor = vec4(1, 1, 1, 1);
}

The later exercises will show how to do a little more with shaders, but nothing too far off from these basic concepts. A stumbling block to get over is that the shader source code isn't JavaScript, it's OpenGL ES. The source code needs to be compiled and linked by the WebGL context object, so the source code needs to be fed to the compiler as a string, and still be readable (i.e., multi-line).

Some developers get past this obstacle by declaring HTML script tags with a custom type so the JavaScript parser will ignore them (e.g. <script type="x-shader/x-vertex">), and then pulling one line of that at a time from the DOM and concatenating it to a JavaScript string, and then passing that string to the compiler: Messy in small examples, but more sensible for larger programs. This method leads to being able to clearly lay out the shader source code as part of the HTML and not worry about escaping newlines and comments, but it gives the impression that your browser intrinsically knows what to do with it, which it doesn't. Yet.

Alternately, the source can be fed inline, as long as you properly escape any newlines or OpenGL comments. My lesson1 page above shows how to both of these correctly.

Let's draw some stuff

Before you completely lose interest, let's go over what's needed to draw a simple triangle. I recommend you go over the source code of the lesson1 example above, which has detailed comments. Let's assume we have the OpenGL ES shader source code above in the strings vert_source and frag_source, and that the WebGL context "gl" has already been created. Here, then, is everything needed to paint a simple triangle:

v_shader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(v_shader, vert_source);
gl.compileShader(v_shader);

f_shader = gl.createShader(gl.FRAGMENT_SHADER);
gl.shaderSource(f_shader, frag_source);
gl.compileShader(f_shader);

shader = gl.createProgram();
gl.attachShader(shader, v_shader);
gl.attachShader(shader, f_shader);
gl.linkProgram(shader);
gl.useProgram(shader);

gl.bindBuffer(gl.ARRAY_BUFFER, gl.createBuffer());
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
     0.0,  0.5,
    -0.5, -0.5,
     0.5, -0.5
]), gl.STATIC_DRAW);

gl.enableVertexAttribArray(0);
gl.vertexAttribPointer(0, 2, gl.FLOAT, false, 0, 0);

gl.clearColor(0, 0, 0, 1);
gl.clear(gl.COLOR_BUFFER_BIT);
gl.drawArrays(gl.TRIANGLES, 0, 3);

Result:

So what just happened? In order, we:

If you're familiar with the 2D HTML5 canvas API, you'll see that this is substantially more complicated. I couldn't agree more. If you just want to render a simple 2D scene, WebGL is definitely not the way to go. It's real power comes later, with rendering many 3D objects, multplying translation matrices in parallel (the real power behind GPUs), and texture and light rendering. This comes with a bit of a learning curve, though.

Lesson 2: Color and depth

Lesson 2

The lesson2 page draws a red square with a Z of 1, putting at the very back of the rendering space, and a blue diamond with a Z of 0, putting it, theoretically in front of the square. There are three buttons which change the order in which the objects are rendered, one of which deliberately renders them out of order to illustrate the need to explicly turn depth checking on.

We'll get to depth issues shortly. Let's cover colors first.

Fragment color is calculated from surrounding vertex colors

If you want an object to be a solid color, each of its vertices must be defined to be that color. If vertices are different colors, the object is rendered as a gradient between the vertex colors. Since most rendering in WebGL is done with triangles, this can lead to some neat effects, such as this simple color "wheel", where the triangle vertices are red, green, and blue:

Now let's see how to get colors assigned to our vertices.

OpenGL variable types

The OpenGL ES 2.0 spec states that there are 4 basic types of shader variables:

Vertex attributes are the per-vertex values specified in (...). Uniforms are per-program variables that are constant during program execution. Samplers are a special form of uniform used for texturing. Varying variables hold the results of vertex shader execution that are used later in the pipeline.

Vertex attributes are declared in the vertex shader with "attribute", e.g.:

attribute vec4 a_color;

Data is fed to them through buffers, and they are used to hold vector or matrix data, suitable for coordinates, colors, or translation matrices. However, they are only accessible in vertex shaders, which don't set color.

Varing variables are declared with "varing", e.g.:

varying   vec4 v_color;

They can share data between a vertex and fragment shaders. So to have JavaScript set a color, you buffer it and feed it to an attribute, have the vertex shader read the color and set it to a varying variable, and have the fragment shader read it from there. The fragment shader can then use the varying variable to set gl_FragColor.

Adding color, simple versus cheap

There are two basic ways to feed color data to a shader. First, after the vertex buffer has been fed to the shader, you can bind a new buffer with color data and feed that to the shader next. For rendering simple scenes, this is fine and is easier to grok when reading the code.

However, binding buffers is expensive. If you are rendering many objects, this will slow your program down, possibly making animation happen at a lower FPS. The alternative is to interlace the vertex and color data, and use calls to gl.vertexAttribPointer to explain to the shader the size of the "stride", and what offsets in the stride are for what attribute. This is generally referred to as creating a "mesh".

Let's go over both methods. The following shaders will be used:

// Vertex
attribute vec2 vertexCoord;
attribute vec3 a_color;
varying   vec3 v_color;
void main(void) {
    gl_Position = vec4(vertexCoord, 0, 1);
    v_color = a_color;
}

// Fragment
precision mediump float;
varying   vec3 v_color;
void main(void){
    gl_FragColor = vec4(v_color, 1);
}

The vertex shader has an additional attribute, a 3 element vector a_color. After setting the vector position, the vertex shader sets the varying variable v_color to match the attribute a_color.

The fragment shader must now start out by setting what precision to use for floats. Why's that? Page 36 of the OpenGL ES Shading Language definition has this to say about the precisions for different variable types:

The vertex language has the following predeclared globally scoped default precision statements:

precision highp float;
precision highp int;
precision lowp sampler2D;
precision lowp samplerCube;

The fragment language has the following predeclared globally scoped default precision statements:

precision mediump int;
precision lowp sampler2D;
precision lowp samplerCube;

Simply put, the fragment shader has no default precision for floats. Even though it behaved well when we used constant values for colors before, attempting to use float variables without a declared precision will cause the shader program to not compile. So we can either use the global float precision declaration, as I have above, or explicitly set the precision of just the v_color variable with something like this:

varying mediump vec3 v_color;

Lastly, the fragment shader above sets gl_FragColor to the varying variable, taking on an alpha of 1 (fully opaque). Now let's look at the two methods to get color data to the shaders. Note that the shaders don't change, only the use of bindings and pointers from JavaScript.

The simple method

Bind a buffer for vertices, and another for colors. Note the use of "getAttribLocation" here. In the lesson1 page, there was only one attribute, which will always have an index of 0. With more than one attribute, you must seek each one's index, as some WebGL engines index them in the order they are declared in the source code (WebKit), and some alphabetize them first (Mozilla).

// Buffer vertices
gl.bindBuffer(gl.ARRAY_BUFFER, gl.createBuffer());
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
     0.0,  0.5,
    -0.5, -0.5,
     0.5, -0.5
]), gl.STATIC_DRAW);

var loc = gl.getAttribLocation(shader, "vertexCoord");
gl.enableVertexAttribArray(loc);
gl.vertexAttribPointer(loc, 2, gl.FLOAT, false, 0, 0);

// Buffer colors
gl.bindBuffer(gl.ARRAY_BUFFER, gl.createBuffer());

gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
     1.0,  0.0,  0.0,
     0.0,  1.0,  0.0,
     0.0,  0.0,  1.0
]), gl.STATIC_DRAW);

loc = gl.getAttribLocation(shader, "a_color");
gl.enableVertexAttribArray(loc);
gl.vertexAttribPointer(loc, 3, gl.FLOAT, false, 0, 0);

The cheap method

Bind one buffer, include X/Y and RGB for each vertex, and describe the array layout to the attribute pointers. Note that the last two variables passed to vertexAttribPointer are explicitly in bytes, so they must be calculated based on how many bytes are in a Float32 (since that's the datatype of the array passed to the buffer), hence the business with "BYTES_PER_ELEMENT" below.

gl.bindBuffer(gl.ARRAY_BUFFER, gl.createBuffer());
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
//  |--coords--|------colors------|
//    X     Y     R     G     B
     0.0,  0.5,  1.0,  0.0,  0.0,
    -0.5, -0.5,  0.0,  1.0,  0.0,
     0.5, -0.5,  0.0,  0.0,  1.0
]), gl.STATIC_DRAW);

var stride      = 5 * Float32Array.BYTES_PER_ELEMENT;
var colorOffset = 2 * Float32Array.BYTES_PER_ELEMENT;

var loc = gl.getAttribLocation(shader, "vertexCoord");
gl.enableVertexAttribArray(loc);
gl.vertexAttribPointer(loc, 2, gl.FLOAT, false, stride, 0);

loc = gl.getAttribLocation(shader, "a_color");
gl.enableVertexAttribArray(loc);
gl.vertexAttribPointer(loc, 3, gl.FLOAT, false, stride, colorOffset);

drawElements versus drawArrays

After the vertices and colors are buffered, the lesson2 page renders the scene using this technique:

var indices = [
    0, 1, 2,
    1, 2, 3,

    4, 5, 6,
    5, 6, 7
];
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, gl.createBuffer());
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, new Uint16Array(indices), gl.STATIC_DRAW);

gl.clear(gl.COLOR_BUFFER_BIT);
gl.drawElements(gl.TRIANGLES, indices.length, gl.UNSIGNED_SHORT, 0);

The drawElements method uses index numbers of buffered vertices rather than rending the raw vertex points in the order they were buffered. This is useful if, as in the lesson2 example, you are rendering the same vertices in different orders based on user input, and it is also useful if you are rendering a 3D object (say, a cube), and wish to share vertex points along separate faces.

Checking for depth

Depth checking is off by default, in which case fragments are rendered in the order the are processed by the shader, with no regard for the point's Z value (so long as the value is between -1 and 1, that is). Depth checking can be turned on with the following:

gl.enable(gl.DEPTH_TEST);

And it is turned off by passing the same value to gl.disable. When depth checking is turned on, a fragment is only rendered if nothing is in front of it... or at least, that's the idea. In reality, the fragment is rendered if it passes whichever depth test is declared by the gl.depthFunc method. The depth buffer is initialized to all 1s, and by default, a fragment is rendered if its Z value is less than the corresponding point in the depth buffer, in which case the fragment's Z value overwrites what is in the depth buffer for that point.

Since each point starts out as 1, if depth checking is enabled, by default nothing is rendered with a Z value of exactly 1. Since my lesson2 page has the red square vertices at Z=1, I had to update the depth checking function to compare with less than or equal with this call:

gl.depthFunc(gl.LEQUAL);

Lesson 3: Rotating a cube

Lesson 3

Translation matrices

A translation matrix in WebGL is literally a 4 by 4 matrix from linear algebra. It is multiplied against 3D coordinates to produce new, translated coordinates, for the purpose of reposition, rotating, or scaling objects. In case your memories of Math Analysis class have succumbed to the ravages of time, here is a quick refresher from my old "Medicine for the Sky" blog on how matrix algebra works, along with some 2D rendering examples.

A non-trivial WebGL program will likely have multiple translation matrices - one for positioning an object in the scene, one (or several) for rotation, and one for positioning the "camera"... except there really isn't a camera, so that would just be a matrix applied to all objects the same way (the camera is moving up, so all objects must move down, for instance).

In the WebGL model, translation matrices use the following formulae:

Something that might jump out at you is that the matrix is column-major, meaning as values are read into the matrix, they fill it up one column at a time instead of one row at a time. This can cause some confusion, since to buffer data into a matrix shader variable, it starts out in JavaScript as a flat array, typically declared like this:

var matrix = [
    a, b, c, d,
    e, f, g, h,
    i, j, k, l,
    m, n, o, p
];

...in row-major order, diagonally transposed from the OpenGL spec for a matrix. This is such a common pain point that many matrix libraries meant for use with WebGL contain a function to swap array values between column and row major layouts.

There are four basic types of values in a translation matrix: Scaling, shearing, offsets, and homogenous modifiers. Offsets [m, n, and o] are the simplest, they add a fixed value to an axis. For instance, to move something to the right, make m a positive value.

Scaling values (a, f, and k) multiply the value of an axis, making it bigger or smaller in that direction. Setting f to 2, for example, gives us an equation of y' = 2y. Shearing values (b, c, e, g, i, and j) are similar. They modify their axis based on the value of another axis. If a is 1 and e is 1, we have the equation _x' = x + y, creating a 45 degree shear on the x axis.

Rotations happen from precise combinations of shears, typically using sine and cosine values of the angle you want to rotate the object to. A good primer for the specific rotation equations for each axis is the "In three dimensions" section of the Wikipedia page on rotation matrices, which shows these rotation rules:

For the lesson3 example, I've taken a unit cube with different colored faces, twisted it to the right 45 degrees, and then tipped in forward 45 degrees, which makes the cube appear to be resting on a point, as shown in the image above.

To accomplish this, I rotated along the Y axis (the vertical axis), which is the same motion as twisting the top or bottom face of a Rubik's cube. Then I rotated along the X axis (the horizontal axis), which is the same motion as twisting a Rubik's cube's left or right face.

Order is important

If I rotated the above cube in the opposite order, but the same amounts, I would get this:

The reason is pretty straightforward: Take a cube that's facing you and tip it forward. It is now on its bottom edge, rather than a corner. Now take the cube resting on its edge and rotate it to the right... and it is still resting on its edge.

The matrices I used are the following:

var a = Math.SQRT1_2; // Also the sine and cosine of 45 degrees

var xRotate = [
    1, 0, 0, 0,
    0, a,-a, 0,
    0, a, a, 0,
    0, 0, 0, 1
];

var yRotate = [
    a, 0,-a, 0,
    0, 1, 0, 0,
    a, 0, a, 0,
    0, 0, 0, 1
];

Luckily, JavaScript already has the square root of one-half as a constant, which matches both the sign and cosine of a 45 degree angle (a 1, 1, √2 triangle, whose adjacent and opposite angles are the same length), which means in this case the CPU won't have to do any real math. Plug the predefined constant into the rotation definitions from Wikipedia, and WebGL happily rotates the object as expected, with this vertex shader code, using nothing but GPU:

    uniform   mat4 x_rotate;
    uniform   mat4 y_rotate;
    attribute vec3 vertexCoord;
    attribute vec3 a_color;
    varying   vec3 v_color;
    void main(void) {
        gl_Position = x_rotate * y_rotate * vec4(vertexCoord, 1);
        v_color = a_color;
    }

Note that the matrix multiplication happens in the opposite order of the action you want. The original vertex positions are referenced last, and the action happens from right to left.

The rotation matrices are "uniform" variable types, meaning they do not change during a draw operation, and which are moderately expensive to set. Typically you set them as infrequently as possible. In lesson3's case, they are both set once, and then the vertices are fed in and the scene drawn:

xLoc = gl.getUniformLocation(shader, "x_rotate");
yLoc = gl.getUniformLocation(shader, "y_rotate");

gl.uniformMatrix4fv(xLoc, false, xRotate);
gl.uniformMatrix4fv(yLoc, false, yRotate);

gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
gl.drawElements(gl.TRIANGLES, indices.length, gl.UNSIGNED_SHORT, 0);

According to Euler's rotation theorem, multiple rotations in 3D space can be represented by a single rotation. This can easily be shown by multiplying our two matrices above together to get a single matrix, and plugging that into the code. Indeed, Lesson 3b shows the same rotation with a single matrix.

To further illustrate why order is important, here are the resultant rotation matrices when the originals are multiplied in both orders, with the help of our good friend Wolfram Alpha:

(X times Y)

(Y times X)

The X times Y matrix matches the single rotation matrix used in Lesson 3b (remember, the JavaScript declaration looks transposed, and since a is √.5, a2 will just be .5), and rotates the cube down to the same corner as lesson3 does:

var rotate = [
    a, -.5, -.5, 0,
    0,   a,  -a, 0,
    a,  .5,  .5, 0,
    0,   0,   0, 1
];

Representing rotations abstractly

Until now, we've thought about rotation just in regard to specific translation matrices multiplied together in a specific order. It may not be self-evident that more abstraction is needed until you see a consequence of not considering how to represent rotations. Let's take our cube and add in some user input to adjust the matrices.

Lesson 4 takes the same cube from lesson 3, and adds in support for mouse drag events to adjust the rotation. Horizontal drags rotate the cube across the Y axis, and vertical drags rotate across the X axis.

The vertex and index data is buffered as before, but setting the translation matrices and final rendering is now moved to its own function:

function draw() {
    var xs = Math.sin(xAngle);
    var xc = Math.cos(xAngle);
    var ys = Math.sin(yAngle);
    var yc = Math.cos(yAngle);

    var xRotate = [
        1,  0,  0, 0,
        0, xc, xs, 0,
        0,-xs, xc, 0,
        0,  0,  0, 1
    ];

    var yRotate = [
        yc, 0,-ys, 0,
         0, 1,  0, 0,
        ys, 0, yc, 0,
         0, 0,  0, 1
    ];

    xLoc = gl.getUniformLocation(shader, "x_rotate");
    yLoc = gl.getUniformLocation(shader, "y_rotate");

    gl.uniformMatrix4fv(xLoc, false, xRotate);
    gl.uniformMatrix4fv(yLoc, false, yRotate);

    // Clear color and depth buffers
    gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
    gl.drawElements(gl.TRIANGLES, numIndices, gl.UNSIGNED_SHORT, 0);
}

Note that I'm now explicitly using the sine/cosine math to determine matrix values. Now all we need is a mechanism to translate mouse drags to angles:

// Set mouse down and mouse up to both update lastX/Y with the current mouse position
window.onmousedown = window.onmouseup = function(e) {
    lastX = e.pageX;
    lastY = e.pageY;
}

// If a mouse button is down add a delta to the angles, update lastX/Y, and repaint.
window.onmousemove = function(e){
    var buttonDown = e.buttons == null ? e.which : e.buttons;
    if (!buttonDown) return;

    yAngle -= (e.pageX - lastX) / 256;
    lastX = e.pageX;

    xAngle -= (e.pageY - lastY) / 256;
    lastY = e.pageY;

    draw();
}

The e.which : e.buttons business is, I believe, a previously undiscovered simple mechanism for a cross-browser (as far as Mozilla and WebKit go, anyway) mouse buttonpress detector. There is no explicit animation going on here with calls to requestAnimationFrame or equivalent; I am simply triggering a redraw on each mouse move where a button was down.

First, open lesson 4 and play around with mouse drags to get a feel for how the rotation works. Note that up/down drags always work as expected, but pay attention to left/right drags when the cube has been previously tilted forward or backward to a new face.

Odd, right? When the cube is not tilted forward or backward, left and right drags move the cube left and right. If you drag the cube forward or backward 90 degrees, the left/right drags now perform a twisting rotation along the Z axis (the axis pointing out of the screen at you, representing depth); this is the same motion as rotating a Rubik's cube's front or back faces.

Dragging the cube an additional 90 degrees forward or back will make left/right drags now work in reverse: dragging to the right rotates the cube to the left.

Why is this? In essence, we've created a gimbal, based solely on multiplying our X and Y rotation matrices. The order of the multiplication determines which axis is a child, meaning its basis of orientation is the current angle of the parent axis.

If we throw in a third rotation axis, we can see an example of the phenomenon known as gimbal lock.

Lesson 4b behaves much the same as lesson 4, however dragging left/right while holding the right mouse button now spins the cube on the Z axis. In the new code, I've added a third rotation matrix into the vertex shader:

uniform   mat4 x_rotate;    /* Translation matrix for rotating along the X axis */ 
uniform   mat4 y_rotate;    /* Translation matrix for Y axis rotation*/ 
uniform   mat4 z_rotate;    /* Translation matrix for Z axis rotation*/ 
attribute vec3 vertexCoord; /* x, y and z coordinates of current vertex */ 
attribute vec3 a_color;     /* RGB of current vertex, sent interleaved with vertex coords */ 
varying   vec3 v_color;     /* Color sent to the fragment shader */ 
void main(void) { /* Set vertex position with above variable, at normal scale */ 
    gl_Position = x_rotate * y_rotate * z_rotate * vec4(vertexCoord, 1);
    v_color = a_color;
}

...a branch to check for whether the left or right mouse button is being used, and choosing the appropriate axis (as well as a quick function to disable the right-click menu):

window.onmousemove = function(e){
    var buttonDown = e.buttons == null ? e.which : e.buttons;
    if (!buttonDown) return;

    if (buttonDown > 1) {
        zAngle -= (e.pageX - lastX) / 256;
        lastX = e.pageX;
    } else {
        yAngle -= (e.pageX - lastX) / 256;
        lastX = e.pageX;
    }

    xAngle -= (e.pageY - lastY) / 256;
    lastY = e.pageY;

    draw();
}

window.oncontextmenu = function() { return false; }

The order of multiplying the matrices in the vertex shader puts the Z rotation as the child of both Y and X, meaning any changes to either X or Y changes the orientation of Z. If this were a physical gimbal, then Z would be the innermost ring - always able to rotate freely, but its "north pole", so to speak, follows the movements of the X and Y rings.

Experiment with lesson4b a little to see how gimbal orientation behaves. Right-click dragging left and right will spin the cube on its Z axis, however the Z axis is always through the blue and red faces. If you left-click and rotate the cube to the right 90 degrees, the cube's red and blue faces are now oriented to the X axis. Now if you right-click and drag right or left, the cube rotates up or down, the same motion as left-click dragging up or down. This is one example of gimbal lock.

There are a number of ways to prevent this, and in the case of WebGL or OpenGL, they all have one thing in common: Abstracting rotations somehow. Track the rotations independently of the vertex shader, use maths to premultiply rotations together, and passing a new rotation matrix to the shader. There are a number of methods to do this, and I'll go over a couple of them here.

Tait-Bryan angles plus tracking the last rotation

The simplest approach code-wise is to leave everything as-is, and on mouse-up, creating a translation matrix representing the current rotation, and reset the X, Y, and Z angles all back to 0. Pass that matrix in as the "initial condition" to the vertex shader, where it will be multiplied against the original cube coordinates first.

This requires our first foray into tracking a changing translation matrix on the JavaScript side, and this is the space where glMatrix, EWGL, Sylvester, and their ilk shine, with lots of optimizations for making transpositions, matrix multiplications, etc. as fast as possible. Why the focus on this optimization? Because the GPU is an order of magnitude faster at this type of math than a CPU, so all the work done JavaScript side instead of shader-side will naturally be your bottleneck.

We start in the shader, declaring a new uniform variable for the "initial" matrix:

uniform   mat4  initial;    /* Orientation from last rotation set */

And plug it into the gl_position calculation:

gl_Position = x_rotate * y_rotate * z_rotate * initial * vec4(vertexCoord, 1);

The "initial" matrix will always be the last set of rotations multiplied together, and the new orientation starting point. The x_, y_, and z_rotate matrices function as before, on only one axis each, functioning as Tait-Bryan angles. This is nothing new, it is exactly how they functioned before, rotating each axis in turn, allowing orientation to any final position. This is what is commonly referred to as Euler angles, but this is slightly inaccurate: Proper Euler angles refer to only two axes, with the first and last rotations being on the same axis.

Say what? For example, rotate on X, then on Y, then on X again. Think that through, and you'll see that with this method you can also accomplish any orientation, with the added bonus of being able to visualize it easier from a single reference plane. Tait-Bryan is the same thing, just using each of the three axes once, which better fits the UI model.

Now that we have a place to multiply the new matrix, there is some complementary code to set the initial value of "initial", and to multiply the new rotations into the matrix. Our good friend Wolfram Alpha again comes to the rescue, illustrating how to multiply three translation matrices together. In the image below, the sine and cosine of the X angle are a and b, Y is c and d, and Z is e and f:

We can represent that with a simple JavaScript array (remember, everything is transposed):

var n = [
     d * f, b * e + a * c * f, a * e - b * c * f, 0,
    -d * e, b * f - a * c * e, b * c * e + a * f, 0,
         c,            -a * d,             b * d, 0,
         0,                 0,                 0, 1
];

Additionally, we need to multiply the above array, showing the current rotations, by the old values for "initial", and update that array. If you browsed through the "Medicine for the Sky" entry on translation matrices, you'll know that to multiply them, you are performing a dot-product on each element of the corresponding row of one matrix, and column from the other. This works out to be a pretty straightforward Array.map step:

initial = n.map(function(val, ndx) {
    var x = ndx % 4, y = ndx - x;
    return n[x] * o[y] + n[x+4] * o[y+1] + n[x+8] * o[y+2] + n[x+12] * o[y+3];
});

(Here "n" is a clone of initial.) I want to update initial after every mouse up event, so I'm rewriting the window.mouseup function like so:

window.onmouseup = function(e) {
    var a = Math.sin(xAngle);
    var b = Math.cos(xAngle);
    var c = Math.sin(yAngle);
    var d = Math.cos(yAngle);
    var e = Math.sin(zAngle);
    var f = Math.cos(zAngle);

    var n = [
         d * f, b * e + a * c * f, a * e - b * c * f, 0,
        -d * e, b * f - a * c * e, b * c * e + a * f, 0,
             c,            -a * d,             b * d, 0,
             0,                 0,                 0, 1
    ]; // new
    var o = initial.slice(); // old

    initial = n.map(function(val, ndx) {
        var x = ndx % 4, y = ndx - x;
        return n[x] * o[y] + n[x+4] * o[y+1] + n[x+8] * o[y+2] + n[x+12] * o[y+3];
    });

    xAngle = yAngle = zAngle = 0;
}

Lesson 4c uses this method, presenting the user with much more intuitive rotations. Unfortunately, this method is computationally expensive (particularly the 48 uncached array lookups on every mouse up), which would drop your FPS rate if many objects were being tracked this way.

A better alternative is Quaternions.

Quaternions

As I mentioned above, Euler's rotation theorem assures us that any set of rotations can be referenced by a single rotation, expressed in Axis-angle representation as a unit vector for the axis and an angle of rotation on that axis.

There are known formulae for combining two Axis-angles into one, e.g., given an existing axis-angle in a 1x3 rotation vector v, and [w,θ] is the axis and angle we wish to rotate that with, they can be combined using:

vrot = v cos θ + (w × v) sin θ + w(w ⋅ v)(1 - cos θ)

...where w × v is the cross-product of w and v, and w ⋅ v being the dot-product. This can also be turned into a 3x3 rotation matrix R with this formula:

R = I + sin θ[k]x + (1 - cos θ)(kkt - I)

...where kt is a vector tangent to k, I is a 3x3 identity matrix, and [k]x is a 3x3 cross product matrix of vector k expressed as:

|  0  -k3  k3 |
|  k3  0  -k1 |
| -k2  k1  0  |

At a glance, turning that into WebGL + JavaScript code would be a chore, and would probably not reduce our number of calculations. There is, fortunately, a way to turn axis-angles into algebraic expressions that can be multiplied together natively, and easily turned into a rotation matrix. Enter Quaternions.

Quaternions present you with an unusual concept right out of the gate: what if there are other types of imaginary numbers than just "i"? How would they behave?

In 1843, Irish natural philosopher William Rowan Hamilton put forth a definition for three imaginary constants, each representing an axis in Cartesian space. i, j, and k representing the X, Y, and Z axes, respectively. Squaring any of them gives -1, just like in traditional complex numbers, however multiplying them together in sequence also gives -1. The equation and its corollaries are normally given like this:

Pretty wild. The multiplications aren't commutative (ij != ji), which makes sense when you remember that they represent operations in 3d space. Rotating on the X and then the Y axis has a different effect that rotating on Y and then X, as discussed above. As a starting point for trying to untangle what happens when the imaginary constants are combined, think of this:

i2 = -1, just as you remember from math class.

ijk = -1, as defined in the new formula.

If i(jk) = -1, then jk must also equal i. Similarly, k2 also = -1, so (ij) must = k. Some trial and error should lead you to independently discover all the equalities listed in the equation image above, and I'll leave that as an exercise for you, if you're interested.

There are a number of sites describing quaternions and how they relate to 3D graphics, however the two I found most useful in showing how the math works are one of Hamilton's original papers on the topic from circa 1850, On Quaternions, or on a New System of Imaginaries in Algebra and a 2001 paper out of the CompSci department of UNC, Quaternions and Rotations in 3 Space by Leandra Vicci, both of which are good if you want the juicy details and aren't afraid of copious notation with nebulous decriptions.

My description will be briefer. A quaternion can be represented by 4 variables, w, representing the rotation angle (in radians), and x,y, and z, representing a vector. The w variable stands alone (not to be confused with the homogenous coordinate w, which I'll discuss in the next section), while x,y, and z are multiplied by the imaginary constants above, to form the algebraic expression:

w + xi + yj + zk

When multiplying two quaternions together, the properties of combining imaginaries comes into play, making the resulting product simpler. For example, if we multiply the two quaternions Q(w,x,y,z) and Q(W,X,Y,Z), this happens:

(w + xi + yj + zk) * (W + Xi + Yj + Zk)
wW + wXi + wYj + wZk + xiW + xiXi + xiYj + xiZk + yjW + yjXi + yjYj + yjZk + zkW + zkXi + zkYj + zkZk

Now let's simplify the i2, j2, and z2 to be -1:

wW + wXi + wYj + wZk + xiW + -xX + xiYj + xiZk + yjW + yjXi + -yY + yjZk + zkW + zkXi + zkYj + -zZ

Now the terms with two imaginaries multiplied together can be simplified to their single-imaginary counterparts:

wW + wXi + wYj + wZk + xiW + -xX + xYk + xZ-j + yjW + yX-k + -yY + yZi + zkW + zXj + zY-i + -zZ

Now rearrange the sign indicators to make that more legible:

wW + wXi + wYj + wZk + xiW - xX + xYk - xZj + yjW - yXk - yY + yZi + zkW + zXj - zYi - zZ

Lastly, sort the terms by which imaginaries they use:

wW - xX - yY - zZ + wXi + xiW + yZi - zYi + wYj - xZj + yjW + zXj + wZk + xYk - yXk + zkW =
(wW - xX - yY - zZ) + i(wX + xW + yZ - zY) + j(wY - xZ + yW + zX) + k(wZ + xY - yX + zW)

This leaves us with 4 expressions, representing the product quaternion Q(w',x',y',z'):

w' = wW - xX - yY - zZ,
x' = wX + xW + yZ - zY,
y' = wY - xZ + yW + zX,
z' = wZ + xY - yX + zW

Interestingly, each of these sixteen terms uses each combination of the original terms exactly once. This turns out to be important. The sum of the squares of the product terms is equal to the product of the sum of the squares of the original terms. By that I mean:

(w'2 + x'2 + y'2 + z'2) = (w2+x2+y2+z2) (W2+X2+Y2+Z2)

A quick check on Wolfram Alpha will show that this is right:

Why is that significant? When we introduce unit vectors and unit quaternions ("unit" in this case meaning the sum of the squared terms equals 1), we see that the sum of the squares of each quaternion and the product quaternion are all one. Then we can make use of Pythagorean trigonometric identity:

cos2 θ + sin2 θ = 1

...where cos θ subs in for w, where θ is the "magnitude" of the quaternion (more on that in a minute). sin θ gets multiplied by (xi + yj + zk), making the equation for the quaternion expand to:

cos(n) + sin(n)ix + sin(n)jy + sin(n)kz

Since the [x,y,z] is a unit vector (x2 + y2 + z2 = 1), the square of the terms reduce down to the Pythagorean identity above. Without getting bogged down much more in the math, here's the coup de grĂ¢ce: The "magnitude", theta, is twice the angle of rotation, phi. The reason is perhaps not intuitive, but it is essentially this: To take a vector with no rotation, and apply rotation to it, you multiply it by a quaternion and its inverse, so that the original vector's length and orientation are unchanged, and each of the two multiplications rotates the vector by theta. The equation is typically shown as this:

R(v) = qvq*

...where v is the original vector, R(v) is a quaternion representing the same vector, rotated, q is a quaternion, and q* is its multiplicative inverse (or conjugate {w - ix - jy -kz}, since those two have the same value for unit quaternions). David Eberly's 1999 paper Quaternion Algebra and Calculus dives into the proof for that in some more detail.

Extracting an angle and vector out of a quaternion would be straightforward (just take the cos-1 of w, and apply some algebra to undo the rest), however as it turns out, this is unnecessary in the context of WebGL rotations. Why? Because quaternions can be cast directly to 4 by 4 translation matrices without any additional trigonometry. The conversion, from Wikipedia, is this:

Since we will only be working with unit quaternions, where the sum of its terms squared is always 1, the diagonal terms can be reduced to 1 - 2n2 - 2f2, where n and f are whichever two terms are subtracted in that matrix element. Now substituting the above a,b, c, and d with our familiar w, x, y, and z, reducing the diagonal terms, and fleshing that out to a 4 by 4 matrix, we get the following transformation from quaternion to rotation matrix:

So now we have a system that lets us track fewer terms for each rotation, combine them with fewer operations (remember, multiplying two 4 by 4 matrices results in 64 multiplications), and cast them to a single rotation matrix without additional trigonometry. Lesson 4d is a rewrite of the more natural rotation from Lesson 4c, using quaternions to track everything, and simplifying the vector shader to one rotation matrix:

uniform   mat4 rotate;      /* Translation matrix from js quaternion calcs */
attribute vec3 vertexCoord; /* x, y and z coordinates of current vertex */ \
attribute vec3 a_color;     /* RGB of current vertex, sent interleaved with vertex coords */
varying   vec3 v_color;     /* Color sent to the fragment shader */
void main(void) { \/* Set vertex position with above variable, at normal scale */
    gl_Position = rotate * vec4(vertexCoord, 1);\
    v_color = a_color;\
}\

Next we need helper functions to multiply quaternions, and to turn them into rotation matrices:

function multiplyQuaternion(a, b) {
    var w = a[0], x = a[1], y = a[2], z = a[3],
        W = b[0], X = b[1], Y = b[2], Z = b[3],
        out = [];

    out[0] = w*W - x*X - y*Y - z*Z;
    out[1] = w*X + x*W + y*Z - z*Y;
    out[2] = w*Y - x*Z + y*W + z*X;
    out[3] = w*Z + x*Y - y*X + z*W;

    return out;
}

function quaternionToMatrix(a) {
    var w = a[0], x = a[1], y = a[2], z = a[3],
        wx = w * x * 2, wy = w * y * 2, wz = w * z * 2,
        xy = x * y * 2, xz = x * z * 2, yz = y * z * 2,
        out = [];

    w = w * w * 2;
    x = x * x * 2;
    y = y * y * 2;
    z = z * z * 2;

    out = [
        1 - y - z,   xy + wz,   xz - wy, 0,
          xy - wz, 1 - x - z,   yz + wx, 0,
          xz + wy,   yz - wx, 1 - x - y, 0,
                0,         0,         0, 1
    ];

    return out;
}

The mouse handler functions then become much simpler:

// Set mouse down to update lastX/Y with the current mouse position
window.onmousedown = function(e) {
    initX = e.pageX;
    initY = e.pageY;
    working = [1, 0, 0, 0];
}

// On drags, update working quaternion, multiply with tracking to get a final
// rotation quaternion, call draw()
window.onmousemove = function(e){
    var buttonDown = e.buttons == null ? e.which : e.buttons;
    if (!buttonDown) return;

    var x = e.pageX - initX, y = e.pageY - initY;
    if (x == 0 && y == 0) return;

    // Calculate root sum squares of [x,y]
    var dist = Math.sqrt(Math.pow(x-y, 2) + 2 * x * y);

    // Turn distance into rotation angle, find its sine and cosine
    var angle = dist / 256,
        w = Math.cos(angle),
        s = Math.sin(angle);

    // Normalize coordinates, flip signs
    x *= -s/dist;
    y *= -s/dist;

    // Rotate on X/Y plane for normal drags, X/Z plane for right-drags
    working = buttonDown > 1 ? [w, y, 0, x] : [w, y, x, 0];
    draw(working);
}

window.onmouseup = function(e) {
    tracking = multiplyQuaternion(working, tracking);
    working = [1, 0, 0, 0];
}

What I've done with the mousemove function is notable: I'm turning a distance from the original click point into the rotation angle, and normalizing those values to become the rotation axis vector, another way quaternions are surprisingly well-matched to a 3D UI.

Finally, the draw function can be greatly simplified over the previous iteration:

function draw(quat) {
    if (quat == null) quat = tracking;
    else quat = multiplyQuaternion(quat, tracking);
    var rotate = quaternionToMatrix(quat);
    gl.uniformMatrix4fv(rLoc, false, rotate);

    // Clear color and depth buffers
    gl.clear(gl.COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
    gl.drawElements(gl.TRIANGLES, numIndices, gl.UNSIGNED_SHORT, 0);
}

That's it! Convert the tracking quaternion into a rotation matrix, and combine it with an optional quaternion if needed. Pass the resulting rotation matrix to the shader, and let fly. We can rely on mouseup events to update "tracking" with the latest final rotation.

Orthographic vs. perspective transformation

The OpenGL ES 2.0 spec describes how W, the homogenous coordinate, is used to translate coordinates from clip space to device space in section 2.12:

Divide all the Cartesian coordinates by W. That's it. In an orthographic projection, the 3D object being rendered is simply flattened - W is always 1. If you are looking head-on at a cube rendered orthographically, you just see a square.

Perspective projection, on the other hand, is exactly what they taught you in 8th grade art class: As objects get further away, they converge on the "vanishing point". The straightforward way to do this in WebGL is to link the Z axis to W. If we look at the algebraic equations for translation matrices again...

...we see that setting L to a positive value links higher Z values with higher W, hence dividing the device x and y coordinates by higher numbers, which pushes the pixels closer to the origin point.

Lesson 5 is a non-interactive example of using this method to render a simple wireframe cube in one-point perspective. The end result looks like this:

This was accomplished simply by taking an identity matrix and setting L to 1 (remember, the JavaScript matrix layout is transposed):

var perspective = [
    1, 0, 0, 0,
    0, 1, 0, 0,
    0, 0, 1, 1,
    0, 0, 0, 1
];

What if you want two-point perspective, like looking at the edge of a building, and seeing both sides extend to their respective vanishing points? It's actually easier than you might imagine: Rotate first, then apply the perspective matrix.

Lesson 5b shows exactly this, taking the same wireframe cube and rotating it 45 degrees across the Y axis. This means that our original higher Z values are now less high, and off to the left, where the right side of the wall originally facing us is now deeper in the frame. Both of these get squished towards the origin, resulting in this effect:

I think that's a good starting off point, and I congratulate you if you've actually managed to plow your way through this. I haven't covered textures or lighting, which are the next logical points to investigate, but will leave it to you to research them if you're so inclined. I also haven't created any objects more complicated than a cube, but I assure you with these basics and a search engine, you're only a couple of steps away from, say, creating a spherical map of Earth behind a torus of Earthsea:

Enjoy!

Curtis