June 8, 2012

Doom3 Source Code Review: Scripting VM (Part 5 of 6) >>

From idTech1 to idTech3 the only thing that completely changed every time was the scripting system:

idTech4 is no exception, once again everything is different:

A good introduction is to read the Doom3 Scripting SDK notes.

Architecture

Here is the big picture:

Compilation : At loadtime the idCompiler is fed one predetermined.script file. A serie of #include directives will result in a script stack that contains all the scripts string and every functions source code. It is scanned by an idLexer that generates basic tokens. Tokens enter the idParser and one giant bytecode is generated and stored in idProgram singleton: This constitute the Virtual Machine RAM and contains both .text and .data VM segments.


Virtual Machine : At runtime the engine will allocate real CPU time to each idThread (one after an other) until the end of the linked list is reached. Each idThread contains an idInterpreter that saves the state of the Virtual CPU. Unless the interpreter go wild and run for more than 5,000,000 instructions it will not be pre-empted by the CPU: This is collaborative multitasking.

Compiler

The compilation pipeline is similar to what we can find reading any compiler such a V8 from Google or Clang except that there is no preprocessor. Hence functions such as "comment skipping", macro, directive (#include,#if) have to be done in the lexer and the parser.

Since the idLexer is reused all across the engine to parse every text assets (maps, entities, camera path) it is very primitive. As an example it only return five types of tokens:

So the parser actually has to perform much more than in a "standard" compiler pipeline.

At startup the idCompiler load the first script script/doom_main.script, a serie of #include will build a stack of scripts that are combined in one giant one.

The Parser seems to be a standard recursive descent top down parser. The scripting language grammar seems to be LL(1) necessitating 0 backtrack (even though the Lexer has the capability to "unread" up to one token). If you ever got a chance of reading the dragon book you will not be lost...otherwise this is a good reason to get started ;) !

Interpreter

At runtime, events trigger the creation of idThread that are not Operating System threads but Virtual Machine threads. They are given some runtime by the CPU. Each idThread has an idInterpreter that keeps track of the Intruction Pointer and the two stacks (one for the data/parameters and one to keep track of the function calls).

Execution occurs in idInterpreter::Execute until the interpreter relinquish control of the Virtual Machine: This is collaborative multi-tasking.


  idThread::Execute
   bool idInterpreter::Execute(void)
   {
       doneProcessing = false;
       while( !doneProcessing && !threadDying ) 
       {
           instructionPointer++;
       
           st = &gameLocal.program.GetStatement( instructionPointer );
           
           //op is an unsigned short, the VM can have 65,535 opcodes 
           switch( st->op ) {
                   .
                   .
                   .
           }
       }    
   }


Once the idInterpreter relinquish control the next idThread::Execute method is called until no more thread need execution time. The overal architecture reminded me a lot of Another World VM design.

Trivia : The bytecode is never converted to x86 instructions since it was not meant to be heavily used. But in the end too much was done via scripting and Doom3 would probably have benefited immensely from a JIT x86 converted just like Quake3 had.

Recommended readings

Great way to understand more about the virtual machine is to read the classic Compilers: Principles, Techniques, and Tools :



 

@