JIT Compiler: Really fast POVRay FPU

Nav:

[home] > [render] > [povray] > [patches] > [jitc-patch]

JIT Compiler: Really fast POVRay FPU

The highly experimental JITC can considerably speed up POVRay renderings involving lots of calls to the function VM such as tracing isosurfaces or parametric objects by actually compiling the user functions using GCC. Patch against UNIX POVRay-3.6.1.

Local:

News

2004-08-11: Version 0.4.1: Patch now applies against POVRay-3.6.1.
2004-08-08: Version 0.4: Fixed bug where incorrect code was produced which lead to a GCC compilation failure and in consequence reverting back to the built-in VM.
2004-08-06: Version 0.3a: Minor change to avoid compiler error for assert().
2004-08-03: Initial release: Initial release (version 0.3). This is a higly experimental patch.

The idea...

The function VM (virtual machine) in POVRay gets called whenever user functions need to be evaluated auch as for isosurfaces or the parametric object. For a number of scenes (especially such ones which consist mostly of an isosurface), a major factor for rendering time is the speed of the POV VM. And since the POV VM interpretes assembly code (produced from the user functions at parsing stage), there is room for speed improvements by actually compiling the code for the real CPU/FPU in the computer.

Hence, one day I decided to give the just-in-time compilation approach a chance and implement that.

Actually, the just-in-time compilation of the function code is no really new idea: The PPC/MacOS version of POVRay already comes with a built-in JIT compiler which compiles POV VM code directly into PPC instructions without the help of external programs such as GCC. The disadvantage is that some optimization opportunities get lost but the advantage is that compilation is faster and all the fuzz with external programs, source code, compiler options and shared libraries (see implementation immediately below) is avoided. However, it is far easier to compile the POV VM code into a PPC RISC code with a number of general purpose FP registers than to translate into i387 FPU code (which has a register stack of size 8).

...and the implementation

The JIT compiler actually simply compiles the assembly for the POV VM. This is done by translating the POV VM assembler code into C++ source code (one C++ function for every user function). All these functions are then collected (in their string representation) until the POV VM is called to evaluate such a function.

At that point, all the functions gathered so far are written into a temporary file wich is compiled into a shared object (that is the UNIX analogon to DLLs on Windows) using the system compiler (GCC works, other compilers will need adjustments). This shared object is then loaded and allows POVRay to directly call the compiled versions of the user functions. All that works on-the-fly without the need for the user to do anything special.

The generated shared object will be named jit-X.so (in the current directory) where X is a serial number which gets increased each time a shared object is created. Care has been taken to put as many functions as possible into a single shared object. If all the functions in a shared object are deleted from the VM, the object is unloaded. You will normally not see the shared object file since it is unlinked (removed) as soon as it has been loaed.

Since the JIT compiler patch involves things like shared object loading, it is highly system specific. The patch provided here works fine for me on my i386 Linux system. It should also work on other Linux/GNU systems (i.e. using GNU compiler and linker) but will definitely not work on Windows. It may, however, be portable to MacOS X with little effort.

The JITC patch needs the POVRay source code because several include files from POVRay are needed when compiling the user functions. Since it also requires the conf.h file, you should also not remove the build directory after having built POVRay. The JIT compiler uses the same flags and directories as were used to build POVRay; this information is statically compiled into POVRay during build of the patched version. The JIT compiler must explicitly be enabled using an environment var, see usage below.

I know that the implementation is not very clean in all points (especially note the part changing fnpovfpu.cpp). Actually, this is the first time, I worked with runtime loading of shared objects and I encountered some problems while implementing which I had not thought of before.

What is it good for?

The JITC patch is primarily useful for scene renderings which are dominated by function VM calls. For example if you want to trace things like mathematical isosurfaces, expect speed increases of factor 2 to 3. However, if you are tracing an isosurface landscape whose major time is spend calculating complicated pattern functions, the benefit will be small. See also the examples below.

Performance considerations

The advantage of the JITC approach presented here is that GCC and all its optimization capabilities can be used. (E.g. most of the pointless register moves generated by the POV VM are optimized away - although it turns out that this example alone is not responsible for a large performance gain.)

The downside is that running GCC takes some time (typically 3 to 4 seconds on my box when functions.inc is included and (only) a couple of functions are used (summing up to about 100 functions, most from the include file)). However, calling functions in the dynamically linked library does not introduce noticeable overhead. (I did several measurements including verification of the produced assembler code which showed that result.) The only overhead (apart from compiling the code and loading the library) is function lookup which has to be performed only once and can therefore be neglected.

Download and Install

Download: The JITC patch can only be obtained as patch against UNIX POVRay-3.6.1.
I will neither provide binaries nor port that patch to any other platform. If you want to do so, please contact me.

Source:		jitc-patch.diff [81kb patch diff]
Version:		0.4.1 (2004-08-11)
Author:		Wolfgang Wieser (report bugs here)
License:		POVRay license (povlegal.doc)

Install: First, patch your POVRay-3.6.1 using patch(1).
Then, run aclocal, autoconf, autoheader and automake (suggested in this order) because some Makefile.am and configure.ac were changed by the patch.
Finally, configure and compile POVRay as usual.

Note that the JITC patch needs the POVRay sources and the build directory (with conf.h) installed at the exact place, so leave the sources and the build dir on your hd. The configure script automatically detects the directories and these are compiled statically into the patched version of POVRay.

Activate: The JITC-patched POVRay should behave exactly like the non-patched. To enable PRT, set the environment variable POV_USE_JITCOMPILER to yes.

Bugs: The patch is highly experimental. If you find any bugs, especially functions for which it does not work correctly, please contact me.

Usage (important)

Using the POVRay with JITC patch should not be any different from using normal POVRay. In order to enable the patch you need to set the environment variable POV_USE_JITCOMPILER to "yes". (Use no env var at all or value "no" to disable). This is done e.g. using the bash(1) via:
export POV_USE_JITCOMPILER="yes"

When having enabled the JIT compiler, it should automatically compile the functions. In case it fails, you should see error messages and POVRay will revert back to the slower built-in POV VM. A successful compile should look like this in the terminal:

Mapping background image

  0:00:00 Rendering line 1 of 120
JIT compiler: g++ -x c++  -pipe -Wno-multichar -O3 -march=athlon-xp
 -malign-double -minline-all-stringops -ffast-math -Wno-multichar
 -funit-at-a-time -fno-rtti -Wno-all -DHAVE_CONFIG_H -nostartfiles
 -shared -I/path/to/povray-3.6.1-modified/source
 -I/path/to/povray-3.6.1-modified/source/base
 -I/path/to/povray-3.6.1-modified/unix -I/path/to/povray-3.6.1-modified-build
 /tmp/jitcompiler-sjC6aD -o ./jit-0.so
JIT compiler: dlopen(./jit-0.so)... OK
JIT compiler: DL_Lookup.........................................................
..........................................................OK
JIT Compiler (114 functions): success
JIT compiler: VM lookup: POV_JIT_FPU_113 -> 0x40440fe0
JIT compiler: VM lookup: POV_JIT_FPU_76 -> 0x4043ef60
JIT compiler: VM lookup: POV_JIT_FPU_111 -> 0x40440e80
JIT compiler: VM lookup: POV_JIT_FPU_112 -> 0x40440f30
  0:00:04 Rendering line 20 of 120

Especially note the red lines.

Example scenes

Finally, let's look at some examples and benchmarks. All were made using JITC-patched POVRay-3.6 on an idle AthlonXP with 1.47GHz running Linux-2.6 and a graphical display. (The unpatched version of POVRay is called "vanilla" and of course both were compiled with the same compiler using the same options etc.)

Alex Kluchikov's "favourite isosurface" in its original form makes very much use of the POV VM. The relevant code is shown below. It was rendered at 320x320 without anti-aliasing:

Vanilla POVRay:		266 sec		16332263 VM calls
JITC POVRay:		100 sec		16332263 VM calls		speedup: factor 2.66

It is reasonable to expect factor 2.5 for scenes which are dominated by an isosurface of a complicated analytic function without pigment functions.

// Alex Kluchikov, 2003; mail: klkspa[at]ukr.net, aklk[at]mail.ru
function { #declare MPI=16*pi/3;
 #macro tx() (sqrt(x*x+z*z)-1.5) #end #macro ty() y #end
 #macro ttx() tx()*sin(radialf(x,y,z)*MPI)+ty()*cos(radialf(x,y,z)*MPI) #end
 #macro tty() tx()*cos(radialf(x,y,z)*MPI)-ty()*sin(radialf(x,y,z)*MPI) #end
   pow(pow(ttx()+0.25,2)+pow(tty(),2),1/64)*.33
  +pow(pow(ttx()-0.125,2)+pow(tty()+0.216506350946109661690930793,2),1/64)*.33
  +pow(pow(ttx()-0.125,2)+pow(tty()-0.216506350946109661690930793,2),1/64)*.33
   -.945+sin(radialf(x,y,z)*10*pi)*0.01 }

In contrast, my dry lake topography experiment (at 400x400), which is basically several pigment and built-in functions, sees a much smaller speed-up since most of the isosurface function evaluation time is spent in complicated pigment and built-in (f_noise3d) functions:

Vanilla POVRay:		386 sec		25401640 VM calls
JITC POVRay:		337 sec		25401640 VM calls		speedup: factor 1.15

Hence, it is still faster but "just" by 15%.

function { y - 0.3
  + (f_noise3d(x/8,0,z/8)-0.5)/2
  - fn_crack_large(x,0,z).grey
  - fn_crack_small(x,0,z).grey/10

Finally, let's try a nontrivial parametric object invented by me. It's actually based on the magnetic field in a cavity and then I played around until this flower-like surface came out. Obviously, the rendering speed is dominated by VM speed for this one. Precomputation at parsing stage was set to 14 (to avoid artifacts).

Vanilla POVRay:		288 sec		192902803 VM calls
JITC POVRay:		69 sec		192902803 VM calls		speedup: factor 4.17

Wow, more than 4 times as fast!

Complete POV SDL code: paramflower.pov

function { u*sin(v+sqrt(u))*(1-0.001*sqrt(u)) }
function { 15*sqrt(pow(sin(m*v+0.1*sqrt(u))*    sin(u*0.5-0.1*sqrt(v))/3, 2)+
                   pow(cos(m*v+0.1*sqrt(u))*m/u*sin(u*0.5-0.1*sqrt(v))/3, 2))
           +sqrt(u)-1/pow(u,3) }
function { u*cos(v+sqrt(u))*(1-0.001*sqrt(u)) }

[home] [site map] [Impressum] [Datenschutz/privacy policy]

Last modified: 2005-03-31 23:09:15