Ruby Speed Guide

Harrison Ainsworth

http://www.hxa.name/
artifex (ατ) hxa7241 (dοτ) org

2007-02-20

Summary

Ruby is very slow. But with some coding effort Ruby programs can be made significantly faster. Here are some guides that emerged from accelerating a small numerical method by a factor of eight. (1300 words)

subject: Software development, Programming, Ruby, Speed, Performance, Optimization
uri: http://www.hxa.name/articles/content/ruby-speed-guide_hxa7241_2007.html
license: Creative Commons BY-SA 3.0 License.

Guide
Code

Guide

(First, profile the program to find what to address.)

Generally: aim for just primitive types — Fixnum, Float, and minimum calculation. Rather like a simple manually optimized C style.

Factor-out everything: obvious common sub-expressions, and implicitly duplicated calculations (the latter has most scope for creativity)
Reduce to basic types and basic variables: Fixnum, Float, and non-structured scalar variables
Unroll loops: simple to do but effective
Inline function calls: including replacing built-ins like abs
Avoid iterations/list-comprehensions/etc.: the hidden structure in that succinctness costs time
Avoid arrays/hashes: creating new objects is probably worst, but navigating structures (indexing) costs too
Avoid objects: for similar reasons to arrays/hashes
Avoid parallel assignment: this appeared to be better (perhaps unexpected...)
Re-use variables: should squeeze out that last little advantage

It is mostly normal optimization, retro-style. Ruby optimizes nothing, and you have fairly detailed control. Also, retrieving stored results is cheaper than calculation.

Code

I translated a minimal global illumination renderer from C++ into Ruby (maybe crazy, but amusing) — http://www.hxa.name/minilight/. A straightforward transliteration of the whole will run almost 1000 times slower than C++. (Ruby 1.8.2 [i386-mswin32] versus MS VC++ Express 2005, Intel P3 1GHz.)

Building the octree spatial index was slow, and the hot-spot was the triangle bound method...

Here are some of the variants tried (following the original C++). The fastest Ruby variant was 8.3 times faster than the slowest, but had 12 times more lines of code. I expect it could be improved further...

variant	speedup factor	codesize expansion
Version 1	1.000	1.00
Version 2	1.529	1.75
Version 3	1.601	2.25
Version 4	3.204	2.75
Version 5	4.502	3.00
Version 6	5.499	4.00
Version 7	5.939	4.00
Version 8	7.814	5.75
Version 9	8.336	12.00

Original C++

void Triangle::getBound
(
   float bound[6]
) const
{
   // initialize
   for( dword i = 6;  i-- > 0;  bound[i] = vertexs_m[2][i % 3] );

   // expand
   for( dword i = 0;  i < 3;  ++i )
   {
      for( dword j = 0, d = 0, m = 0;  j < 6;  ++j, d = j / 3, m = j % 3 )
      {
         // include some tolerance
         const float v = vertexs_m[i][m] + ((d ? 1.0f : -1.0f) *
            (::fabsf(vertexs_m[i][m]) + 1.0f) * TOLERANCE());
         bound[j] = (bound[j] > v) ^ d ? v : bound[j];
      }
   }
}

This is the shortest version (6 lines) I made. (Speed is not a problem.)

(It is often said that choosing the best algorithm is most important and dominates language choice. But in practice, using a fast language is important: a quickly hacked-up dumb solution will probably still be fast enough — saving programming time as well as runtime.)

Version 1

# Axis-aligned bounding box of triangle.
#
# ===return
# * Array of 2 Vector3fc, lower corner and upper corner.
#
def getBound?

   @vertexs.inject( Array.new( 2, @vertexs[2] ) ) do |bound, v|
      tolerance = v.combine?( v ) { |a,b| (a.abs + 1.0) / 1024.0 }
      [bound[0].clampMax?(v - tolerance), bound[1].clampMin?(v + tolerance)]
   end

end

      # @vertexs is an Array of 3 Vector3fc

      # using this method in Vector3fc:
      def combine?( arg )
         Vector3fc.new( yield(@x, arg.x), yield(@y, arg.y), yield(@z, arg.z) )
      end

21.732 seconds, 4 lines. (returns Array of two Vector3fc, instead of Array of 6 Float)

Version 2

# Axis-aligned bounding box of triangle.
#
# ===return
# * Array of 6 Float, lower corner in [0..2], and upper corner in [3..5]
#
def getBound?

   @vertexs.inject( @vertexs[2].to_a * 2 ) do |bound, v|
      bound.each_index do |j|
         d, m = j > 2, j % 3
         vt = v[m] + ((d ? 1.0 : -1.0) * (v[m].abs + 1.0) * TOLERANCE)
         bound[j] = (bound[j] > vt) ^ d ? vt : bound[j]
      end
   end

end

14.214 seconds, 7 lines.

Version 3

bound = @vertexs[2].to_a * 2

@vertexs.each do |v|
   6.times do |j|
      d, m = j > 2, j % 3
      vt = v[m] + ((d ? 1.0 : -1.0) * (v[m].abs + 1.0) * TOLERANCE)
      bound[j] = (bound[j] > vt) ^ d ? vt : bound[j]
   end
end

bound

13.570 seconds, 9 lines.

Version 4

bound = @vertexs[2].to_a * 2

@vertexs.each do |v|
   3.times do |j|
      tolerance = (v[j].abs + 1.0) * TOLERANCE
      c = v[j] - tolerance
      bound[    j] = c if bound[    j] > c
      c = v[j] + tolerance
      bound[3 + j] = c if bound[3 + j] <= c
   end
end

bound

6.783 seconds, 11 lines.

Version 5

bound = [ @vertexs[2].x, @vertexs[2].y, @vertexs[2].z,
   @vertexs[2].x, @vertexs[2].y, @vertexs[2].z ]

3.times do |j|
   2.times do |i|
      v = @vertexs[i]
      bound[    j] = v[j] if bound[    j] > v[j]
      bound[3 + j] = v[j] if bound[3 + j] < v[j]
   end
   bound[    j] -= (bound[    j].abs + 1.0) * TOLERANCE
   bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end

bound

4.827 seconds, 12 lines. (second choice (with factored @vertexs[2]))

Version 6

bound = Array.new( 6, 0.0 )

3.times do |j|
   v0 = @vertexs[0][j]
   v1 = @vertexs[1][j]
   v2 = @vertexs[2][j]
   if v0 < v1
      bound[    j] = v0 < v2 ? v0 : v2
      bound[3 + j] = v1 > v2 ? v1 : v2
   else
      bound[    j] = v1 < v2 ? v1 : v2
      bound[3 + j] = v0 > v2 ? v0 : v2
   end
   bound[    j] -= (bound[    j].abs + 1.0) * TOLERANCE
   bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end

bound

3.952 seconds, 16 lines.

Version 7

v2 = @vertexs[2]
bound = [ v2.x, v2.y, v2.z, v2.x, v2.y, v2.z ]

3.times do |j|
   v0 = @vertexs[0][j]
   v1 = @vertexs[1][j]
   if v0 < v1
      bound[    j] = v0 if v0 < bound[    j]
      bound[3 + j] = v1 if v1 > bound[3 + j]
   else
      bound[    j] = v1 if v1 < bound[    j]
      bound[3 + j] = v0 if v0 > bound[3 + j]
   end
   bound[    j] -= (bound[    j].abs + 1.0) * TOLERANCE
   bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end

bound

3.659 seconds, 16 lines. (first choice)

Version 8

v2 = @vertexs[2]
bound0 = bound3 = v2.x
bound1 = bound4 = v2.y
bound2 = bound5 = v2.z

@vertexs.each do |v|
   e = v[0]
   if    e < bound0 then bound0 = e
   elsif e > bound3 then bound3 = e
   end
   e = v[1]
   if    e < bound1 then bound1 = e
   elsif e > bound4 then bound4 = e
   end
   e = v[2]
   if    e < bound2 then bound2 = e
   elsif e > bound5 then bound5 = e
   end
end

[ bound0 - ((bound0.abs + 1.0) * TOLERANCE),
  bound1 - ((bound1.abs + 1.0) * TOLERANCE),
  bound2 - ((bound2.abs + 1.0) * TOLERANCE),
  bound3 + ((bound3.abs + 1.0) * TOLERANCE),
  bound4 + ((bound4.abs + 1.0) * TOLERANCE),
  bound5 + ((bound5.abs + 1.0) * TOLERANCE) ]

2.781 seconds, 23 lines.

Version 9

v2 = @vertexs[2]
bound0 = bound3 = v2.x
bound1 = bound4 = v2.y
bound2 = bound5 = v2.z

e = v2[0]
if    e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v2[1]
if    e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v2[2]
if    e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end

v1 = @vertexs[1]
e = v1[0]
if    e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v1[1]
if    e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v1[2]
if    e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end

v0 = @vertexs[0]
e = v0[0]
if    e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v0[1]
if    e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v0[2]
if    e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end

[ bound0 - (((bound0 < 0.0 ? -bound0 : bound0) + 1.0) * TOLERANCE),
  bound1 - (((bound1 < 0.0 ? -bound1 : bound1) + 1.0) * TOLERANCE),
  bound2 - (((bound2 < 0.0 ? -bound2 : bound2) + 1.0) * TOLERANCE),
  bound3 + (((bound3 < 0.0 ? -bound3 : bound3) + 1.0) * TOLERANCE),
  bound4 + (((bound4 < 0.0 ? -bound4 : bound4) + 1.0) * TOLERANCE),
  bound5 + (((bound5 < 0.0 ? -bound5 : bound5) + 1.0) * TOLERANCE) ]

2.607 seconds, 48 lines.

Timing harness for Ruby variants

# items filled with triangles from a normal scene 

t0 = Time.now

100.times { items.each { |item| item.getBound? } }

t1 = Time.now
puts "#{t1 - t0}"