Ruby is very slow. But with some coding effort Ruby programs can be made significantly faster. Here are some guides that emerged from accelerating a small numerical method by a factor of eight. (1300 words)
(First, profile the program to find what to address.)
Generally: aim for just primitive types — Fixnum
, Float
, and minimum calculation. Rather like a simple manually optimized C style.
Fixnum
, Float
, and non-structured scalar variablesabs
It is mostly normal optimization, retro-style. Ruby optimizes nothing, and you have fairly detailed control. Also, retrieving stored results is cheaper than calculation.
I translated a minimal global illumination renderer from C++ into Ruby (maybe crazy, but amusing) — http://www.hxa.name/minilight/. A straightforward transliteration of the whole will run almost 1000 times slower than C++. (Ruby 1.8.2 [i386-mswin32] versus MS VC++ Express 2005, Intel P3 1GHz.)
Building the octree spatial index was slow, and the hot-spot was the triangle bound method...
Here are some of the variants tried (following the original C++). The fastest Ruby variant was 8.3 times faster than the slowest, but had 12 times more lines of code. I expect it could be improved further...
variant | speedup factor | codesize expansion |
---|---|---|
Version 1 | 1.000 | 1.00 |
Version 2 | 1.529 | 1.75 |
Version 3 | 1.601 | 2.25 |
Version 4 | 3.204 | 2.75 |
Version 5 | 4.502 | 3.00 |
Version 6 | 5.499 | 4.00 |
Version 7 | 5.939 | 4.00 |
Version 8 | 7.814 | 5.75 |
Version 9 | 8.336 | 12.00 |
void Triangle::getBound
(
float bound[6]
) const
{
// initialize
for( dword i = 6; i-- > 0; bound[i] = vertexs_m[2][i % 3] );
// expand
for( dword i = 0; i < 3; ++i )
{
for( dword j = 0, d = 0, m = 0; j < 6; ++j, d = j / 3, m = j % 3 )
{
// include some tolerance
const float v = vertexs_m[i][m] + ((d ? 1.0f : -1.0f) *
(::fabsf(vertexs_m[i][m]) + 1.0f) * TOLERANCE());
bound[j] = (bound[j] > v) ^ d ? v : bound[j];
}
}
}
This is the shortest version (6 lines) I made. (Speed is not a problem.)
(It is often said that choosing the best algorithm is most important and dominates language choice. But in practice, using a fast language is important: a quickly hacked-up dumb solution will probably still be fast enough — saving programming time as well as runtime.)
# Axis-aligned bounding box of triangle.
#
# ===return
# * Array of 2 Vector3fc, lower corner and upper corner.
#
def getBound?
@vertexs.inject( Array.new( 2, @vertexs[2] ) ) do |bound, v|
tolerance = v.combine?( v ) { |a,b| (a.abs + 1.0) / 1024.0 }
[bound[0].clampMax?(v - tolerance), bound[1].clampMin?(v + tolerance)]
end
end
# @vertexs is an Array of 3 Vector3fc
# using this method in Vector3fc:
def combine?( arg )
Vector3fc.new( yield(@x, arg.x), yield(@y, arg.y), yield(@z, arg.z) )
end
21.732 seconds, 4 lines. (returns Array of two Vector3fc, instead of Array of 6 Float)
# Axis-aligned bounding box of triangle.
#
# ===return
# * Array of 6 Float, lower corner in [0..2], and upper corner in [3..5]
#
def getBound?
@vertexs.inject( @vertexs[2].to_a * 2 ) do |bound, v|
bound.each_index do |j|
d, m = j > 2, j % 3
vt = v[m] + ((d ? 1.0 : -1.0) * (v[m].abs + 1.0) * TOLERANCE)
bound[j] = (bound[j] > vt) ^ d ? vt : bound[j]
end
end
end
14.214 seconds, 7 lines.
bound = @vertexs[2].to_a * 2
@vertexs.each do |v|
6.times do |j|
d, m = j > 2, j % 3
vt = v[m] + ((d ? 1.0 : -1.0) * (v[m].abs + 1.0) * TOLERANCE)
bound[j] = (bound[j] > vt) ^ d ? vt : bound[j]
end
end
bound
13.570 seconds, 9 lines.
bound = @vertexs[2].to_a * 2
@vertexs.each do |v|
3.times do |j|
tolerance = (v[j].abs + 1.0) * TOLERANCE
c = v[j] - tolerance
bound[ j] = c if bound[ j] > c
c = v[j] + tolerance
bound[3 + j] = c if bound[3 + j] <= c
end
end
bound
6.783 seconds, 11 lines.
bound = [ @vertexs[2].x, @vertexs[2].y, @vertexs[2].z,
@vertexs[2].x, @vertexs[2].y, @vertexs[2].z ]
3.times do |j|
2.times do |i|
v = @vertexs[i]
bound[ j] = v[j] if bound[ j] > v[j]
bound[3 + j] = v[j] if bound[3 + j] < v[j]
end
bound[ j] -= (bound[ j].abs + 1.0) * TOLERANCE
bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end
bound
4.827 seconds, 12 lines. (second choice (with factored @vertexs[2]
))
bound = Array.new( 6, 0.0 )
3.times do |j|
v0 = @vertexs[0][j]
v1 = @vertexs[1][j]
v2 = @vertexs[2][j]
if v0 < v1
bound[ j] = v0 < v2 ? v0 : v2
bound[3 + j] = v1 > v2 ? v1 : v2
else
bound[ j] = v1 < v2 ? v1 : v2
bound[3 + j] = v0 > v2 ? v0 : v2
end
bound[ j] -= (bound[ j].abs + 1.0) * TOLERANCE
bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end
bound
3.952 seconds, 16 lines.
v2 = @vertexs[2]
bound = [ v2.x, v2.y, v2.z, v2.x, v2.y, v2.z ]
3.times do |j|
v0 = @vertexs[0][j]
v1 = @vertexs[1][j]
if v0 < v1
bound[ j] = v0 if v0 < bound[ j]
bound[3 + j] = v1 if v1 > bound[3 + j]
else
bound[ j] = v1 if v1 < bound[ j]
bound[3 + j] = v0 if v0 > bound[3 + j]
end
bound[ j] -= (bound[ j].abs + 1.0) * TOLERANCE
bound[3 + j] += (bound[3 + j].abs + 1.0) * TOLERANCE
end
bound
3.659 seconds, 16 lines. (first choice)
v2 = @vertexs[2]
bound0 = bound3 = v2.x
bound1 = bound4 = v2.y
bound2 = bound5 = v2.z
@vertexs.each do |v|
e = v[0]
if e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v[1]
if e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v[2]
if e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end
end
[ bound0 - ((bound0.abs + 1.0) * TOLERANCE),
bound1 - ((bound1.abs + 1.0) * TOLERANCE),
bound2 - ((bound2.abs + 1.0) * TOLERANCE),
bound3 + ((bound3.abs + 1.0) * TOLERANCE),
bound4 + ((bound4.abs + 1.0) * TOLERANCE),
bound5 + ((bound5.abs + 1.0) * TOLERANCE) ]
2.781 seconds, 23 lines.
v2 = @vertexs[2]
bound0 = bound3 = v2.x
bound1 = bound4 = v2.y
bound2 = bound5 = v2.z
e = v2[0]
if e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v2[1]
if e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v2[2]
if e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end
v1 = @vertexs[1]
e = v1[0]
if e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v1[1]
if e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v1[2]
if e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end
v0 = @vertexs[0]
e = v0[0]
if e < bound0 then bound0 = e
elsif e > bound3 then bound3 = e
end
e = v0[1]
if e < bound1 then bound1 = e
elsif e > bound4 then bound4 = e
end
e = v0[2]
if e < bound2 then bound2 = e
elsif e > bound5 then bound5 = e
end
[ bound0 - (((bound0 < 0.0 ? -bound0 : bound0) + 1.0) * TOLERANCE),
bound1 - (((bound1 < 0.0 ? -bound1 : bound1) + 1.0) * TOLERANCE),
bound2 - (((bound2 < 0.0 ? -bound2 : bound2) + 1.0) * TOLERANCE),
bound3 + (((bound3 < 0.0 ? -bound3 : bound3) + 1.0) * TOLERANCE),
bound4 + (((bound4 < 0.0 ? -bound4 : bound4) + 1.0) * TOLERANCE),
bound5 + (((bound5 < 0.0 ? -bound5 : bound5) + 1.0) * TOLERANCE) ]
2.607 seconds, 48 lines.
# items filled with triangles from a normal scene
t0 = Time.now
100.times { items.each { |item| item.getBound? } }
t1 = Time.now
puts "#{t1 - t0}"