Since gcc-4.1 now has the option to perform profile guided
optimization, I thought I’d try compiling a version of Ruby with this
feature turned on. For the (somewhat contrived/synthetic) scenario
below, it looks like compiling with profile guided optimization turned
on, along with -O3, results in about a 9% speed increase vs. -O3 alone.
YMMV. I compiled a version of Ruby-1.8.4 using the following config
configure CFLAGS="-O3 -fprofile-generate" LDFLAGS=-fprofile-generate
…Then I ran the short version of the n-body program from the Great
Computer Language Shootout with n=1000…
…this generated the profile information needed for the next stage. I
then compiled a Ruby with…
configure CFLAGS="-O3 -fprofile-use"
…Running the resulting interpreter runs the exact same n-body program
about 8-10% faster than an compiling with just -O3.