Hi all - apologies in advance since this issue is not necessarily a Ruby issue, but I've written the program in Ruby and the community here has been very helpful/friendly in my experience. I'm working on another problem from Project Euler ( http://www.projecteuler.net ). This one is problem 73 (I'm not actually that smart, I just jump around ;-) ): "Consider the fraction, n/d, where n and d are positive integers. If nd and HCF(n,d)=1, it is called a reduced proper fraction. How many fractions lie between 1/3 and 1/2 in the sorted set of reduced proper fractions for d 10,000?" So I set up an hcf(n,d) function that prime-factor-izes n and d to see if n/d is reduced and proper: Primes_under_100 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] def hcf(n,d) Primes_under_100.each do |prime| if (prime < Math.sqrt(d) && n % prime == 0 && d % prime == 0) || d % n == 0 return 2 end end return 1 end Then I ran that through a loop that builds an array of unique instances of fractions between 1/2 and 1/3: red_prop_fract = [] for d in 2..10_000 do for n in 1...d do if hcf(n,d) == 1 && n.to_f/d > 1.0/3 && n.to_f/d < 0.5 red_prop_fract.push(n.to_f/d) unless red_prop_fract.include? (n.to_f/d) end end end #Get the answer p red_prop_fract.length Now, I'm fairly certain that the algorithm is right, because I ran it on their example of d < 9 and got the correct answer. However, this setup is taking ridiculously long and I can't determine why. Running on a computer with a 2.4GHz CPU and 256 Mb of RAM, I started it at 4:30 yesterday and it hasn't terminated! I know that a nested 'for' loop that goes to n is O(n^2). And I know that the array I'm building is rather big, and growing it gradually is going to cause a lot of allocations and GCs. But I can't figure out why it's being so ridiculous. My first rule as a programmer is "Assume all problems are my fault." Insights? TIA, Andrew

on 2007-07-25 17:47

on 2007-07-25 20:43

On 7/25/07, Kaldrenon <kaldrenon@gmail.com> wrote: > fractions lie between 1/3 and 1/2 in the sorted set of reduced proper > n % prime == 0 && > > #Get the answer > that the array I'm building is rather big, and growing it gradually is > going to cause a lot of allocations and GCs. But I can't figure out > why it's being so ridiculous. > > My first rule as a programmer is "Assume all problems are my fault." > > Insights? > > TIA, > Andrew Hi, I don't want to spoil the fun, so just few hints: hcf(14,35) == 1 which is wrong. I don't think you have a problem with GC. Your algorithm is too slow. You can verify this using profiler. There are two ways of enhancing the speed: 1. enhacing the algorithm itself, i.e. lowering the O(n^2)and 2. tweaking the implementation (i.e. the constants) Install the profiler, gem install ruby-prof, and run it ruby-prof -p graph_html yourprog.rb > profile.html Have a look at the output, and find the parts that take the most time. Suggested changes: 1. Move things out of cycle if it doesn't depend on the inner variable. 2. Cache hard-to-compute expressions, especially those involving floats. 3. Use Set instead of array. 4. Move easy-to-compute conditions (<, >) before the hard ones (hcf) 5. Optimize the ranges of your loops 6. Try to avoid sqrt and floats as much as possible. Some stats (core2duo T7200 2ghz/1gb ram/4mb cache, ruby 1.8.5, wxpsp2) d <= 500 (12687) your version: 20.453 seconds with change #3: 6.981 with #3 and #4: 1.484 with #3,4,2: 1.203 d <= 700 (24836) your version: 66.281 with #3,4,2: 2.515 with #3,4,2,5: 2.141 d <= 1000 (50695) with #3,4,2,5: 5.094 fixing hcf: 2.547 (I'm not sure about the results, I suspect at least 1/2 is included there somehow, if not more false positives) J.

on 2007-07-25 21:25

On Jul 25, 2007, at 11:45 AM, Kaldrenon wrote: > nd and HCF(n,d)=1, it is called a reduced proper fraction. How many > if (prime < Math.sqrt(d) && > instances of fractions between 1/2 and 1/3: > > I know that a nested 'for' loop that goes to n is O(n^2). And I know > that the array I'm building is rather big, and growing it gradually is > going to cause a lot of allocations and GCs. But I can't figure out > why it's being so ridiculous. > > My first rule as a programmer is "Assume all problems are my fault." > > Insights? Jano Svitok has already given you a good answer, but let me add my 2 cents. Basically, you need a better algorithm. However, even if you were to stick with a brute-force approach, it could be speeded up. For one thing, you could just count the reduced fractions as you go along rather than accumulating a list of their values and counting that. Second, you could use the mathn standard library, which adds a gcd method to Integer (gcd is another name for HCF). Third, you can avoid converting integers to floats and doing float arithmetic. When I made these changes to your algorithm, I got the following results: ~ mg: time /Users/mg/Desktop/test.rb Number of reduced fractions is 5066251 real 7m52.685s user 7m22.083s sys 0m2.275s This is on a 2-GHz iMac. It's still very slow, but a lot faster than what you were experiencing. And it doesn't use much memory. I'm not showing my code here for fear of being a spoiler. I will post it if you ask. Regards, Morton

on 2007-07-25 23:32

On Jul 25, 2:42 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote: > 1. Move things out of cycle if it doesn't depend on the inner variable. I try to do this in general, but am I correct in my thinking that it's not applicable to this script? > 2. Cache hard-to-compute expressions, especially those involving floats. I'm still very much a novice - could you explain to me in simple terms what you mean, or point me to a document/resource on this topic? > 3. Use Set instead of array. Had a BIG impact on run-time, thank you! > 4. Move easy-to-compute conditions (<, >) before the hard ones (hcf) This is something I feel dumb for not remembering myself - although some of these other things are new material for me, this is something I've dealt with before. Thanks for the reminder. Also had a big impact on time. > 5. Optimize the ranges of your loops As with tip #1, is there a way this technique can improve this script? I know it's wise in general. > 6. Try to avoid sqrt and floats as much as possible. Like #1 and #5, I think - Can I really avoid floats, since I need to get a number that's between two floats? > Some stats (core2duo T7200 2ghz/1gb ram/4mb cache, ruby 1.8.5, wxpsp2) > d <= 500 (12687) > your version: 20.453 seconds > with change #3: 6.981 > with #3 and #4: 1.484 > with #3,4,2: 1.203 On my system (2.4ghz/256mb ram/(don't know cache), ruby 1.8.6, wxpsp2) (taking into account a number of other programs were running at the time, it's much slower than it was for you but the rate of change is comparable.) d <= 500 with change #3: ~38 seconds with changes #3 and #4: ~8 seconds. I also took Morton's advice and dropped my hcf method in favor of Mathn's gcd2(), along with dropping the array/set for a counter. I was wary of this at first until I realized that with /reduced/ proper fractions, there's no chance of a duplicate value. I had been using the "push unless include?" style in order to avoid duplication, but avoiding duplication is inherent. My full script is now: p Time.now require 'mathn' red_prop_fract = 0 for d in 2..10_000 do for n in 1...d do if n.to_f/d > 1.0/3 && n.to_f/d < 0.5 && n.gcd2(d) == 1 red_prop_fract += 1 end end end p red_prop_fract.length p Time.now Many thanks to both of you! I always love it when people who know their stuff are genuinely willing to help newbies like me. -Andrew

on 2007-07-26 04:12

On Jul 25, 2007, at 5:29 PM, Kaldrenon wrote: > end > end > > p red_prop_fract.length > p Time.now That's very close to what I did. However, the conditional clause of the if-statement can be improved a little: if 3 * n > d && 2 * n < d && n.gcd(d) == 1 Staying with integers is faster, and gcd is a little faster than gcd2. Regards, Morton

on 2007-07-26 09:34

On 7/25/07, Kaldrenon <kaldrenon@gmail.com> wrote: > On Jul 25, 2:42 pm, "Jano Svitok" <jan.svi...@gmail.com> wrote: > > 1. Move things out of cycle if it doesn't depend on the inner variable. > > I try to do this in general, but am I correct in my thinking that it's > not applicable to this script? In your script Math.sqrt was computed inside an each loop. Now it's not there anymore, so it's fixed. > > 2. Cache hard-to-compute expressions, especially those involving floats. > > I'm still very much a novice - could you explain to me in simple terms > what you mean, or point me to a document/resource on this topic? For this I meant n.to_f/d that was there three times. I'm not sure if ruby reuses the value or computes it three times. > > > 5. Optimize the ranges of your loops > > As with tip #1, is there a way this technique can improve this script? > I know it's wise in general. require 'mathn' MAX_D = 1000 red_prop_fract = 0 for n in 2...(MAX_D/2).ceil do lower = n*2+1 upper = [n*3-1,MAX_D].min for d in lower..upper do if n.gcd2(d) == 1 red_prop_fract += 1 end end end > > 6. Try to avoid sqrt and floats as much as possible. > > Like #1 and #5, I think - Can I really avoid floats, since I need to > get a number that's between two floats? As morton said, instead of checking a/b > c/d you can do a*d > b*c. And even in case of storing the fractions you can encode them into a Fixnum: (n<<16 + d) You can optimize sqrt when you know you are iterating over a range: sqrt =1 sqr = 1 for n in 1...10000 if n > sqr sqr += 2*sqrt + 1 sqrt += 1 end # now sqrt == Math.sqrt(n).ceil #.... do whatever you need with sqrt end When looking at the source of Mathn.gcd2, (and after profiling) we can see that most of the script is spent inside Prime.succ, so another way to get a little gain is to copy the code, and replace def prime_division raise ZeroDivisionError if self == 0 ps = Prime.new value = self pv = [] for prime in ps with def prime_division value = self pv = [] for prime in Primes_under_100 (we can skip the zero check, as we do not send zero) But for me gcd is still faster (there we can get a little gain as well if we drop the .abs calls) The last thing I could do is to skip over when both n and d are even - it's a fast check that avoids expensive gcd check Inlining the function also helps a bit. This is my version: t = Time.now MAX_D = 1000 red_prop_fract = 0 for n in 2...(MAX_D/2).ceil do lower = n*2+1 upper = [n*3-1,MAX_D].min for d in lower..upper do next if (d|n) & 1 == 0 min = n max = d while min > 1 tmp = min min = max % min max = tmp end if min == 1 red_prop_fract += 1 end end end p red_prop_fract p Time.now-t

on 2007-07-26 15:26

On Jul 26, 3:32 am, "Jano Svitok" <jan.svi...@gmail.com> wrote: > For this I meant n.to_f/d that was there three times. I'm not sure if > ruby reuses the value or computes it three times. In other words, if the exact same calculation is needed in more than one place, store it instead of recalculating? Gotta remember that one... > end > end ...I can't believe I didn't even think to narrow the loop down to d's where d is less than three but more than two times larger than n. *smacks head* > min = n > end > p red_prop_fract > p Time.now-t That's a good-looking solution. I'm still getting much longer times than the two of you have cited, but I'm chalking that up to the fact that my work machine is a low- cost, out-of-the-box Dell and has worse specs. Thanks again! Andrew

on 2007-07-26 15:57

On 7/26/07, Kaldrenon <kaldrenon@gmail.com> wrote: > On Jul 26, 3:32 am, "Jano Svitok" <jan.svi...@gmail.com> wrote: > > For this I meant n.to_f/d that was there three times. I'm not sure if > > ruby reuses the value or computes it three times. > > In other words, if the exact same calculation is needed in more than > one place, store it instead of recalculating? Gotta remember that > one... Right, but it depends on calculation's complexity. sometimes creating a variable for it could be longer that the calculation itself. For best results, always use profile and Benchmark class. For me profiler says, most of the time is spent in gcd calculation. So, further optimizations should go into either avoiding gcd as much as possible, speeding it up and/or caching its results somehow (sometimes you can trade memory complexity for space). I haven't found any such optimizations. Final note: this was an exercise in optimizing as much as possible. In normal code one does not do such things, as this would complicate the maintainability of the code. It's important to weight speed vs. complexity of the code... J.