FreeBSD performance: OAuth2 calls are extremely slow!

Hello,

I’m developing a sinatra application. When deployed on a FreeBSD server
the OAuth calls slow down incredibly. I don’t know why this happens and
I’m not even sure where or what to start looking for.
The program does a series of OAuth2 calls. I wrote a script to test an
OAuth2 call and logged the results:


#!/usr/bin/env ruby

require ‘./pritory’
require ‘benchmark’

x = Skroutz::Query.new
Benchmark.bm do |bm|
bm.report do
# Fetch price and print!
x.skroutz_check ‘3517212’ # => returns price String e.g. ‘72.12’
end
end

OSX

ruby --version && uname -a

ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-darwin13.0.0]
Darwin gauss.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3
21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64

macosx-dev> ruby test.rb
user system total real
0.110000 0.020000 0.130000 ( 4.564007)

FreeBSD

ruby --version && uname -a

ruby 2.1.2p95 (2014-05-08 revision 45877) [amd64-freebsd10]
FreeBSD myhostname.net 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu
Jan 16 22:34:59 UTC 2014
[email protected]:/usr/obj/usr/src/sys/GENERIC amd64

freebsd-10-prod> ruby test.rb
user system total real
0.117188 0.007812 0.125000 ( 71.876525)

The module code can be found here:

Any ideas where/what to start looking for?

The FreeBSD server runs on a VM as a VPS. There are other applications
without problems (sinatra apps + ruby scripts).

Panagiotis (atmosx) Atmatzidis

email: [email protected]
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

“As you set out for Ithaca, hope the voyage is a long one, full of
adventure, full of discovery […]” - C. P. Cavafy

Panagiotis A. [email protected] wrote:

macosx-dev> ruby test.rb
user system total real
0.110000 0.020000 0.130000 ( 4.564007)
freebsd-10-prod> ruby test.rb
user system total real
0.117188 0.007812 0.125000 ( 71.876525)

user + system times look similar between them.

The module code can be found here:
pritory/helpers/skroutz.rb at master · atmosx/pritory · GitHub

Any ideas where/what to start looking for?

The FreeBSD server runs on a VM as a VPS. There are other applications without
problems (sinatra apps + ruby scripts).

I suspect different latencies to api.skroutz.gr
Can you check ping times to api.skroutz.gr from your server
vs your dev machine?

Hello Eric,

Thanks for the reply.

On 12 Σεπ 2014, at 24:53 , Eric W. [email protected] wrote:

The module code can be found here:
pritory/helpers/skroutz.rb at master · atmosx/pritory · GitHub

Any ideas where/what to start looking for?

The FreeBSD server runs on a VM as a VPS. There are other applications without
problems (sinatra apps + ruby scripts).

I suspect different latencies to api.skroutz.gr
Can you check ping times to api.skroutz.gr from your server
vs your dev machine?

I’m on the bus now to Vienna airport, connected via bluetooth-tethering
from an iPhone (3G probably). Ping replies are normal though.

MacOSX dev machine

ping -c 10 api.skroutz.gr
[…]
— skroutz.gr ping statistics —
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 257.541/380.245/542.174/111.240 ms
time curl https://api.skroutz.gr
curl https://api.skroutz.gr 0,03s user 0,01s system 1% cpu 3,082 total

FreeBSD Production

ping -c 10 api.skroutz.gr
[…]
— skroutz.gr ping statistics —
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 52.895/53.028/53.196/0.093 ms

time curl https://api.skroutz.gr
curl https://api.skroutz.gr 0,05s user 0,02s system 2% cpu 8,190 total

Generally speaking there is some latency in the server but doesn’t
justify almost 60+ seconds of difference :-/

Panagiotis (atmosx) Atmatzidis

email: [email protected]
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

“As you set out for Ithaca, hope the voyage is a long one, full of
adventure, full of discovery […]” - C. P. Cavafy

Hello,

On Sep 12, 2014, at 10:31, Eric W. [email protected] wrote:

Panagiotis A. [email protected] wrote:

Generally speaking there is some latency in the server but doesn’t
justify almost 60+ seconds of difference :-/

I’d take a look at truss/ktruss/dtruss or whatever system call tracer
which gives timings on FreeBSD and see what syscalls take long.
Because based on the system+user times, it is waiting something
external.

Run a dtruss. I didnt look at it carefully yet. Im not familiar with
this kind of things, it will probably take me some time to figure out
whats happening. Please take a look and let me know if theres anything
that jumps out directly. What I found odd are the path errors are every
call. I dont see those on my MacOSX system. NOTE: thats a 'dtruss -a
output.

https://gist.github.com/atmosx/93e7f202539d8d8a0d47

I would also take a look at the Faraday gem + dependencies which it
seems to use for HTTP access. It could be misconfigured TCP knobs
(TCP_NODELAY may be needed, or forgot to uncork the socket on
TCP_NOPUSH).

I suppose you can also try doing the equivalent with the excellent
Net::HTTP::Persistent gem .

I tried using open-uri to do download a file of 1024kb side. Net:Http
made about 12s while OpenURI about 22s not that system.

(sorry I don’t have any experience with Faraday, but it looks like
you can get it to use different backends…)

I will install FreeBSD on virtual box this weekend see if I am able to
reproduce the results Im getting on my VPS and see from there.

Thanks for your time!

Panagiotis (atmosx) Atmatzidis

email: [email protected]
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

“As you set out for Ithaca, hope the voyage is a long one, full of
adventure, full of discovery […]” - C. P. Cavafy

https://gist.github.com/atmosx/93e7f202539d8d8a0d47#file-gistfile1-pytb-L5863

Panagiotis A. [email protected] wrote:

Generally speaking there is some latency in the server but doesn’t
justify almost 60+ seconds of difference :-/

I’d take a look at truss/ktruss/dtruss or whatever system call tracer
which gives timings on FreeBSD and see what syscalls take long.
Because based on the system+user times, it is waiting something
external.

I would also take a look at the Faraday gem + dependencies which it
seems to use for HTTP access. It could be misconfigured TCP knobs
(TCP_NODELAY may be needed, or forgot to uncork the socket on
TCP_NOPUSH).

I suppose you can also try doing the equivalent with the excellent
Net::HTTP::Persistent gem .

(sorry I don’t have any experience with Faraday, but it looks like
you can get it to use different backends…)

Panagiotis A. [email protected] wrote:

Run a ???dtruss???. I didn???t look at it carefully yet. I???m not familiar with
this kind of things, it will probably take me some time to figure out what???s
happening. Please take a look and let me know if there???s anything that jumps out
directly. What I found odd are the path errors are every call. I don???t see those
on my MacOSX system. NOTE: that???s a 'dtruss -a??? output.

https://gist.github.com/atmosx/93e7f202539d8d8a0d47

As Ryan pointed out, select is taking a long time so that means it’s
waiting on I/O of some sort. Unfortunately your dtruss output isn’t
outputting more detail about which FDs select is waiting on.

I would look into more verbose tracing options of dtruss
(probably equivalent to “strace -v -T” for Linux).

One more thing: have you checked if the api.skroutz.gr endpoint is
throttling your server IP for giving it too much traffic?

I would also take a look at the Faraday gem + dependencies which it
seems to use for HTTP access. It could be misconfigured TCP knobs
(TCP_NODELAY may be needed, or forgot to uncork the socket on
TCP_NOPUSH).

I suppose you can also try doing the equivalent with the excellent
Net::HTTP::Persistent gem .

I tried using ???open-uri??? to do download a file of 1024kb side.
Net:Http made about 12s while OpenURI about 22s not that system.

s/not that system/on that system/ ?

Hello,

On Sep 12, 2014, at 22:24, Eric W. [email protected] wrote:

Run a dtruss. I didnt look at it carefully yet. Im not familiar with this kind
of things, it will probably take me some time to figure out whats happening.
Please take a look and let me know if theres anything that jumps out directly.
What I found odd are the path errors are every call. I dont see those on my MacOSX
system. NOTE: thats a 'dtruss -a output.

https://gist.github.com/atmosx/93e7f202539d8d8a0d47

As Ryan pointed out, select is taking a long time so that means it’s
waiting on I/O of some sort. Unfortunately your dtruss output isn’t
outputting more detail about which FDs select is waiting on.

I would look into more verbose tracing options of dtruss
(probably equivalent to “strace -v -T” for Linux).

Thanks for the explanation. I tried several different outputs by
reading the man page. I think this might be a little more helpful:

$ sudo dtruss -ascf ./test.rb
[ huge file ]
100364 95102/100683: 1863430 55365 10 select(0xE, 0x805D7D880,
0x0) = 1 0
100365
100366 0x80188606a
100367 libc.so.70x8009abe2d 100368 libc.so.70x8009ac2a2
100369 libc.so.7getdiskbyname+0x3ad 100370 libc.so.70x8009ab486
100371 libc.so.70x8008a962f 100372 0x80402c1f2 100373 libc.so.70x8009a2250
100374 libc.so.70x8009a1e42 100375 libc.so.70x80098de58
100376 libc.so.70x80099a824 100377 libc.so.70x80099b125
100378 libc.so.7fmtmsg+0x201 100379 libc.so.7fmtmsg+0x12e
100380 envmalloc+0x9df 100381 envmalloc+0x8af
100382 ld-elf.so.1`0x80061d000
100383
[ huge file ]

You can see the entire output here:
https://gist.githubusercontent.com/atmosx/e32f494b17f5a9f4b1f4/raw/dtruss
(but its kinda huge, I used vim to find select system calls)
Im not sure what the problem however, its can be reproduced only on my
VPS[1]. Other people with FreeBSDs are generally slower than Linux but
by 1-2s not.

I dont this is something related to ruby. I think there something wrong
with the VPS. The virtualisation engine maybe, who knows.

One more thing: have you checked if the api.skroutz.gr endpoint is
throttling your server IP for giving it too much traffic?

No Im 100% thats not an issue. Im respecting their limits + my calls are
to slow to reach their limit anyway.

Net:Http made about 12s while OpenURI about 22s not that system.

s/not that system/on that system/ ?

Yes :slight_smile: See [1] for details! The first system is the VPS (14s) while on
other systems takes about 5-6s realtime.

[1] https://gist.github.com/atmosx/14efea27eb2c1e38af09

Panagiotis (atmosx) Atmatzidis

email: [email protected]
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

“As you set out for Ithaca, hope the voyage is a long one, full of
adventure, full of discovery […]” - C. P. Cavafy

Hello!

On Sep 13, 2014, at 21:10, Robert K. [email protected]
wrote:

91089/100320: 1981020 53532 16 kevent(0xD, 0x7FFFFFFFB6A8, 0x1) = 1 0
0x1C) = -1 Err#64
There are quite a few kevent calls that take more than one second (unit is us).
The two entries which account for almost all the time:
Total Average - ELAPSED 5.364
Total Count - ELAPSED 5905

Kind regards

robert


[guy, jim].each {|him| remember.him do |as, often| as.you_can - without end}
http://blog.rubybestpractices.com/

Thanks for taking the time to help out with this Robert! I really
appreciate it!

Panagiotis (atmosx) Atmatzidis

email: [email protected]
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

“As you set out for Ithaca, hope the voyage is a long one, full of
adventure, full of discovery […]” - C. P. Cavafy

On Fri, Sep 12, 2014 at 5:52 PM, Panagiotis A.
[email protected] wrote:

Run a ‘dtruss’. I didn’t look at it carefully yet. I’m not familiar with
this kind of things, it will probably take me some time to figure out what’s
happening. Please take a look and let me know if there’s anything that jumps
out directly. What I found odd are the path errors are every call. I don’t
see those on my MacOSX system. NOTE: that’s a 'dtruss -a’ output.

https://gist.github.com/atmosx/93e7f202539d8d8a0d47

Some preliminary analysis:

$ awk ‘NR<=3 || ($3 ~ /^[0-9]*$/ && $3>1000) {print}’ gistfile1.pytb.txt
$ sudo dtruss -a ./test.rb
dtrace: 3387 dynamic variable drops with non-empty dirty list
PID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return
91089/100320: 1980466 1019754 11 kevent(0xD, 0x7FFFFFFFB6A8, 0x1)
= 0 0
91089/100320: 1980595 369752 15 kevent(0xD, 0x7FFFFFFFB6A8, 0x1)
= 1 0
91089/100320: 1980764 1609894 12 kevent(0xD, 0x7FFFFFFFB6A8, 0x1)
= 1 0
91089/100320: 1980894 1019483 12 kevent(0xD, 0x7FFFFFFFB6A8, 0x1)
= 0 0
91089/100320: 1981020 53532 16 kevent(0xD, 0x7FFFFFFFB6A8, 0x1)
= 1 0
91089/100320: 1981447 53020 65 connect(0xD, 0x804B01FB0, 0x10)
= 0 0
91089/100320: 1997891 57567 12 select(0xE, 0x805E4BB00, 0x0) =
1 0
91089/100320: 1999566 53752 7 select(0xE, 0x805E8E100, 0x0) =
1 0
91089/100320: 2000819 59065 12 select(0xE, 0x805F2DB80, 0x0) =
1 0
91089/100320: 2003865 2992714 21 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 1 0
91089/100320: 2003989 1019412 8 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 0 0
91089/100320: 2004089 53489 14 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 1 0
91089/100320: 2004251 1926141 12 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 1 0
91089/100320: 2004372 1019526 8 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 0 0
91089/100320: 2004674 15598747 31 connect(0xD, 0x8055EC3B0,
0x1C) = -1 Err#64
91089/100320: 2004812 49906 65 connect(0xD, 0x805D5F9F0, 0x10)
= 0 0
91089/100320: 2023067 50707 13 select(0xE, 0x807A12B00, 0x0) =
1 0
91089/100320: 2024124 142633 15 select(0xE, 0x807A95400, 0x0) =
1 0
91089/100320: 2031193 1009502 11 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 0 0
91089/100320: 2031321 78420 19 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 1 0
91089/100320: 2031505 1911222 17 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 1 0
91089/100320: 2031684 1019338 12 kevent(0xD, 0x7FFFFFFFB778, 0x1)
= 0 0
91089/100320: 2032104 50128 82 connect(0xD, 0x808E30570, 0x10)
= 0 0
91089/100320: 2032683 389215 13 select(0xE, 0x807A95F80, 0x0) =
1 0

There are quite a few kevent calls that take more than one second (unit
is us).

Then I did

ruby -nae ‘printf(“%d;%d;%s\n”, Integer($F[1]), Integer($F[2]),
$F[4][/^\w+/]) rescue 1’ gistfile1.pytb.txt

and imported the data into a spreadsheet. You can find it here:

https://drive.google.com/file/d/0B7Q7WZzdIMlIUzhnUGxyWXY1WlU/edit?usp=sharing

The two entries which account for almost all the time:

kevent Sum - ELAPSED 15.102.889
Average - ELAPSED 943.931
Count - ELAPSED 16

connect Sum - ELAPSED 15.752.179
Average - ELAPSED 656.341
Count - ELAPSED 24

Total Sum - ELAPSED 31.673.152
Total Average - ELAPSED 5.364
Total Count - ELAPSED 5905

Kind regards

robert

My SWAG is a problem with DNS either on the VPS or on the host you are
connecting to. Perhaps the oauth host is trying to look up the PTR for
your IP and one isn’t present.
On Sep 13, 2014 3:20 PM, “Panagiotis A.” [email protected]