Specifying a network interface, with a http get request

diabolist · August 28, 2008, 3:10pm

Hi all, I’m fairly new to Ruby but have learnt a lot in the last month
and am enjoying the diverse features it includes.

My problem is, we have a machine with multiple IP addresses, eth0,
eth0:0 eth0:1 … and I need to be able to select which IP it uses at
runtime.

The reason for this is to be able to paralelize web page downloads from
a site that has a max of 1 hit per seccond per IP address.

The only solution I’ve thought of so far, is to set up the server with
several proxies and send all requests through that.

Is there a more elegant and efficient solution in ruby? Some way to
choose eth0 / eth0:0 for each request?

Thanks in advance for any replies.

Andy

diabolist · August 28, 2008, 10:09pm

Hi Andy,

If you are using ‘Mongrel’ as your webserver (possibly this also works
on Webrick?) than you could at least set up your HTTP servers on a
per-NIC basis (I think…)

http://mongrel.rubyforge.org/web/mongrel/files/README.html

Has an example that appears to say ‘listen on all NICs’ (0.0.0.0).

So you could listen separately to each NIC with a different web-server
(and share a single web-app )…the trick is how to divert your
different users to each NIC though? I’m not sure how Ruby (or any
runtime) is going to be able to do this without some sort of
load-balancer sorting this out.

Maybe you could have one NIC dedicated as your ‘incoming’ request card,
but then send back different URLs (which map to the different NICS) back
to your client…but that would probably cause chaos for cookies etc…)

Dunno if that gives you any ideas or not…sorry if I’m way off track !

Cheers

John

diabolist · August 29, 2008, 11:31am

Hey John,

Thanks for your response, however I think you misunderstood my post. I’m
looking for a way to bind to a specific IP for outgoing requests. That
is on the server i want to use something like the following sudo code:

def getPage(whichIP)
ip = ‘’;
case (whichIP)
when 0: ip = ‘123.456.789.001’;
when 1: ip = ‘123.456.789.002’;
when 2: ip = ‘123.456.789.003’;
default: ip = ‘123.456.789.004’;
end

soc = bind(ip,80); #bind to specific ip and port 80

soc.open(myUrl) do |sh|
return sh.read();
end
end

now if the url was for a page that had no content, except for the IP
address of the requester then the following code:

puts getPage(0);
puts getPage(1);
puts getPage(2);
puts getPage(3);

would output:
123.456.789.001
123.456.789.002
123.456.789.003
123.456.789.004

Hope that makes it clearer

Andy

diabolist · August 29, 2008, 11:37am

I’m no ruby expert , but I don’t think that could be done from ruby .
Using a linux binary , let’s say wget , how would you access a webpage
using a certain interface ? If we could figure that out , we could
figure out how to automate the whole process .

diabolist · August 29, 2008, 11:54am

I’m about to dig into the ruby source code and see how much socket
access is available. If we can use raw sockets then I should be able to
bind to the ip i want, otherwise I’m going to code a very simple proxy
handling program in c that does it for me.

as for using wget, you can use the --bind-address it seems to specify
which local IP to bind to. So that seems like a plausable option, using
IO.popen and stuff.

cheers.

Andy

diabolist · August 29, 2008, 12:56pm

As your on the client side you do not bind you connect.

More generally you should not try to modify your socket at runtime
because it is not the way it is designed to be used.
I’m not talking about how ruby implements but about underlying system
calls.

You’d rather create a pool of sockets before your main loop. And you’d
iterate over the pool during your main loop.

Regards,

Antonin

diabolist · August 29, 2008, 1:02pm

Hi Andy,

Thanks for the clarification, but I still don’t quite understand what
you are trying to do here.

You said originally:

//
The reason for this is to be able to paralelize web page downloads from
a site that has a max of 1 hit per seccond per IP address.
//

And then in a follow-up post: (Which clarified that you are writing a
HTTP client of some sort I think).

//
now if the url was for a page that had no content, except for the IP
address
//

So you seem to have control over both client and server here I think?

My networking isn’t that great, and I’m a Ruby-Newbie…but I think from
the client side this is a non-issue: the client just connects to an IP
(or host) directly and a port: if you know the IPs ahead of time, just
connect to them.

So I think then you are designing something like:

A webserver which has multiple NICs , and one (perhaps) ‘master’ NIC.
This webserver will return as plain-text (or XML or some data-format)
a list of these IP addresses.
The client is able to make an initial-request to the ‘master’ NIC/URL
to retrieve a list of other hosts (the fact they are in fact located on
the same machine is not really relevant to the client).
Once the client has this list of IPs it can just connect to them and
download whatever it wants.

So, (I think): so long as your webserver is listening on all NICs (or
you have multiple dedicated webservers listening on per NIC), you have a
fairly straight-forward programming task I think:

A servlet to generate the IP address file, which sits on the server.
Some client code to retrieve and process that IP address file and set
of some threads in parallel.

Correct ?

I guess you might want to share some sort of context between the
different parallel clients: you could use a cookie and (somesort) of
shared background context between web-servers : maybe a database, or
files in a shared directory ?

In short: I think there is essentially no big deal about getting a TCP
client to talk to a specific NIC - you just address with the IP address
and everything is just taken care of at the network layer.

I think you are writing an ad-hoc load-balancer here by the sound of
it…

Cheers

John

diabolist · August 29, 2008, 1:10pm

Antonin,

I know what you mean, i said it was sudo code, and bind was the best i
could think of to describe what I meant. However your socket on the
client side has to bind to an IP, for it to work, if you look at raw
sockets, you are able to specify which IP address your connection goes
through.

John,

I’m trying to write a http client to download pages off a server I don’t
own. The client however can only connect to the server once a seccond if
it only uses one IP, however the client is running on a machine that has
4 IP addresses, defined as eth0, eth0:0, eth0:1, eth0:2, I want the
client to send the request from a different one of those each time. The
example I gave about the server returning the IP address was just an
example to show what should be returned.

I think the answer is to use wget with the --bind-address option,
although I have yet to test this due to debugging other things atm.

If that doesn’t work then running a proxy that i write in c, that uses
raw sockets is the way forward.

Thanks for all your help

Andy

diabolist · August 29, 2008, 1:28pm

Its not running as a web request, the code for this is executed via the
terminal, and as wget has nothing to do with apache / mongrel it won’t
be able to object to what IP is used reguardless.

Andy

diabolist · August 29, 2008, 1:23pm

Hi Andy,

If you use IP addresses (rather than a host), and those IP-addresses are
bound to single NICs, and you know these IP addresses, then I think
you don’t have to do anything more complicated then literally specify
the IP addresses when you connect…

So, if you can do (as you say ‘wget’):

telnet ip0 80
telnet ip1 80

And issue a :

GET HTTP://ip0/webapp HTTP/1.0

Then you are away.

Once complication I can think off, which is sometimes enforced on
webservers is that the ‘GET’ request has to correspond to the host you
have connected to.

I mean:

www.mywebsite.com, might be on IP1, IP2, so you could get to it (in a
browser) like:

http://www.mywebsite.com
-or-
http://x1.x1.x1.x1
-or-
http://x2.x2.x2.x2

Your browser will implicitly issue a ‘GET’ command based on the host you
provided in the URL. The server may verify this is the same.

So, if you were to ‘telnet’ to http://www.mywebsite.com on port 80 but
then issue a GET like this:

GET http://x1.x1.x1.x1 HTTP/1.0

The server may reject you (since you connected on hostname, but tried to
GET a different ‘host’)

So your program may have to take this into account. (ie, you may need to
rewrite your URLs dependant on what host/IP you are talking to )

I hope that makes sense , and I hope I’m not wildly inaccurate
there…(as I say, not a n/w expert…)

John

diabolist · August 29, 2008, 1:58pm

Hey John, Yeah i’ve worked out the solution now using wget.

diabolist · August 29, 2008, 1:48pm

Andrew Parlane wrote:

Its not running as a web request, the code for this is executed via the
terminal, and as wget has nothing to do with apache / mongrel it won’t
be able to object to what IP is used reguardless.

Andy

Sorry Andy , I’m lost to what you are asking here, I thought you were
doing HTTP requests from your initial description. (Also wget is a
cmd-line HTTP client…)

//The reason for this is to be able to paralelize web page downloads
from
a site that has a max of 1 hit per seccond per IP address.//

No worries I think you have worked out what you need to do from your
earlier posts…

Cheers

John

diabolist · September 1, 2008, 6:44pm

On Fri, Aug 29, 2008 at 08:06:05PM +0900, Andrew Parlane wrote:

I’m trying to write a http client to download pages off a server I don’t
own. The client however can only connect to the server once a seccond if
it only uses one IP, however the client is running on a machine that has
4 IP addresses, defined as eth0, eth0:0, eth0:1, eth0:2, I want the
client to send the request from a different one of those each time. The
example I gave about the server returning the IP address was just an
example to show what should be returned.

If this is something that you are going to need to do for an extended
period of time, you may think about using a tinyproxy + haproxy
solution. Bind an instance of tinyproxy to each external IP address,
and then have haproxy load balance between the 4 instances. Then your
code could just hit the haproxy incoming port and it would be sent to
one of the outgoing ip addresses automatically. Something like:

( haproxy : http://haproxy.1wt.eu/ )
haproxy proxy has a configuration somewhat like:

listen proxy-in 10.0.0.1:10000
mode http
balance roundrobim
server tiny1 10.0.0.1:10001
server tiny2 10.0.0.1:10002
server tiny3 10.0.0.1:10003
server tiny4 10.0.0.1:10004

( tinyproxy : http://www.banu.com/tinyproxy/ )
Then you have 4 tinyproxy instances:

    listen at         proxy to
#1  10.0.0.1:10001    192.168.1.1
#2  10.0.0.1:10002    192.168.1.2
#3  10.0.0.1:1000e    192.168.1.3
#4  10.0.0.1:10004    192.168.1.4

I’ve done things like this before and it works out pretty well. Then
your download code just hits 10.0.0.1:1000 as if it were an http proxy
and you’re good to go.

enjoy,

-jeremy