Nginx mogilefs module 1.0.1


#1

Greetings!

I’ve managed to implement module for nginx which fetches files from
MogileFS.

Please see module’s manual page on English or my nginx modules page:

http://www.grid.net.ru/nginx/mogilefs.en.html
http://www.grid.net.ru/nginx/

I expect minor issues to pop up in near future, therefore I advise you
to test module before putting into production mode.

I would like to thank to Michael S. for the idea.

I hope you’ll enjoy it!

And your feedback is always welcomed.


#2

Hello!

On Thu, Apr 16, 2009 at 11:39:31AM +0100, Valery K. wrote:

I would like to thank to Michael S. for the idea.

I hope you’ll enjoy it!

And your feedback is always welcomed.

Not really used (and not likely to in near future), but here are
some questions:

  1. Any reason why you create hidden location from the module instead
    of accepting name of existing one? It looks unnatural for me.

  2. As far as I see it uses only first path returned by mogilefs.
    Is it planned to support failover? From my understanding it
    should be simple, something like

    location /mogilefs {
    mogilefs_tracker …
    mogilefs_pass /mogilefs_fetch;
    }
    location /mogilefs_fetch {
    error_page 502 503 504 = @failover;
    proxy_pass $mogilefs_path_0;
    }
    location @failover {
    proxy_pass $mogilefs_path_1;
    }

Maxim D.


#3

----- “Maxim D.” removed_email_address@domain.invalid wrote:

Not really used (and not likely to in near future), but here are
some questions:

  1. Any reason why you create hidden location from the module instead
    of accepting name of existing one? It looks unnatural for me.

There are several reasons:

  1. The other way around looks unnatural to me :slight_smile:
  2. People will tend to forget internal; directive, leaving fetch
    locations open to public, which is kinda security hole. You did it as
    well, by the way:) I don’t feel comfortable with it.

In alpha version of this module, however, mogilefs_pass took location
name as an argument as you describe. This also allowed to compile module
with nginx 0.6.x.

I don’t know where the balance is at the moment. I don’t have enough
feedback.

    proxy_pass $mogilefs_path_0;
}
location @failover {
    proxy_pass $mogilefs_path_1;
}

That’s actually good idea, I’ll implement it. The only thing I’d love to
have to do this is access to parametric variables from modules.


#4

Hello!

On Thu, Apr 16, 2009 at 01:49:46PM +0100, Valery K. wrote:

  1. The other way around looks unnatural to me :slight_smile:
  2. People will tend to forget internal; directive, leaving fetch locations open to public, which is kinda security hole. You did it as well, by the way:) I don’t feel comfortable with it.

In alpha version of this module, however, mogilefs_pass took location name as an argument as you describe. This also allowed to compile module with nginx 0.6.x.

I don’t know where the balance is at the moment. I don’t have enough feedback.

They will be unable to fetch anything without appropriate variable
correctly set.

And personally I think that security is quite a different thing
and should be handled in the way admin prefers. It may be
internal, may be allow/deny, may be something else. That’s why I
usually omit internal from the examples. And any magic is really
bad here since people may think that software will handle security
for them while it actually can’t.

On the other hand, it may be handy to actually have this location
non-internal (e.g. for direct requests from internal services or
just admin checks).

    proxy_pass $mogilefs_path_0;
}
location @failover {
    proxy_pass $mogilefs_path_1;
}

That’s actually good idea, I’ll implement it. The only thing I’d love to have to do this is access to parametric variables from modules.

For this particular case I suppose it will be simple enough to
register just 10 variables with appropriate names.

Alternatively, this may be handled by something like

set $mogilefs_failover 1;
proxy_pass $mogilefs_path;

with appropriate lookup of $mogilefs_failover in code before
returning value for $mogilefs_path.

Maxim D.


#5

Hello!

On Thu, Apr 16, 2009 at 02:38:34PM +0100, Valery K. wrote:

[…]

Alternatively, this may be handled by something like

set $mogilefs_failover 1;
proxy_pass $mogilefs_path;

with appropriate lookup of $mogilefs_failover in code before
returning value for $mogilefs_path.

This is impossible, because I do not evaluate $mogilefs_path variable dynamically. nginx discards all modules’ context when it does an internal redirect. Instead, I simply assign a value to variable and the value survives during internal redirect.

You may preserve old context by (surprise!) assigning it to a
variable. But it seems overkill for me, too. :slight_smile:

Maxim D.


#6

----- “Maxim D.” removed_email_address@domain.invalid wrote:

And personally I think that security is quite a different thing
and should be handled in the way admin prefers. It may be
internal, may be allow/deny, may be something else. That’s why I
usually omit internal from the examples. And any magic is really
bad here since people may think that software will handle security
for them while it actually can’t.

On the other hand, it may be handy to actually have this location
non-internal (e.g. for direct requests from internal services or
just admin checks).

You might be right. After all, I disclaimed any responsibility for
security damages in the license.

Will see whether I’ll get any other comments regarding this part.

Alternatively, this may be handled by something like

set $mogilefs_failover 1;
proxy_pass $mogilefs_path;

with appropriate lookup of $mogilefs_failover in code before
returning value for $mogilefs_path.

This is impossible, because I do not evaluate $mogilefs_path variable
dynamically. nginx discards all modules’ context when it does an
internal redirect. Instead, I simply assign a value to variable and the
value survives during internal redirect.


#7

----- “Maxim D.” removed_email_address@domain.invalid wrote:

with appropriate lookup of $mogilefs_failover in code before
returning value for $mogilefs_path.

This is impossible, because I do not evaluate $mogilefs_path
variable dynamically. nginx discards all modules’ context when it does
an internal redirect. Instead, I simply assign a value to variable and
the value survives during internal redirect.

You may preserve old context by (surprise!) assigning it to a
variable. But it seems overkill for me, too. :slight_smile:

Yes. This looks like a dirty hack. Personally, I was surprised when I
discovered that redirects clear contexts.


#8

On Thu, Apr 16, 2009 at 5:49 AM, Valery K.
removed_email_address@domain.invalid wrote:

    proxy_pass $mogilefs_path_0;
  }
  location @failover {
    proxy_pass $mogilefs_path_1;
  }

That’s actually good idea, I’ll implement it. The only thing I’d love to have to do this is access to parametric variables from modules.

There should be no need for multiple $mogilefs_path’s as the tracker
supplies the locations that nginx should be proxying to…

However, failover to a non-mogilefs source does make sense. in this
case it would be something like this I think:

error_page 502 503 504 = /maintenance.html;

Or something of that nature?

Remember, mogilefs already has its own intelligence built in for
redundancy. All nginx has to do is take the list of mogstoreds
(storage nodes that listen over http for basic webdav commands, which
can be use nginx for that too :)) and try them in either a) the order
given, b) opposite order or c) random order - the choice should come
from the feedback from dormando or someone knowledgable with the
mogilefs code. I am not sure if the tracker arbitrarily gives a list
of URLs or if there is any context behind it.


#9

On Thu, Apr 16, 2009 at 9:05 AM, Michael S. removed_email_address@domain.invalid
wrote:

There should be no need for multiple $mogilefs_path’s as the tracker
supplies the locations that nginx should be proxying to…

which probably means something like:

location /mogilefs {
mogilefs_tracker …
mogilefs_pass $mogilefs_url; <- this would be an array/list
of urls (like an nginx upstream construct)
}

location /mogilefs_fetch {
error_page 502 503 504 = @failover;
proxy_pass $mogilefs_path_0; <- this makes no sense (in my
opinion)
}

you can’t arbitrarily assume that people have only 2 or 10 copies of
the files available. unless the tracker has a limit of how many it
replies to, then you would have up to $mogilefs_path_X; but i see it
much better to take what is given and create something like an
upstream{} internally for it, and then mogilefs_pass is essentially
proxy_pass to the upstream at that point.

you might not even need mogilefs_pass then (unless it does additional
work) as it should technically be an upstream{} created on the fly in
memory for that request, and it would be just like proxy_pass
@mogilefs_reply; or something?


#10

----- “Michael S.” removed_email_address@domain.invalid wrote:

Remember, mogilefs already has its own intelligence built in for
redundancy. All nginx has to do is take the list of mogstoreds
(storage nodes that listen over http for basic webdav commands, which
can be use nginx for that too :)) and try them in either a) the order
given, b) opposite order or c) random order - the choice should come
from the feedback from dormando or someone knowledgable with the
mogilefs code. I am not sure if the tracker arbitrarily gives a list
of URLs or if there is any context behind it.

Regardless of the fact, that tracker returns path to the host with least
load, it would be worth to try secondary locations, since there could
some network configuration or routing issue suddenly appear.

However, sysads must be punished for network configurations, where a
situation can appear such as tracker can contact storage nodes and
frontends not.

Ideally, every frontend node must have a clone of distributed fs repo,
which it can contact locally. I don’t know whether mogile can do this.


#11

On Thu, Apr 16, 2009 at 9:34 AM, Valery K.
removed_email_address@domain.invalid wrote:

Regardless of the fact, that tracker returns path to the host with least load, it would be worth to try secondary locations, since there could some network configuration or routing issue suddenly appear.

exactly - that’s why it should be an nginx upstream {} construct
probably; and your mogilefs_timeout or whatever settings would
essentially be the same parameters given normally with the upstream
parameters of timeout and such or whatever. that way nginx is using
it’s “smart” retry-multiple-upstreams functionality. basically
mogilefs is just giving us a dynamic list of available upstreams for a
specific URI. that’s in a nutshell i think 90% of what this module
does, is translating a URI until a mogilefs key and domain and asking
a tracker where it is, then offloading it to standard nginx
proxying… if that is possible

However, sysads must be punished for network configurations, where a situation can appear such as tracker can contact storage nodes and frontends not.

Ideally, every frontend node must have a clone of distributed fs repo, which it can contact locally. I don’t know whether mogile can do this.

to me, this is up to how they want to configure their mogilefs
installation. you can create domains with 2 or 3 or N replicas of the
file in question.

for fallbacks i would use standard nginx fallback functionality using
error_page or whatever.