Postgres match with regular expression

Hi guys,

I am working on a problem, that I would like to see if anyone might be
able to help me with.

I have a simple rails application that uses a controller to find a
record in a database. I grab the path that the client is trying to
access, and then I look in a database to see if I have a record that
matches this path and return an appropriate response.

The table looks like this (simplified example):

| path | response |

| /users | { users } |

This is all well and nice, and works the way it should, to expose an
extremely simple REST service. What I would like to do now is to be
able to insert a regular expression, that I will then use to match
against the path. That way having a record value like //users/(\d*)/
to match a client request path like /users/1234.

| path | response |

| //users/(\d*)/ | { users } |

I first looked at

(section 9.7.2), but that is the reverse problem, where the input is a
regular expression.

Does my problem make sense, and is it even possible?

Thanks,
JP

On Nov 3, 8:11am, JP [email protected] wrote:

The table looks like this (simplified example):

| path | response |

| //users/(\d*)/ | { users } |

I first looked
athttp://www.postgresql.org/docs/8.3/static/functions-matching.html
(section 9.7.2), but that is the reverse problem, where the input is a
regular expression.

Does my problem make sense, and is it even possible?

The challenge here is that there’s no DB (that I’m aware of) that can
meaningfully index regexes, meaning that every query against the table
is going to be a full table scan. How bad that is depends on how big
your table is:

  • if it’s just a couple dozen records, it won’t be too slow. You may
    want to just load all the objects and scan them on the Ruby side, as
    the implementation will be much more straightforward and performance
    won’t be an issue.

  • if it’s tens of thousands of records, you’re probably screwed. DB-
    side lookups will still be slow, and loading everything into memory
    will also be slow.

The MySQL manual mentions that you can use a DB column for the pattern
in a REGEXP expression, so that might be looking into.

–Matt J.

able to insert a regular expression, that I will then use to match
against the path. That way having a record value like //users/(\d*)/
to match a client request path like /users/1234.

| path | response |

| //users/(\d*)/ | { users } |

I first looked
athttp://www.postgresql.org/docs/8.3/static/functions-matching.html
(section 9.7.2), but that is the reverse problem

I’m not sure what problem you’re trying to solve with an approach like
this, but a quick test confirms that a query such as:

select response
from paths
where ‘/users/1234’ similar to path

would actually work. Of course, it will require setting up a custom
sql condition, but that’s not really a huge deal. As for the previous
reply, a sequential table scan on 10k records isn’t a big deal for
PostgreSQL on any modern machine, in fact I rarely even think about
indices until tables are at least approaching 10k records and usually
well beyond.

Jim Crate

Thanks guys…! I am going to give this a try! I should of course just
have tried this out. Will give it a go.

I am not going to be anywhere near 10k rows, but probably rather a few
hundred rows. So table scan won’t be that big a deal, and this is
never going to be a performance critical system, but rather one used
in test environments.

Thanks again!

/JP

On Fri, Nov 4, 2011 at 3:14 PM, Matt J. [email protected] wrote:

The challenge here is that there’s no DB (that I’m aware of) that can
meaningfully index regexes, meaning that every query against the table
is going to be a full table scan. How bad that is depends on how big
your table is:

Postgres can handle this kind of lookup, though it is more complex
than the normal situation. Postgres uses customisable indexes, so you
can specify things exactly as you need for specific lookups.


Simon R. http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

I’d be interested in getting a bit more information about what sort of
custom SQL condition
or customized index is required to get this to work.

In Rails 2.x, you would do something like:

Path.find(:first, :conditions => [’? similar to path’, accessed_path])

I haven’t had to do custom SQL conditions in Rails 3.x yet so I can’t
help with that. I’m also not sure what kind of index you’d use to
query like that, the PostgreSQL mailing list would be a better place
to ask.

Jim Crate

On Nov 5, 4:41am, Jim [email protected] wrote:

I’m not sure what problem you’re trying to solve with an approach like
this, but a quick test confirms that a query such as:

select response
from paths
where ‘/users/1234’ similar to path

would actually work. Of course, it will require setting up a custom
sql condition, but that’s not really a huge deal.

I’d be interested in getting a bit more information about what sort of
custom SQL condition
or customized index is required to get this to work.

Thanks…
ESB