A complex availability problem

Ok, this is a hard one and I just thought I’d see if people on the
list had any suggestions on how they would approach this, I’ve not
started to implement the rails to handle this yet as I’m still at the
paper planning phase.

I’m building an app that has to handle availability for travel packages.

The complexity comes in that a person can choose a start date for a
package and how long they wish to stay. The packages themselves have
availability for each start date. The availability needs to be
calculated such that if person A chooses date 1 which has an
availability of 2 places and wishes to stay for 4 weeks, but the 4th
week doesn’t have any availability, then the user is only offered a
choice of 1-3 weeks for that particular date. If there is
availability from the 5th week to the 10th week then the user could
choose the 5th week as a start date and book for up to 5 weeks from
then, but if they took up availability on the 7th week then no-one
else could book the same package (5 weeks from the 5th week) and
would only be offered a choice of up to 2 weeks if they wanted to
choose the 5th week as a start date.

The way this has been approached in the current system is to create a
separate table that stores availability day by day for each package
updated by a stored procedure after a booking is taken. However this
completely violates data normalisation rules and potentially creates
the update/delete errors that can arise from duplicating data and not
keeping it atomic.

Hope thats clear, I can try to clarify if not.

David

David S.
w: http://davidsmalley.com/blog

Hi David.

The way this has been approached in the current system is to create
a separate table that stores availability day by day for each
package updated by a stored procedure after a booking is taken.
However this completely violates data normalisation rules and
potentially creates the update/delete errors that can arise from
duplicating data and not keeping it atomic.

Could you sketch out this part in pseudo-SQL?

*m

On 20 Feb 2006, at 09:49, Manuel H. wrote:

Hi David.

The way this has been approached in the current system is to
create a separate table that stores availability day by day for
each package updated by a stored procedure after a booking is
taken. However this completely violates data normalisation rules
and potentially creates the update/delete errors that can arise
from duplicating data and not keeping it atomic.
Could you sketch out this part in pseudo-SQL?

It uses the following tables and attributes:

Booking: (id,…, package_id, status) [The booking only takes up
availability if the status is set to active]
Package: (id,…)
Availabilities: (id, package_id, datefrom, dateto, capacity)
DailyAvailabilties: (id, package_id, date, available, allocated,
daysremaining)
PackageDailyAvailability: (id, availabilities_id, date, available,
allocated, daysremaining)

The current database is horrendously complicated and very overworked
in my opinion. There are countless tables I just can’t figure out how
they link due to an inconsistent naming scheme and lack of
documentation. The previous developers are unwilling to revel how
they implemented it (as they are being replaced!) and the stored
procedures that update availability are encrypted in SQL server.

I know there must be a better approach to what they have done! It’s
just very hard to suss out, especially once you read what they’ve
done and started to get thoroughly confused.

David


David S.
w: http://davidsmalley.com/blog

Begin forwarded message:

Could you sketch out this part in pseudo-SQL?
allocated, daysremaining)
There also appears to be a table which links a booking to an
availability.

BookingDetails(id, booking_id, availabilities_id)

I can’t really make head-nor-tails of the SQL dump I have been given
so apologies if this is inconsistent.


David S.
w: http://davidsmalley.com/blog

On 20 Feb 2006, at 11:36, Manuel H. wrote:

The first thing I see is that you are dealing with very complicated
data. In math-speak you are dealing with “bags” of intervals (a bag
being a “multiset” which can contain the same element more than
once). Intervals are complicated but that you might have multiple,
identical ones does not make things simpler.

> Regards, > > Manuel > _______________________________________________ > Rails mailing list > [email protected] > http://lists.rubyonrails.org/mailman/listinfo/rails

Thanks, I’ll digest this

David S.
w: http://davidsmalley.com/blog

Am 20.02.2006 um 11:16 schrieb David S.:

done and started to get thoroughly confused.
The first thing I see is that you are dealing with very complicated
data. In math-speak you are dealing with “bags” of intervals (a bag
being a “multiset” which can contain the same element more than
once). Intervals are complicated but that you might have multiple,
identical ones does not make things simpler.

Let’s only consider a single package. Your data looks like this when
abstracted

[2006-01-01, 2006-02-23]
[2006-02-12, 2006-03-23]
[2006-03-23, 2006-04-23]
[2006-01-13, 2006-06-23]

Let’s say these are four rooms or slots for this package in your
booking system.

Now, if you get a booking - say of the second interval, you might end
up with data like the following (assuming someone books from
2006-02-14 to 2006-03-04):

[2006-01-01, 2006-02-23]
[2006-02-12, 2006-02-13], [2006-03-05, 2006-03-23]
[2006-03-23, 2006-04-23]
[2006-01-13, 2006-06-23]

As we can see, you are getting more intervals - d’oh.

You can imagine how much more complicated all this will get with
multiple packages and of course with more slots (I assume there are
more than hundred concurrent bookings for a package for medium sized
hotels).

As far as I can see, you can approach storing the general data in
four ways.

(I) First, you can store intervals, but since there might be more
than one identical interval, this might end up ugly (storing the same
thing twice is a nono in relational databases). You’d get something
like this:

interval: (from, to, count)

Now, if person A books an interval in the middle of the other
interval, you might end up with the following intervals:

1.) the interval before the booking with count = old_count
2.) the interval after the booking with count = old_count
3.) the interval with the same interval of the boking with count =
old_count-1

Whee, and if person B books the interval in 1., you will have to
merge both intervals back together to keep consistency.

(II) So storing the free slots seems not to be the best way to do it.
What about storing the booked intervals? We might end up with the
following:

package := (id, maximum_bookings)
booking := (id, booking, interval)

So when checking if we have free slots at a given time T, we do
something like (in Pseudo-SQL):

SELECT COUNT(id)
FROM booking
WHERE booking.interval CONTAINS %T%

This seems pretty clean to me. You might want to extend this with a
“slots_booked” column in booking and replace the COUNT(id) with a SUM
(slots_booked).

(III, IV) Since the intervals have a finite count of elements you
could also store the days (or weeks if you can only book full weeks
in your system) and make them have a “availability” count as in (I)
or store only the booked days as in (II).

Since your database will kind of explode (356 days * n slots per
package * m packages), I’d recommend (II). I don’t know how fast such
“date in interval” queries are in your database system. I’d implement
all that stuff and if it turns out that this query is a bottle neck,
you might want to have a “cache” of sorts rebuilt by stored
procedures (or in your code) on storing elements. Of course, this is
kind of a hack and maybe your RDBMS handles those interval questions
really well with an index.

Regards,

Manuel

On 20 Feb 2006, at 11:36, Manuel H. wrote:

allocated, daysremaining)

The current database is horrendously complicated and very
overworked in my opinion. There are countless tables I just can’t
figure out how they link due to an inconsistent naming scheme

I just had a thought. Surely this kind of problem lends itself to
temporal database theory. That is to say that if the available
booking dates were stored as interval datatypes along with capacity,
I could perform an SQL count to see how many rows intersect with a
particular interval. This would return the number of bookings there
were on a particular date, minus this from the total capacity for a
date and we’d have the availability for a particular date.

Anyone had any experience of temporal support in Rails? I see that
Mysql 5 has improved interval datatype support.

Would I have to write a patch for activerecord that dealt with
interval datatypes and could perform the pack/unpack operations
necessary for testing an intersection?

David S.
w: http://davidsmalley.com/blog

I just want to point something out: if you google Lisp or certain
topics in artificial intelligence, you get job ads for a company in
Massachusetts which solves a similar class of problems. Some kind of
travel-booking company. They are literally working on the travelling
salesman problem.

The travelling salesman problem is a graph theory problem in
math/computer science, and all I can tell you beyond that is that it’s
a bit over my head and some very clever people consider it to be quite
difficult. I’m not sure if what you’re dealing with here is on the
same level, but it might be something to look into.


Giles Goat Boy

http://gilesmakesmusic.blogspot.com
http://gileswritescode.blogspot.com

On Feb 21, 2006, at 22:56 , David S. wrote:

Anyone had any experience of temporal support in Rails? I see that
Mysql 5 has improved interval datatype support.

Little late to the party. Temporal representations are an interest of
mine, but I’m unfamiliar with MySQL. I don’t see intervals in the
MySQL documentation and didn’t find anything when I googled. Could
you provide some links for where I might find some more information
for this wrt MySQL?

Would I have to write a patch for activerecord that dealt with
interval datatypes and could perform the pack/unpack operations
necessary for testing an intersection?

With just a start and end point demarcating an interval its possible
to do interval comparisons, so you wouldn’t necessarily have to write
pack/unpack code. You may be interested in a book by Snodgrass that’s
available as a PDF download on temporal databases in SQL:

http://www.cs.arizona.edu/people/rts/tdbbook.pdf

Michael G.
grzm myrealbox com

Hi,

not sure if it helps, but I remember a friend writing his master’s
thesis in CS on optimizing/estimating duration of processes in a
workflow system. I think he used something called “critical path
analysis”. Maybe it’s appliable to you problem.

Otherwise it sounds like a travelling salesman problem (graph
theory), which belongs to a class of problems which are hard to
compute in a way that throwing faster computers on them doesn’t
really help. Your best bet is to use heuristics (intelligent
guessing) to find your way through the problem space. The “A*”
algorithm comes to my mind, but I only remember fragments of that
university stuff today…

Various projects which might be related by looking at the description
(search for “graph”):
http://rubyforge.org/projects/rgraph/
http://rubyforge.org/projects/wildberry/
http://rubyforge.org/projects/horai/
http://rubyforge.org/projects/rgl/

Maybe it also helps re-asking you question on the ruby-talk mailing
list. There are quite some people who are good in solving
mathematical complex problems.

HTH,

Timo

Am 20.02.2006 um 10:25 schrieb David S.: