-------- Original-Nachricht --------
Datum: Wed, 14 May 2008 18:52:08 +0900
Von: Calamitas [email protected]
An: [email protected]
Betreff: Re: Gecoder syntax question
matrix A.
How can I declare a matrix with float or double precision entries in
Gecoder to formulate the problem ?
Dear Peter and Andreas,
thank you for responding.
Gecode/R does not handle float values in constraints.
I somehow previously suspected that but I was afraid I might
have overlooked something. Thanks for the information.
First off, your optimization problem seems to make little sense to me
the way you specified it.
Well, as I asked a syntax question, I didn’t want to give a lengthy
description of the background of the problem I’m trying to solve. So the
motivation might not be obvious. Here it is.
x and y should be vectors counting occurrences of words in a sentence
or a paragraph from a text. A is a matrix made from an entire given text
using a procedure that can be considered a variant of Latent Semantic
Analysis,
http://en.wikipedia.org/wiki/Latent_semantic_analysis
and the goal is to find (a set of) synonyms/paraphrases/explanations
(=y) of the meaning of a given sentence/paragraph (=x) - here I am not
concerned with the order of the words - so I abusively treat a
sentence/paragraph and its word count vector as the same thing. A
synonym/paraphrase/explanation cannot use the exact same words as what
it is supposed to explain, so I cannot just solve a matrix equation to
retrieve x=y from Ax=Ay.
The matrix A will always be singular (“more words than concepts” - see
below), but as it is a linear mapping between finite-dimensional
vector spaces, it is continuous and this:
If A is singular, the optimization problem is
unbound and thus there is no maximum for ||x-y|| for a fixed value of
||A(x-y)||.
cannot happen.
A is in fact a matrix that comes about using an SVD with some predefined
maximum amount of non-zero eigenvalues and associated eigenvectors,
where the latter can be interpreted as different concepts, and Ax is
then a vector expressing how much sentence x was about
each of the concepts talked about in the text x is taken from.
Maybe you should quantify what ‘mapped closely’ means,
like ||A(x-y)|| <= 1.
Of course, I’ll introduce a quantification, but I am not sure yet
whether
I should give some hard bound right away or search for the
so-and-so-many
y vectors that come closest in some iterative form …
But I don’t think Gecode/R can handle floats, or at least not for the
unknowns. It seems to be a finite domain solver.
As Andreas said, constraints cannot be floats. This can be helped by
multiplying with some constant and cutting off.
The unknowns (entries of y) and the entries of x are integers anyway -
as I wrote in the original post.
If you only care
about say 3 digits after the comma, you could scale all values up by
1000 and round to integers. My gut feeling is that this can easily
give results that are way off, and not only for ill-conditioned
problems.
I share that fear, too.
Mmh, however, well here, the multiplication and cutting off would take
place after the SVD, and I don’t want to use the Euclidean norm, because
it introduces numerical instability by minimizing one or a few large
squared sums in exchange for maximizing many small squared sums in the
components.
norm" means Euclidian norm.
As I said above, the x and y are discrete, and as a matter of fact, I
wanted Gecoder to do the searching here - there
might only be 1 or 2 more or less occurrences of each word in a
paraphrase,
but a good paraphrase might actually render a set of words by another
one having differences in more than one direction, like in these bits
“George W. Bush”
“the current president of the U.S.”
“Mr. Clinton’s successor”
all having the same meaning in a newspaper article, say.
So thank you again for your kind suggestions, I’ll try some
multiplication
and cutting-off.
Best regards,
Axel