-------- Original-Nachricht --------

Datum: Wed, 14 May 2008 18:52:08 +0900

Von: Calamitas [email protected]

An: [email protected]

Betreff: Re: Gecoder syntax question

matrix A.

How can I declare a matrix with float or double precision entries in

Gecoder to formulate the problem ?

Dear Peter and Andreas,

thank you for responding.

Gecode/R does not handle float values in constraints.

I somehow previously suspected that but I was afraid I might

have overlooked something. Thanks for the information.

First off, your optimization problem seems to make little sense to me

the way you specified it.

Well, as I asked a syntax question, I didn’t want to give a lengthy

description of the background of the problem I’m trying to solve. So the

motivation might not be obvious. Here it is.

x and y should be vectors counting occurrences of words in a sentence

or a paragraph from a text. A is a matrix made from an entire given text

using a procedure that can be considered a variant of Latent Semantic

Analysis,

http://en.wikipedia.org/wiki/Latent_semantic_analysis

and the goal is to find (a set of) synonyms/paraphrases/explanations

(=y) of the meaning of a given sentence/paragraph (=x) - here I am not

concerned with the order of the words - so I abusively treat a

sentence/paragraph and its word count vector as the same thing. A

synonym/paraphrase/explanation cannot use the exact same words as what

it is supposed to explain, so I cannot just solve a matrix equation to

retrieve x=y from Ax=Ay.

The matrix A will always be singular (“more words than concepts” - see

below), but as it is a linear mapping between finite-dimensional

vector spaces, it is continuous and this:

If A is singular, the optimization problem is

unbound and thus there is no maximum for ||x-y|| for a fixed value of

||A(x-y)||.

cannot happen.

A is in fact a matrix that comes about using an SVD with some predefined

maximum amount of non-zero eigenvalues and associated eigenvectors,

where the latter can be interpreted as different concepts, and Ax is

then a vector expressing how much sentence x was about

each of the concepts talked about in the text x is taken from.

Maybe you should quantify what ‘mapped closely’ means,

like ||A(x-y)|| <= 1.

Of course, I’ll introduce a quantification, but I am not sure yet

whether

I should give some hard bound right away or search for the

so-and-so-many

y vectors that come closest in some iterative form …

But I don’t think Gecode/R can handle floats, or at least not for the

unknowns. It seems to be a finite domain solver.

As Andreas said, constraints cannot be floats. This can be helped by

multiplying with some constant and cutting off.

The unknowns (entries of y) and the entries of x are integers anyway -

as I wrote in the original post.

If you only care

about say 3 digits after the comma, you could scale all values up by

1000 and round to integers. My gut feeling is that this can easily

give results that are way off, and not only for ill-conditioned

problems.

I share that fear, too.

Mmh, however, well here, the multiplication and cutting off would take

place after the SVD, and I don’t want to use the Euclidean norm, because

it introduces numerical instability by minimizing one or a few large

squared sums in exchange for maximizing many small squared sums in the

components.

norm" means Euclidian norm.

As I said above, the x and y are discrete, and as a matter of fact, I

wanted Gecoder to do the searching here - there

might only be 1 or 2 more or less occurrences of each word in a

paraphrase,

but a good paraphrase might actually render a set of words by another

one having differences in more than one direction, like in these bits

“George W. Bush”

“the current president of the U.S.”

“Mr. Clinton’s successor”

all having the same meaning in a newspaper article, say.

So thank you again for your kind suggestions, I’ll try some

multiplication

and cutting-off.

Best regards,

Axel