Hello Hoang,

exciting

Actually, if you’re already using LAPACK, I doubt there’s much potential

for further optimization on the same platform – if you, however, know

your stuff is going to only be run on Intel Xeons or so, than maybe have

a look at the Intel math kernel library lapack examples. Maybe you’d

want to accelerate by using GPUs, then you’d have a look at OpenCl or

CUDA implementation, or theano (however, I don’t know how readily

available things like SVD are for theano).

One more thing: Since you’re doing SVD using LAPACK yourself, I trust

you’ve already chosen the right routine (general, fully equipped

complex-valued matrix). I’m not completely convinced, though:

Have you had a look at [1]? It seems SVD $A=V \Sigma U^H$ is a two step

process: First, the input matrix is decomposed into left and right

unitary matrixes $U_1$ and $V_1^H$ and an bidiagonal matrix $B$ using

CGEBRD, and after that, $B$ is SVD’ed, yielding $B=U_2 \Sigma V_2^H$;

the product $V_1 V_2$ then is $V$. Maybe for your application $V1$ is

sufficient, because you can rearrange your problem mathematically?

Greetings,

Marcus

[1] http://www.netlib.org/lapack/lug/node53.html