9x9 matrix inversion, which algorithm are you using ?
There are some algorithm which gives exact answer in many iterations
and if I remember my math courses, there is some other algorithms who
are just starting from one “guess”, normally [ Identity 9x9 ] and iterate
until the error is less than 0.00x%, depending on what kind of values
you have one might be faster than the other.
If you are willing to have less exact values but better speed, you might
consider this one.
Some “numerical analysis book” might help you.
Another thing, you can do is to test your code and find the “loop” which
takes most of
the CPU and maybe try to put that part inside a function and implement that
function
in heavy purified and optimized assembly code, possibly using MMX or any
other
techniques. There are some nice book about Pentium ASM optimizations.
Another way, to optimize maybe is to multiply everything by a 1,000,000 let
say
and treat everything as long or int64_t (long long), so it could be faster,
the only
problem would be if you have extremely different data
[ 10^-12 10^5 ]
[ 10^5 10^-12 ]
Might not result in something wanted… but
[ 1000 12 ]
[ 500 4320 ]
will give good results.
Another way, would be to implement self recursive method or stack,
I remember few years ago, implementing QuickSort
with a simple array stack and gotos
which was faster than with the normal recursive way.
The first thing to do is to understand what algorithm is used,
take some math book and figure out how you could implement a better one.
Good Luck,
Fred.
J2K Library
http://j2k.sourceforge.net/
Mario Charest wrote in message <8tqlo6$pja$1@inn.qnx.com>…
“Dan” <> none@no.spam> > wrote in message news:8tqgoh$krp$> 1@inn.qnx.com> …
Thanks for your input. The problem is most likely that the 9x9 matrix
inversion takes too long. We are looking at that. I do realise that to
double the speed i have to look at the implementation algorithm and
perhaps
the algorithm itself, however, I also want to find out if compiler
options
can help me as well.
I tried to compile using various options (all options listed below)
but
they don’t appear to have much effect at all. I recall working on an HPUX
ages ago which produced significantly different execution times depending
on
optimisation options. (Different code, in c, hence not valid comparison)
Compiler are much better these days. If you already use -O2 with gcc,
there is little you can do. There are some option to help math function
like sin() cos() etc, but a matrix inversion doesn’t use them.
Can anyone suggest which combination of options would be most suitable
for
C++ template libraries?
thanks
Dan
Optimization options
-fbranch-probabilities
-fcaller-saves -fcse-follow-jumps -fcse-skip-blocks
-fdelayed-branch -fexpensive-optimizations
-ffast-math -ffloat-store -fforce-addr -fforce-mem
-ffunction-sections -finline-functions
-fkeep-inline-functions -fno-default-inline
-fno-defer-pop -fno-function-cse
-fno-inline -fno-peephole -fomit-frame-pointer
-frerun-cse-after-loop -fschedule-insns
-fschedule-insns2 -fstrength-reduce -fthread-jumps
-funroll-all-loops -funroll-loops
-O -O0 -O1 -O2 -O3
“Warren Peece” <> warren@nospam.com> > wrote in message
news:8tpdr2$g6o$> 1@inn.qnx.com> …
Also perhaps look at using the MMX instructions, I believe they were
intended
specifically for matrix operations at very high speeds. I’m fairly
certain
that to get what you want you’re not going to be able to use high level
C++
constructs & 3rd party libraries, unless someone really went out of
their
way
to optimize for the x86 environment.
“Jim Atkins” <> jamesa@tsd.serco.com> > wrote in message
news:8tom0h$ios$> 1@inn.qnx.com> …
| I dunno as I haven’t seen any code but as part of reworking your
algorithm,
| you can save quite a bit of time by unelegant coding i.e. not using
function
| calls and class methods (your own ones I’m talking about) in favour
of
| having all your code inline. This will cut out a lot of stack usage
and
if
| you have a lot of iterations it all adds up. As an example: I have a
bit
of
| code thats used to program a gate array and has a loop which
bitbashes
the
| data in over 77000 cycles. Time to program using class methods : 26
secs
as
| opposed to 3secs for inline code.
|
| You can use ‘register’ for you loop variables as well - that can
sometimes
| shave a bit off…