New memmove()
OK. So there was one bug remaining and it wasn’t really 66 times as fast.
But here are the new versions. I did one for QNX4 and one for QNX6. The
QNX4 version clocks in about twice as fast IFF the moves are on non quad
byte boundries. The QNX6 version clocks in anywhere from 3 to 10 times as
fast as the library version based on direction and weather ort not it is on
quad byte boundries.
Enjoy. Report any bugs or improvements you may find back to me and I’ll
report it.
–
Bill Caroselli – 1(530) 510-7292
Q-TPS Consulting
QTPS@EarthLink.net
“Bill Caroselli (Q-TPS)” <qtps@earthlink.net> wrote in message
news:9rjms4$kq4$1@inn.qnx.com…
Yes.
I have rewritten memmove in asm. The results are so go I can’t believe
them. My newest memmove is clocking in at 66 times faster than the
Neutrino
library version. When I am done testing it I’ll post the code.
–
Bill Caroselli – 1(530) 510-7292
Q-TPS Consulting
QTPS@EarthLink.net
“Tom” <> tom_usenet@hotmail.com> > wrote in message
news:9rhtrp$85j$> 1@inn.qnx.com> …
Why do you need memmove rather than memcpy? Can your source overlap your
destination?
Tom
\
begin 666 memmoveQ4.S
M.PEM96UM;W9E430N4PH*"CL)0V]P>7)I9VAT($YO=&EC92 M+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+0H[“CL)0V]P>7)I9VAT(#(P,#$@8GD@42U44%,L(%%44%- 16%R=&A,:6YK
M+FYE= H[“CL)4&5R;6ES<VEO;B!I<R!H97)E8GD@9W)A;G1E9”!T;R!U<V4@
M;W(@8V]P>2!T:&ES(&UO9’5L92!F<F5E;‘D@<’)O=FED960@“CL)=&AA=”!T
M:&ES(&-O<'ER:6=H=”!N;W1I8V4@<F5M86EN<R!U;F-H86YG960N"CL*.PE1
M+5104R!M86ME(&YO(’=A<F5N=‘DL(&5I=&AE<B!E>’!R97-S960@;W(@:6UP
M;&EE9"P@87,@=&@=&AI<R!M;V1U;&4G<PH[“7)E;&EA8FEL:71Y+”!P97)F
M;W)M86YC92P@;W(@9FET;F5S<R!F;W(@=7-E+@H["CL)268@>6]U(&UA:V4@
M96YH86YC=F5M96YT<R!T;R!T:&ES(&UO9’5L92!P;&5A<V4@92UM86EL(‘1H
M96T@8F%C:R!T;R!1+5104PH["6%T(%%44%- 16%R=&A,:6YK+FYE= H[“CL)
M16YD($]F($-O<'ER:6=H=”!.;W1I8V4@+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*"CL)1&5S8W)I<‘1I;VX@
M;V8@1FEL92 M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+0H["CL)5&AI<R!I<R!A;B!O<‘1I;6EZ960@=F5R
M<VEO;B!O9B!T:&4@;65M;6]V92@I(&9U;F-T:6]N+B @270@871T96UP=’,@
M=&@"CL)9&@9&]U8FQE(’=O<F0@;6]V97,@;VX@9&]U8FQE(’=O<F0@8F]U
M;F1R:65S+@H["CL)16YD($]F($1E<V-R:7!T:6]N(&]F($9I;&4@+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*"CL)
M0VAA;F=E($AI<W1O<GD@+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H["CL)1&%T90E697)S:6]N
M"5=H;PE$97-C<FEP=&EO;B!O9B!#:&%N9V4*.PDM+2TM+2T)+2TM+2TM+0DM
M+2T)+2TM+2TM+2TM+2TM+2TM+2TM+2TM"CL),#$Q,#(U"38N,#$N,#$)8G)C
M"7=R:71T96X@9G)O;2!S8W)A=&-H(&9O<B!13E@@3F5U=’)I;F*.PDP,3$P
M,C<)-BXP,2XP,0EB<F,)+B N(“X@86YD(&)A8VL@<&]R=&5D('1O(%%.6”!6
M- H["CL)16YD($]F($-H86YG92!(:7-T;W)Y("TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*“CL)26YC;'5D
M960@2&5A9&5R<R M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H[“CL)5&AE<F4@87)E(&YO(&]T:&5R
M(&9I;&5S(&EN8VQU9&5D"CL*.PE%;F0@3V8@26YC;'5D960@2&5A9&5R<R M
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM”@H*“0D);F%M90EM96UM;W9E”@D)“7!U8FQI8PEM96UM;W9E7PI?5$58
M5 D)<V5G;65N= ED=V]R9”!P=6)L:6,@)T-/1$4G”@D)“2XU.#9P”@D)"6%S
M<W5M90EC<SI?5$585 H*.PE&=6YC=&EO;B!M96UM;W9E("TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M"CL*.R!R96=I<W1E<B!C86QL:6YG(&-O;G9E;G1I;VX*.PIM96UM;W9E7PEP
M<F]C"6YE87("@D)“7!U<V@@(” @96-X"0D)“3L@<V%V92!R96=S”@D)“7!U
M<V@@(” @97-I"@D)“7!U<V@@(” @961I"@D)"7!U<V@)97,"0D)<‘5S: EE
M87@"@D)“6UO=B @(” @961I+&5A> D)“3L@141)(#T@14%8(#T@=&*“0D)
M;6]V(” @(”!E<VDL961X"0D).R!%4TD@/2!%1%@@/2!F<F]M"@D)"6UO=@D)
M96-X+&5B> D)“3L@14-8(#T@14)8(#T@;&5N9W1H”@D)"6UO=@D)87@L9’,)
M"0D[($53(#T@1%,"0D);6]V"0EE H*“0D)=&5S= EE8W@L96-X"0D)
M.R!I9B@@;&5N9W1H(#T](# @0H)"0EJ>@D)3#DY"0D)“3L)(&=E=”!O=70
M"0D)8VUP"0EE<VDL961I"0D).R!I9B@@9G)O;2 ]/2!T;R I”@D)"6IE"0E,
M.3D)"0D).PD@9V5T(&]U= H)"0EJ80D)3#(P"0D)“3L@:68H('1O(#P@9G)O
M;2 I”@D)"0D)"0D)"3L)($]+(‘1O(&-O<‘D@9F]R=V%R9 H*"0D);&5A"0EE
M9’@L6V5S:2ME8WA="3L@1418(#T@)F9R;VU;;&5N9W1H70H)"0EC;7 )"65D
M>“QE9&D)“0D[(&EF*”!T;R (&9R;VU;;&5N9W1H72 I”@D)"6IB90D)3#(P
M"0D)“3L)($]+('1O(&-O<'D@9F]R=V%R9 H*.R!C;W!Y(&)A8VMW87)D”@D)
M"7-T9 D)"0D)"3L@<V5T(&1I<F5C=&EO;B!F;&%G(&1O=VX*"0D)861D"0EE
M9&DL96-X"0D).R!%1$D@/2 F=&];;&5N9W1H70H)"0EM;W8)"65S:2QE9’@)
M"0D($5322 9&5C"0EE9&D)“0D).R!A9&IU
M<W0@141)”@D)"61E8PD)97-I"0D)“3L@861J=7-T($5320H*.R!C;W!Y(&]D
M9”!T<F%I;&EN9R!B>71E<R!F:7)S= H)"0EM;W8)"65A>"QE9&D)"0D[(&-A
M;&-U;&%T92!N=6UB97(@;V8@=’)A:6QI;F<@8GET97,"0D)86YD"0EE87@L
M,PH)"0EC;7 )"65A>"QE8W@"0D):F=E"0E,38*"0D)>&-H9PEE87@L96-X
M"0D).R!A;F0@;G5M8F5R(&]F(’)E;6%I;FEN9R!B>71E<PH)"0ES=6()"65A
M>"QE8W@"0ER97 );6]V<V(“CL@8V]P>2!D;W5B;&4@=V]R9’,"0D);6]V
M"0EE8W@L96%X"0D).R!R97-T;W)E(&QE;F=T: H)"0ES:’()"65C>"PR"0D)
M.R!L96YG=&@@+ST@- H"0D)<W5B"0EE9&DL,PH)“0ES=6()“65S:2PS”@D)
M<F5P"6UO=G-D”@H[(&-O<‘D@;&5A9&EN9R!B>71E<PH)"0EA;F0)"65A>“PS
M"0D).R!L96YG=&@@/2!L96YG=&@@)2 T”@D)"6UO=@D)96-X+&5A> H)"0EA
M9&0)“65D:2PS”@D)"6%D9 D)97-I+#,3#$V.@ER97 );6]V<V("CL@<’)E
M<&%R92!T;R!E>&ET”@D)"6-L9 D)"0D)"3L@8VQE87(@9&ER96-T:6]N(&9L
M86<“0D):FUP"0E,.3D)“0D).R!P<F5P87)E('1O(&5X:70@+2!E87@@;75S
M=” ]('1O”@H.R!C;W!Y(&9O<G=A<F0*"0D)86QI9VX@"303#(P.@D);6]V
M"0EE87@L961I"0D).R!C86QC=6QA=&4@;G5M8F5R(&]F(&QE861I;F<@;V1D
M(&)Y=&5S"@D)"6%N9 D)96%X+#,“0D)8VUP"0EE87@L96-X”@D)“6IG90D)
M3#(V”@D)"7AC:&<)96%X+&5C> D)"3L@86YD(&YU;6)E<B!O9B!R96UA:6YI
M;F<@8GET97,“0D)<W5B"0EE87@L96-X”@H[(&-O<'D@;V1D(&QE861I;F<@
M8GET97,@9FER<W0"0ER97 );6]V<V()"0D)"3L@;6]V92!A(&QE861I;F<@
M8GET90H*.R!C;W!Y(&1O=6)L92!W;W)D<PH)"0EM;W8)"65C>"QE87@)“0D[
M(’-A=F4@;&5N9W1H”@D)“7-H<@D)96-X+#()“0D[(&QE;F=T:” O/2 T”@H)
M"7)E< EM;W9S9 D)“0D).R!M;W9E(&1O=6)L92!W;W)D<PH*.R!C;W!Y('1R
M86EL:6YG(&)Y=&5S”@D)“6%N9 D)96%X+#,)“0D[(&QE;F=T:” ](&QE;F=T
M:” E(#0*"0D);6]V"0EE8W@L96%X"DPR-CH)<F5P"6UO=G-B"0D)"0D[(&UO
M=F4@97AT<F$@8GET97,“CL@<’)E<&%R92!T;R!E>&ET”@I,.3DZ"0EP;W )
M"65A> D)"0D[(’)E='5R;B!T;PH)"0EP;W )"65S"0D)“3L@<F5S=&]R92!R
M96=S”@D)“7!O< D)961I”@D)“7!O< D)97-I”@D)“7!O< D)96-X”@D)"7)E
M= H["CL)16YD($]F($9U;F-T:6]N(&UE;6UO=F4@+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H;65M;6]V95)
896YD< I?5$585 D)96YD<PH)"0EE;F0*
`
end
begin 666 memmoveQ6.s
M(PEM96UM;W9E438N<PH*“B,)0V]P>7)I9VAT($YO=&EC92 M+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+0HC"B,)0V]P>7)I9VAT(#(P,#$@8GD@42U44%,L(%%44%- 16%R=&A,:6YK
M+FYE= HC"B,)4&5R;6ES<VEO;B!I<R!H97)E8GD@9W)A;G1E9”!T;R!U<V4@
M;W(@8V]P>2!T:&ES(&UO9’5L92!F<F5E;‘D@<’)O=FED960@“B,)=&AA=”!T
M:&ES(&-O<‘ER:6=H="!N;W1I8V4@<F5M86EN<R!U;F-H86YG960N"B,(PE1
M+5104R!M86ME(&YO(’=A<F5N=‘DL(&5I=&AE<B!E>’!R97-S960@;W(@:6UP
M;&EE9"P@87,@=&@=&AI<R!M;V1U;&4G<PHC"7)E;&EA8FEL:71Y+"!P97)F
M;W)M86YC92P@;W(@9FET;F5S<R!F;W(@=7-E+@HC"B,)268@>6]U(&UA:V4@
M96YH86YC=F5M96YT<R!T;R!T:&ES(&UO9’5L92!P;&5A<V4@92UM86EL('1H
M96T@8F%C:R!T;R!1+5104PHC"6%T(%%44%- 16%R=&A,:6YK+FYE= HC"B,)
M16YD($]F($-O<'ER:6=H="!.;W1I8V4@+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H"B,)1&5S8W)I<‘1I;VX@
M;V8@1FEL92 M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+0HC"B,)5&AI<R!I<R!A;B!O<‘1I;6EZ960@=F5R
M<VEO;B!O9B!T:&4@;65M;6]V92@I(&9U;F-T:6]N+B @270@871T96UP=’,@
M=&@"B,)9&@9&]U8FQE(’=O<F0@;6]V97,@;VX@9&]U8FQE(’=O<F0@8F]U
M;F1R:65S+@HC"B,)16YD($]F($1E<V-R:7!T:6]N(&]F($9I;&4@+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*“B,)
M0VAA;F=E($AI<W1O<GD@+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0HC"B,)1&%T90E697)S:6]N
M"5=H;PE$97-C<FEP=&EO;B!O9B!#:&%N9V4*(PDM+2TM+2T)+2TM+2TM+0DM
M+2T)+2TM+2TM+2TM+2TM+2TM+2TM+2TM"B,),#$Q,#(U"38N,#$N,#$)8G)C
M"7=R:71T96X@9G)O;2!S8W)A=&-H(&9O<B!13E@@3F5U=’)I;F*(PHC"45N
M9”!/9B!#:&%N9V4@2&ES=&]R>2 M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2T*"@HC"4EN8VQU9&5D($AE861E
M<G,@+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2T*(PHC"51H97)E(&%R92!N;R!O=&AE<B!F:6QE<R!I
M;F-L=61E9 HC"B,)16YD($]F($EN8VQU9&5D($AE861E<G,@+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*“B,)
M1G5N8W1I;VX@;65M;6]V92 M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0HC”@D)+F9I;&4)(FUE;6UO
M=F51-BYS(@H)"2YV97)S:6]N"2(P,2XP,2(9V-C,E]C;VUP:6QE9"XZ"BYT
M97AT"@D)+F%L:6=N(#0+F=L;V)L(&UE;6UO=F4*“0DN='EP90D@;65M;6]V
M92Q 9G5N8W1I;VX*”@HC(’-T86-K(&-A;&QI;F<@8V]N=F5N=&EO;@H*;65M
M;6]V93H*“0D)<‘5S:&P))65B< H)“0EM;W9L"25E<W L)65B< H*“0D)<‘5S
M:&P))65S:0D)“2,@<V%V92!R96=S”@D)"7!U<VAL"25E9&D*"0D)<‘5S: DE
M97,“0D);6]V"0DE9’,L)6%X"0D)(R!%4R ]($13”@D)"6UO=@D))6%X+"5E
M<PH"0D);6]V; DX*"5E8G I+"5E9&D)(R!%1$D@/2!T;PH)"0EM;W9L"3$R
M*"5E8G I+“5E<VD)(R!%4TD@/2!F<F]M”@D)"6UO=FP),38H)65B<“DL)65C
M> DC($5#6” ](&QE;F=T: H)“0EP=7-H; DE961I"0D)(PET:&ES(&UU<W0@
M8F4@<F5T=7)N960*”@D)"71E<W1L"25E8W@L)65C> D)(R!I9B@@;&5N9W1H
M(#T](# @0H)"0EJ>@D)3#DY"0D)“2,)(&=E=”!O=70"0D)8VUP; DE961I
M+"5E<VD)"2,@:68H(&9R;VT@/3T@=&@0H)"0EJ90D)3#DY"0D)“2,)(&=E
M=”!O=70"0D):F$)“4PR, D)“0DC(&EF*”!T;R (&9R;VT@0H)"0D)"0D)
M"0DC"2!/2R!T;R!C;W!Y(&9O<G=A<F0”@D)“6QE80D)"5E<VDL)65C>“DL
M)65D>”,@1418(#T@)F9R;VU;;&5N9W1H70H)“0EC;7!L"25E9&DL)65D> D)
M(R!I9B@@=&@/”!F<F]M6VQE;F=T:%T@0H)"0EJ8F4)"4PR, D)"0DC"2!/
M2R!T;R!C;W!Y(&9O<G=A<F0"B,@8V]P>2!B86-K=V%R9 H)“0ES=&0)“0D)
M"0DC(’-E=”!D:7)E8W1I;VX@9FQA9R!D;W=N”@D)"6%D9 D))65C>"PE961I
M"0DC($5$22 ](“9T;UML96YG=&A=”@D)"6UO=@D))65D>"PE97-I"0DC($53
M22 ]("9F<F]M6VQE;F=T:%T"0D)9&5C"0DE961I"0D)(R!A9&IU<W0@141)
M”@D)"61E8PD))65S:0D)“2,@861J=7-T($5320H*(R!C;W!Y(&]D9”!T<F%I
M;&EN9R!B>71E<R!F:7)S= H)"0EM;W8)“25E9&DL)65A> D)(R!C86QC=6QA
M=&4@;G5M8F5R(&]F('1R86EL:6YG(&)Y=&5S”@D)"6%N9 D))#,L)65A> H)
M"0EC;7 )"25E8W@L)65A> H)"0EJ9V4)"4PQ-@H)"0EX8VAG"25E8W@L)65A
M> D)(R!A;F0@;G5M8F5R(&]F(’)E;6%I;FEN9R!B>71E<PH)"0ES=6()"25E
M8W@L)65A> H)"7)E< EM;W9S8@H*(R!C;W!Y(&1O=6)L92!W;W)D<PH)"0EM
M;W9L"25E87@L)65C> D)(R!R97-T;W)E(&QE;F=T: H)"0ES:’)L"25E8W@)
M"0DC(&QE;F=T:” O/2 T”@D)“7-H<FP))65C> H)“0ES=6()“20S+“5E9&D*
M"0D)<W5B"0DD,RPE97-I”@D)<F5P"6UO=G-L”@HC(&-O<'D@;&5A9&EN9R!B
M>71E<PH)“0EA;F1L"20S+“5E87@)“0DC(&QE;F=T:” ](&QE;F=T:” E(#0*
M"0D);6]V; DE96%X+“5E8W@“0D)861D"0DD,RPE961I”@D)"6%D9 D))#,L
M)65S:0I,38Z"7)E< EM;W9S8@H(R!P<F5P87)E(‘1O(&5X:70*"0D)8VQD
M"0D)"0D)(R!C;&5A<B!D:7)E8W1I;VX@9FQA9PH)"0EJ;7 )"4PY.0D)"0DC
M(’!R97!A<F4@=&@97AI=” M(&5A>”!M=7-T(#T@=&*”@HC(&-O<‘D@9F]R
M=V%R9 H)"0DN86QI9VX@"303#(P.@D);6]V; DE961I+"5E87@)“2,@8V%L
M8W5L871E(&YU;6)E<B!O9B!L96%D:6YG(&]D9”!B>71E<PH)"0EA;F1L"20S
M+"5E87@"0D)8VUP"0DE96-X+"5E87@"0D):F=E"0E,C8"0D)>&-H9PDE
M96%X+"5E8W@)"2,@86YD(&YU;6)E<B!O9B!R96UA:6YI;F<@8GET97,"0D)
M<W5B; DE96-X+"5E87@"B,@8V]P>2!O9&0@;&5A9&EN9R!B>71E<R!F:7)S
M= H)“7)E< EM;W9S8@D)“0D)(R!M;W9E(&$@;&5A9&EN9R!B>71E”@HC(&-O
M<‘D@9&]U8FQE(’=O<F1S”@D)"6UO=FP))65A>"PE96-X"0DC(’)E<W1O<F4@
M;&5N9W1H”@D)"7-H<FP))65C> D)“2,@;&5N9W1H(”](#0*“0D)<VAR; DE
M96-X”@D)<F5P"6UO=G-L"0D)"0DC(&UO=F4@9&]U8FQE(’=O<F1S”@HC(&-O
M<‘D@=’)A:6QI;F<@8GET97,“0D)86YD; DD,RPE96%X"0D)(R!L96YG=&@@
M/2!L96YG=&@@)2 T”@D)"6UO=FP))65A>"PE96-X"DPR-CH)<F5P"6UO=G-B
M"0D)"0DC(&UO=F4@97AT<F$@8GET97,“B,@<’)E<&%R92!T;R!E>&ET”@I,
M.3DZ"0EP;W!L"25E87@)"0DC(’)E=‘5R;B!T;PH)"0EP;W )"25E<PD)"0DC
M(’)E<W1O<F4@<F5G<PH)“0EP;W!L"25E9&D*“0D)<&]P; DE97-I”@D)“6QE
M879E”@D)“7)E= HC"B,)16YD($]F($9U;F-T:6]N(&UE;6UO=F4@+2TM+2TM
M+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+2TM+0H*
M"BY,9F4Q.@H)“0DN<VEZ90D@;65M;6]V92PN3&9E,2UM96UM;W9E”@D)“2YI
M9&5N= DB0FEL;”!#87)O<V5L;&DL(%$M5%!3+”!15%!30$5A<G1H3&EN:RYN
$970B”@``
`
end