Hello everyone,
I’m trying to make use of the NEON-pipeline for fast floating point arithmetics on an Cortex-A8 (OMAP3530). I also have problems
using the not fully pipelined VFP.
IDE: QNX® Momentics® Integrated Development Environment Version: 4.6.0
The Board is a Gumstix Overo, with an OMAP3530, 256MB PoP Memory (Micron). I’ve
modified the Mistral OM3530 EVM BSP, it is working perfectly so far. We’re debugging
via WLAN using a Ralink USB Wifi Module.
GCC compiler (v4.3.3) is set up with following tags:
-mtune=cortex-a8 -march=armv7-a -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
and I’m also linking to libm-vfp.so (instead of libm.so)
EDIT: Here is the complete console output for the error I get with mfpu=neon, the code is below, it works fine with mfpu=vfp
[color=darkred]C:\QNX641\host\win32\x86\usr\bin\make -k all --file=C:/DOKUME~1/ADMINI~1/LOKALE~1/Temp/QMakefile124748840109952093.tmp
C:/QNX641/host/win32/x86/usr/bin/make -j 1 -Carm -fMakefile all
make[1]: Entering directory C:/ide-4.6-workspace/FPU_test/arm' C:/QNX641/host/win32/x86/usr/bin/make -j 1 -Co-le-g -fMakefile all make[2]: Entering directory
C:/ide-4.6-workspace/FPU_test/arm/o-le-g’
C:/QNX641/host/win32/x86/usr/bin/qcc -V4.3.3,gcc_ntoarm -c -Wc,-Wall -Wc,-Wno-parentheses -Wc,-fno-builtin -O3 -march=armv7-a -mtune=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize -I. -IC:/ide-4.6-workspace/FPU_test/arm/le -IC:/ide-4.6-workspace/FPU_test/arm/o-le-g -IC:/ide-4.6-workspace/FPU_test/arm -IC:/ide-4.6-workspace/FPU_test -IC:/QNX641/target/qnx6/usr/include -EL -g -DVARIANT_le -DVARIANT_g -DBUILDENV_qss C:/ide-4.6-workspace/FPU_test/FPU_test.cc
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s: Assembler messages:
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:95: Error: selected processor does not support fstmfdd sp!,{d8,d9,d10}' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:100: Error: selected processor does not support
fconsts s20,#112’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:108: Error: selected processor does not support flds s17,.L10' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:113: Error: selected processor does not support
flds s19,.L10+4’
make[2]: Leaving directory C:/ide-4.6-workspace/FPU_test/arm/o-le-g' make[1]: Leaving directory
C:/ide-4.6-workspace/FPU_test/arm’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:115: Error: selected processor does not support flds s18,.L10+8' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:118: Error: selected processor does not support
fadds s16,s17,s19’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:126: Error: selected processor does not support fadds s16,s16,s18' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:129: Error: selected processor does not support
fcvtds d16,s17’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:130: Error: selected processor does not support fmrrd r1,r2,d16' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:134: Error: selected processor does not support
fcvtds d16,s16’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:136: Error: selected processor does not support fmrrd r1,r2,d16' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:140: Error: selected processor does not support
fadds s17,s17,s20’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:147: Error: selected processor does not support fldmfdd ip!,{d8,d9,d10}' cc: C:/QNX641/host/win32/x86/usr/bin/ntoarm-as caught signal 1 make[2]: *** [FPU_test.o] Error 1 make[2]: Target
all’ not remade because of errors.
make[1]: [all] Error 2 (ignored)
That the simple test code, it does not work with NEON (not compiling at all) but it does work using VFP:
[color=blue]
[i]//begin code
//This Code compiles fine with the option -mfpu=vfp but not with neon
#include
#include
#include <math.h>
int main(int argc, char *argv[]) {
float a,b,c;
a = 8.0f;
b = 1.4f;
for(float g=0.0f; g < 50; g+= 1.0f)
{
c = a+b+g;
c += a*b;
std::printf("g:%f\n",g);
std::printf("Ergebnis:%f\n",c);
}
return EXIT_SUCCESS;
}
//end code[/i]
In an ohter project using the option -mfpu=vfp i get following errors:
(The code that yields following errors doesn’t compile with either NEON or VFP) But it works fine with software float emulation (as it is set by default))
[color=darkred]C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s: Assembler messages:
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:117: Error: D register out of range for selected VFP version – `
fldd d16,[r1,#0]’
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:118: Error: register out of range in list – fldmiad ip!,{d17}' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:119: Error: bad instruction
vadd.f32 d16,d16,d17’
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:120: Error: register out of range in list – `fstmiad r1!,{d16}’
etc…
I used this wiki page as guide wiki.davincidsp.com/index.php/Cortex-A8_Features
What am I doing wrong? Thank you!