Compiler Error due to using Neon Pipeline on OMAP3530

Hello everyone,

I’m trying to make use of the NEON-pipeline for fast floating point arithmetics on an Cortex-A8 (OMAP3530). I also have problems
using the not fully pipelined VFP.

IDE: QNX® Momentics® Integrated Development Environment Version: 4.6.0

The Board is a Gumstix Overo, with an OMAP3530, 256MB PoP Memory (Micron). I’ve
modified the Mistral OM3530 EVM BSP, it is working perfectly so far. We’re debugging
via WLAN using a Ralink USB Wifi Module.

GCC compiler (v4.3.3) is set up with following tags:
-mtune=cortex-a8 -march=armv7-a -mfpu=neon -ftree-vectorize -mfloat-abi=softfp
and I’m also linking to libm-vfp.so (instead of libm.so)

EDIT: Here is the complete console output for the error I get with mfpu=neon, the code is below, it works fine with mfpu=vfp

[color=darkred]C:\QNX641\host\win32\x86\usr\bin\make -k all --file=C:/DOKUME~1/ADMINI~1/LOKALE~1/Temp/QMakefile124748840109952093.tmp
C:/QNX641/host/win32/x86/usr/bin/make -j 1 -Carm -fMakefile all
make[1]: Entering directory C:/ide-4.6-workspace/FPU_test/arm' C:/QNX641/host/win32/x86/usr/bin/make -j 1 -Co-le-g -fMakefile all make[2]: Entering directory C:/ide-4.6-workspace/FPU_test/arm/o-le-g’
C:/QNX641/host/win32/x86/usr/bin/qcc -V4.3.3,gcc_ntoarm -c -Wc,-Wall -Wc,-Wno-parentheses -Wc,-fno-builtin -O3 -march=armv7-a -mtune=cortex-a8 -mfloat-abi=softfp -mfpu=neon -ftree-vectorize -I. -IC:/ide-4.6-workspace/FPU_test/arm/le -IC:/ide-4.6-workspace/FPU_test/arm/o-le-g -IC:/ide-4.6-workspace/FPU_test/arm -IC:/ide-4.6-workspace/FPU_test -IC:/QNX641/target/qnx6/usr/include -EL -g -DVARIANT_le -DVARIANT_g -DBUILDENV_qss C:/ide-4.6-workspace/FPU_test/FPU_test.cc
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s: Assembler messages:
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:95: Error: selected processor does not support fstmfdd sp!,{d8,d9,d10}' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:100: Error: selected processor does not support fconsts s20,#112
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:108: Error: selected processor does not support flds s17,.L10' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:113: Error: selected processor does not support flds s19,.L10+4’
make[2]: Leaving directory C:/ide-4.6-workspace/FPU_test/arm/o-le-g' make[1]: Leaving directory C:/ide-4.6-workspace/FPU_test/arm’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:115: Error: selected processor does not support flds s18,.L10+8' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:118: Error: selected processor does not support fadds s16,s17,s19’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:126: Error: selected processor does not support fadds s16,s16,s18' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:129: Error: selected processor does not support fcvtds d16,s17’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:130: Error: selected processor does not support fmrrd r1,r2,d16' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:134: Error: selected processor does not support fcvtds d16,s16’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:136: Error: selected processor does not support fmrrd r1,r2,d16' C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:140: Error: selected processor does not support fadds s17,s17,s20’
C:\DOKUME~1\ADMINI~1\LOKALE~1\Temp\2qcc5MtBgb\FPU_test.s:147: Error: selected processor does not support fldmfdd ip!,{d8,d9,d10}' cc: C:/QNX641/host/win32/x86/usr/bin/ntoarm-as caught signal 1 make[2]: *** [FPU_test.o] Error 1 make[2]: Target all’ not remade because of errors.
make[1]: [all] Error 2 (ignored)

That the simple test code, it does not work with NEON (not compiling at all) but it does work using VFP:
[color=blue]
[i]//begin code
//This Code compiles fine with the option -mfpu=vfp but not with neon

#include
#include
#include <math.h>

int main(int argc, char *argv[]) {
float a,b,c;

a = 8.0f;
b = 1.4f;

for(float g=0.0f; g < 50; g+= 1.0f)
{
	c = a+b+g;
	c += a*b;

	std::printf("g:%f\n",g);
	std::printf("Ergebnis:%f\n",c);
}

return EXIT_SUCCESS;

}
//end code[/i]

In an ohter project using the option -mfpu=vfp i get following errors:
(The code that yields following errors doesn’t compile with either NEON or VFP) But it works fine with software float emulation (as it is set by default))

[color=darkred]C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s: Assembler messages:
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:117: Error: D register out of range for selected VFP version – `

fldd d16,[r1,#0]’
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:118: Error: register out of range in list – fldmiad ip!,{d17}' C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:119: Error: bad instruction vadd.f32 d16,d16,d17’
C:\DOCUME~1\ADMINI~1\LOCALS~1\Temp\2qcc0yf3eb\Vector.s:120: Error: register out of range in list – `fstmiad r1!,{d16}’

etc…

I used this wiki page as guide wiki.davincidsp.com/index.php/Cortex-A8_Features

What am I doing wrong? Thank you!

It seems a math co-processor issue… ?

No, actually as we found out, it is an assembler problem and can be fixed with passing over
two compiler options to the assembler: [color=blue]Wa,-mfpu=neon and [color=blue]Wa,-march=armv7-a

Total string of compiler options looks like this:

[color=blue]-march=armv7-a -Wa,-march=armv7-a -mfpu=neon -Wa,-mfpu=neon mtune=cortex-a8 -mfloat-abi=softfp -ftree-vectorize

By doing this, everything compiles well…