Need a fast float to ASCII algorithm.

kwschumm · July 1, 2008, 12:46pm

We’re trying to optimize performance on a PXA270 (ARM) product. This unit does not have hardware floating point and we are using the 4.2.1 compiler with the latest released version of QNX (6.2.3 I think).

fprintf() is taking about 430 microseconds to do a batch of several single precision “%+10.7f” conversions (profiler numbers).

This is killing performance. If we use fputs() to output an 11 byte text string it only takes 65us.

Does anyone have a tested algorithm that can do single precision float to text conversions faster than fprintf()?

mario · July 1, 2008, 6:22pm

Try with cout (C++). You won`t save on the conversion time itself but you will save on fprintf parsing the format string which can be significant.

ingraham · July 1, 2008, 10:46pm

Wow. That turns out to be a WAY more complicated problem than I thought. Searching around yielded some interesting answers, but they were all fairly complicated. I’m afraid I don’t have the time to try some and test their performance. But none of them LOOKED fast.

You could try to convert to long, since there is an ltoa() function. Something like

left = (long)trunc(val);
right = (long)(val - left) * 10000000L;

or

right = 10000000L * (long)modff(val, &left_float);
left = (long) left_float;

then output using

sctrncat(buff, ltoa(left, temp_buff, 10));
sctrncat(buff, “.”)
sctrncat(buff, ltoa(right, temp_buff, 10));
fputs(buff);

or

fputs(sctrncat(sctrncat(sctrncat(buff, ltoa(left, temp_buff, 10)), “.”), ltoa(right, temp_buff, 10)));

if you want to give yourself a migraine.

Somehow, I don’t think this will solve your speed problem.

-James Ingraham
Sage Automation, Inc.

kwschumm · July 2, 2008, 12:15am

Yeah, it’s a complex problem. I’m sure there are guys making a living at doing nothing but floating point optimization and I was hoping to capitalize on work someone else has done Oh, we’d probably pay for a workable solution. In fact we’re shopping for one but haven’t found it yet.

Outputting the fp value as %x instead of %f reduced the conversion time by 300us, but marketing says “no way”. They want to import it directly into apps on the host as csv files and don’t want to require the user to do any sort of extra step to convert the data.

Sure wish we had a cpu with fp hardware.

maschoen · July 2, 2008, 5:18am

Ken,

This is an interesting question, but probably not one with a good answer.   Think about what the code to produce the output must look like:

float_char_string(float number)
loop:
c = float_number % 10;
putchar(c+‘0’);
float_number = float_number/10;

That looks like two floating point divisions (unless you optimize so that it’s only one) for each char printed.
And I’m pretty sure that floating point division is not pretty.

Now here is a way out idea. Since the floating point is done in software anyway, re-implement your code to use decimal floating point. That way, there’s no overhead when you want to print. I can’t think of a reason why decimal floating point in software should be much slower than binary based IEEE formats. It might be easier to control precision also.

micro · July 2, 2008, 1:24pm

Even not solving the speed problem i guess: ^^

float fn; // your float
float fns;
int inb;
int ins;

inb = (int) fn;
fns = fn - inb;
ins = fns * 10000000;

printf("%+10d.%07d",inb,ins);

kwschumm · July 2, 2008, 3:21pm

My original post was incorrect, we need “%10.7E” not “%10.7f” but thanks micro, your method is a lot faster.

I was pointed to another method that goes right after the 32 bit IEEE value by breaking it into it’s component parts using a union of float and long. That looks fast too.

Thanks for pointing me in some new directions.

kwschumm · July 12, 2008, 2:17pm

We ended up writing our own floating point to ascii conversion routine that extracted the sign, exponent and mantissa from the IEEE754 format and used integer math and lookup tables to do the conversion.

We then benchmark tested using a “%10.7f,%10u” format string.

fprintf took 77us per conversion (using the 4.2 compiler which is supposed to generate well optimized floating point code for ARM).

The new method took 7us per conversion.