We’re trying to optimize performance on a PXA270 (ARM) product. This unit does not have hardware floating point and we are using the 4.2.1 compiler with the latest released version of QNX (6.2.3 I think).
fprintf() is taking about 430 microseconds to do a batch of several single precision “%+10.7f” conversions (profiler numbers).
This is killing performance. If we use fputs() to output an 11 byte text string it only takes 65us.
Does anyone have a tested algorithm that can do single precision float to text conversions faster than fprintf()?
Wow. That turns out to be a WAY more complicated problem than I thought. Searching around yielded some interesting answers, but they were all fairly complicated. I’m afraid I don’t have the time to try some and test their performance. But none of them LOOKED fast.
You could try to convert to long, since there is an ltoa() function. Something like
left = (long)trunc(val);
right = (long)(val - left) * 10000000L;
or
right = 10000000L * (long)modff(val, &left_float);
left = (long) left_float;
Yeah, it’s a complex problem. I’m sure there are guys making a living at doing nothing but floating point optimization and I was hoping to capitalize on work someone else has done Oh, we’d probably pay for a workable solution. In fact we’re shopping for one but haven’t found it yet.
Outputting the fp value as %x instead of %f reduced the conversion time by 300us, but marketing says “no way”. They want to import it directly into apps on the host as csv files and don’t want to require the user to do any sort of extra step to convert the data.
That looks like two floating point divisions (unless you optimize so that it’s only one) for each char printed.
And I’m pretty sure that floating point division is not pretty.
Now here is a way out idea. Since the floating point is done in software anyway, re-implement your code to use decimal floating point. That way, there’s no overhead when you want to print. I can’t think of a reason why decimal floating point in software should be much slower than binary based IEEE formats. It might be easier to control precision also.
My original post was incorrect, we need “%10.7E” not “%10.7f” but thanks micro, your method is a lot faster.
I was pointed to another method that goes right after the 32 bit IEEE value by breaking it into it’s component parts using a union of float and long. That looks fast too.
We ended up writing our own floating point to ascii conversion routine that extracted the sign, exponent and mantissa from the IEEE754 format and used integer math and lookup tables to do the conversion.
We then benchmark tested using a “%10.7f,%10u” format string.
fprintf took 77us per conversion (using the 4.2 compiler which is supposed to generate well optimized floating point code for ARM).