Diagnosing Server Problems?

Chris, I’m working with QNX more than 10 years … in the meantime I’m
convienced to know how QNX is working.

Do you know what’s the value of a documented library interface call?

It should work as documented … even when it is a very low level
function of the OS library.

And it does. I am not sure why you keep quoting the number of years you
have done this or that, it really doesn’t matter. (FYI, we are tied in
the number of years working on/with QNX[246], but I have 3 of those years
working at QNX, which counts for more).

So, if you had simply posted your example that failed we could have shown
you the bug right away. Why did you take so long to post that example
anyways? I asked for it in my FIRST reply (20 messages ago) and you
refused claiming that mmap() was broken and so no example was needed.

We know QNX better then you do, and it is only in your interest to show us
what you are doing so we can diagnois the problem. Something to consider
in the future when you claim entire subsystems of the kernel busted.

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Jens H Jorgensen wrote:

the mmap() statement above works with 6.2.0 and not with 6.2.1 !!

6.2.1 supports > 4GB physical addresses, so if your ‘address’ value
is >= 0x80000000, it is a different physical address than the one
you get via the 64-bit parameter to mmap_device_memory().

Sunil.

So how did this work in 6.2.0 as Armin is claiming - was the conversion from
the offset_t to offset64_t in mmap() done by typecasting it to a ULONG?

No - the 64-bit offset was being converted back into a 32-bit
physical address in the memory manager code.
This doesn’t happen in 6.2.1 since physical addresses are not
necessarily 32-bits.

Sunil.

“Sunil Kittur” <skittur@qnx.com> wrote in message
news:3E81E916.10607@qnx.com

Jens H Jorgensen wrote:
the mmap() statement above works with 6.2.0 and not with 6.2.1 !!

6.2.1 supports > 4GB physical addresses, so if your ‘address’ value
is >= 0x80000000, it is a different physical address than the one
you get via the 64-bit parameter to mmap_device_memory().

Sunil.

So how did this work in 6.2.0 as Armin is claiming - was the conversion
from
the offset_t to offset64_t in mmap() done by typecasting it to a ULONG?

No - the 64-bit offset was being converted back into a 32-bit
physical address in the memory manager code.
This doesn’t happen in 6.2.1 since physical addresses are not
necessarily 32-bits.

So with 6.2.1 mmap() is not safe with physical addresses >= 0x80000000, but
you should use mmap64()?


Jens

No - the 64-bit offset was being converted back into a 32-bit
physical address in the memory manager code.
This doesn’t happen in 6.2.1 since physical addresses are not
necessarily 32-bits.



So with 6.2.1 mmap() is not safe with physical addresses >= 0x80000000, but
you should use mmap64()?

That’s true for platforms that can have > 4GB physical memory.
At the moment, I think that’s only x86 and ppc.

However, I believe mmap_device_memory() is safe on all platforms.
The documentation also says you should use this instead of mmap()
with MAP_PHYS.

Sunil.

So with 6.2.1 mmap() is not safe with physical addresses >= 0x80000000,
but
you should use mmap64()?

That’s true for platforms that can have > 4GB physical memory.
At the moment, I think that’s only x86 and ppc.

However, I believe mmap_device_memory() is safe on all platforms.
The documentation also says you should use this instead of mmap()
with MAP_PHYS.

I have seen the recommendation in the documentation also, and we actually
followed that recommendation when we wrote our drivers, but the
documentation should also make the reader aware that mmap() is not safe on
the above mentioned platforms.


Jens

Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

\

  • Mario

Mario Charest postmaster@127.0.0.1 wrote:


Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

:v) Do you mean ‘thrive’?


cburgess@qnx.com

“Colin Burgess” <cburgess@qnx.com> wrote in message
news:b5t4q0$pn7$1@nntp.qnx.com

Mario Charest postmaster@127.0.0.1 wrote:


Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

:v) Do you mean ‘thrive’?

Hey Colin why don’t you mind your own business, if people aren’t smart
enough to figure out by themselves they can go to hell. Plus I don’t think
I’ve ask for your “advice”. Next time you write a post you better make sure
you don’t make any mistake cause I’ll be there watching.

Thinking about it, I’ll sue Microsoft cause on the box of their software it
says “includes spelling and grammatical error correction”, moron.








Sorry coudn’t resists :wink:)))))

I think a “Thanks Colin”, is better suited :wink:





cburgess@qnx.com

Mario Charest postmaster@127.0.0.1 wrote:

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:b5t4q0$pn7$> 1@nntp.qnx.com> …
Mario Charest postmaster@127.0.0.1 wrote:


Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

:v) Do you mean ‘thrive’?

Hey Colin why don’t you mind your own business, if people aren’t smart
enough to figure out by themselves they can go to hell. Plus I don’t think
I’ve ask for your “advice”. Next time you write a post you better make sure
you don’t make any mistake cause I’ll be there watching.

Thinking about it, I’ll sue Microsoft cause on the box of their software it
says “includes spelling and grammatical error correction”, moron.

Sorry coudn’t resists > :wink:> )))))

ROTFL :vD


cburgess@qnx.com

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b5t54n$l7u$1@inn.qnx.com

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:b5t4q0$pn7$> 1@nntp.qnx.com> …
Mario Charest postmaster@127.0.0.1 wrote:


Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

:v) Do you mean ‘thrive’?

Hey Colin why don’t you mind your own business, if people aren’t smart
enough to figure out by themselves they can go to hell. Plus I don’t
think
I’ve ask for your “advice”. Next time you write a post you better make
sure
you don’t make any mistake cause I’ll be there watching.

Thinking about it, I’ll sue Microsoft cause on the box of their software
it
says “includes spelling and grammatical error correction”, moron.

Well given that they both are valid words you’re likely to lose Mario.
There’s no way for a grammar/spelling checker to guess what you have
actually meant. BTW, I don’t think Colin meant to attack you or your
spelling abilities. Get a sense of humor …

– igor

“Igor Kovalenko” <kovalenko@attbi.com> wrote in message
news:b5tuck$n2f$1@inn.qnx.com

“Mario Charest” postmaster@127.0.0.1 wrote in message
news:b5t54n$l7u$> 1@inn.qnx.com> …

“Colin Burgess” <> cburgess@qnx.com> > wrote in message
news:b5t4q0$pn7$> 1@nntp.qnx.com> …
Mario Charest postmaster@127.0.0.1 wrote:


Strange how some people seem to create tension around them.

I guess different people strive in different kind of environments !

:v) Do you mean ‘thrive’?

Hey Colin why don’t you mind your own business, if people aren’t smart
enough to figure out by themselves they can go to hell. Plus I don’t
think
I’ve ask for your “advice”. Next time you write a post you better make
sure
you don’t make any mistake cause I’ll be there watching.

Thinking about it, I’ll sue Microsoft cause on the box of their software
it
says “includes spelling and grammatical error correction”, moron.


Well given that they both are valid words you’re likely to lose Mario.
There’s no way for a grammar/spelling checker to guess what you have
actually meant. BTW, I don’t think Colin meant to attack you or your
spelling abilities. Get a sense of humor …

I’m not sure if the “Get a sense of humor” is a way to get into the game, or
if you are serious, but my post was definitely a joke ( or more seriously my
way of trying to break the tension that was building up in this thread and
make a point in the process). I put a comment all the way down to make sure
it was read as such, I assume you missed it. Check Colin reply’s, he got it
:wink:

– igor

“David Donohoe” <ddonohoe@qnx.com> wrote in message
news:b5sltq$fv7$1@nntp.qnx.com

Sunil Kittur <> skittur@qnx.com> > wrote:
What is ‘address’?
For mmap(), the offset is a signed 32-bit quantity, so if it’s
above 0x80000000 it will be treated as a negative number and fail.

mmap_device_memory() takes a 64-bit offset so those values work.

I think Sunil’s hit the nail on the head. The above code is wrong.
It relies on “address” being less that 0x8000000, and if “address” is
a physical address, that is an invalid assumption.

That would explain why you’re seeing mmap fail half the time: half of
physical 32-bit address space is below 0x80000000, the other
half is not.

So are you going to attack me like you’ve attacked everyone else
who’s attempted to help you in this thread, or are you going to thank
Sunil for identifying a problem in your code, like most of the
reasonable, courteuous folk we deal with from day to day would do?

Since the latter must be about someone else, I could spoil the fun here a
little. Yes, I think Sunil probably did hit the nail on the head here.
However Armin has one point, even while having problematic code. The track
record of QNX on compatibility between even minor versions has been less
than spotless. And all the explanations about ‘how QNX works’ do not solve
that problem. This discussion happened many times before, but somehow
people working for QNX seem to be still missing the point. If any change in
QNX makes a previously working code (library call) to fail, for users that’s
a failure of the library call. Asking in response ‘do you even understand
how QNX works’ would be irritating for most and insulting for some.

If you (QNX) introduce some changes into OS internals that can affect
behavior of existing code, you owe it to your customers to make every effort
to avoid such effects and when impossible then at least find and document
such scenarios. Your regression tests should include all kinds of input
data, including invalid or otherwise problematic inputs. Everything that is
known as ‘corner cases’. So if you break a working code, it is not a good
thing. Even when the code was problematic in the first place.

In this particular case the change in the behavior of mmap() could have been
anticipated and it should definitely have been found by automated regression
tests. This would be done by simply feeding ranges of values into functions’
parameters and comparing return codes against pre-recorded values from
previous releases. Any mismatch warrants investigation and unless resolved
should go into release notes.

Now it is time to thank Sunil for the brilliant insight.

– igor -who-observed-behavior-of-fgets()-to-change-between-releases-

David Donohoe wrote:

Sunil Kittur <> skittur@qnx.com> > wrote:

Armin Steinhoff wrote:

Steve Reid wrote:

[ clip …]

This statement works with 6.2.0 and not with 6.2.1 !!


What is ‘address’?
For mmap(), the offset is a signed 32-bit quantity, so if it’s
above 0x80000000 it will be treated as a negative number and fail.


mmap_device_memory() takes a 64-bit offset so those values work.


I think Sunil’s hit the nail on the head. The above code is wrong.
It relies on “address” being less that 0x8000000, and if “address” is
a physical address, that is an invalid assumption.

David, please notice it was not invalid with 6.2.0. The NEW
interpretation of the offset parameter makes it now impossible to map
with the POSIX mmap() base addresses of PCI devices.

That would explain why you’re seeing mmap fail half the time: half of
physical 32-bit address space is below 0x80000000, the other
half is not.

So are you going to attack me like you’ve attacked everyone else
who’s attempted to help you in this thread,

If I exclude Sunil … there was no real help, only annoying statements
about our code should be wrong. Here just a list of curious statements
from Chris McKillop:

  • The only thing that broke was the mapping of /dev/mem to bring in a
    device memory space.

  • Nothing has changed in the code to mmap() in libc since 1997.

  • It is the handler for /dev/mem inside of procnto that has changed, not
    mmap() itself.

  • And it does.(working as expected … )

or are you going to thank Sunil for identifying a problem in your
code,

The problem is in my code?? Sorry, it must be a disordered perception …
as common these days! Sunil did only explain what has been changed with
mmap()!!

So I know at least now why it breaks my mmap code.

like most of the
reasonable, courteuous folk we deal with from day to day would do?

It’s hard to be courteous when you hear all the time ‘your code is
wrong’ or insults like ‘do you know how QNX works?’ and on the other
side, the reason for problems are unpredictable and unneccessary changes
in the documented behaviour of low level library calls by QSSL.

After all of these breaks with 6.0, 6.1, 6.2.0 and now 6.2.1 I have the
impression that QNX6.x is a moving target and every new release will be
attacking the code base of QSSL customers.

The absolute worst fact is that employees of QSSL are always claiming
that the root of these problems are in the code of their customers.

Armin

Chris McKillop wrote:
[ ]

We know QNX better then you do,

I hope so …

and it is only in your interest to show us
what you are doing so we can diagnois the problem. Something to consider
in the future when you claim entire subsystems of the kernel busted.

What you are talking about? Show me where I did claim that an ‘entire
subsystems of the kernel’ is ‘busted’!

Armin

Igor Kovalenko <kovalenko@attbi.com> wrote in message
news:b5u1tm$rm8$1@inn.qnx.com

Since the latter must be about someone else, I could spoil the fun here a
little. Yes, I think Sunil probably did hit the nail on the head here.
However Armin has one point, even while having problematic code. The track
record of QNX on compatibility between even minor versions has been less
than spotless. And all the explanations about ‘how QNX works’ do not solve
that problem. This discussion happened many times before, but somehow
people working for QNX seem to be still missing the point. If any change
in
QNX makes a previously working code (library call) to fail, for users
that’s
a failure of the library call. Asking in response ‘do you even understand
how QNX works’ would be irritating for most and insulting for some.

If you (QNX) introduce some changes into OS internals that can affect
behavior of existing code, you owe it to your customers to make every
effort
to avoid such effects and when impossible then at least find and document
such scenarios. Your regression tests should include all kinds of input
data, including invalid or otherwise problematic inputs. Everything that
is
known as ‘corner cases’. So if you break a working code, it is not a good
thing. Even when the code was problematic in the first place.

I agree with the princple and the argument of mmap() behaver change…
And it would be better that “32Bit sign value passed into mmap()
could causing problem” was in release note.

However, I disagree that, becuase of pass in a perticular 32Bit sign value
into mmap() causing a problem, then declare:

Every resource manager using mmap() will crash because mmap() is
returning an invalid address !

-xtang

Xiaodan Tang wrote:

Igor Kovalenko <> kovalenko@attbi.com> > wrote in message
news:b5u1tm$rm8$> 1@inn.qnx.com> …

Since the latter must be about someone else, I could spoil the fun here a
little. Yes, I think Sunil probably did hit the nail on the head here.
However Armin has one point, even while having problematic code. The track
record of QNX on compatibility between even minor versions has been less
than spotless. And all the explanations about ‘how QNX works’ do not solve
that problem. This discussion happened many times before, but somehow
people working for QNX seem to be still missing the point. If any change

in

QNX makes a previously working code (library call) to fail, for users

that’s

a failure of the library call. Asking in response ‘do you even understand
how QNX works’ would be irritating for most and insulting for some.

If you (QNX) introduce some changes into OS internals that can affect
behavior of existing code, you owe it to your customers to make every

effort

to avoid such effects and when impossible then at least find and document
such scenarios. Your regression tests should include all kinds of input
data, including invalid or otherwise problematic inputs. Everything that

is

known as ‘corner cases’. So if you break a working code, it is not a good
thing. Even when the code was problematic in the first place.


I agree with the princple and the argument of mmap() behaver change…
And it would be better that “32Bit sign value passed into mmap()
could causing problem” was in release note.

Yes … it causing problems if the mmap() interface is considering a
physical address as a signed 32 bit value. But I don’t think that this
is a good idea.

However, I disagree that, becuase of pass in a perticular 32Bit sign value
into mmap() causing a problem, then declare:

Every resource manager using mmap() will crash because mmap() is
returning an invalid address !

I have tested by chance only our PCI based resource managers … and the
result was just shocking!!

Armin


-xtang

Xiaodan Tang <xtang@qnx.com> wrote:
: I agree with the princple and the argument of mmap() behaver change…
: And it would be better that “32Bit sign value passed into mmap()
: could causing problem” was in release note.

I’ll add it.


Steve Reid stever@qnx.com
TechPubs (Technical Publications)
QNX Software Systems

If I exclude Sunil … there was no real help, only annoying statements
about our code should be wrong. Here just a list of curious statements
from Chris McKillop:

This is gonna be my last post in this thread. I am sorry you didn’t feel
I was helpfull but in retrospect I don’t think you wanted help, I think you
wanted to make inflamitory comments and you got me inflamed. A mistake I
won’t make with you again.

Your first post was that…

“POSIX mmap() is broken and all resmgrs() will crash”.

Both totally inaccurate statements and I really would expect that someone
with your years of experience would be more accurate in thier assesment of
the situation.

Your first posting should have been…

“My resmgrs are exiting due to a mmap() failure, errno of ENXIO. This didn’t
happen under 6.2.0 and here is the way I am calling mmap(). What is up?”.

Even when I asked you for an example of a failure case you waited until many
posts later to give it. Funny that when you finally did post an example
that is when Sunil was able to help you. And actually, Steve pointed out the
root cause before Sunil posted (returning an error for offsets less then 0).
It is even in the release notes.

You always had a bug in your code Armin, it just happens that we fixed a
bug in OUR code that exposed yours (which is unfortunate).

http://www.opengroup.org/onlinepubs/007908799/xsh/datatypes.html

off_t is a signed value and mmap() takes an off_t for it’s offset parameter.
You didn’t take that into account when you wrote your code and, due to our
bug, your code worked. That doesn’t make it any less of a bug in your code.
There has always been mmap64() which takes an off64_t. Which is why when
you changed your code to mmap_device_memory() it worked again (it uses
mmap64() internally).

#include <sys/types.h>
int main( void )
{
off_t offset;
offset = 0x90000000;
if( offset < 0 ) {
fprintf( stderr, “Signed Numbers Suck!\n” );
}
return 0;
}


EOT

chris


Chris McKillop <cdm@qnx.com> “The faster I go, the behinder I get.”
Software Engineer, QSSL – Lewis Carroll –
http://qnx.wox.org/

Igor Kovalenko <kovalenko@attbi.com> wrote:

However Armin has one point, even while having problematic code. The track
record of QNX on compatibility between even minor versions has been less
than spotless.

I agree that the release notes from QSS are often lacking the level of detail
that I expect, especially in this regard. I don’t think that is the real
issue that you are raising though.

If any change in QNX makes a previously working code (library call) to fail,
for users that’s a failure of the library call.

I would normally agree with you, but in this case you are wrong. Armin is
using a library call incorrectly, the documentation says it will not work
with address ranged >4G, and Armin assumed that the MMU would never generate
an address above that on a system with less than that memory. THAT is not
a valid assumption.

What happened to Armin is simple, he used an library function wrong, and he
got lucky that on previous versions of QNX6, the library function was
technically “broken” in that it never returned values >4G. It’s fixed and
now it does. The result is that Armin’s broken code no longer works, and
his luck has run out.

This is NO DIFFERENT than writing code that uses shared memory but without
properly using semaphores. It will typically run perfectly fine on a single
processor machine, and then crap out on an SMP machine. It’s NOT the SMP
machine that is broken, it’s the code. This is no different.

I DO NOT expect QSS to document all the case where “improper code would
work prior to us fixing this bug”. Sure, they changed/improved the functionality
and I expect them to document that… but not the fact that BAD CODE used to
work as a SIDE EFFECT of the previous buggy behavior of that library function.

If they had to document all the instances where a bug of theirs would allow
a programmer to “get away with” bad code, and then document again when they
fixed (and therfore broke the bad code) function… that’s lunacy.

Armin is being stubborn, and foolish. He has so far had it pointed out to
him that he has made TWO mistakes in his code:

  1. assumed that an error result code would be NULL, when in fact it is not.
  2. assumed that result values would remain <4G

Instead of being polite, and thanking the individuals that tried to help
him (despite his rude and stubborn behaviour and refusal to admit he made
a coding error), he has only berated those that tried to help him…
That is just stupid, rude, etc.

So lets be blunt: Armin stop being a dick.

Cheers,
Camz.

\

Martin Zimmerman camz@passageway.com
Camz Software Enterprises www.passageway.com/camz/qnx/
QNX Programming & Consulting www.qnxzone.com

Armin Steinhoff <a-steinhoff@web.de> wrote:

It’s hard to be courteous when you hear all the time ‘your code is
wrong’ or insults like ‘do you know how QNX works?’ and on the other

Armin, your code WAS wrong, if you ask for help here, or anywhere, you
always have to be prepared to learn that you might have made a mistake.
That isn’t a critism, its people actually trying to help you. If I
make a mistake in my code (and I don’t realise it), I want someone to
point it out to me. Fixing the problem is more important to me than
bothering to let my feelings get hurt when someone suggests that I
made a mistake. You’re an adult (I assume), so get over it.

As for the ‘do you know how QNX works’ remark… you demonstrated that
you did not. The comment was justified. QNX uses IPC, there WAS a
bug, but NOT where you claimed it was. Your refusal to accept that
showed that you had (I hope, temporarily) forgotten how the majority
of libc works in QNX, which is as a wrapper function to an IPC message
to procnto or some other resmgr.

The absolute worst fact is that employees of QSSL are always claiming
that the root of these problems are in the code of their customers.

In this case, they were right.

Cheers,
Camz.


\

Martin Zimmerman camz@passageway.com
Camz Software Enterprises www.passageway.com/camz/qnx/
QNX Programming & Consulting www.qnxzone.com