Segmentation fault with a call to dispatch_handler

I have a random segmentation fault when I’m calling the function dispatch_handler when I’m receiving a message approximately bigger than 500 bytes. I’m using this class since a while and I never had problem with it but since I’m now having a bigger message I got segmentation fault pretty often. The size of the message defined is 1024 bytes and the maximum number of parts is 65536 so the problem is not there. I can receive millions of small messages without any problem but when I got a bigger message I never know if I will get a segmentation fault.

This is the code I have that is failing:

//------------------------------------------------------------------------
IpcServer *IpcServer::CreateIpcServer(
const std::string & p_strInName)
throw( eIpcError )
{
// create the server
IpcServer *l_poServer = new IpcServer;
assert(l_poServer);
l_poServer->m_strServerName=p_strInName;
// initialize the mutex for accessing the priority_queue
// this may not be necc. but I did not find positive information
// about QNX libraries telling me that priority_queue is thread safe
if ( pthread_mutex_init( &l_poServer->m_oQueueLock, NULL ) != EOK )
{
delete( l_poServer );
throw( IPC_MUTEX_INIT_FAIL );
}

// initialize the semaphore , sem name must start with a /
if ( sem_init( &l_poServer->m_oMsgAvailSem, 0, 0 ) != EOK )
{
    delete( l_poServer );
    throw( IPC_SEM_INIT_FAIL );
}

// get the IPC server up
try
{
    l_poServer->InitResMgr( p_strInName );
}
catch( eIpcError err )
{
    delete( l_poServer );
    throw( err );
}

// and start the thread
pthread_attr_t attr;
pthread_attr_init( &attr );
pthread_attr_setdetachstate( &attr, PTHREAD_CREATE_DETACHED );
if ( pthread_create( &l_poServer->m_oServerThread, &attr, ServerThread, reinterpret_cast<void *>( l_poServer )) != EOK )
{
    delete( l_poServer );
    throw( IPC_THREAD_INIT_FAIL );
}
return( l_poServer );

}

//------------------------------------------------------------------------
void IpcServer::InitResMgr(
std::string inName )
throw( eIpcError )
{
// create the dispatch interface
mdispatch = dispatch_create();
if ( mdispatch == NULL )
{
throw( IPC_CREATE_DISPATCH_FAIL );
}

// set the resmgr attributes
memset( &mresmgr_attr, 0, sizeof( mresmgr_attr ));
mresmgr_attr.nparts_max   = MAX_MESSAGE_PARTS;
mresmgr_attr.msg_max_size = MAX_MESSAGE_SIZE;

// setup default I/O functions to handle open/read/write
iofunc_func_init( _RESMGR_CONNECT_NFUNCS, &mConnectFuncs, _RESMGR_IO_NFUNCS, &mIoFuncs );

// setup the attributes for the entry in the filesystem
const int PERMISSIONS = S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH ;
iofunc_attr_init( &mIoFuncAttr, S_IFREG | PERMISSIONS, 0, 0 );

// Display statistics following a "cat" command on the cmd line.
mIoFuncs.read = IpcServerRead;


// and attach to the filesystem
mresmgr_id = resmgr_attach( mdispatch, &mresmgr_attr, inName.c_str(), _FTYPE_ANY, _RESMGR_FLAG_SELF,
                               &mConnectFuncs, &mIoFuncs, &mIoFuncAttr );

// check for failure
if ( mresmgr_id == -1 )
{
    throw( IPC_RESMGR_ATTACH_FAIL );
}

// setup the message callback
memset( &mmessage_attr, 0, sizeof( mmessage_attr ));
mmessage_attr.nparts_max   = MAX_MESSAGE_PARTS;
mmessage_attr.msg_max_size = MAX_MESSAGE_SIZE;
mmessage_attr.flags        = MSG_FLAG_DEFAULT_FUNC;  // take all messages

// attach the message callback
mmessage_id = message_attach( mdispatch, &mmessage_attr, _IO_MAX+1, _IO_MAX+1, MessageCallback, reinterpret_cast<void *>( this ));

if ( mmessage_id == -1 )
{
    throw( IPC_MSG_ATTACH_FAIL );
}

// and finally the context for the dispatch layer to use
mdispatch_context = dispatch_context_alloc( mdispatch );

if ( mdispatch_context == NULL )
{
    throw( IPC_DISPATCH_CONTEXT_FAIL );
}

}

//------------------------------------------------------------------------
void *IpcServer::ServerThread(
void *inArg )
{
TRACETHREAD(“IpcMsgServer”);

//dispatch_context_t *ctp_ret;
IpcServer *server = reinterpret_cast<IpcServer *>( inArg );

while( true )
{
    server->mdispatch_context = dispatch_block( server->mdispatch_context );
    if ( server->mdispatch_context != NULL )
    {

[color=red] // This is where it is failing.
dispatch_handler( server->mdispatch_context );
}
else
{
cout<<“IPCServer::Dispatch context is NULL”<<endl;
}
}
}

Sorry for no indentation, it seems that this text doesn’t support tabs.

Thanks for any help

My spidy sense tells me that there is something curious about allowing 65K parts. If you are only expecting 1024 bytes, how would expect to break it up in to that many parts? That’s not to say that something is wrong with your code, but it wouldn’t shock me to learn that its never come up, or been tested before.

On the other hand, within dispatch_handler() should be some of your routines, right? How do you know it is not your code that is burping here?

Can you post the MessageCallback routine?

Here is the MessaCallback routine:

//------------------------------------------------------------------------
int IpcServer::MessageCallback(
message_context_t *inCTP,
int MessageNumber,
unsigned int inFlags,
void * inClass )
{
static long reply = 0; // returns 0 on succesfull insertion, otherwise send MsgError
IpcServer *server = reinterpret_cast<IpcServer *>( inClass );
MsgQueueEntry entry;

std::vector<char> m_oCompleteMsg;      //
unsigned inSize=0;
bool bMessageReplace=false; // True if we replace an existing message
int iBytesRead=0;
unsigned i=0;
char cBuffer[MAX_MESSAGE_SIZE];

// copy the buffer over!
do {
	m_oCompleteMsg.reserve(m_oCompleteMsg.size()+MAX_MESSAGE_SIZE);
	iBytesRead = 0;
	iBytesRead = MsgRead(inCTP->rcvid,cBuffer,MAX_MESSAGE_SIZE,i*MAX_MESSAGE_SIZE);
	if (iBytesRead>0) {
		m_oCompleteMsg.insert(m_oCompleteMsg.end(),cBuffer,&cBuffer[iBytesRead]);
		i++;
		inSize+=iBytesRead;
        
	}else if (iBytesRead<0){
		//cerr << "MsgRead Fatal Error <0" << strerror(errno) << endl;
		//return 0;
		break;
	}
} while ( iBytesRead>0 );

if (inSize==0) 
{ // Nothing will be read if the block process died
	//cerr << "MsgRead Fatal Error, Nothing read " << endl;
	return 0;
}

try
{
    entry = server->CreateMsgQueueEntry(m_oCompleteMsg, bMessageReplace, inCTP->info.pid);
    
    // and stuff the pid in there too
    entry.mSenderPid = inCTP->info.pid;

    // If we are registering a Event
    if (entry.mType == string(TypeInfo(typeid(MsgRegisterEvent)).name()))
    {
        for ( int i = 0; i < MsgBase::MESSAGE_SIZE_OFFSET; i++ )
        {
            entry.mBuffer.erase( entry.mBuffer.begin());
        }
        MsgBase* msg = MsgMakerFactory::ConstructMsg(true, entry.mBuffer );
        MsgRegisterEvent* msgRegister = reinterpret_cast<MsgRegisterEvent*>(msg);

        ((IpcServer*)inClass)->mEvent= msgRegister->GetEvent();// Save the event type
        ((IpcServer*)inClass)->mId   = inCTP->rcvid;           // Save the receiver id

        // reply to the caller

		cout << "MsgBaseServer Register to receive pulse " << inCTP->rcvid << " to " << server->m_strServerName<< endl;

		/* Delete any prevously pending messages */
		/* Hoping no important messages are erase like many ping !!!*/
		((IpcServer*)inClass)->m_oQueHighest.clear();
		((IpcServer*)inClass)->m_oQueHigh.clear();
		((IpcServer*)inClass)->m_oQueNormal.clear();
		((IpcServer*)inClass)->m_oQueLow.clear();
		((IpcServer*)inClass)->m_oQueLowest.clear();

        if (MsgReply( inCTP->rcvid, EOK, reinterpret_cast<char *>( &reply ), sizeof( long )))
        {
			cerr << "Error replying to the client " << strerror(errno) << endl;
		}

        if (msg) delete msg;
		msg=NULL;
        return(0); // Don't put this message in the queue
    }
    else
    {
	    // reply to the caller
		MsgReply( inCTP->rcvid, EOK, reinterpret_cast<char *>( &reply ), sizeof( long ));
	}
}
catch(...)
{
    MsgError(inCTP->rcvid,EBUSY);
    return( 0 );
}

// and let anybody waiting know that there is a message ready
if (bMessageReplace==false) {          // If a message was added, not only replace
	sem_post( &server->m_oMsgAvailSem ); // Unblock any waiting thread

	//Some thread ask to be inform of new messages with a pulse first
    if (((IpcServer*)inClass)->mId!=-1) { // If a pulse id is valid, then send a pulse
        int rc;
        rc = MsgDeliverEvent(((IpcServer*)inClass)->mId,&(((IpcServer*)inClass)->mEvent));
		//LOGDEBUG("DeleveringEvent");
        if (rc == -1) {
            cerr << "MsgDeliverEvent failed " << strerror(errno) << endl;
        }
    }
}	
return( 0 );

}

I will try to decrease the number of parts since I’m never using more than 2 or 3 anyway.

Also I have a trace in my code that is telling me that this is exactly this thread that is making the segmentation fault and exactly this line. It’s doing it everytime.

The line you have marked is a subroutine call with a single argument, server->mdispatch_context.
This line would only cause a segmentation fault if

  1. server has bad data in it
  2. server->mdispatch_context is pointing outside of legal memory
  3. dispatch_handler points to a bogus location in your code
  4. The call causes a stack overflow

None of these make much sense. If it is 1) or 2), why doesn’t the previous “if” statement cause the same segment violation. 3) would indicate some weird compile link problem. 4) would be a little too coincidental unless you were somehow causing a recursive call, which I do not see.

I’d like to see a print out of the addresses server and &server->mdispatch_context right before this occurs.

It seems that decreasing the number of parts from 65K to 5 has fixed the problem. I will continue my testing but after thousands of sends of the message that was crashing very often before it’s not crashing anymore.

Thank you very much for your help.

You should report this to QNX tech support. I don’t think 65K parts makes much sense, but there should be graceful error reporting rather than a segmentation violation. That’s the only way something obscure like this will get fixed.