Core dump issue

Hi, everyone.

I tried to write data into a text file at 100Hz per line. Sometimes there would be a Memory Fault and created a core dump. The core dump file implied that there was a error when calling the function _Lockfilelock().

Here’s the code:

#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <termios.h>

int main( void )
{
int fd_ser;
FILE *fp_file;
char rbuffer[300]={""}; //store GPS data
char rbuffer_crc[12]={""}; //GPS CRC
char cbuffer[400]={""}; //store configuration data
char wbuffer[40]={“log inspvasa ontime 0.1\r”}; //command
struct termios termios_p; //the serrial port attribute

//////////////////////////////////////////////
//Open the data recording file 
//////////////////////////////////////////////
if((fp_file = fopen("GPSdata.txt", "a+"))==NULL)
{
	printf("open error.\n");
	return EXIT_FAILURE;
}	

///////////////////////////////////////
//Open the serial port
//////////////////////////////////////
fd_ser = open( “/dev/ser1”, O_RDWR);

///////////////////////////////////////////////////
//Set the forwarding character ''
///////////////////////////////////////////////////
tcgetattr(fd_ser, &termios_p);
termios_p.c_cc[VFWD] = 0x00;
tcsetattr(fd_ser, TCSANOW, &termios_p);

///////////////////////////////////////
//Check satellite Omnistar
///////////////////////////////////////
printf("Checking Omnistar...\t\t");
fflush(stdout);
do
{	
	write(fd_ser, "log bestgpsposa once\r", sizeof("log bestgpaposa once\r"));
	readcond(fd_ser, cbuffer, sizeof(cbuffer), 300, 0, 5);				//Read the data
			printf("%s\n",cbuffer);
}while(!strstr(cbuffer, "OMNISTAR"));
printf("******Connected!!******\n");
memset(cbuffer, 0, 400);		//clear buffer	

///////////////////////////////////////////////////
//Set the forwarding character '*'
///////////////////////////////////////////////////
tcgetattr(fd_ser, &termios_p);
termios_p.c_cc[VFWD] = '*';
tcsetattr(fd_ser, TCSANOW, &termios_p);

sleep(3);
 	
///////////////////////////
//Send command 
//////////////////////////
write(fd_ser, wbuffer, sizeof(wbuffer));

////////////////////////////////
// Read OK
////////////////////////////////
do
{
	readcond(fd_ser, cbuffer, 13, 13, 0, 5);	//Read 'OK'
}while(!strstr(cbuffer, "OK")); 
memset(cbuffer, 0, 400);		//clear buffer	

//////////////////////
//Read GPS data
//////////////////////

while(1)
{	
	
	readcond(fd_ser, rbuffer, sizeof(rbuffer), 0, 0, 10);
	readcond(fd_ser, rbuffer_crc, 10, 10, 0, 10);
	strcat(rbuffer, rbuffer_crc);
	
	////////////////////////////////////////////////		
	//Write directly into GPSdata.txt
	////////////////////////////////////////////////
	fprintf(fp_file, "%s", rbuffer);

	printf("******Data from GPS******\n%s\n", rbuffer); 
	memset(rbuffer, 0, 300);												//clear buffer
	memset(rbuffer_crc, 0, 12);	  	
	}

//////////////////////////////////////////////////////
//Clear the forwarding character '*'
//////////////////////////////////////////////////////
tcgetattr(fd_ser, &termios_p);
termios_p.c_cc[VFWD] = 0x00;
tcsetattr(fd_ser, TCSANOW, &termios_p);

////////////////////////
// Close the file 
////////////////////////	
close( fd_ser );
fclose(fp_file);

printf("GPS process done!\n");

return EXIT_SUCCESS;

}

were is the fclose().

What is in buffer. Why use “%s” instead of usin buffer.

When posting code post complete code.

Sorry about not describe the problem clearly. I have posted the complete code.

The data in the buffer is something like:
%INSPVASA,1546,506648.111;1546,506648.111248050,41.836873263,-87.627346927,156.220162736,-0.049152740,0.008858111,-0.394364234,0.796106837,-1.517555203,131.984234026,INS_ALIGNMENT_COMPLETE*743ffb22

I use “%s” because I need to write the data formatly into the text file later.

Rnyter,

You might consider giving fwrite a try and see if that fixes the problem.

Tim

I could point out lines of code but let’s generalize and develop your debugging skills. You have made the assumption, in many places, that when a message is received from the serial port that it’s NULL terminated, but it’s not.

For example you cannot do a strcat after a readconf because the received data doesn’t end with a 0 ( null terminator ). It wouldn’t be as bad if you fill the receive buffer with 0 before calling readconf (which the code sometimes does). Personnaly I wouldn’t do that that because it hurt performance. Instead I would rely on the size of the data received as returned by readconf and put a 0 at the end of the data block, making it a real and safe string.

You said “I use “%s” because I need to write the data formatly into the text file later”. I don’t understand that this has do to with it. fprintf( fd, “%s”, buffer ) generates the exact same results as fprintf( fd, buffer ) however I could be wrong and the %s may be faster. Not sure anymore ;-) That being said printf("%s\n", buffer) is much slower then puts(buffer)

Also never do a memset() with constant for size. Always use a sizeof.

There is a while(1) which no never get out of, yet there is clean up code afterward??

Thanks Tim, I will try that.

Thanks for your detailed and patient reply, Mario.

My apology for my bad style of coding. I will try to improve me skills. I know that there is a terminator, something like \r\n, at the end of the received data. However during the configuration stage of the GPS, we need to send a lot of logs over the serial port. And each time we will get some responses like “OK” or “OK [COM1]” or else and then followed by the data string. The problem is that there are also the same terminator before and after these response. The only difference between those response and the data string from GPS is there is a ‘’ in the data string, right before the CRC. That’s why I tried to use the '’ as a mark and read the remaining CRC and do a strcat.

You are right that it is a bad idea to do these so in another version, when it comes to the data collecting stage (without sending logs), I unraw the serial port, in another version, so it will read the data until it meets a terminator. But I think that would not make any differences because the memory fault occurs when calling the _Lockfilelock().

I don’t know whether it would be faster to use fprintf(fd, “%s”, buffer) than fprintf(fd, buffer) or not. But I know I will need to use fprintf(fd, “%s, %s, …”, gps_speed, gps_velocity, …) later. So if there is a problem when using “%s” I think I still need to figure it out what the problem is.

About the memset(), it works. But you’re right, I should be more careful about this.

For the while(1). In the new version, it creates a thread to break the loop and do the remaining clean up things. Since I want to make the description clearer, I posted this old version code. The same problem happens sometimes when running with the new version.

Thanks again!

This terminator means nothing to the C language. Only a 0 byte can be use as a string terminator.

Just as I mention above, you can’t use a strcat because the data received does not contain a 0 at the end of the block.

The fprintf function is being use all over the world. It is very much extremely unlikely to be buggy… The way you handle “strings” is most definitely causing memory corruption which ends up crashing fprintf.

To be more clear :

char buffer[200];

This statement allocates space on the stack, since it’s local, the standard say the content is undefined.

then readconf( buffer…);

Say it read 4 bytes “OK\n\r”. If we’d look in the memory area where buffer lives we could see something like:

O K \n \r & @ D L F ! 2 … and so forth.

Notice the 4 first bytes are the received data, and the rest of the array is full of garbage, and what do you know none of them are 0. That means the string is very long, in fact we don’t even know how long it is. The 0 that terminates the string could end up being 1 Meg pass buffer, creating a 1 Meg string…

char buffer2[200];
buffer2[0] = 0; // create an empty string

strcpy ( buffer2, buffer );

Now what do you thing is going to happen? If you are lucking you get a SIGSEGV if not you get fprintf that crashes :wink:

Mario,

In his main loop he definitely has terminators thanks to the memsets so he can do the strcat call.

while(1) 
{ 
readcond(fd_ser, rbuffer, sizeof(rbuffer), 0, 0, 10); 
readcond(fd_ser, rbuffer_crc, 10, 10, 0, 10); 
strcat(rbuffer, rbuffer_crc); 

//////////////////////////////////////////////// 
//Write directly into GPSdata.txt 
//////////////////////////////////////////////// 
fprintf(fp_file, "%s", rbuffer); 

printf("******Data from GPS******\n%s\n", rbuffer); 
memset(rbuffer, 0, 300); //clear buffer 
memset(rbuffer_crc, 0, 12); 
} 

So unless the fprintf is crashing on the 1st time through the main loop (something he can check by incrementing a variable or he can just move the memsets to just before the readcond calls) it’s something else entirely that is causing the problem.

That aside, all your comments about his code is excellent advice.

Tim

[quote=“Tim”]
Mario,

In his main loop he definitely has terminators thanks to the memsets so he can do the strcat call.

[code]

Yes but it’s AFTER the first readconf. On the first call to readconf it’s already too late.

Maybe I am wrong but when I defined the rbuffer[300] and rbuffer_crc[12], I initialized them as empty arrays. I think there are also 0s at the end of these buffers, actually there are 300 0s and 12 0s separately.

You’re right, I missed it. Hum then change the fprintf to something like fprintf(rbuffer, “%s”, “a know string” ); If it doesn’t crash then there is something wrong with rbuffer. If it still crashes, replace readcond() with a strcpy of what you expect to received. I’m still convince something gets corrupted, don’t see what though.

You should check the return value of the open of the serial port. Also check the return value of fprintf ( what if the disk gets full).

Before the fprintf, print the value of fd_file to make sure it doesn’t get corrupted.

write(fd_ser, wbuffer, sizeof(wbuffer)); should be
write(fd_ser,wbuffer, strlen ( wbuffer ) ); otherwise it’s sending extra 0.
Or don’t put a size in [] and do a sizeof(wbuffer)-1. The -1 is to not send the extra 0.

Thanks, I will try that. But the problem is I don’t know when the error will occur. I mean the code I posted usually worked well. It was collecting data and writing into the text file for ten minutes even half an hour. But sometimes, which I cannot predict, a Memory fault occured. The only thing I know is the error occured when fprintf called the _Lockfilelock(). Another thing I know that is, since I print all the data on the screen rows by rows, when core dumped, the rows of the data in the text file are less than the rows on the screen. Consider the _lockfilelock(), maybe sometimes, somehow the file is locked but the program keep trying to write data into it, after maybe ten times trying, the program failed, and core dumped?

Even if locking failed for some reason I wouldn’t expect it to crash, fprintf would return -1 or something like that.

The problem may occur rarely because it depends no what the GPS returns.

Yes the textfile is using fprintf and as such is buffered try putting a fflush() afterthe fprintf or setit to non buffered with setvbuf. I think unlike the console a /n doesn’t cause data to be flushed to a file.

If you are using 6.4.1 you could try using mudflap ( check the doc ).

rnyter,

Did the fwrite help?

Also when the crash occurs and you look at the core dump when you walk back up the stack what does rbuffer and rbuffer_crc contain? Knowing that would go a long way toward determining what exactly is happening in terms of corruption (which I agree with Mario must be happening but isn’t entirely clear to me either where it does).

Late note here:

I just looked at readcond in the help doc’s.

The case where the last 3 arguments are (0,0,t) is marked as RESERVED. Thus I assume this means an indeterminate behavior. Your main readcond uses that exact argument format:

readcond(fd_ser, rbuffer, sizeof(rbuffer), 0, 0, 10);

Maybe it’s as simple as changing that is all you need.

Tim

OK, thanks. I will try using it to see if it would find out the corruption.

Hi, Tim.

Like I said, I don’t know when the crush will happen or not. Sometimes it happens once or twice but other times it never happens. So we will need some time to test it. I also will try checking the memory in rbuffer and rbuffer_crc.

Thanks for your notice about the RESERVED argument format. I really didn’t noticed that. What I used at the beginning is (sizeof(rbuffer), 0, 10) and I think I’d better change it back.

Rnyter

rnyter,

One other thing you might want to change:

In your read of data

readcond(fd_ser, rbuffer, sizeof(rbuffer), 0, 0, 10);

You ask for up to the full buffer size (300). If for any reason readcond had this much data you’d get 300 bytes worth. Then the strcat that occurs 2 lines later that adds the crc would definitely overflow the 300 byte buffer. To be really safe, either you should read 287 bytes or have another buffer that is bigger than rbuffer + rbuffer_crc + string terminator character.

Tim

Nice catch Tim!

size = readcond(fd_ser, rbuffer, sizeof(rbuffer), …);
… check and handle error
totalsize = readconf( fd_ser, &rbuffer[size], sizeof( rbuffer) - size, … );
… check and handle error
rbuffer[totalsize] = 0; //null terminat to turn into a C string.

I don’t like to rely on terminator or timeout. What I usually do when handling serial port is set it in unblock mode and use io_notify to get a pulse when there is something in the rx buffer. When the pulse is received the code go and get what ever is in the rx buffer. Then I have a state machine that
handles the data. One could get rid of the io_notify overhead by doing having a thread do a readconf with a minmum of 1 byte but a maximum of 1024.

Thanks to all your suggestions I’ve think I solved the problem now.

The program rarely crushes when running on the computer and diplaying the received data directly onto the moniter.

However it happened almost every time when I use phindows on another computer over a tweisted wire.

What I see is that on the phindows, it display the data row by row normally in a small console window. But when I maximize the console window, the displaying slows donw quickly and it cannot even display the data row by row. It displays segment by segment. And soon followed by a memory fault and core dumped.

I checked the data on the screen and in the textfile. Both of them have some corrupted data. It is this corrupted data and the strcat() causing the problem like you said.

To justify that, I change the serial port into unraw mode and use read(fd, rbuffer, sizeof(rbuffer)) so I don’t have to use the strcat(). By the way, each row of the data comes from the GPS will be ended by a \r\n which can be considered to be a whole row.

Under the same conditions, though the displaying is also slow and there is also corrupted data both on the screen and in the textfile when I maximize the console winodw on phindows. But after I ran the program for a while I didn’t meet the memory fault again.

Of course I have to do more tests to see if it really fix the problem.

To Tim, I tried the fwrite(), the same problem happened.

Thanks to you guys again!

Rnyter