Analysis

< Back to Program Output | Forward to Recommendations >

When the text files were opened in binary mode, all three compilers returned the exact file contents regardless of the read function used. One minor problem occurred reading the malformed file bad1.dos (in which the standard DOS line terminator "CR/LF" is reversed to "LF/CR") using fgets(): the carriage-return was returned as the first character of the next line rather than the last character of the current line. This is because fgets() stops reading a line when it finds a newline character (a line-feed), so it didn't see the carriage-return until it started reading the next line.

None of the compilers were able to identify the record boundaries in the Macintosh formatted files (data.mac and noendeol.mac) using the fgets() function in either text or binary modes; they simply returned all three records as a single line. Since fgets() looks for line-feeds, and Macintosh text files don't contain any, fgets() read to the ends of the files without ever detecting an end-of-line condition.

Using the fgetc() and fread() functions (as they are implemented in the program), all of the compilers were able to identify the records in the Macintosh files when they were opened in binary mode. In addition, Visual C++ and the gcc compiler also successfully read the Mac files with fgetc() and fread() in text mode. However, in text mode, the Borland compiler continued to return all three records as a single line when using fgetc() and fread(). This has to do with the way the Borland compiler translates carriage-returns in text mode.

Both Visual C++ and the Borland compiler perform some character translation in text mode, but the translation is subtly different. Visual C++ "hides" only those carriage-returns that immediately precede a line-feed character. The Borland compiler hides all carriage-returns, regardless of the presence or absence of line-feeds. This explains why it couldn't identify the record boundaries in the Macintosh files using fgetc() and fread() while Visual C++ could. The functions implementing fgetc() and fread() look for either carriage-returns or line-feeds to identify the line ends. Since the Borland compiler hides all carriage-returns in text mode, and Macintosh text files use only carriage-returns as line terminators, all line termination was "invisible" to the Borland compiler. The carriage-returns in the Mac files were visible to Visual C++ in text mode because they do not precede line-feeds.

In text mode, both Visual C++ and the Borland compiler hide DOS end-of-file markers (hex 0x1A, decimal 26). Both compilers completely stop reading when they encounter an end-of-file marker (in text mode). This is not a problem if the marker occurs at the end of the file (as in eof.dos), but resulted in incomplete reads of the malformed file bad3.dos (which contains an embedded end-of-file marker). In binary mode, both compilers were able to read the entire file, and returned the end-of-file marker at its correct location.

(It's worth noting that DOS end-of-file markers are almost never used anymore, so the chances of encountering a spurious one are almost nil.)

The Borland compiler performed one unexpected translation in text mode when reading using fgets(). Apparently, if the last character in the file is a carriage-return (as it is in data.mac and bad1.dos), the final carriage-return is translated to an end-of-file marker.

The gcc compiler (Unix) performs no character translation when a file is opened in text mode. For any given read function, the output was identical in both text and binary modes.


The bottom line is that different compilers, even on the same platform, can produce different results when reading the same text files.

< Back to Program Output | ^ Up to Top | Forward to Recommendations >