Sometimes you get nipped in the butt by a small, usually insignificant detail that just happens to be significant in your specific situation. Today I ran into such a "gotcha", and it reminded me of a story that fits nicely.
First, some code. Don’t overthink it…
This code prints each character from a raw string raw string teststring as a (hex) ASCII value to the console. Now ask yourself: what do you think the output will be?
You can try it here.
The answer might surprise you - or maybe not, if you’ve been bitten by this one before. What if I told you the output depends on who you ask? "Well, duh," you’d say. But what if I told you it also depends on their platform? That might be enough of a hint.
So here it is. On Linux or macOS, you’ll see:
46, 6F, 6F, 0A, 42, 61, 72
But on Windows:
46, 6F, 6F, 0D, 0A, 42, 61, 72
Yep - the age-old newline difference between Windows, Linux, macOS.
Once you see it, you’ll go “Duh!” as I did. Still, I was at least a little
surprised. For a "modern"-ish language like C#, I would've expected some kind
of normalization. But no, C# preserves the text
as entered. And the compiler
happily embeds those platform-specific newlines right into your string.
But
wait. It gets better. Suppose you write your code on Windows, commit and push
to SVN (or Git), and your coworker on Linux checks it out. Depending on Git's
config or your .gitattributes
,
Git may silently change your line
endings! Not a big deal most of the time… until it is.
For reference: I ran into this in a DSMR parser (what it is). It parses "telegrams" from electricity meters. The fix? Simple — since it was a unit test, I just moved the raw string into a file in my testdata directory (already covered by .gitattributes) and called it a day.
And another sidenote: Strictly speaking, classic Mac OS (before OS X) used \r as a line ending, but macOS (formerly OS X) uses \n, just like Linux.
Story time!
This must’ve been 15+ years ago. We worked with a 3rd party who supplied a
daily file with sensitive data. To track leaks, they sprinkled the file with
'marker' records - unique, harmless, dummy lines tied to each partner. That way, when a file
gets leaked, it can be traced back to whomever leaked the file.
Their docs stated: all records end with
\n
. Easy enough. I wrote the
parser, got our first file… and it immediately choked. Turned out a handful of
records (the markers, obviously) ended in
\r\n
. Out of millions of lines,
maybe 10 had this "wrong" line ending.
I contacted them. The issue was fixed the next day (or maybe two). What stood out, though, was that they claimed none of their other partners had ever noticed. Or maybe they just kept quiet… we’ll never know. My guess? Most companies just load the file into some system that doesn’t care about newline style. Newlines are newlines, amirite?
Anyway. Your turn. Got newline war stories? Drop them in the comments. And don’t forget to like, subscribe, and ... just kidding 😉
No comments:
Post a Comment