line endings

:: introductory, computers, rant, newlines

You know what really bothers me? Line endings.

Let me give a little background for those who need it. Computers store things as binary numbers, right? So to store letters, words, and sentences, you need to encode those things as numbers. Enter the Ascii encoding system, and many others (such as UTF–8). These have mappings of letters to numbers so that computers can store them. What’s A? 65. What’s B? 66. What’s b? 98 (lower case letters need a different number than upper case letters). Is that all? No — we need numbers for things like spaces, tabs, and even things like backspaces and a bell ring (because early computer terminals had some sort of bell). There’s even an encoding for a character to give you a new line! Or… is there?

It turns out that there is a code for a line feed (analogous to scrolling a typewriter page up one row), and one for a carriage return (analogous to moving the typewriter carriage to the left). Early computer operating systems all decided to do various different things to represent new lines with these characters. Unix systems decided to use a single line feed character for a new line (maybe recognizing that computers are not bound to be typewriters forever — line feeds are represented as \n for short), early Apple operating systems used a single carriage return (probably an equally valid choice — represented as \r), and DOS (as well as some others that no longer exist for practical purposes) used a carriage return followed by a line feed (because, hey, that’s what you have to do on a typewriter — naturally, this is represented as \r\n). When Apple switched to a Unix operating system with Mac OSX, they left behind \r for \n. So now there are really only two competitors - \n in the Unix world, and \r\n in the Windows world (as well as some various standards, like http).

Why does this matter? When you write programs, you often put in strings of characters to display. Often you want to have line breaks. If you are writing just for Unix you can write a literal \n. If just for Windows you can write a literal \r\n. But what if you want to write something that will work portably across all operating systems? You have to do things like concatenate the rest of your string with a variable line ending depending on the operating system you are going to compile for. This leads to nonsense like in C++ needing to use std::endl for all newlines, rather than the easy and convenient embedded \n. Or languages needing a different function for printing with or without a new line at the end. While I don’t mind having extra functions for convenience, it is ridiculous that programmers can specify every letter precisely, and even obscure symbols like the bell with ease, but cannot write a newline without resorting to this nonsense! Every programmer has to keep in mind this issue and step around it. This should be an encoding level issue — one where you specify a line break and the encoding (ascii, unicode, latin–1, or whatever) stores it however it wants. Not an issue pushed up so that EVERY programmer has to know about it.

Now, this is a fixable problem. One camp simply has to convert to the other style. I submit that Microsoft should switch Windows to use \n. They don’t have to do it all at once — they can first make it optional which line ending is used, so that everyone can switch to one line ending, and leave in \r\n support for legacy stuff. Eventually, only seriously backwards compatible code will need to care — in a decade or two having multiple line endings can be simply an interesting tidbit of computer history rather than something thousands of people have to deal with every day!

Why Microsoft? Why not make Unix switch? First of all, \r\n is twice as long as \n. Not a huge deal, but why use two bytes to store one thing when you can just use one? Second, Windows is already half way there. Most Windows programs that deal with this issue that aren’t made by Microsoft already support \n line endings. Only the Windows command prompt, Notepad (the worst editor ever), and some Windows configuration stuff really care anymore. If Microsoft just patches those things the problem could just disappear. Finally, lots of Unix stuff relies on \n being the line ending. There are many more programs in Unix that care and they are used so much more. It would be a hard switch, and there is no single entity controlling Unix like there is with Windows. Windows switching is possible. Unix switching will never happen completely.

Sure, there may be a few things like http requests that need \r\n, but those will be one-off situations that a few programmers can worry about. Microsoft has a real chance here to make a serious change for programmers everywhere.

Please, Microsoft. Let’s end the madness. Let’s use a single newline.