[Up] [Previous] [Home] [Other]

A Modest Proposal

Quite a long time ago, I had felt that there ought to be a standard modified version of ASCII for use in word processing applications:

replacing the characters not normally found on a typewriter by additional characters which are found there. One could do the same for a basic set of characters appropriate to a typesetting keyboard as well, shown in the third chart above, or for a set of characters for computing, as shown in the fourth chart. I think that originally I interchanged the substitutes for ~ and \ as when I first had the idea, I wasn't thinking exclusively in terms of the now-ubiquitous keyboard layout derived from the 101-key Model M keyboard:

Similarly, if that keyboard layout is to be used without any change, the substitutes for ` and " would need to be interchanged for the typesetting character set, but that gives a character equivalent in meaning to ` the code for ", which seems inappropriate.

Changing a few characters within the basic 94-character printable characters of ASCII originally was not unusual, when ASCII was strictly a 7-bit code;

as this was done to accommodate the accented letters used by numerous languages. Later, those first 94 printable characters were kept constant in codes derived from ASCII that were 8 or 16 bits long.

Thinking in terms of Unicode, and its extension of ASCII to what was, at first, a 16-bit code (although it was only Unicode 1.0 that was strictly defined as a 16-bit code) to support other languages, it seemed to me that because some foreign languages have more than 32 letters in their alphabet, it was a pity that what we think of 8-bit ASCII (officially, ISO 8859-1) couldn't have been restructured significantly, rather than having its first 128 characters kept strictly compatible with ASCII:

As noted above, I felt that alternate ASCII character sets for computation and word processing would have been useful. This was still the case even when ASCII was expanded to eight bits from seven:

The multiplication and division symbols used in grade school textbooks weren't particularly important characters, and so the OE ligatures should have been left in those code positions for ISO 8859-1. But the emphasis on accented letters, while understandable to facilitate international use, meant the code was chiefly oriented to word processing.

It didn't include the symbols for less than or equal to, not equal to, and greater than or equal to, which seemed to me to be the most important deficiencies of ASCII when used for writing computer programs. And so, I put those characters in, along with the characters I removed from ISO 8859-1, in the computing character set, along with the Greek alphabet.

Finally, another anomaly of 8-bit ASCII is the code positions used for control characters.

Control characters certainly are important for a code used for communications between a computer and a computer terminal. But they really aren't very useful for a code used to store documents as files on a disk drive.

These days, computers usually don't have terminals connected to them - instead, the computer is the terminal, as this keeps hardware costs to a minimum.

Taking out the control characters, except for 00 being NUL and FF being DEL, allows including small capitals as a basic character case in the code, having the same status as upper and lower case - instead of being treated as a presentation form which requires a switch to a different font.

I'm willing to accept that boldface and italic are presentation forms, but the keyboard ought to include keys for switching to those as well, rather than requiring one to lift one's hands up and use the mouse to select text and switch to them.

However, doing away with control characters might be just a tad too radical.

Using the word processing form of ASCII as a basis, and keeping only the most important additional characters, one could arrive at the following 8-bit character set to serve as the starting point for a code:

However, while I tried to pick the most necessary of the added special characters, this is too tight a squeeze. The Euro symbol really ought to be added, if one is going to make changes, and several other highly useful characters had to be left out.

Note that, though, with the restructuring done to accomodate languages like Armenian, it is no longer as convenient to handle Chinese, Japanese, or Korean with a double-byte character set (DBCS). At least, though, with this design, it's not completely impossible. With ISO 8859-1 in its original form, the printable characters with the high bit set allowed 94 possible prefix characters as the first character of a two-character code representing one Chinese character. Using the high bit to indicate lower case had spoiled that. But with a space reserved for small capitals, at least those 32 codes could be used as DBCS prefixes. Given the full integration of the high bit, the second character could be any of the 222 normal printable characters, rather than being only one of 94 characters.

This doesn't mean that the situation for DBCS coding has improved, as 2 times 94 times 94 was always an option that had been easy to take, and the Big-5 coding had made use of that option.

Considering that many control characters aren't used, while some are, and that the two possible 8-bit codes for technical and word processing use would be derived from portions of a 16-bit code, which would have to be modified from Unicode to fit a significantly modified basis that replaces ISO 8859-1, this diagram indicates how enough room for new currency symbols might be obtained:

After taking some time to examine the history of the development of ASCII, an even stranger idea occurred to me.

First, I thought, when adding lowercase to ASCII, the old ESC and ALT MODE characters could have been left in their old places; except for the lower case letters, instead of adding printable characters, one - the backslash - could even be subtracted. Then there would be 89 printable characters, one of which would be the space, matching the size of a conventional typewriter keyboard, and removing the temptation to add keys in awkward places on computer keyboards.

A truncated ASCII of this sort is depicted in the first section of the diagram above.

Then, I thought, if ASCII is to be modified to reflect the typewriter keyboard, perhaps some characters could be moved around so as to allow a bit-pairing keyboard to have its characters in the normal positions of a typewriter keyboard, rather than forcing people to wait until the technology advanced before they could enjoy typewriter-pairing keyboards.

The second diagram shows a possible arrangement with this goal in mind. The third diagram shows an even more radical rearrangement; it has the one flaw, though, that because it is assumed the period and comma are never shifted, it interferes with providing an alternate APL keyboard layout.

The fourth diagram shows the rearrangement in the third diagram, with typewriter alternate characters. Note that some pairs of symbols are assigned in reverse compared to the typewriter arrangement shown at the top of the screen; this is because characters that occured on the shifted keys on a typewriter were, in that earlier arrangement, put on the "first" key - the one on the left - when the two characters were split between two different keys.

Instead of a new version of, or a replacement for, ASCII, this code could be considered as a terminal connection code - used strictly for computer terminals, with translation to ASCII being done for intercommunication.


[Up] [Previous] [Home] [Other]