[Journal of the Simplified Spelling Society, 1988/2, pp5-10 later designated J8]
[Patrick Hanks: See Journal article.]

Conventionality and Efficiency in Written English: the Hyphen.

Patrick Hanks.

Patrick Hanks is Chief Editor of Collins English Dictionaries and a Research Fellow in the University of Birmingham. He edited Collins English Dictionary (1979) and is managing editor of the Collins Cobuild English Language Dictionary (1987). The Cobuild Project is a research project in the University of Birmingham in which 18 million words of contemporary English have been analysed for a learners' dictionary. Current work includes preparation of an English grammar on the basis of this data and designing computational tools for linguistic analysis.

This article is based on a paper given at the Society's Fifth International Conference held in July 1987.

Abstract.

To start with, the notion of the convention in written forms is examined, and some examples are given of variations and inconsistencies that occur in traditional orthography. Conventions of spelling are contrasted with the relative freedom of punctuation in British English. The hyphen is taken as an example of inconsistency in written forms. It is argued that inconsistencies, in the form of competing conventions, lead to inefficiencies, and competing conventions in the use of the hyphen are an extreme example of this.

Use of the hyphen in English contrasts with spelling, in that the rules for its use are not clearly conventionalized. This in itself is a source of inefficiency. Evidence will be given of the major current inconsistencies in the use of hyphens and some resulting inefficiencies.

A few simple rules for standardizing the use of hyphens in English could be associated with proposals for simplifying spelling, leading to greater communicative efficiency.

Conventionality and Spelling.

The title of this paper is 'Conventionality and Efficiency', but it might well have been 'A Case Study in Orthographic Inefficiency', since it is the orthographic inefficiency of English punctuation, and in particular of the use of such marks as the hyphen, to which I wish to draw your attention.

I would like to start with a brief discussion of conventionality, using examples from English orthography by way of illustrative material. I shall then go on to contrast the state of conventionality in English orthography with the state of conventionality in English punctuation.

In British English in particular, we have a situation in which orthography is highly conventionalized. Whatever we may think of the queer old conventions of English spelling, one can at least say that there is a wide measure of agreement as to what they are. It is important to draw a distinction between a situation in which conventions exist, even though individual members of the community may not know them, and a situation in which conventions do not exist at all, or are so little known that they might as well not exist. I think the latter situation is more fertile soil for reform than the former.

To the extent that conventions exist at all, they can lay claim to a modicum of efficiency, however weak their foundations in logic or their connections with other modes, such as phonology, may be. Members of the Simplified Spelling Society will surely have considered the potential difficulties of a situation in which different segments of the writing and publishing community are aspiring to different conventions.

In modern English, published texts such as books and newspapers do not, as a general rule, cause their readers to spend time puzzling over a written form and wondering what word might be represented. Book and newspaper publishers employ copyeditors and proofreaders who have the specific duty, inter alia, of ensuring that the conventions are adhered to. In less formal situations, too, users of written English have ways of agreeing among themselves what the conventional spellings are, and of ironing out disagreements in such a way as to preserve the convention rather than allowing the continued co-existence of more than one orthographic form.

To take a more or less random example, there is widespread agreement that the conventional spelling of consensus is with the three <s>s and one <c>, rather than with two of each. However, many people outside the world of printed and published texts spell the word with two <c>s. In my experience, when users of the spelling concensus, with two <c>s, discover that their practice is at odds with that of other members of the English-writing world, and that the others are supported by weighty tomes such as dictionaries, they surrender. They do not continue to insist that their form is as good as or better than the other one (as well they might); instead they fall meekly into line, confessing the error of their ways. That is, they themselves will agree that the spelling they have used is erroneous. How is this decided? The appeal to the authority of a dictionary is usually taken as sufficient to clinch the matter, even though most respectable lexicographers devote quite a lot of energy to disclaiming any authoritarian status. Within the dictionary, the etymology is often consulted and is regarded as a source of evidence for correctness. The fact that consensus is derived from sentire 'to feel' is regarded as conclusive evidence in favour of the 3-<s> spelling. However, the appeal to etymology is not in fact sufficient evidence by itself of conventionality in matter of spelling. The <-ant> ending of the noun descendant, for example, is etymologically indefensible, although it is undeniably conventional. Latinists will be able to think of many other examples. I do not know whether, etymologically speaking, the word address should have one <d> or two <d>s, but English has two and French has one: surely they can't both rely on the appeal to etymology to support their different conventions? At moments like these anyone can sympathize with those who feel exasperation with the discrepancies in the conventions.

Here, of course, I am preaching to the converted. Spelling reformers have long been pointing out the discrepancies in English spelling conventions; my purpose in mentioning them is merely to draw your attention to the distinction between discrepancies in established conventions on the one hand and discrepancies in practice, where no strong conventions exist, on the other.

Are there any examples in English of genuine discrepancies of belief and practice as to what the spelling convention actually is?

There are several, of course. The most striking is the more or less free choice in British English between <-ize> and <-ise> spellings for verbs such as conventionalize. The form <-ise> is not in common use in American English, and there is a belief among some British users of <-ise> that <-ize> is American. This is in line with the general British belief that any unfamiliar bit of language must be American. In fact, of course, there is plenty of evidence that <-ize> is in conventional use in British English.

This particular case of competing conventions is irritating, time-wasting, and costly for publishers. In dictionaries, it can also be very costly in terms of precious space. In the case of the Cobuild dictionary, for example, the first drafts of the explanations were written freely in the researchers' own preferred spellings. Time and effort then had to be expended on normalizing all uses of <-ise> to the <-ize> that eventually came to be preferred. In the case of dictionaries, of course, there is a need to practice what one preaches: since one or other form must be entered first in the dictionary and be the main entry, the assumption arises among users that the form carrying the main entry is the preferred form for some principled reason.

I am not sure why it is considered undesirable to have free variation within the same book, but consistency in such matters seems to matter to many people, especially publishers and reviewers. Perhaps there is a fear that some distinction will be perceived where none is intended. Support for this hypothesis can be gleaned from the case of program(me), where, in British English, the <-mme> spelling has become specialized for programmes of music and broadcasting, while the single <-m> spelling has become specialized for computing uses. I leave you to ponder the confusion that has resulted in inflected forms of the verb: just how many <m>s do you use in program(m)ed and program(m)ing, and do you associate a distinction in meaning with a distinction in spelling here too? As far as I can see, the trend in British English, which is towards doubling the <m> in all cases, is matched by a trend in the opposite direction in American English. However, the evidence is by no means clear.

Regional differences can, of course, provide many examples of coexisting conventions in spelling: British colour vs. American color, and so on. But regional differences are not competing conventions in the sense under discussion here; they represent rather a signal as to which segment of the speech community a writer belongs to.

More interesting, for present purposes, are competing plurals. Consider first words such as index and appendix, cactus and corpus. What is the conventional plural of these words? As a committed user of the morphologically English plurals indexes and appendixes, I would like to believe that there really are competing conventions here. Unfortunately, the facts do not support this hope.

One of the advantages for a committed descriptivist of working with a large body of evidence such as the computerized Birmingham Corpus of English Texts is that one can actually interrogate the corpus and get answers that help in judging the state of conventionality, The corpus is constantly growing and being improved. The version that was used for research on the dictionary consisted of 18 million words of running English text, taken from a wide variety of sources.

This corpus, then, contains 9 cases of appendices, but only 2 of appendixes, both from the same writer. It seems that this writer and I are in a minority in our preferences: when it comes to deciding what is most conventional, there is no contest. The story is much the same with index. There are 29 cases of indices, from 15 or 16 different sources; there are only 5 cases of indexes, and these are from just 2 sources.

The corpus does not contain enough evidence to enable one to judge what the conventional plural of corpus is; there is only one example of corpora and there are none at all for corpuses. The evidence from general English texts is not sufficiently specialized to shed light on an abstruse problem with a rather technical word.

More interesting is the plural of cactus. Pace the Cobuild dictionary, there is no evidence at all in the corpus for cactuses; this was clearly put into the dictionary by an editor who shares my own prejudices. Perhaps it will be taken out in the next edition. The corpus contains 7 cases of cacti, which should clinch the matter. However, careful examination of the 36 lines for cactus itself reveals some that are indisputably plural: for example,
Some cactus only open their blossoms at night.
There are other lines where cactus seems to be being used as a mass noun: for example,
large growths of palm and cactus.
In still other cases, there is no way of telling whether the writer intended to use a mass noun or a plural noun, e.g.,
giant tortoises lumber through the cactus.
Thus there does appear to be some doubt as to what the conventional plural of cactus is, but it is not the doubt that we were hoping for. It is more a grammatical doubt than a choice between two morphologically established forms.

This brings me to my final orthographic example in the search for genuine uncertainty as to what the conventions of English spelling might be. It concerns the word diocese. For etymological and other reasons, the singular noun is conventionally spelled ending in <-ese>, although ignorant persons such as myself may believe (until shown evidence to the contrary) that it is spelled in <-is>, on the analogy of such words as thesis and basis. The Cobuild corpus shows 23 examples of the spelling diocese and none at all for diocis. OK, we were wrong, then. So far so good: the convention survives, unscathed by our ignorance.

But what about the plural?

The fact that diocese is a count noun, supported by the real-world observation that episcopal churches have more than one bishop and therefore presumably more than one diocese, leads us to expect realistically that there will be a plural.

The Cobuild corpus shows not a single example of any morphologically plural form - neither dioceses, which is presumably what the dictionaries predict, since they are silent on the issue, nor dioces, which users of the <-sis> spelling might expect by analogy with bases and theses.

Morphologically, there is no orthographic evidence in the Cobuild corpus for a separate plural form of this word. However, if we examine all the lines for the type diocese carefully, we find that two of them appear to be plural.

They are:
the diocese of Gibraltar and London...
and
We're much closer connected with diocese and Christians outside than we were.
This evidence is supported by evidence from straw-polling and comparison of intuitions (a time-honoured lexicographical technique, first mentioned explicitly by Noah Webster in his 1828 preface, in which he comments that he "fortified his opinion with that of some gentlemen in whose opinion he had confidence").

I have confidence in the intuitions of my colleagues, at any rate as a way of supplementing corpus evidence, so I asked them (orally) what is the plural of diocese. Eight out of twelve members of the COBUILD team offered /daɪəsi:z/. They were quite uncertain about how this might be spelled, although all of them were quite sure about the conventional spelling of the singular. In particular, one colleague who was in this majority had what she describes as 'an ecclesiastical childhood' (she is a vicar's daughter); the word, in both singular and plural forms, is therefore in her active vocabulary. The other team members gave answers which may be summarized as ranging from 'don't know' to wrestling with the tongue-twisting dioceses in ways that raised the suspicion that they had never had occasion to use the word, let alone the plural.

I think, then, that the plural of diocese may be a case where the convention of written English is unclear. There are very few such, and I am arguing that this is probably a good thing. More competing conventions may introduce more inefficiency and wasteful expense.

Conventionality and Punctuation.

English punctuation, by contrast, is much less trammelled by conventionality. I do not know whether this is a good thing or a bad thing. In some ways, I think it is probably a bad thing.

To take a fairly obvious example, the distinction between restrictive and nonrestrictive relative clauses is regularly and unconsciously made in the intonation pattern of English. How useful and efficient it would be if the same distinction were made by the conventional use of commas in written English.

There is a vital distinction between, say,
To my daughter Judith I leave my collection of gold coins, which are in my bank vault.
and
To my daughter Judith I leave my collection of gold coins which are in my bank vault.
Suppose that at some time before his death the testator removed some but not all of the coins from the bank vault and left them in his son Peter's room. Presence or absence of the comma could make all the difference if the will were contested. Peter's claim to the gold coins would be much stronger if the will did not contain a comma after gold. The relative clause would be restrictive, and Judith would be entitled to only the gold coins which were in the bank vault and no others. The restrictive status of the relative clause allows or encourages the implication that the testator may have other collections of gold coins which are not in the bank vault. If the comma is present, however, the relative clause is nonrestrictive, and can be read merely as helpful guidance to the legatee as to where to find her bequest. Judith's case would be strengthened by presence of the comma.

Of course, no self-respecting lawyer would allow a client to write such a clause in a will, but it is the occurrence of such clauses in home-made wills that can result in lawsuits. No doubt this is one reason why the legal profession in Britain some years ago took to writing all its legal documents without any punctuation in them at all. This draconian solution could hardly be called helpful, and in fact of course even more ambiguities arise in totally unpunctuated text.

Examination of a large body of published texts supports the view that even p rofessional copyeditors and proofreaders in Britain have a rather hazy view of punctuation, let alone lawyers and the general public. There are such widespread discrepancies in the use of punctuation such as the comma in English published usage that it would be hard if not impossible to describe in detail what the conventions are. Usage is highly idiosyncratic. The situation for literate texts in the U.S.A. seems to be different: American punctuation in published texts is recognizably more consistent and logical. This, then, may be an example of an area in which linguistic prescriptivism in Britain is desirable.

The best that can be said of British punctuation at present is that at least the rather random use of commas does not seem to be costing anyone very much in terms of money or wasted effort.

I shall be arguing that associated with any proposals for spelling reform and more efficient use of written English should be proposals for more efficient use of punctuation. I use for illustrative purposes the hyphen.

The stopped Hyphen.

Three uses of the hyphen may be distinguished: orthographic, grammatical, and end-of-line. Principles for each kind of use are discussed. Within the context of simplified spelling, the principle is proposed that the hyphen should not be used at all, except when there is some clear justification for its existence.

Orthographic hyphens are those sometimes seen in the middle of lexical items that could equally well be regarded as single words or as two independent words, eg sign-writer. We may compare current usage (as observed in the Cobuild corpus) with principles of efficiency and consistency. This entails an examination of the relationship between the two or more morphemes making up a 'word' such as farm-hand, farm-house, far-reaching, far-off, and so on. Orthographic criteria must also be considered, as in fire-engine and fire-eater, where the co-occurrence of the letter <e> inhibits coalescence. Also discussed under the heading of the orthographic hyphen are hyphens which represent some phonological point, for example those in co-operate and re-enter. It will readily be seen that omission of hyphens between consonants should not present a problem within Cut Spelling. They may indeed be among the few cases where a doubled consonant survives.

The grammatical hyphen, as in expressions such as an easy-to-master language, may well have a function in promoting efficiency of understanding in complex syntactic units. Compare a machine-tool minder with a machine tool-minder. Is the hyphen sufficient to indicate that in one case the referent is human and animate, while in the other it is inanimate?

End of line hyphenation.

End-of-line hyphenation is probably the source of more wasted effort than anything else in the typesetting industry. Printers' readers are very fond of objecting to compositors' break points. There are conflicting principles at work in current practice. For example, should we hyphenate etymologically (eg speedo-meter) or should we hyphenate phonologically (eg spee-dom-eter)? Does it matter? If not, why do master printers allow their readers to make so many expensive alterations in this area? But where should the line be drawn? Can we really accept a hyphen in a word such as mo-re? Is it any more objectionable than id-ol?

The question arises, could the hyphen be abolished completely? Would we actually be better off without it? To simplify the symbol inventory by removing one of the symbols would certainly be a step in the direction of greater efficiency from the point of view of text producers; would it lead to difficulties of comprehension, and therefore inefficiency, from the point of view of readers? If there are good reasons to keep the hyphen, what are they? What rules for conventional use of the hyphen can be proposed that would maximize efficiency and minimize waste?

Let us look in more detail at the end-of-line hyphen. Hyphens are used at the end of lines in printed texts in order to keep the right-hand margin straight (known as 'right justification'), without increasing the amount of inter-word spacing in any given line beyond acceptable limits. One clear way of avoiding the need for end-of-line hyphenation is to abandon right justification, accepting a ragged right margin. This is the solution, I see, adopted on the second page of your conference programme: on the page headed 'Background'. The main objection to an unjustified right margin is that it is quite wasteful of space.

BACKGROUND

It was long thought English spelling reform just
meant of writing words by their sound. But the
obstacles to this procedure are now clear: above all
the variations in pronunciation and the need to
ensure continuity of literacy. Instead of
phonographic representation, the principle
now proposed is efficiency, i.e. the convenience of all
categories of user. The task facing orthographers is
thus to determine what kind of spelling best meets
this criterion.
Space wasted by unjustified right margin: excerpt from the
Simplified Spelling Society's conference programme

For example, the first paragraph of the 'Background' section could well have been one line shorter if right justification, with end-of-line hyphen, had been used. Over the extent of a whole book, the difference can amount to several pages. In a book such as a dictionary, where space is at a premium, a ragged right margin is not normally an acceptable option. Double-column setting, which of course is standard in dictionaries, increases the need for end-of-line hyphenation; many more words get hyphenated in a narrow column than in a wide one. Space is also the main reason why double-column setting is standard in dictionaries: it allows the publisher to adopt a smaller typesize on a large page without losing readability, and it reduces the amount of space lost through short lines at the end of paragraphs. This is even more true of newspaper setting, where the use of several columns on a very large page greatly increases editorial flexibility.

It seems unlikely, therefore, that we could abolish end-of-line hyphenation completely. What principles can be recommended for those who are forced to use it?

Proofreaders both in printing houses and publisher, houses have traditionally always devoted a great deal of energy to trying to ensure that end-of-line hyphenation is 'correct'. It is worth noting just how costly this obsession can be. In order to move a single letter forwards or backwards from one line to the next or to the preceding, both lines have to be reset (with the possibility of further errors arising within them), the original lines have to be cut out of the text (with the possibility of accidentally damaging the lines above and below the cut), and the new lines have to be stripped in (with the possibility of poor alignment and, if the material being used is film, the possibility of a nasty thin black line being visible in the published text). Wise printers and wise publishers brief their readers to be very conservative before insisting on a change in the end-of-line hyphenation. It is, perhaps, hardly surprising that in at least one printing house a compositor and a printer's reader actually came to blows over the reader's persistent objections to the compositor's chosen break points in what was otherwise a very clean text!

End-of-line hyphenation has long been a steady source of wasteful expenditure in the typesetting industry, although with the growing use of computers in typesetting, some of them with quite sophisticated look-up tables for hyphenation points, the problem is no longer as widespread as it was.

If, as I am suggesting, there are circumstances in which end-of-line hyphenation is unavoidable, what suggestions should we make for conventionalizing the circumstances in which it is used? Perhaps the best starting principle, from the point of view of efficiency, would be that end-of-line hyphenation should be as liberal as possible. Printers and publishers should accept any break point unless there is a good reason not to. They should discourage their proofreaders from altering any end-of-line hyphenation point that comes out of the typesetter without very good reasons. The good sense of this is supported by the fact that there are at least two competing principled systems of end-of-line hyphenation in operation in British English: one which is phonologically based, adopted for example by Collins, and one which is etymologically based, promulgated by Oxford among others. The former would opt for spee-dom-eter, while the latter would prefer speed-o-meter. My suggestion is that any of six possible break points in this word should be regarded as acceptable: spee-d-o-m-e-t-er.

What constraints, then, should be placed on this liberal proposal?

We might wish to say that 'obvious' syllable boundaries should count as preferred break points. The question then arises, what counts as an 'obvious' syllable boundary? Keyw-ord and mainfr-ame are unacceptable to everyone, since the composition of the compound in each case is transparent. But should we accept disg-usted, di-stress, and distr-ess? The liberal proposal depends in part on acknowledging that syllable boundaries are unclear, but some seem clearer than others.

Another commonsensical suggestion might be that there should be no hyphenation within, say, 2 characters of the end of a word. Obviously, this means that no four-letter word would be hyphenated. I did once see an English book typeset in Czechoslovakia in which the word mo-re had been hyphenated after the <o>. This is absurd because the word is a monosyllable. But from a typographical point of view it would be equally pointless to hyphenate idol or idle; the space saved does not justify the effort involved. But then, what about the -ed of disgusted? In traditional typography, the only other acceptable break point is after dis-. However, under the more liberal policy being suggested here, disgus-ted, for example, would be acceptable.

A less controversial suggestion would be that there should be at least one full syllable both before and after the hyphen: this would rule out mo-re, but it would also rule out strai-ght and str-aight.

Without prejudice to what might be decided about syllable boundaries, it might be possible to identify certain clusters where it would clearly be undesirable to introduce a hyphen and line break. For example, presumably everyone would agree that it is undesirable by any standard to introduce a break in the middle of an orthographic cluster representing a single phoneme: <ph>, <sh>, and <th> are cases in point. An adaptation of the same rule would discourage hyphenation in the middle of a diphthong, ruling out stra-ight and proce-ed. In fact, straight is probably about the longest word which, under these proposals, would not be hyphenated at all at the end of a line.

There are many other modifications to the set of liberal guidelines being proposed here that should be considered. For example, it is often said that one should not break a line in such a way that a misleading first element of a word appears at the end of a line: after the <d> in read-just or after the <e> in arse-nic, for example. But how serious is this as a source of potential problems for a reader reading sequential text? The objection seems to be based on a notion that people read texts letter by letter and word by word. But do they? If they read in larger units - for example clause by clause, phrase by phrase, or tone unit by tone unit - the objection falls. In addition, the desirability of keeping things simple is worth bearing in mind: the more complex a set of rules is, the less likely it is to be implemented efficiently.

Enough has been said to illustrate the dimensions of the problem of the end-of-line hyphen.

The Orthographic Hyphen.

At a rough estimate, there are between 800 and 1000 words in the Cobuild dictionary for which, if we go back to the corpus, we can observe variation, for no very clear reasons of principle, in the written form. Some people write these lexemes as one word, some as two words, and some compromise with a hyphen. For example, there are 5 occurrences of sledge hammer written as two words, 7 where it is written solid (i.e. as one word), and 6 where it has a hyphen. In considering spelling and.efficiency, this seems to be an area where some recommendations in the direction of standardization of usage might be appropriate. In most (though not all) cases, no meaning distinction is at stake. Where a meaning distinction is at stake, especially where what is in question is some grammatical point, which I shall discuss under the heading of 'the grammatical hyphen', the distinction is often obliterated by the random variations in the base form.

Let us again start with the proposal that the hyphen should not be used at all, in order to test whether it does in fact have any useful function.

It is possible to distinguish 3 main classes of words in which the possibility of a mid-word hyphen is at issue. These are: noun-noun compounds, nominal derivatives of phrasal verbs, and words containing prefixes. There are a number of less frequent classes around the edges, such as verbs from noun+verb compounds (e.g. gatecrash), and oddities such as offlicence and unputdownable. I shall concentrate on examples from the three main classes, starting with words containing a prefix.

As Tom McArthur has pointed out, the orthographic hyphen seems to be doing a very useful job in making a written distinction between two quite distinct words: reform and re-form. Another example is recreation and re-creation. This is analogous to the useful function of the apostrophe in distinguishing between well and we'll, as opposed to all the rather pointless uses objected to by George Bernard Shaw amongst others.

I am much less convinced by arguments in favour of the orthographic hyphen to make some phonological point, as in the case of microorganism, cooperate, antiimmigration, readjust and even nonnuclear. I would be glad to see this particular hyphen abolished in any spelling system. I wonder whether the hyphen in these words really does aid phonological recognition and realization? In testing this, it would of course be important to rule out the influence of familiarity of one form rather than other. No doubt every spelling reformer would agree that it takes a short while to get used to a new system.

We must, however, recognize that the balance of usage is against us, at any rate in British English. Microorganism, for example, is spelled 21 times with a hyphen and only 11 times as one word in the Cobuild corpus. Well, at least this is evidence of competing conventions - a clear case for resolution by prescription, even if the balance of usage is siding (as usual in English, it seems) with the least efficient convention. We should also note in passing that this proposal, which would lead to abandonment of the hyphen in cooperative, would create an anomaly with its short form co-op, which would retain its hyphen under the reform/re-form rule mentioned above.

Less defensible, in my opinion, is the widespread use of the hyphen in words such as coexist, reuse, antisemitic, panamerican: no real ambiguity or phonological difficulty is at stake. Nonnuclear falls into this class: it is generally hyphenated in current written English, although the doubled <n> presents no more difficulty than that in unnatural, which is apparently never written with a hyphen.

At the far end of this particular cline lie words such as subcategory, subhuman, antihero, antimatter and postwar. Here, the only justification for the widespread use of the hyphen is that people do not seem willing to give up the notion that the bound morpheme (sub-, anti-, or post-) has some independent status as a meaningful element. The cases of pseudo and quasi are interesting in this respect: in British English they fall into this class, although in American English, for some writers at least, they apparently count as independent words.

Noun compounds.

The chaotic state of British English as regards hyphenation of noun compounds may be judged from the following tiny random selection from a list of more than 500 words in the Cobuild corpus where the choice of orthographic form in English seems to be more or less arbitrary.

WORD
sledgehammer
stepping stones
saddlebag
test tube
treetops
videotape
windowbox
windowpane
passerby
SOLID
7
2
17
5
12
14
5
7
12
HYPHENATED
6
7
7
17
13
9
8
15
71
2 WORDS
5
3
3
4
6
13
7
4
5

In the context of spelling reform, total abolition of the orthographic hyphen for noun compounds might be desirable. Writers would simply have to choose between writing one or two words. The choice would depend on several factors, not least the writer's perception of whether the lexeme was functioning as a single unit at the word rank, or whether it could be satisfactorily accepted as a word + word group. So, for example, a writer might spell sledgehammer as one word since very few people think of sledge as a semantically independent unit modifying hammer, whereas gas fire might be more satisfactory as two words, since it falls neatly into the well-known English pattern of noun modifier + noun.

A similar view might be taken of nouns and adjectives derived from phrasal verbs (pickup, makeup, ripoff, getaway, takeoff and so on). It is important to distinguish these from the phrasal verb itself, which (if I may be permitted a momentary prescriptive outburst) SHOULD NEVER BE SPELLED WITH A HYPHEN. The noun and adjective derivatives could, in my view, always be written as one word.

The Grammatical Hyphen.

This process of noun derivation from phrasal verbs brings me to what I call the grammatical hyphen. This category overlaps to some extent with the category of orthographic hyphens just discussed.

Many writers, myself included, like to use hyphens to indicate a certain kind of rank shift, where a group of words has been assigned the grammatical function of a single word. Examples are:

a never-to-be-forgotten experience
end-of-line hyphen
an easy-to-read text vs. This text is easy to read.

The question arises whether any genuine ambiguity or difficulty of understanding would arise from omitting these hyphens. I think we would be hard put to it to show that it would, but I would be glad to have the views of others. Earlier, I invented a case where some genuine meaningful consequences might follow from placement of a hyphen in different positions in a phrase (machine-tool minder vs. machine tool-minder). I have to confess that in browsing through the hyphens in the Cobuild corpus I have not come across one case of such a distinction in actual language use. It seems that, no doubt wisely, people rarely rely on punctuation to make such subtle points of meaning.

Some conventional uses of grammatical hyphens seem both hard to learn and singularly pointless: for example the attributive/predicative distinction made in: a well- intentioned gesture vs. the gesture was well-intentioned.

In British English, as I have tried to show, I think we are suffering - or at any rate, suffered in the past - from creeping hyphen-mania. My recommendation is that most of them should be avoided. I close with a widespread but, to me, particularly irksome example of what might be called a pseudo-hyphen that seems to be becoming increasingly widespread. It is the hyphen that joins a submodifier to a modifier, as in highly-strung - or increasingly-widespread. Here again, I think we have a circumstance in which HYPHEN SHOULD NEVER BE USED.

Back to the top.