This chapter begins by considering how we should specify what candi-dates can be expected to do, and then goes on to make suggestions forsetting appropriate test tasks.

Specifying what the candidate should be able to do


The testing of reading ability seems deceptively straightforward when itis compared to, say, the testing of oral ability. You take a passage, asksome questions about it, and there you are. But while it is true that youcan very quickly construct a reading test, it may not be a very good test,and it may not measure what you want it to measure.

The basic problem is that the exercise of receptive skills does notnecessarily, or usually, manifest itself directly in overt behaviour. Whenpeople write and speak, we see and hear; when they read and listen,there will often be nothing to observe. The challenge for the languagetester is to set tasks which will not only cause the candidate to exercisereading (or listening) skills, but will also result in behaviour that willdemonstrate the successful use of those skills. There are two parts to thisproblem. First, there is uncertainty about the skills which may beinvolved in reading and which, for various reasons, language testers areinterested in measuring; many have been hypothesised but few havebeen unequivocally demonstrated to exist. Second, even if we believe inthe existence of a particular skill, it is still difficult to know whether anitem has succeeded in measuring it.

The proper response to this problem is not to resort to the simplisticapproach to the testing of reading outlined in the first paragraph, whilewe wait for confirmation that the skills we think exist actually do. Webelieve these skills exist because we are readers ourselves and are awareof at least some of them. We know that, depending on our purpose in

11 Testing reading

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


reading and the kind of text we are dealing with, we may read in quitedifferent ways. On one occasion we may read slowly and carefully,word by word, to follow, say, a philosophical argument. Another time wemay flit from page to page, pausing only a few seconds on each, to getthe gist of something. At yet another time we may look quickly down acolumn of text, searching for a particular piece of information. There islittle doubt that accomplished readers are skilled in adapting the waythey read according to purpose and text. This being so, I see no difficultyin including these different kinds of reading in the specifications of a test.

If we reflect on our reading, we become conscious of other skills wehave. Few of us will know the meaning of every word we ever meet, yetwe can often infer the meaning of a word from its context. Similarly, aswe read, we are continually making inferences about people, things andevents. If, for example, we read that someone has spent an evening in a puband that he then staggers home, we may infer that he staggers becauseof what has he drunk (I realise that he could have been an innocent foot-baller who had been kicked on the ankle in a match and then gone tothe pub to drink lemonade, but I didn’t say that all our inferences werecorrect).

It would not be helpful to continue giving examples of the readingskills we know we have. The point is that we do know they exist. Thefact that not all of them have had their existence confirmed by researchis not a reason to exclude them from our specifications, and therebyfrom our tests. The question is: Will it be useful to include them in ourtest? The answer might be thought to depend at least to some extent onthe purpose of the test. If it is a diagnostic test which attempts to iden-tify in detail the strengths and weaknesses in learners’ reading abilities,the answer is certainly yes. If it is an achievement test, where the devel-opment of these skills is an objective of the course, the answer mustagain be yes. If it is a placement test, where a rough and ready indica-tion of reading ability is enough, or a proficiency test where an ‘overall’measure of reading ability is sufficient, one might expect the answer tobe no. But the answer ‘no’ invites a further question. If we are not goingto test these skills, what are we going to test? Each of the questions thatwere referred to in the first paragraph must be testing something. If ouritems are going to test something, surely on grounds of validity, in a testof overall ability, we should try to test a sample of all the skills that areinvolved in reading and are relevant to our purpose. This is what Iwould recommend.

Of course the weasel words in the previous sentence are ‘relevant toour purpose’. For beginners, there may be an argument for including ina diagnostic test items which test the ability to distinguish between letters(e.g. between b and d). But normally this ability will be tested indirectly

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


through higher level items. The same is true for grammar and vocabu-lary. They are both tested indirectly in every reading test, but the placefor grammar and vocabulary items is, I would say, in grammar andvocabulary tests. For that reason I will not discuss them further in thischapter.

To be consistent with our general framework for specifications, wewill refer to the skills that readers perform when reading a text as‘operations’. In the boxes that follow are checklists (not meant to beexhaustive) which it is thought the reader of this book may find useful.Note the distinction, based on differences of purpose, between expedi-tious (quick and efficient) reading and slow and careful reading. Therehas been a tendency in the past for expeditious reading to be given lessprominence in tests than it deserves. The backwash effect of this is thatmany students have not been trained to read quickly and efficiently. Thisis a considerable disadvantage when, for example, they study overseasand are expected to read extensively in very limited periods of time.Another example of harmful backwash!

Expeditious reading operations

SkimmingThe candidate can:

● obtain main ideas and discourse topic quickly and efficiently;● establish quickly the structure of a text;● decide the relevance of a text (or part of a text) to their needs.

Search readingThe candidate can quickly find information on a predeterminedtopic.

ScanningThe candidate can quickly find:

● specific words or phrases;● figures, percentages;● specific items in an index;● specific names in a bibliography or a set of references.

Note that any serious testing of expeditious reading will require candi-dates to respond to items without having time to read the full contentsof a passage.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


The different kinds of inference described above deserve comment.Propositional inferences are those which do not depend on informationfrom outside the text. For example, if John is Mary’s brother, we caninfer that Mary is John’s sister (if it is also clear from the text that Maryis female). Another example: If we read the following, we can infer thatHarry was working at her studies, not at the fish and chip shop. Harryworked as hard as she had ever done in her life. When the exam resultscame out, nobody was surprised that she came top of the class.

Pragmatic inferences are those where we have to combine informa-tion from the text with knowledge from outside the text. We may read,for example: It took them twenty minutes by road to get from Readingto Heathrow airport. In order to infer that they travelled very quickly,we have to know that Reading and Heathrow airport are not close by

Careful reading operations

● identify pronominal reference;● identify discourse markers;● interpret complex sentences;● interpret topic sentences;● outline logical organisation of a text;● outline the development of an argument;● distinguish general statements from examples;● identify explicitly stated main ideas;● identify implicitly stated main ideas;● recognise writer’s intention;● recognise the attitudes and emotions of the writer;● identify addressee or audience for a text;● identify what kind of text is involved (e.g. editorial, diary, etc.);● distinguish fact from opinion;● distinguish hypothesis from fact;● distinguish fact from rumour or hearsay.

Make inferences:● infer the meaning of an unknown word from context.● make propositional informational inferences, answering

questions beginning with who, when, what.● make propositional explanatory inferences concerned with

motivation, cause, consequence and enablement, answeringquestions beginning with why, how).

● make pragmatic inferences.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


each other. The fact that many readers will not know this allows us tomake the point that where the ability to make pragmatic inferences is tobe tested, the knowledge that is needed from outside the text must beknowledge which all the candidates can be assumed to have1.


Texts that candidates are expected to be able to deal with can be speci-fied along a number of parameters: type, form, graphic features, topic,style, intended readership, length, readability or difficulty, range ofvocabulary and grammatical structure.

Text types include: text books, handouts, articles (in newspapers,journals or magazines), poems/verse, encyclopaedia entries, dictionaryentries, leaflets, letters, forms, diary, maps or plans, advertisements, post-cards, timetables, novels (extracts) and short stories, reviews, manuals,computer Help systems, notices and signs.

Text forms include: description, exposition, argumentation, instruc-tion, narration. (These can be broken down further if it is thoughtappropriate: e.g. expository texts could include outlines, summaries, etc.)

Graphic features include: tables, charts, diagrams, cartoons, illustra-tions.

Topics may be listed or defined in a general way (such as non-technical,non-specialist) or in relation to a set of candidates whose background isknown (such as familiar to the students).

Style may be specified in terms of formality.Intended readership can be quite specific (e.g. native speaking science

undergraduate students) or more general (e.g. young native speakers).Length is usually expressed in number of words. The specified length

will normally vary according to the level of the candidates and whetherone is testing expeditious or careful reading (although a single long textcould be used for both).

Readability is an objective, but not necessarily very valid, measure ofthe difficulty of a text. Where this is not used, intuition may be relied on.

Range of vocabulary may be indicated by a complete list of words (asfor the Cambridge tests for young learners), by reference either to aword list or to indications of frequency in a learners’ dictionary. Rangemay be expressed more generally (e.g. non-technical, except whereexplained in the text).

Range of grammar may be a list of structures, or a reference to thoseto be found in a course book or (possibly parts of) a grammar of thelanguage.

The reason for specifying texts in such detail is that we want the textsincluded in a test to be representative of the texts candidates should be

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


able to read successfully. This is partly a matter of content validity butalso relates to backwash. The appearance in the test of only a limitedrange of texts will encourage the reading of a narrow range by potentialcandidates.

It is worth mentioning authenticity at this point. Whether or notauthentic texts (intended for native speakers) are to be used will dependat least in part on what the items based on them are intended to measure.


Reading speed may be expressed in words per minute. Different speedswill be expected for careful and expeditious reading. In the case of thelatter, the candidate is, of course, not expected to read all of the words.The expected speed of reading will combine with the number and diffi-culty of items to determine the amount of time needed for the test, orpart of it.

Criterial level of performance

In norm-referenced testing our interest is in seeing how candidatesperform by comparison with each other. There is no need to specifycriterial levels of performance before tests are constructed, or evenbefore they are administered. This book, however, encourages a broadlycriterion-referenced approach to language testing. In the case of thetesting of writing, as we saw in the previous chapter, it is possible todescribe levels of writing ability that candidates have to attain. Whilethis would not satisfy everyone’s definition of criterion-referencing, it isvery much in the spirit of that form of testing, and would promise tobring the benefits claimed for criterion-referenced testing.

Setting criterial levels for receptive skills is more problematical.Traditional passmarks expressed in percentages (40 per cent? 50 percent? 60 per cent?) are hardly helpful, since there seems no way ofproviding a direct interpretation of such a score. To my mind, the bestway to proceed is to use the test tasks themselves to define the level. Allof the items (and so the tasks that they require the candidate to perform)should be within the capabilities of anyone to whom we are prepared togive a pass. In other words, in order to pass, a candidate should beexpected, in principle, to score 100 per cent. But since we know thathuman performance is not so reliable, we can set the actual cuttingpoint rather lower, say at the 80 per cent level. In order to distinguishbetween candidates of different levels of ability, more than one test maybe required (see page 55).

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


As part of the development (and validation) of a reading test, onemight wish to compare performance on the test with the rating of candi-dates’ reading ability using scales like those of ACTFL or the ILR. Thiswould be most appropriate where performance in the productive skillsare being assessed according to those scales and some equivalencebetween tests of the different skills is being sought.

Setting the tasks

Selecting texts

Successful choice of texts depends ultimately on experience, judgement,and a certain amount of common sense. Clearly these are not qualitiesthat a handbook can provide; practice is necessary. It is never the lesspossible to offer useful advice. While the points may seem ratherobvious, they are often overlooked.

1. Keep specifications constantly in mind and try to select as repre-sentative a sample as possible. Do not repeatedly select texts of aparticular kind simply because they are readily available.

2. Choose texts of appropriate length. Expeditious reading tests maycall for passages of up to 2,000 words or more. Detailed readingcan be tested using passages of just a few sentences.

3. In order to obtain both content validity and acceptable reliability,include as many passages as possible in a test, thereby giving candi-dates a good number of fresh starts. Considerations of practicalitywill inevitably impose constraints on this, especially where scanningor skimming is to be tested.

4. In order to test search reading, look for passages which containplenty of discrete pieces of information.

5. For scanning, find texts which have the specified elements that haveto be scanned for.

6. To test the ability to quickly establish the structure of a text, makesure that the text has a clearly recognizable structure (It’s surprisinghow many texts lack this quality).

7. Choose texts that will interest candidates but which will not over-excite or disturb them. A text about cancer, for example, is almostcertainly going to be distressing to some candidates.

8. Avoid texts made up of information that may be part of candidates’general knowledge. It may be difficult not to write items to whichcorrect responses are available to some candidates without readingthe passage. On a reading test I encountered once, I was able to

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


answer 8 out of 11 items without reading the text on which theywere based. The topic of the text was rust in cars, an area in whichI had had extensive experience.

9. Assuming that it is only reading ability that is being tested, do notchoose texts that are too culturally laden.

10. Do not use texts that students have already read (or even closeapproximations to them). This happens surprisingly often.

Writing items

The aim must be to write items that will measure the ability in whichwe are interested, that will elicit reliable behaviour from candidates, andthat will permit highly reliable scoring. Since the act of reading does notin itself demonstrate its successful performance, we need to set tasksthat will involve candidates in providing evidence of successful reading.

Possible techniques

It is important that the techniques used should interfere as little as poss-ible with the reading itself, and that they should not add a significantlydifficult task on top of reading. This is one reason for being wary ofrequiring candidates to write answers, particularly in the language ofthe text. They may read perfectly well but difficulties in writing mayprevent them demonstrating this. Possible solutions to this probleminclude:

Multiple choiceThe candidate provides evidence of successful reading by making amark against one out of a number of alternatives. The superficial attrac-tion of this technique is outweighed in institutional testing by thevarious problems enumerated in Chapter 8. This is true whether thealternative responses are written or take the form of illustrations, as inthe following:

Choose the picture (A, B, C, or D) that the following sentencedescribes: The man with a dog was attacked in the street by awoman.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


It has already been pointed out that True/False items, which are to befound in many tests, are simply a variety of multiple choice, with onlyone distractor and a 50 per cent probability of choosing the correctresponse by chance! Having a ‘not applicable’ or ‘we don’t know’ cate-gory adds a second ‘distractor’ and reduces the likelihood of guessingcorrectly to 331–3 per cent.

Short answerThe best short answer questions are those with a unique correct response,for example:

In which city do the people described in the ‘Urban Villagers’ live?

to which there is only one possible correct response, e.g. Bombay.The response may be a single word or something slightly longer (e.g.

China and Japan; American women).The short answer technique works well for testing the ability to iden-

tify referents. An example (based on the newspaper article aboutsmoking, on page 150) is:

What does the word ‘it’ (line 26) refer to? ________________

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Care has to be taken that the precise referent is to be found in thetext. It may be necessary on occasion to change the text slightly for thiscondition to be met.

The technique also works well for testing the ability to predict themeaning of unknown words from context. An example (also based onthe smoking article) is:

Find a single word in the passage (between lines 1 and 26)which has the same meaning as ‘making of laws’. (The word inthe passage may have an ending like -s, -tion, -ing, -ed, etc.)

The short answer technique can be used to test the ability to makevarious distinctions, such as that between fact and opinion. For example:

Basing your answers on the text, mark each of the followingsentences as FACT or OPINION by writing F or O in thecorrect space on your answer sheet. You must get all three correctto obtain credit.

1. Farm owners are deliberately neglecting their land.2. The majority of young men who move to the cities are

successful.3. There are already enough farms under government


Because of the requirement that all three responses are correct, guessinghas a limited effect in such items.

Scanning can be tested with the short answer technique:

Which town listed in Table 4 has the largest population? ______

According to the index, on which page will you learn aboutNabokov’s interest in butterflies? ___

The short answer technique can also be used to write items related tothe structure of a text. For example:

There are five sections in the paper. In which section do thewriters deal with:

(a) choice of language in relation to national unity [Section …..](b) the effects of a colonial language on local culture [Section

…..](c) the choice of a colonial language by people in their fight for

liberation [Section …..](d) practical difficulties in using local languages for education

[Section …..](e) the relationship between power and language [Section …..]

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Again, guessing is possible here, but the probabilities are lower thanwith straightforward multiple choice.

A similar example (with the text) is2:

In what order does the writer do the following in her articlereproduced below? To answer this, put the number 1 in theanswer column next to the one that appears first, and so on. Ifan idea does not appear in the article, write N/A (not applicable)in the answer column.

a) She gives some of the history of migraine.b) She recommends specific drugs.c) She recommends a herbal cure.d) She describes migraine attacks.e) She gives general advice to migraine sufferers.


SUE LIMB begins an occasional series by sufferers from particular ailments

Migraine first visited mewhen I was 20, and for yearsafterwards it hung aboutmy life like a blackmailer,only kept at bay by constantsacrifices on my part. Itstyranny was considerable.Many innocent everydayexperiences would triggeran attack: stuffy rooms,fluorescent light, minuteamounts of alcohol, stayingup late, lying in at the week-end, having to wait formeals, loud noises, smoke-filled rooms, the sun, andwatching TV for more thantwo hours.

Work, social life andholidays were all equallydisrupted. Naturally, allthese prohibitions made mevery tense and angry, butanger and tension were

dangerous luxuries to awoman with my volatilechemistry.

At its worst, migrainewas incapacitating me threetimes a week, for hours onend. I was losing more thanhalf my life. I had to changemy life-style radically —giving up my job andbecoming self-employed —before the headaches wouldretreat. Nowadays, I cansometimes go for 3 or 4months without an attack,as long as I keep my imme-diate environment as cool,

dark and peaceful as possi-ble. Sometimes I think Ishould live in a cave, or lurkunder a stone like a toad.

Migraine is rather like apossessive parent or loverwho cannot bear to see itsvictim enjoying ordinarylife. Indeed, my loved oneshave sometimes in theirturn felt jealous at the wayin which migraine sweepsme off my feet and awayfrom all company, keepingme in a darkened roomwhere it feasts off me fordays on end.


of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Migraine sufferers oftenfeel a deep sense of guilt, formigraine is a bore as wellas a tyrant and kidnapper. Itdestroys social plans anddevastates work-schedules.Despite its destructivepower, however, the igno-rant still dismiss it as theproduct of a fevered (andprobably female) imagina-tion: a bit like the vapours.But if you’ve ever felt it, orseen someone live throughit, you know: migraine isthe hardest, blackest andmost terrifying of everydaypains.

Eyes shrink to the size ofcurrants, the face turnsdeathly pale, the tonguefeels like an old gardeningglove, the entire body seemsto age about 70 years, soonly a palsied shuffle to thebathroom is possible. Day-light is agonising, a thirstrages, and the vomitingcomes almost as a relief,since in the paroxysm ofnausea the pain recedesfor a few blissful seconds.Above all, the constantfeeling of a dagger strikingthrough the eyeball andtwisting into the brain canmake the sufferer long fordeath. When at last (some-times three days later) thepain begins to ebb, and onecan slowly creep back intolife, it’s like being reborn.

Migraine is the focus ofmany myths. It is emphati-cally not a recent ailment,or a response to the stressesof modern life. It has beenwith us always. Its veryname derives from theancient Greek for half the

skull — migraine is alwaysa one-sided headache. TheEgyptians had a god for it:no doubt he was more oftencursed than hymned. Somesuggest that migrainesufferers are intellectualtypes, or particularly con-scientious personalities.There is little basis for anyof this. Migraine affects 7to 18 per cent of the popu-lation, impartially; the egg-heads and the emptyheadedalike.

Anxiety, of course, cancause migraine. And fear ofan attack can itself be acause of massive anxiety.Caught in this Catch 22situation, some sufferersno longer dare make anyplans, so reluctant are theyto let down their family orfriends yet again. Thisincapacitating fear (Mellon-tophobia) shows the far-reaching damage migraineis doing to the lives of sixmillion adults in GreatBritain alone.

The best thing thesesufferers can do is to join theBritish Migraine Associa-tion without delay. Thisexcellent, lively and infor-mal organisation producesleaflets and a newsletter, andorganises fund-raising activ-ities to sponsor research. Itkeeps its members informedabout the latest sophisti-cated drugs available, andalso (most importantly)swaps members’ hintsabout herbal treatment andself-help techniques.

There are several drugsavailable on prescriptionfor the control of migraine,but perhaps the most excit-ing recent development inresearch involves a modesthedgerow plant, native to the British Isles and used for centuries by wise women for a varietyof ailments. It is fever-

f ew (ChrysanthemumParthenium).

In 1979, Dr E. StewartJohnson, Research Directorof the City of LondonMigraine Clinic, saw threepatients who had beenusing feverfew as a preven-tative, and soon afterwardshe became involved in itsclinical trials. Dr Johnson’swork is still progressing,but early results hint atspectacular success. 70 percent of subjects claim theirattacks are less frequentand not so severe: 33 percent seem completelymigraine-free. A few expe-rience unpleasant side-effects (mostly mouth-ulcers: feverfew is a verybitter herb), and it is notrecommended for pregnantwomen. But for the rest ofus, three vile-tasting fever-few leaves a day havebecome indispensable.

Ten years ago I wastaking Librium to reducestress and Ergotamine totreat the actual migrainepain. They were powerfuldrugs, which left me feelingdoped and poisoned, andthey didn’t always cure theheadache, either. Nowadays,I eat my three leaves, feelgood, and probably neverget the headache in the firstplace.

Acupuncture has alsohelped, partly by improv-ing my general sense ofwell-being, but during amigraine the pain can beimmediately dulled andeventually dispersed byneedles placed on specialpoints in the feet ortemples. Finger pressureon these points can helptoo, in the absence of anacupuncturist. Locallyapplied heat (a hot waterbottle or acupuncturist’smoxa stick—a bit like acigar—is very soothing).

‘Tyrant blackmailer,kidnapper,bore’

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


It should be noted that the scoring of ‘sequencing’ items of this kind canbe problematical. If a candidate puts one element of the text out ofsequence, it may cause others to be displaced and require complex deci-sion making on the part of the scorers.

One should be wary of writing short answer items where correctresponses are not limited to a unique answer. Thus:

According to the author, what does the increase in divorce ratesshow about people’s expectations of marriage and marriagepartners?

might call for an answer like:

(They/Expectations) are greater (than in the past).

The danger is of course that a student who has the answer in his or herhead after reading the relevant part of the passage may not be able toexpress it well (equally, the scorer may not be able to tell from theresponse that the student has arrived at the correct answer).

Gap fillingThis technique is particularly useful in testing reading. It can be used anytime that the required response is so complex that it may cause writing(and scoring) problems. If one wanted to know whether the candidatehad grasped the main idea(s) of the following paragraph, for instance,the item might be:

Complete the following, which is based on the paragraph below.

‘Many universities in Europe used to insist that their studentsspeak and write only _____________ . Now many of themaccept _____________ as an alternative, but not a ____________of the two.’

But above all the bestthing I’ve done about mymigraine is learn to relaxall the muscles surroundingthe eye. The natural responseto severe pain is to tense upthe muscles, making thepain worse. Deliberatelyrelaxing these muscles in-stead is a demandingdiscipline and requires un-disturbed concentration,but the effect is dramatic.Immediately the pain be-comes less acute.

Migraine is a formidableadversary: tyrant, black-mailer, kidnapper, bore; butafter many years’ struggle Ireally feel I’ve got it on therun. And though I’m agreat admirer of the best ofWestern orthodox medicine,it’s pleasing that mymigraines have finallystarted to slink away whenfaced not with a futuristicsuperpill, but with thegentle healing practices ofthe East and the Past.

The British MigraineAssociation, 178a HighRoad, Byfleet, Weybridge,Surrey KT14 7ED. Tel:Byfleet 52468.The City of LondonMigraine Clinic, 22Charterhouse Square,London, EC1M 6DX willtreat sufferes caught awayfrom home with a severeattack.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Until recently, many European universities and colleges not onlytaught EngEng but actually required it from their students; i.e.other varieties of standard English were not allowed. This wasthe result of a conscious decision, often, that some norm neededto be established and that confusion would arise if teachersoffered conflicting models. Lately, however, many universitieshave come to relax this requirement, recognising that theirstudents are as likely (if not more likely) to encounter NAmEngas EngEng, especially since some European students study for atime in North America. Many universities therefore now permitstudents to speak and write either EngEng or NAmEng, so longas they are consistent. (Trudgill and Hannah 2002:2)

A possible weakness in this particular item is that the candidate has toprovide one word (mixture or combination) which is not in the passage.In practice, however, it worked well.

Gap filling can be used to test the ability to recognise detail presentedto support a main idea:

To support his claim that the Mafia is taking over Russia, theauthor points out that the sale of __________ __________ inMoscow has increased by ______ per cent over the last twoyears.

Gap filling can also be used for scanning items:

According to Figure 1, _____ per cent of faculty members agreewith the new rules.

Gap filling is also the basis for what has been called ‘summary cloze’. Inthis technique, a reading passage is summarised by the tester, and thengaps are left in the summary for completion by the candidate. This isreally an extension of the gap filling technique and shares its qualities. Itpermits the setting of several reliable but relevant items on a relativelyshort passage. Here is an example:

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Information transferOne way of minimising demands on candidates’ writing ability is torequire them to show successful completion of a reading task by supply-ing simple information in a table, following a route on a map, labellinga picture, and so on.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Relatively few techniques have been presented in this section. Thisis because, in my view, few basic techniques are needed, and non-professional testers will benefit from concentrating on developingtheir skills within a limited range, always allowing for the possibilityof modifying these techniques for particular purposes and in particularcircumstances. Many professional testers appear to have got by withjust one – multiple choice! The more usual varieties of cloze and theC-Test technique (see Chapter 14) have been omitted because, whilethey obviously involve reading to quite a high degree, it is not clear thatreading ability is all that they measure. This makes it all the harder tointerpret scores on such tests in terms of criterial levels of performance.

Which language for items and responses?

The wording of reading test items is not meant to cause candidates anydifficulties of comprehension. It should always be well within theircapabilities, and less demanding than the text itself. In the same way,responses should make minimal demands on writing ability. Wherecandidates share a single native language, this can be used both foritems and for responses. There is a danger, however, that items mayprovide some candidates with more information about the content ofthe text than they would have obtained from items in the foreignlanguage.

Procedures for writing items

The starting point for writing items is a careful reading of the text,having the specified operations in mind. One should be asking oneselfwhat a competent reader should derive from the text. Where relevant, anote should be taken of main points, interesting pieces of information,stages of argument, examples, and so on. The next step is to decide whattasks it is reasonable to expect candidates to be able to perform in rela-tion to these. It is only then that draft items should be written.Paragraph numbers and line numbers should be added to the text ifitems need to make reference to these. The text and items should bepresented to colleagues for moderation. Items and even the text mayneed modification. A moderation checklist follows:

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Practical advice on item writing

1. In a scanning test, present items in the order in which the answerscan be found in the text. Not to do this introduces too much randomvariation and so lowers the test’s reliability.

2. Do not write items for which the correct response can be foundwithout understanding the text (unless that is an ability that you aretesting!). Such items usually involve simply matching a string ofwords in the question with the same string in the text. Thus (aroundline 45 in the smoking passage, on page 150):

What body said that concern over passive smoking had arisenin part through better insulation and draught proofing?

Better might be:

What body has claimed that worries about passive smoking arepartly due to improvements in buildings?


1. Is the English of text and item grammatically correct?

2. Is the English natural and acceptable?

3. Is the item in accordance with specified parameters?

4. Is specified reading sub-skill necessary in order to respond correctly?

5. (a) Multiple choice: Is there just one correct response?(b) Gap filling and summary cloze: Are there just one or two correct responses for each gap?(c) Short answer: Is answer within productive abilities? Can it be scored validly and reliably?(d) Unique answer: Is there just one clear answer?

6. Multiple choice: Are all the distractors likely to distract?

7. Is the item economical?

8. Is the key complete and correct?

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing reading


Items that demand simple arithmetic can be useful here. We maylearn in one sentence that before 1985 there had only been threehospital operations of a particular kind; in another sentence, thatthere have been 45 since. An item can ask how many such operationsthere have been to date, according to the article.

3. Do not include items that some candidates are likely to be able toanswer from general knowledge without reading the text. Forexample:Inhaling smoke from other people’s cigarettes can cause ………………It is not necessary, however, to choose such esoteric topics ascharacterised the Joint Matriculation Board’s Test in English(Overseas). These included coracles, the Ruen, and the people ofWillington.

4. Make the items independent of each other; do not make a correctresponse on one item depend on another item being responded tocorrectly.In the following example, taken from a test handbook, the candidatewho does not respond correctly to the first item is unlikely to be ableto respond to the following two parts (the second of which uses theYES/NO technique). For such a candidate, (b) and (c) might as wellnot be there.

(a) Which soup is made for slimmers?(b) Name one thing which surprised the author about this soup.(c) Did the writer like the taste?

However, complete independence is just about impossible in itemsthat are related to the structure of a text (for example, in the Migrainepassage above).

5. Be prepared to make minor changes to the text to improve an item.If you do this and are not a native speaker, ask a native speaker tolook at the changed text.

A note on scoring

General advice on obtaining reliable scoring has already been given inChapter 5. It is worth adding here, however, that in a reading test (or alistening test), errors of grammar, spelling or punctuation should not bepenalised, provided that it is clear that the candidate has successfullyperformed the reading task which the item set. The function of a readingtest is to test reading ability. To test productive skills at the same time(which is what happens when grammar, etc. are taken into account)simply makes the measurement of reading ability less valid.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Reader activities

1. Following the procedures and advice given in the chapter, constructa 12-item reading test based on the passage about New ZealandYouth Hostels on page 157.(The passage was used in the Oxford Examination in English as aForeign Language, Preliminary Level, in 1987.)(a) For each item, make a note of the skill(s) (including sub-skills)you believe it is testing. If possible, have colleagues take the test andprovide critical comment. Try to improve the test. Again, if possible,administer the test to an appropriate group of students. Score thetests. Interview a few students as to how they arrived at correctresponses. Did they use the particular sub-skills that you predictedthey would?(b) Compare your questions with the ones in Appendix 3. Can youexplain the differences in content and technique? Are there any itemsin the appendix that you might want to change? Why? How?

2. Do the sequencing item that is based on the Migraine text. Do youhave any difficulties? If possible, get a number of students of appro-priate ability to do the item, and then score their responses. Do youhave any problems in scoring?

3. Write a set of short answer items with unique correct responses toreplace the sequencing items that appear with the Migraine text.

4. The following is part of an exercise designed to help students learnto cope with ‘complicated sentences’. How successful would thisform of exercise be as part of a reading test? What precisely wouldit test? Would you want to change the exercise in any way? If so, whyand how? Could you make it non-multiple choice? If so, how?

The intention of other people concerned, such as the Minister ofDefence, to influence the government leaders to adapt theirpolicy to fit in with the demands of the right wing, cannot beignored.

What is the subject of ‘cannot be ignored’?a. the intentionb. other people concernedc. the Minister of Defenced. the demands of the right wing.

(Swan 1975)

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms


It may seem rather odd to test listening separately from speaking, sincethe two skills are typically exercised together in oral interaction. How-ever, there are occasions, such as listening to the radio, listening tolectures, or listening to railway station announcements, when no speakingis called for. Also, as far as testing is concerned, there may be situationswhere the testing of oral ability is considered, for one reason or another,impractical, but where a test of listening is included for its backwasheffect on the development of oral skills. Listening may also be tested fordiagnostic purposes.

Because it is a receptive skill, the testing of listening parallels in mostways the testing of reading. This chapter will therefore spend little timeon issues common to the testing of the two skills and will concentratemore on matters that are particular to listening. The reader who plansto construct a listening test is advised to read both this and the previouschapter.

The special problems in constructing listening tests arise out of thetransient nature of the spoken language. Listeners cannot usually movebackwards and forwards over what is being said in the way that theycan a written text. The one apparent exception to this, when a tape-recording is put at the listener’s disposal, does not represent a typicallistening task for most people. Ways of dealing with these problems arediscussed later in the chapter.

Specifying what the candidate should be able to do

As with the other skills, the specifications for reading tests should saywhat it is that candidates should be able to do.

12 Testing listening

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening




Some operations may be classified as global, inasmuch as they depend onan overall grasp of what is listened to. They include the ability to:

� obtain the gist;� follow an argument;� recognise the attitude of the speaker.

Other operations may be classified in the same way as were oral skills inChapter 10. In writing specifications, it is worth adding to each opera-tion whether what is to be understood is explicitly stated or only implied.

Informational:� obtain factual information;� follow instructions (including directions);� understand requests for information;� understand expressions of need;� understand requests for help;� understand requests for permission;� understand apologies;� follow sequence of events (narration);� recognise and understand opinions;� follow justification of opinions;� understand comparisons;� recognise and understand suggestions;� recognise and understand comments;� recognise and understand excuses;� recognise and understand expressions of preferences;� recognise and understand complaints;� recognise and understand speculation.

Interactional:� understand greetings and introductions;� understand expressions of agreement;� understand expressions of disagreement;� recognise speaker’s purpose;� recognise indications of uncertainty;� understand requests for clarification;� recognise requests for clarification;� recognise requests for opinion;� recognise indications of understanding;

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing for language teachers


� recognise indications of failure to understand;� recognise and understand corrections by speaker (of self and others);� recognise and understand modifications of statements and comments;� recognise speaker’s desire that listener indicate understanding;� recognise when speaker justifies or supports statements, etc. of other

speaker(s);� recognise when speaker questions assertions made by other speakers;� recognise attempts to persuade others.

It may also be thought worthwhile testing lower level listening skills ina diagnostic test, since problems with these tend to persist longer thanthey do in reading. These might include:

� discriminate between vowel phonemes;� discriminate between consonant phonemes;� interpret intonation patterns (recognition of sarcasm, questions in

declarative form, etc., interpretation of sentence stress).


For reasons of content validity and backwash, texts should be specifiedas fully as possible.

Text type might be first specified as monologue, dialogue, or multi-participant, and further specified: conversation, announcement, talk orlecture, instructions, directions, etc.

Text forms include: description, exposition, argumentation, instruc-tion, narration.

Length may be expressed in seconds or minutes. The extent of shortutterances or exchanges may be specified in terms of the number of turnstaken.

Speed of speech may be expressed as words per minute (wpm) orsyllables per second (sps). Reported average speeds for samples of BritishEnglish are:

wpm sps

Radio monologues 160 4.17Conversations 210 4.33Interviews 190 4.17Lectures to non-native speakers 140 3.17

(Tauroza and Allison, 1990)

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening


Dialects may include standard or non-standard varieties.Accents may be regional or non-regional.If authenticity is called for, the speech should contain such natural

features as assimilation and elision (which tend to increase with speedof delivery) and hesitation phenomena (pauses, fillers, etc.).

Intended audience, style, topics, range of grammar and vocabularymay be indicated.

Setting criterial levels of performance

The remarks made in the chapter on testing reading apply equally here.If the test is set at an appropriate level, then, as with reading, a nearperfect set of responses may be required for a ‘pass’. ACTFL, ILR or otherscales may be used to validate the criterial levels that are set.

Setting the tasks

Selecting samples of speech (texts)

Passages must be chosen with the test specifications in mind. If we areinterested in how candidates can cope with language intended for nativespeakers, then ideally we should use samples of authentic speech. Thesecan usually be readily found. Possible sources are the radio, television,spoken-word cassettes, teaching materials, the Internet and and our ownrecordings of native speakers. If, on the other hand, we want to knowwhether candidates can understand language that may be addressed tothem as non-native speakers, these too can be obtained from teachingmaterials and recordings of native speakers that we can make ourselves.In some cases the indifferent quality of the recording may necessitate re-recording. It seems to me, although not everyone would agree, that apoor recording introduces difficulties additional to the ones that we wantto create, and so reduces the validity of the test. It may also introduceunreliability, since the performance of individuals may be affected by therecording faults in different degrees from occasion to occasion. If detailsof what is said on the recording interfere with the writing of good items,testers should feel able to edit the recording, or to make a fresh record-ing from the amended transcript. In some cases, a recording may be usedsimply as the basis for a ‘live’ presentation.

If recordings are made especially for the test, then care must be takento make them as natural as possible. There is typically a fair amount ofredundancy in spoken language: people are likely to paraphrase what

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing for language teachers


they have already said (‘What I mean to say is . . .’), and to remove thisredundancy is to make the listening task unnatural. In particular, weshould avoid passages originally intended for reading, like the follow-ing, which appeared as an example of a listening comprehensionpassage for a well-known test:

She found herself in a corridor which was unfamiliar, but aftertrying one or two doors discovered her way back to the stone-flagged hall which opened onto the balcony. She listened forsounds of pursuit but heard none. The hall was spacious, devoidof decoration: no flowers, no pictures.

This is an extreme example, but test writers should be wary of trying tocreate spoken English out of their imagination: it is better to base thepassage on a genuine recording, or a transcript of one. If an authentictext is altered, it is wise to check with native speakers that it still soundsnatural. If a recording is made, care should be taken to ensure that it fitswith the specifications in terms of speed of delivery, style, etc.

Suitable passages may be of various lengths, depending on what isbeing tested. A passage lasting ten minutes or more might be needed totest the ability to follow an academic lecture, while twenty seconds couldbe sufficient to give a set of directions.

Writing items

For extended listening, such as a lecture, a useful first step is to listento the passage and note down what it is that candidates should be ableto get from the passage. We can then attempt to write items that checkwhether or not they have got what they should be able to get. Thisnote-making procedure will not normally be necessary for shorterpassages, which will have been chosen (or constructed) to test particu-lar abilities.

In testing extended listening, it is essential to keep items sufficientlyfar apart in the passage. If two items are close to each other, candidatesmay miss the second of them through no fault of their own, and theeffect of this on subsequent items can be disastrous, with candidateslistening for ‘answers’ that have already passed. Since a single faultyitem can have such an effect, it is particularly important to trialextended listening tests, even if only on colleagues aware of the poten-tial problems.

Candidates should be warned by key words that appear both in theitem and in the passage that the information called for is about to beheard. For example, an item may ask about ‘the second point that thespeaker makes’ and candidates will hear ‘My second point is . . . ’.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening


The wording does not have to be identical, but candidates should begiven fair warning in the passage. It would be wrong, for instance, toask about ‘what the speaker regards as her most important point’ whenthe speaker makes the point and only afterwards refers to it as themost important. Less obvious examples should be revealed throughtrialling.

Other than in exceptional circumstances (such as when the candidatesare required to take notes on a lecture without knowing what the itemswill be, see below), candidates should be given sufficient time at theoutset to familiarise themselves with the items. As was suggested forreading in the previous chapter, there seems no sound reason not to writeitems and accept responses in the native language of the candidates. Thiswill in fact often be what would happen in the real world, when a fellownative speaker asks for information that we have to listen for in theforeign language.

Possible techniques

Multiple choiceThe advantages and disadvantages of using multiple choice in extendedlistening tests are similar to those identified for reading tests in the previ-ous chapter. In addition, however, there is the problem of the candidateshaving to hold in their heads four or more alternatives while listening tothe passage and, after responding to one item, of taking in and retain-ing the alternatives for the next item. If multiple choice is to be used,then the alternatives must be kept short and simple. The alternatives inthe following, which appeared in a sample listening test of a well-knownexamination, are probably too complex.

When stopped by the police, how is the motorist advised tobehave?

a. He should say nothing until he has seen his lawyer.b. He should give only what additional information the law

requires.c. He should say only what the law requires.d. He should in no circumstances say anything.

Better examples would be:(Understanding request for help)

I don’t suppose you could show me where this goes, could you?Response:

a. No, I don’t suppose so.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing for language teachers


b. Of course I can.c. I suppose it won’t go.d. Not at all.

(Recognising and understanding suggestions)

I’ve been thinking. Why don’t we call Charlie and ask for hisopinion?

Response:a. Why is this his opinion?b. What is the point of that?c. You think it’s his opinion?d. Do you think Charlie has called?

Multiple choice can work well for testing lower level skills, such asphoneme discrimination.

The candidate hears batand chooses between pat mat fat bat

Short answerThis technique can work well, provided that the question is short andstraightforward, and the correct, preferably unique, response is obvious.

Gap fillingThis technique can work well where a short answer question with aunique answer is not possible.

Woman: Do you think you can give me a hand with this?

Man: I’d love to help but I’ve got to go round to my mother’s ina minute.

The woman asks the man if he can ____________ her but he hasto visit his ___________ .

Information transfer This technique is as useful in testing listening as it is in testing reading,since it makes minimal demands on productive skills. It can involve suchactivities as the labelling of diagrams or pictures, completing forms,making diary entries, or showing routes on a map. The followingexample, which is taken from the ARELS examination, is one of a seriesof related tasks in which the candidate ‘visits’ a friend who has beeninvolved in a motor accident. The friend has hurt his hand, and thecandidate (listening to a tape-recording) has to help Tom write his reportof the accident. Time allowed for each piece of writing is indicated.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening


Tom: This is a rough map of where the accident happened. There’s themain road going across with the cars parked on both sides of it – that’sQueen Street. You’d better write the name on it – Queen Street. (fiveseconds) And the smaller road going across it is called Green Road.Write Green Road on the smaller road. (five seconds) Now, I was ridingalong Queen Street where the arrow is and the little boy ran into theroad from my right, from between the two buildings on the right. Thebuilding on the corner is the Star Cinema – just write Star on the cornerbuilding. (five seconds) And the one next to it is the Post Office. WriteP.O. on that building next to the cinema. (five seconds) Well the boy ranout between those two buildings, and into the road. Can you put anarrow in where the boy came from, like I did for me and the bike, butfor the boy? (five seconds) When he ran out I turned left away from himand hit one of the parked cars. It was the second car back from thecrossroads on the left. Put a cross on the second car back. (threeseconds) It was quite funny really. It was parked right outside the policestation. A policeman heard the bang and came out at once. You’d betterwrite Police on the police station there on the corner. (five seconds) Ithink that’s all we need. Thanks very much.

Note takingWhere the ability to take notes while listening to, say, a lecture is inquestion, this activity can be quite realistically replicated in the testing

In this question you must write your answers. Tom also has to draw asketch map of the accident. He has drawn the streets, but he can’t writein the names. He asks you to fill in the details. Look at the sketch mapin your book. Listen to Tom and write on the map what he tells you.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing for language teachers


situation. Candidates take notes during the talk, and only after the talkis finished do they see the items to which they have to respond. Whenconstructing such a test, it is essential to use a passage from which notescan be taken successfully. This will only become clear when the task isfirst attempted by test writers. I believe it is better to have items (whichcan be scored easily) rather than attempt to score the notes, which isnot a task that is likely to be performed reliably. Items should be writtenthat are perfectly straightforward for someone who has taken appropriatenotes.

It is essential when including note taking as part of a listening test thatcareful moderation and, if possible, trialling should take place. Other-wise, items are likely to be included that even highly competent speakersof the language do not respond to correctly. It should go without sayingthat, since this is a testing task which might otherwise be unfamiliar,potential candidates should be made aware of its existence and, if poss-ible, be provided with practice materials. If this is not done, then theperformance of many candidates will lead us to underestimate theirability.

Partial dictationWhile dictation may not be a particularly authentic listening activity(although in lectures at university, for instance, there is often a certainamount of dictation), it can be useful as a testing technique. As well asproviding a ‘rough and ready’ measure of listening ability, it can also beused diagnostically to test students’ ability to cope with particular diffi-culties (such as weak forms in English).

Because a traditional dictation is so difficult to score reliably, it isrecommended that partial dictation is used, where part of what thecandidates hear is already written down for them. It takes the followingform:

The candidate sees:

It was a perfect day. The sun _____________ in a clear blue skyand Diana felt that all was _____________ with the world. Itwasn’t just the weather that made her feel this way. It was alsothe fact that her husband had ____________ agreed to a divorce.More than that, he had agreed to let her keep the house and topay her a small fortune every month. Life _____________ bebetter.

The tester reads:

It was a perfect day. The sun shone in a clear blue sky and Dianafelt that all was right with the world. It wasn’t just the weather

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening


that made her feel this way. It was also the fact that her husbandhad finally agreed to a divorce. More than that, he had agreedto let her keep the house and to pay her a small fortune everymonth. Life couldn’t be better.

Since it is listening that is meant to be tested, correct spelling shouldprobably not be required for a response to be scored as correct. How-ever, it is not enough for candidates simply to attempt a representationof the sounds that they hear, without making sense of those sounds.To be scored as correct, a response has to provide strong evidence ofthe candidate’s having heard and recognised the missing word, even ifthey cannot spell it. It has to be admitted that this can cause scoringproblems.

The gaps may be longer than one word:

It was a perfect day. The sun shone ………………….……………and Diana felt that all was well with the world.

While this has the advantage of requiring the candidate to do more thanlisten for a single word, it does make the scoring (even) less straight-forward.

TranscriptionCandidates may be asked to transcribe numbers or words which arespelled letter by letter. The numbers may make up a telephone number.The letters should make up a name or a word which the candidatesshould not already be able to spell. The skill that items of this kind testbelong directly to the ‘real world’. In the trialling of a test I was involvedwith recently, it was surprising how many teachers of English wereunable to perform such tasks satisfactorily. A reliable and, I believe,valid way of scoring transcription is to require the response to an itemto be entirely correct for a point to be awarded.

Moderating the items

The moderation of listening items is essential. Ideally it should becarried out using the already prepared recordings or with the itemwriter reading the text as it is meant to be spoken in the test. Themoderators begin by ‘taking’ the test and then analyse their items andtheir reactions to them. The moderation checklist given on page 154 forreading items needs only minor modifications in order to be used formoderating listening items.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing for language teachers


Presenting the texts (live or recorded?)

The great advantage of using recordings when administering a listeningtest is that there is uniformity in what is presented to the candidates.This is fine if the recording is to be listened to in a well-maintainedlanguage laboratory or in a room with good acoustic qualities and withsuitable equipment (the recording should be equally clear in all parts ofthe room). If these conditions do not obtain, then a live presentation isto be preferred. If presentations are to be live, then greatest uniformity(and so reliability) will be achieved if there is just a single speaker foreach (part of a) test. If the test is being administered at the same time ina number of rooms, more than one speaker will be called for. In eithercase, a recording should be made of the presentation, with which speak-ers can be trained, so that the intended emphases, timing, etc. will beobserved with consistency. Needless to say, speakers should have a goodcommand of the language of the test and be generally highly reliable,responsible and trustworthy individuals.

Scoring the listening test

It is probably worth mentioning again that in scoring a test of a recep-tive skill there is no reason to deduct points for errors of grammar orspelling, provided that it is clear that the correct response was intended.

Reader activities

1. Choose an extended recording of spoken language that would beappropriate for a group of students with whom you are familiar (youmay get this from published materials, or you may record a nativespeaker or something on the radio). Play a five-minute stretch toyourself and take notes. On the basis of the notes, construct eightshort-answer items. Ask colleagues to take the test and comment onit. Amend the test as necessary, and administer it to the group ofstudents you had in mind, if possible. Analyse the results. Go throughthe test item by item with the students and ask for their comments.How far, and how well, is each item testing what you thought itwould test?

2. Design short items that attempt to discover whether candidates canrecognise: sarcasm, surprise, boredom, elation. Try these on colleaguesand students as above.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms

Testing listening


3. Design a test that requires candidates to draw (or complete) simplepictures. Decide exactly what the test is measuring. Think what otherthings could be measured using this or similar techniques. Administerthe test and see if the students agree with you about what is beingmeasured.

Further reading

Buck (2001) is a thorough study of the assessment of listening. Freedleand Kostin (1999) investigate the importance of the text in TOEFLminitalk items. Sherman (1997) examines the effects of candidatespreviewing listening test items. Buck and Tatsuoka (1998) analyseperformance on short-answer items. Hale and Courtney (1994) look atthe effects of note taking on performance on TOEFL listening items.Buck (1991) uses introspection in the validation of a listening test.Shohamy and Inbar (1991) look at the effects of texts and question type.Arnold (2000) shows how performance on a listening test can beimproved by reducing stress in those who take it. Examples of record-ings in English that might be used as the basis of listening tests areCrystal and Davy (1975); Hughes and Trudgill (1996), if regionalBritish accents are relevant.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:50, subject to the Cambridge Core terms


Testing grammar

Why test grammar?

Can one justify the separate testing of grammar? There was a time whenthis would have seemed a very odd question. Control of grammaticalstructures was seen as the very core of language ability and it wouldhave been unthinkable not to test it. But times have changed. As far asproficiency tests are concerned, there has been a shift towards the viewthat since it is language skills that are usually of interest, then it is thesewhich should be tested directly, not the abilities that seem to underliethem. For one thing, it is argued, there is more to any skill than the sumof its parts; one cannot accurately predict mastery of the skill bymeasuring control of what we believe to be the abilities that underlie it.For another, as has been argued earlier in this book, the backwash effectof tests that measure mastery of skills directly may be thought prefer-able to that of tests that might encourage the learning of grammaticalstructures in isolation, with no apparent need to use them. Consider-ations of this kind have resulted in the absence of any grammar compo-nent in some well-known proficiency tests.

But probably most proficiency tests that are administered on a largescale still retain a grammar section. One reason for this must be the easewith which large numbers of items can be administered and scoredwithin a short period of time. Related to that, and at least as important,is the question of content validity. If we decide to test writing abilitydirectly, then we are severely limited in the number of topics, styles ofwriting, and what we earlier referred to as ‘operations’ that we can coverin any one version of the test. We cannot be completely confident that thesample chosen is truly representative of all possibilities. Neither can webe sure, of course, that a (proficiency) grammar test includes a goodsample of all possible grammatical elements. But the very fact that therecan be so many items does put the grammar test at an advantage.

13 Testing grammar and vocabulary

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


Even if one has doubts about testing grammar in a proficiency test,there is often good cause to include a grammar component in theachievement, placement and diagnostic tests of teaching institutions. Itseems unlikely that there are many institutions, however ‘communica-tive’ their approach, that do not teach some grammar in some guise orother. Wherever the teaching of grammar is thought necessary, thenconsideration should be given to the advisability of including agrammar component in achievement tests. If this is done, however, itwould seem prudent, from the point of view of backwash, not to givesuch components too much prominence in relation to tests of skills, thedevelopment of which will normally constitute the primary objectives oflanguage courses.

Whether or not grammar has an important place in an institution’steaching, it has to be accepted that grammatical ability, or rather thelack of it, sets limits to what can be achieved in the way of skills perfor-mance. The successful writing of academic assignments, for example,must depend to some extent on command of more than the mostelementary grammatical structures. It would seem to follow from thisthat in order to place students in the most appropriate class for thedevelopment of such skills, knowledge of a student’s grammatical abilitywould be very useful information. There appears to be room for agrammar component in at least some placement tests.

It would be very useful to have diagnostic tests of grammar whichcould tell us – for individual learners and groups – what gaps exist intheir grammatical repertoire. Such tests could inform not only teachersbut also learners, so that they could take responsibility for filling theexisting gaps themselves. For this reason, it would be important for thetests to be linked in some way or other to learning materials. There isreason to believe that we may be on the point of having computer basedtests of grammar that will be able to provide such information.

Writing specifications

For achievement tests where teaching objectives or the syllabus list thegrammatical structures to be taught, specification of content should bequite straightforward. When there is no such listing it becomes neces-sary to infer from textbooks and other teaching materials what struc-tures are being taught. Specifications for a placement test will normallyinclude all of the structures identified in this way, as well as, perhaps,those structures the command of which is taken for granted in eventhe lowest classes. For proficiency and diagnostic tests, the van Ek andTrim publications referred to in the Further reading section, which are

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


based on a notional-functional approach, are especially useful, as aregrammars like the Cobuild English Usage.


This will reflect an attempt to give the test content validity by selectingwidely from the structures specified. It should also take account of whatare regarded for one reason or another as the most important structures.It should not deliberately concentrate on the structures that happen tobe easiest to test.

Writing items

Whatever techniques are chosen for testing grammar, it is important forthe text of the item to be written in grammatically correct and naturallanguage. It is surprising how often this is not the case. Two examples Ihave to hand from items written by teachers are:

We can’t work with this class because there isn’t enough silence.


I want to see the film. The actors play well.

To avoid unnatural language of this kind, I would recommend usingcorpus based examples. One readily available source for English is theBritish National Corpus sampler on CD.

Four techniques are presented for testing grammar: gap filling, para-phrase, completion, and multiple choice. Used with imagination, theyshould meet just about all our needs. The first three require productionon the part of the candidates, while multiple choice, of course, calls onlyfor recognition. This difference may be a factor in choosing one tech-nique rather than another.

Gap filling

Ideally, gap filling items should have just one correct response.

For example: What was most disturbing _______________ thatfor the first time in his life Henry was on his own. [was]

Or: The council must do something to improve transport in thecity. ____________, they will lose the next election. [Otherwise](Sentence linking can be tested extensively using gap filling)

Or: He arrived late, _____________ was a surprise. [which]

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


An item with two possible correct responses may be acceptable if themeaning is the same, whichever is used: Thus:

He displayed the wide, bright smile _____________ hadcharmed so many people before. [which, that]

But an item is probably to be rejected if the different possibilities givedifferent meanings or involve quite different structures, one of which isthe one that is supposed to be tested.

Patient: My baby keeps me awake all night. She won’t stopcrying.

Doctor: ________________ let her cry. She’ll stop in the end.[Just, I’d, Well, Then, etc.]

This item may be improved by including the words ‘Then’ and ‘just’ sothat it cannot fill the gap.

Doctor: Then _______________ just let her cry. She’ll stop in theend.

(But if you or I’d is thought to be a possible correct response, then theitem is still not acceptable)

It’s worth saying here that if contractions like I’d are to be allowed inthe gaps (and I would recommend this), the possibility should be madevery clear to the candidates and at least one example of it should begiven at the beginning of the test.

As was pointed out in Chapter 8, adding to the context can oftenrestrict the number of possible correct responses to a single one. Anextension of this is to present a longer passage with several gaps. Thesemay be used to test a set of related structures, such as the articles:

(Candidates are required to write the, a or NA (No Article).)

In England children go to ________ school from Monday toFriday. _________ school that Mary goes to is very small. Shewalks there each morning with ________ friend. One morningthey saw ____________ man throwing ____________ stones and__________ pieces of wood at ____________ dog. ____________dog was afraid of ___________ man.

And so on.The technique can also be used to test a variety of structures.(The text is taken from Colin Dexter, The Secret of Annexe 3.)

When the old man died, ____________ was probably no greatjoy ____________ heaven; and quite certainly little if any realgrief in Charlbury Drive, the pleasantly unpretentious cul-de-sac

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


_____________ semi-detached houses to which he ____________retired.

There can be just a gap, as above, or there can be a prompt for each gap,as in the example below.

Part 5

For questions 56–65, read the text below. Use the word given in capitals at the end ofeach line to form a word that fits in the space in the same line. There is an example atthe beginning (0). Write your answers on the separate answer sheet.

Example: 0 ability


Computers have had the (0) …………. to play chess for many years now, and their (56) …………. in games against the best players in the world has shown steady (57) …………. . However, it will be years before designers of computer games machines can beat their (58) …………. challenge yet – the ancient board game called Go. The playing area is (59) …………. larger than in chess and there are far more pieces, so that the (60) …………. of moves is almost (61) …………. . The game involves planning so many moves ahead that even the (62) …………. calculations of the fastest modern computers are (63) …………. to deal with the problems of the game.

In recent (64) …………. for computer Go machines, the best machine beat all its computer rivals, but lost (65) …………. to three young schoolchildren, so there is obviously still a lot of work to do!





Paraphrase items require the student to write a sentence equivalent inmeaning to one that is given. It is helpful to give part of the paraphrasein order to restrict the students to the grammatical structure beingtested.

UCLES FCE Handbook 1997

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


Thus:1. Testing passive, past continuous form.

When we arrived, a policeman was questioning the bank clerk.When we arrived, the bank clerk ……………………………………….

2. Testing present perfect with for.

It is six years since I last saw him.I ………………………………..……….. six years.


This technique can be used to test a variety of structures. Note how thecontext in a passage like the following, from the Cambridge FirstCertificate in English (FCE) Testpack 1, allows the tester to elicit specificstructures, in this case interrogative forms1.

In the following conversation, the sentences numbered (1) to (6)have been left incomplete. Complete them suitably. Read thewhole conversation before you begin to answer the question.(Mr Cole wants a job in Mr Gilbert’s export business. He hascome for an interview.)

Mr Gilbert: Good morning, Mr Cole. Please come in and sit down. Now let mesee. (1) Which school ………………………………….?

Mr Cole: Whitestone College.Mr Gilbert: (2) And when ……………………………………………………..?Mr Cole: In 1972, at the end of the summer term.Mr Gilbert: (3) And since then what ……………………………………….?Mr Cole: I worked in a bank for a year. Then I took my present job, selling

cars. But I would like a change now.Mr Gilbert: (4) Well, what sort of a job ……………………………………….?Mr Cole: I’d really like to work in your Export Department.Mr Gilbert: That might be a little difficult. What are your qualifications?

(5) I mean what languages ……………………………………….besides English?

Mr Cole: Well, only a little French.Mr Gilbert: That would be a big disadvantage, Mr Cole. (6) Could you tell me

why ……………………………………….?Mr Cole: Because I’d like to travel and to meet people from other countries.Mr Gilbert: I don’t think I can help you at present, Mr Cole. Perhaps you ought

to try a travel agency.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


Multiple choice

Reasons for being careful about using multiple choice were given inChapter 8. There are times, however, when gap filling will not test whatwe want it to test (at least, in my experience). Here is an example wherewe want to test epistemic could.If we have the simple sentence:

They left at seven. They __________ be home by now.

There are obviously too many possibilities for the gap (must, should,may, could, might, will).

We can add context, having someone reply: Yes, but we can’t counton it, can we? This removes the possibility of must and will but leavesthe other possibilities.

At this point I would think that I could only test the epistemic use ofcould satisfactorily by resorting to multiple choice.

A: They left at seven. They __________ be home by now.B: Yes, but we can’t count on it, can we?

a. can b. could c. will d. must

I would also use multiple choice when testing discontinuous elements.

A: Poor man, he ………………………… at that for days now.B: Why doesn’t he give up?

a. was workingb. has been workingc. is workingd. had worked

(Why doesn’t he give up? is added to eliminate the possibility of d beingcorrect, which might just be possible despite the presence of now.)

Also, all the above non-multiple-choice techniques can be given amultiple choice structure, but the reader who attempts to write suchitems can often expect to have problems in finding suitable distractors.

Moderation of items is of course essential. The checklist included inChapter 7 should be helpful in this.

Scoring production grammar tests

Gap filling and multiple choice items should cause no problems. Theimportant thing when scoring other types of item is to be clear aboutwhat each item is testing, and to award points for that only. There

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


may be just one element, such as subject-pronoun-verb inversion, andall available points should be awarded for that; nothing should bededucted for non-grammatical errors, or for errors in elements ofgrammar which are not being tested by the item. For instance, a candidateshould not be penalised for a missing third person -s when the item istesting relative pronouns; opend should be accepted for opened, withoutpenalty.

If two elements are being tested in an item, then points may be assignedto each of them (for example present perfect form and since with pasttime reference point). Alternatively, it can be stipulated that both elementshave to be correct for any points to be awarded, which makes sense inthose cases where getting one element wrong means that the studentdoes not have full control of the structure. For items such as these, toensure scoring is valid and reliable careful preparation of the scoringkey is necessary.

Testing vocabulary

Why test vocabulary?

Similar reasons may be advanced for testing vocabulary in proficiencytests to those used to support the inclusion of a grammar section(though vocabulary has its special sampling problems). However, thearguments for a separate component in other kinds of test may not havethe same strength. One suspects that much less time is devoted to theregular, conscious teaching of vocabulary than to the similar teaching ofgrammar. If there is little teaching of vocabulary, it may be argued thatthere is little call for achievement tests of vocabulary. At the same time,it is to be hoped that vocabulary learning is taking place. Achievementtests that measure the extent of this learning (and encourage it) perhapsdo have a part to play in institutional testing. For those who believe thatsystematic teaching of vocabulary is desirable, vocabulary achievementtests are appreciated for their backwash effect.

The usefulness (and indeed the feasibility) of a general diagnostic testof vocabulary is not readily apparent. As far as placement tests areconcerned, we would not normally require, or expect, a particular set oflexical items to be a prerequisite for a particular language class. All wewould be looking for is some general indication of the adequacy of thestudent’s vocabulary. The learning of specific lexical items in class willrarely depend on previous knowledge of other, specified items. Onealternative is to use a published test of vocabulary. The other is toconstruct one’s own vocabulary proficiency test.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


Writing specifications

How do we specify the vocabulary for an achievement test? If vocabu-lary is being consciously taught, then presumably all the items therebypresented to the students should be included in the specifications. Tothese we can add all the new items that the students have met in otheractivities (reading, listening, etc.). Words should be grouped accordingto whether their recognition or their production is required. A subse-quent step is to group the items in terms of their relative importance.

We have suggested that a vocabulary placement test will be in essencea proficiency test. The usual way to specify the lexical items that may betested in a proficiency test is to make reference to one of the publishedword lists that indicate the frequency with which the words have beenfound to be used (see Further reading).


Words can be grouped according to their frequency and usefulness.From each of these groups, items can be taken at random, with morebeing selected from the groups containing the more frequent and usefulwords.

Writing items

Testing recognition ability

This is one testing problem for which multiple choice can be recom-mended without too many reservations. For one thing, distractors areusually readily available. For another, there seems unlikely to be anyserious harmful backwash effect, since guessing the meaning of vocabu-lary items is something that we would probably wish to encourage.However, the writing of successful items is not without its difficulties.

Items may involve a number of different operations on the part of thecandidates:

Recognise synonyms

Choose the alternative (a, b, c or d) which is closest in meaningto the word on the left of the page.

gleam a. gather b. shine c. welcome d. clean

The writer of this item has probably chosen the first alternative becauseof the word glean. The fourth may have been chosen because of thesimilarity of its sound to that of gleam. Whether these distractors wouldwork as intended would only be discovered through trialling.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


Note that all of the options are words that the candidates areexpected to know. If, for example, welcome were replaced by groyne,most candidates, recognising that it is the meaning of the stem (gleam)on which they are being tested, would dismiss groyne immediately.

On the other hand, the item could have a common word as the stemwith four less frequent words as options:

shine a. malm b. gleam c. loam d. snarl

The drawback to doing this is the problem of what distractors to use.Clearly they should not be too common, otherwise they will not distract.But even if they are not common, if the test taker knows them, they willnot distract. This suggests that the first method is preferable.

Note that in both items it is the word gleam that is being tested.

Recognise definitions

loathe means a. dislike intenselyb. become seriously illc. search carefullyd. look very angry

Note that all of the options are of about the same length. It is said thattest-takers who are uncertain of which option is correct will tend tochoose the one which is noticeably different from the others. If dislikeintensely is to be used as the definition, then the distractors should bemade to resemble it. In this case the writer has included some notion ofintensity in all of the options.

Again the difficult word could be one of the options, although theconcern expressed above about this technique applies here too.

One word that means to dislike intensely is a. growlb. screechc. sneerd. loathe

Thrasher (Internet) believes that vocabulary is best tested in contextand, referring to the first edition of this book, suggests that a better wayto test knowledge of loathe would be:

Bill is someone I loathe.

a. like very muchb. dislike intenselyc. respectd. fear

For the moment, I leave it to the reader to consider whether the provi-sion of context makes an improvement.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


Recognise appropriate word for contextContext, rather than a definition or a synonym, can be used to testknowledge of a lexical item.

The strong wind __________ the man’s efforts to put up thetent.

a. disabled b. hampered c. deranged d. regaled

Note that the context should not itself contain words that the candi-dates are unlikely to know.

Having now presented an item testing vocabulary in context myself,I return to Thrasher’s suggested improvement. It could be argued that,since learners and language users in general normally meet vocabularyin context, providing context in an item makes the task more authenticand perhaps results in a more valid measure of the candidate’s ability.The context may help activate a memory of the word, in the same wayas meeting it when reading in a non-test situation. It may also be saidthat there could be some negative backwash when words are presentedin isolation. However, when we test vocabulary by means of multiplechoice, the range of possible distractors will be wider if words arepresented in isolation. In Thrasher’s item, I suspect that the difference inlength between the first two and the second two options would encour-age candidates who don’t know the word to choose a or b, therebyincreasing the possibility of a correct response by guessing. I have toadmit that I know of no systematic research that has compared testperformance on vocabulary items with and without context.

Testing production ability

The testing of vocabulary productively is so difficult that it is practicallynever attempted in proficiency tests. Information on receptive ability isregarded as sufficient. The suggestions presented below are intendedonly for possible use in achievement tests.

PicturesThe main difficulty in testing productive lexical ability is the need tolimit the candidate to the (usually one) lexical item that we have inmind, while using only simple vocabulary ourselves. One way roundthis is to use pictures.

Each of the objects drawn below has a letter against it. Writedown the names of the objects:

A …………………………………………………………..

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


B ………………………………………………..…

C …………………………………………………………..

D ………………………………………………….

E …………………………………………………..

F …………………………………………….…….

This method of testing vocabulary is obviously restricted to concretenouns that can be unambiguously drawn.

DefinitionsThis may work for a range of lexical items:

A ………………. is a person who looks after our teeth.

…………………… is frozen water.

…………………… is the second month of the year.

But not all items can be identified uniquely from a definition: any defi-nition of say feeble would be unlikely to exclude all of its synonyms.Nor can all words be defined entirely in words more common or simplerthan themselves.

Gap fillingThis can take the form of one or more sentences with a single wordmissing.

Because of the snow, the football match was _______________until the following week.




of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing for language teachers


I ____________ to have to tell you this, Mrs Jones, but yourhusband has had an accident.

Too often there is an alternative word to the one we have in mind.Indeed the second item above has at least two acceptable responses(which was not intended when it was written!). This problem can besolved by giving the first letter of the word (possibly more) and even anindication of the number of letters.

I r___________ to have to tell you …

or I r _ _ _ _ _ to have to tell you.

Again, moderation of items is necessary and the checklist in Chapter 7can be used, possibly with minor modifications.


This chapter should end with a reminder that while grammar and vocab-ulary contribute to communicative skills, they are rarely to be regardedas ends in themselves. It is essential that tests should not accord them toomuch importance, and so create a backwash effect that undermines theachievement of the objectives of teaching and learning where these arecommunicative in nature.

Reader activities

Construct items to test the following:

� Conditional: If …. had …., …. would have …. .� Comparison of equality.� Relative pronoun whose.� Past continuous: … was -ing, when … .

Which of the techniques suggested in the chapter suits each structurebest? Can you say why?

Can you see anything wrong with the following multiple choice itemstaken from tests written by teachers (use the checklist given as Table1 in Chapter 7)? If so, what? Try to improve them.

a) I said to my friend ‘ ________ be stupid.’Isn’t Aren’t Didn’t Don’t be

b) What ________ you do, if your car broke down?must did shall

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms

Testing grammar and vocabulary


c) You are too thin. You should eat……………………many more a few

d) – I’m sorry that the child saw the accident.– I don’t think it matters. He soon ___________ forgetting forgets will forget will be forgetting

e) People _____________ in their reaction to the same stimulus.replace vary upset very

Produce three vocabulary tests by writing three items for each of thefollowing words. One set of items should be multiple choice withoutcontext; one set should be multiple choice with context; the third setshould be gap filling. Give each test to a different (but comparable)group of students. Compare performance on items testing the sameword. Can differences of performance be attributed to a difference intechnique?

beard sigh bench deaf genialtickle weep greedy mellow callow

(If the words are inappropriate for your students, replace them withothers.)

Further reading

For a highly detailed taxonomy of notions and functions and their gram-matical and lexical realisations, see van Ek and Trim (2001a, b and c). Ihave also found Collins Cobuild (1992) useful in writing specifications. Athorough study of vocabulary assessment (going beyond testing) is Read(2000). It includes methods of assessing both size (breadth) and quality(depth) of knowledge. Read and Chapelle (2001) proposes a frameworkfor vocabulary assessment. A new book of word frequencies is Leech et al(2001). It gives information for spoken and written varieties of English.West (1953) is a standard word list of high frequency words learnersshould know. Collins COBUILD English Language Dictionary and theLongman Dictionary of Contemporary English mark words according totheir frequency in the language.

1. This technique is no longer used in the FCE.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:52, subject to the Cambridge Core terms


This chapter begins by suggesting a general approach to tests for younglearners. It then goes on to consider the particular requirements of suchtests. Finally it recommends suitable testing techniques.

General approach

While in some countries, for example Norway, children have been learn-ing foreign languages at primary school for decades, in recent years it ishas become an increasingly common phenomenon in many other partsof the world. This chapter considers the particular requirements for thesuccessful testing of young learners and makes suggestions as to howthis may best be done. By young learners we have in mind children agedfrom about five to twelve.

One might ask first why we have to test young language learners atall. This is a good question. Not everyone does it. In Norway, forexample, where the learning of English appears to be highly successful,children up to the age of thirteen are not formally tested in the subject.One answer to the question might be that we want to be sure that theteaching programme is effective, that the children are really benefitingfrom the chance to learn a language at an early age. But this invites afurther question: Why is testing rather than assessment by other meansnecessary? The answer I gave in Chapter 1 was that there was a need fora common yardstick, which tests give, in order to make meaningfulcomparisons. I have to confess, however, as someone who has spent alot of his time either testing or advising others on testing, that I feeluneasy at the thought of the damage to children’s learning, and theirattitude to learning, that might be done by insensitive, inappropriatetesting. This uneasiness is not lessened by the knowledge that the aimsof early language teaching typically include the development of positiveattitudes to language learning and to language. But people do test younglearners and this being so, I believe it is worthwhile considering what isthe best way to do this.

15 Tests for young learners

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


On a more positive note, it seems to me that if young children aregoing to be tested, such testing provides an opportunity to developpositive attitudes towards assessment, to help them recognise the valueof assessment. In order to take advantage of this opportunity, I wouldmake three general recommendations that, together, amount to anapproach to such testing. The first recommendation is that a specialeffort be made to make testing an integral part of assessment, andassessment an integral part of the teaching programme. All three shouldbe consistent with each other in terms of learning objectives and, as faras possible, the kinds of tasks which the children are expected toperform. Testing will not then be seen as something separate from learn-ing, as a trial that has to be endured.

The second recommendation is that feedback from tests (and feed-back from assessment generally) should be immediate and positive. Bybeing immediate its value will be maximised. By telling children notonly what their weaknesses are but also what they have done well, thepotential demoralising effect of test results is lessened.

The third recommendation is that self assessment by the children bemade a part of the teaching programme. This will help them to developthe habit of monitoring their own progress. It should also allow them totake pleasure in what they are achieving. To improve their ability to assessthemselves, they should be encouraged to compare their own assessmentswith those of their teacher. On the following page is an example of testmaterial produced by the Norwegian Ministry of Education for 11–12year olds (Hasselgren 1999). Pupils complete this form after doing anassessment task on reading.

These three recommendations and their intended outcomes may seemsomewhat idealistic, but before rejecting them one has to consider thealternative; by default, this is to instil negative attitudes towards testsand, through them, to language learning.

Particular demands

Although we want children to take tests in a relaxed setting, this does notmean that we should relax our own standards for test development. Westill need to make sure that our tests are valid and reliable1. And the needto seek positive backwash is more important than ever. It would not beappropriate to recapitulate here the advice given earlier on how to maketests valid and reliable, and have a beneficial backwash. It is worthsaying, however, that crucial elements are the writing of full specifica-tions and the choice of appropriate test techniques.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Before considering specific techniques, let us ask what it is aboutyoung learners that might require their test to have special features.

1. Young children have a relatively short attention span. For this reasontests should not be long. Individual tasks should be brief and varied.If necessary, what would for other learners have been a single testcan be broken down into two or more tests.

2. Children enjoy stories and play. If we want them to become engagedin tests, the tasks should reflect this. Games can include versions ofthe kind of word games to be found in comics and puzzle books.

3. Children respond well to pictures, attractive typography, and colour2.Tests should include these features if possible. With computers,

EPISODE 1 Try to answer these questions. Put crosses.

Did you … yes mostly so-so not really no

understand what to do?

understand the texts?

have enough time?

do the tasks well?

like the tasks?

manage to guess what new words meant?

Were any texts difficult to understand?

no yes (write the numbers)





What have you learnt?





of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


colour printers and inexpensive scanners generally available, there isusually no reason why they can’t be included. It goes without sayingthat the content of all pictures used should be unambiguous for all thechildren who may take the test. This might involve testers in checkingthat children with different cultural backgrounds are familiar with theconventions (such as the different kinds of bubble for speech and forthought) that are used in the test pictures. Pictures may be includedeven where they are not necessary to complete a task.

4. First language and cognitive abilities are still developing. Tasks shouldbe ones that the children being tested could be expected to handlecomfortably in their own language.

5. Since children learn through social interaction, it is appropriate toinclude tasks that involve interaction between two or more children.This assumes, of course, that similar tasks are used when they arelearning the language.

6. If teaching and learning involve tasks which are ‘integrated’ (in thesense that two or more skills are involved in its completion), similartasks may have a place in tests. However, these are not so suitablewhere diagnostic information about separate skills is being sought.

One final recommendation is that every effort be made to create theconditions that allow the children to perform at their best. This means,I think, that they should be tested by sympathetic teachers whom theyknow and in surroundings with which they are familiar. It is particularlyimportant with children to make sure at the outset that they understandwhat they have to do. It is also important to include easy tasks at thebeginning of a test in order to give them the confidence to tackle themore difficult ones.

Recommended techniques3

In what follows I have concentrated on techniques that seem particularlysuited to young learners. This does not mean that techniques presentedin previous chapters will never be appropriate. The older the childrenare, the more likely they are to respond well to techniques used withteenagers or adults. Whatever techniques are used with young learners,it is essential that the children have plenty of opportunities to practisewith them before they meet them in tests. Ideally, the techniques shouldbe used in learning exercises as well as in testing.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


The children hear:

A: Look at the picture. Listen to the example. Listen and look4.B: Put the book on top of the bookcase.C: Pardon?B: Put the book on top of the bookcase.A: This is an example. Can you see the line? Now you listen

and draw a line.B: Put the socks on the bed.C: Where?B: Put the socks on the bed.

Techniques to test listening

Placing objects or identifying people

The children see a picture with objects placed outside its frame. Theyhave to draw lines to show where the objects are to be placed.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


An alternative is to have a drawing of children involved in a variety ofactivities. Outside the picture are the names of a number of children.The children hear something like:

A: I’m looking for Mary.B: Mary? She’s painting a picture over there.A: Is that her in the corner?B: Yes.

They have to draw a line from the word ‘Mary’ to the representation ofher in the picture.

Multiple choice pictures

The children see four pictures, under each of which there is an emptybox. They have to tick the box beneath the appropriate picture. Forexample there may be pictures of four fruits. What the children hearmay be as simple as:

It’s an apple.

Or it could be a brief dialogue:

A: Did you say Harry was eating an orange?B: No, it was an apple.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Colour and draw on existing line drawing

The following example is taken from a Cambridge Young Learners samplepaper. The children see:

Listen and colour and draw. There is one example.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


They hear:

A: Look at the fish under the ducks.B: I can see it. Can I colour it?

A: Yes, colour it red.B: The fish under the ducks – colour it red.


A: Now draw a fish.B: Where?A: Draw a fish between the boat and the box.B: OK. Between the boat and the box.A: And colour it green. Colour the fish green.

Information transfer

This will usually involve some simple reading and writing. For example,there may be a chart:

Name: John Thomson

John’s best friend: ……………………………………………………………………..

Sports: football and …………………………………………………………………….

Where John plays football: at ………………………………………………………

How many goals he scored last week: ………………………………………….

The children may hear an interview with John (or a talk by or aboutJohn), in which they can find the information they need to complete thechart. The interview or talk should include sufficient redundancy andinclude pauses during which answers can be put in the chart. It may beappropriate for the interview or talk to be repeated.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Techniques to test reading

Multiple choice

Reading can be tested by multiple choice in the usual way or, probablybetter when possible, with pictures. The following example of the latteris taken from EVA materials5.

Find the suspect

The newspaper article tells usabout three men who were seennear the circus last night. They are‘suspects’.

Three of the pictures here showthe suspects. Try to find them.

Put a ‘1’ under the first mandescribed in the article, a ‘2’ underthe second and a ‘3’ under thethird.

Baby elephant stolenThree young men were seen late last night nearthe circus.The first was a bald-headed young man, who wasseen around ten o’clock. The man was wearing along grey raincoat, and gave an angry look to alady as she went past him.The second was a blond-haired man, wearing astriped scarf. He was seen cycling away from thecircus at midnight.The third was a tall young man with long darkhair, wearing a leather jacket. He was seen aroundone o’clock, pushing a wheelbarrow.The police would like to speak to these three men.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Multiple choice can also be used in the context of a conversation, inter-view, or discussion (transcribed as if in, say, a school magazine). Thechildren have to choose the most appropriate response to somethingthat is said. Thus, for example:

The pop star Billy Wild returns to his old school and is askedquestions by members of a class.

Mary: Whose lessons did you like best when you were here,Billy?

Billy Wild: a. Mr Brown’sb. Footballc. Historyd. History and geography

And so on.Simple definitions can be made the basis of multiple choice items in

which the chances of correct responses being made by guessing are muchreduced. There may, for example, be ten definitions and a set of fifteenwords (which include ten to which the definitions apply). The childrenhave to identify the correct word and copy it alongside its definition.

The definitions do not have to be formal. For instance, wood may bedefined ‘Doors are often made of this’. Provided that the presentation ofsuch items is attractive (the words may be different colours, for example,and dotted about the page), such items need not be as grim as they mayfirst sound.

Before leaving the testing of reading, it is worth saying that the shortanswer technique (Chapter 11) can also be used successfully, providedthat the words for the correct responses can be found in the text.

Techniques to test writing

Anagram with picture

To test vocabulary and spelling, children can be presented with a ‘puzzle’.There is a series of pictures, and opposite each picture is an anagram ofthe word the picture represents.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Cartoon story

A series of cartoons tell a simple story.

__ __ __ __ __ __ __ __ letters s e t r s o r u

__ __ __ __ __ letters o s h r e

The instructions are:

Look at the pictures. See what happens. The girl in the picturesis called Sally. Sally writes a letter to her friend David. She tellshim what happened.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Gap filling with pictures

This technique may test reading as much as writing. A passage (perhapsa story) is presented in which there are blanks where words are missing.Above each blank there is a pictorial representation of the missing word.

Dear David_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Best wishesSally

I live in a small by the sea. Every day I go for

a swim. One day, when I came back after a swim I saw a big

. In its mouth was a big .

Here is her letter. Write what she says to David.

. . . and so on. Drawings need not be restricted to objects but can alsorepresent actions.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Techniques for testing oral ability

The same general advice for oral testing given in Chapter 10 appliesequally to the testing of young learners. What is worth emphasising,perhaps, is the need for a long enough warm-up period for the childrento become relaxed. In the case of the youngest children, it may behelpful to introduce toys and dolls from the outset.

Useful techniques include:� Asking straightforward questions about the child and their family.� Giving the child a card with a scene on it (a ‘scene card’), and then

asking them to point out people, say what colour something is, whatsomeone is doing, etc.

� Giving the child small cards, each with an object drawn on it, andasking the child to place each of these ‘object cards’ in a particularlocation on a larger scene card. For example, the child may be handeda small card with a picture of a cup on it and be asked to put the cupon the table (which appears on the scene card).

� Giving the child two pictures that are very similar but which differ inobvious ways (for example, one picture might contain a house withthree windows and a red door, with a man in the garden; while the othermight have a house with four windows, a green door and a woman inthe garden). The child is asked to say what the differences are.

� The child is given a short series of pictures that tell a story. The testerbegins the story and asks the child to complete it.

� Sets of pictures are presented. In each set there is one picture whichdoes not ‘belong’. There may, for example, be three pictures of articlesof clothing and one of a bed. The child is asked to identify the oddone out and explain why it is different from the others.

Where we want to see how well children can interact with their peers,useful techniques are:� If the two children belong to the same class, each can say a specified

number of things about another classmate, at the end of which theother child has to guess who is being described.

� There are four different picture postcards. Each child is given three ofthem, such that they have two cards in common and one which isdifferent. By asking and answering questions in turn, they have todiscover which pictures they have in common. All the pictures shouldhave some common features, or the task may end too quickly withoutmuch language being used.

� There are two pictures (A and B) which are different but whichcontain a number of objects that are identical. One child is givenpicture A, the other picture B. The first child has to describe an objectin their picture and the other has to say whether it is to be found in

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


their picture. The second child then describes something in theirpicture, and the other responds. This continues until they have founda specified number of objects which are in both pictures.

� The children can each be given a card with information on it. In bothcases the information is incomplete. The task is for them to ask ques-tions of each other so that they end up with all the information.Examples would be diaries with missing appointments, or timetableswith missing classes.

Reader activities

Look at the following activities taken from Primary Colours (Hicks andLittlejohn, 2002). These were not originally devised as testing tasks.

What changes, if any, would you make to them in order to create testtasks that will be reliable and valid?

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Tests for young learners


Further reading

Cameron (2001) is a book on teaching language to young learners,which has a chapter on assessment. Rea-Dickens and Rixon (1997)discuss the assessment of young learners of English as a foreign language.Carpenter et al (1995) describe an oral interview procedure for assess-ing Japanese as a second language. Language Testing Volume 17

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Testing for language teachers


Number 2 (2000) is a special issue on assessing young language learners.Contributions include a general introduction to the area by Rea-Dickens; an account of how foreign language attainment is assessed atthe end of primary education in the Netherlands by Edelenbos andVinjé; a discussion of teacher assessment in relation to psychometrictheory by Teasdale and Leung; a description of the Norwegian materi-als project (referred to in the chapter) by Hasselgren. A handbook andsample papers for the Cambridge tests for young learners can beobtained from the address given on page 73.

1. Attractive as they might seem for young children, true/false and yes/noitems, for example, are no more valid or reliable for them than they are foradults.

2. Unfortunately, it was not possible to include colour in this book.3. I have drawn extensively on the techniques used by Hasselgren in the

Norwegian EVA project, and on those to be found in the CambridgeYoung Learners tests. Of course children aged five are quite different fromchildren aged twelve, and so not all of the techniques given here will beequally appropriate for young learners throughout this age range.

4. There should always be an example item. To save space, however, thesewill be omitted for subsequent techniques.

5. I recognise that this example is marginal – between multiple choice and(very) short answer.

of use, available at from Maynooth University, on 19 May 2018 at 00:20:51, subject to the Cambridge Core terms

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.
Open chat
Hello. Can we help you?