Member Login
E-mail:
Password:

Reset Password

 

 

Vol 4, No. 2 (December 1986)

[article | discuss (0) | print article]

Evaluating A Computer-Adaptive ESL Placement Test

Harold Madsen
Brigham Young University

Abstract:
The purpose of this article is to evaluate one of the first operational computerized-adaptive ESL tests in the United States. Utilizing an item bank of 300 grammar and reading items, the CALT exam was administered to 72 foreign students and later to 42 FL students; subjects also responded to a questionnaire. Results showed an overwhelmingly positive reaction to the computerized tests, but significant negative reaction by most Japanese students. In addition, the CALT was far more efficient than conventional paper and pencil tests.

Evaluating A Computer-Adaptive ESL Placement Test

KEYWORDS: anxiety, CALT, computer adaptive, computerized adaptive, efficiency, item bank, Rasch analysis

The purpose of this paper is to evaluate one of the first operational computerized-adaptive ESL tests in the United States. While a brief description of the examination and its development will be provided, the primary objective will be to assess its impact on students and to compare the test with comparable paper and pencil ESL tests.

Simply computerizing an exam is of course no panacea as far as language testing is concerned (Larson and Madsen 1985). But adaptive tests do hold out the promise of helping to meet two concerns of psychometricians—the need for precision and the desirability of invariant scaling. For a survey of the several advantages and limitations of computerized adaptive tests, see Tung (1986) and Canale (1986). Also, for an in-depth explanation of such tests, see Weiss (1983).

Development of the Computerized-Adaptive Test

A computerized-adaptive language test (CALT) was field tested in December 1985 at Brigham Young University. Covering the grammar and reading objectified segment of the English Language Center Placement Battery, this CALT instrument complements three other sections of the ELC placement battery—the oral interview, the writing section, and the listening comprehension

41

section. (The latter was field tested in a CALT version on August 29, 1986.)

Preparation of the CALT required the development of a carefully calibrated bank of test items for each of the two subtests. This was accomplished by Rasch analyzing the results of paper and pencil placement tests, utilizing the Mediax Microscale program on a Supercalc spreadsheet. Items administered over a three-year period were then linked on a common scale.

The initial item bank (now expanded) consisted of 300 items (180 grammar and 120 reading). With items sorted sequentially from easiest to most difficult, the Smith-Larson mechanical-sort algorithm was utilized to tailor the test for each student. Students were introduced to the test in the mid range of difficulty, moved rapidly in five probes to the appropriate difficulty level, and then terminated when their ability level had been determined. (For a detailed description of the test and its development, see Madsen and Murray 1986.)

Evaluation of the Computerized Adaptive ESL Test

The December 1985 field test was administered to 72 foreign students as a Promotion Test. Thirty-seven computers in three locations were utilized. This administration was not formally evaluated, however, since all those taking the test had received computer-assisted instruction at BYU. Students were very relaxed and the computer-adaptive test was accepted as a pleasant novelty. Two or three students asked (and received) permission to take photographs of the event.

It was decided that a more legitimate occasion for evaluating the test would be an administration to incoming students who had not had previous computer instruction or computer testing. This took place in January 1986 at the BYU Continuing Education computer lab.

1. Subjects

The 42 foreign students involved in the January CALT administration were newly arrived in the United States: 17 from Japan, 17 from Spanish-speaking countries, and 8 from other language backgrounds—3 Portuguese, 2 Chinese, 1 Vietnamese, 1 Indonesian, and 1 Thai (Table 1). There were 17 males and 25 females, their average age being 23 (ranging from 17 to 38). While some of the students had had a little experience with computers, none had received academic instruction on a computer.

42

2. Instruments

All students were administered a computerized-adaptive grammar test and a computerized adaptive reading test. The grammar test consisted of multiple-choice items with an incomplete sentence stem and four options. Students selected the option that best completed the sentence. The reading test consisted of multiple-choice sentence paraphrase items. Students selected one of three options as the best paraphrase of the sentence stem. Instructions and examples preceded each subtest. Students selected the best response and keyed this into the computer.

After exiting from the two subtests, students completed a nine-item CALT questionnaire. Seven five-level Likert items covered the following topics: 1) previous computer experience, 2) emotive reactions prior to the test, 3) emotive reactions following the test, 4) relative difficulty of the two CALT subtests--grammar and reading, 5) clarity of instructions, 6) evaluation of the length of the test, 7) clarity of the computer screen. In addition, two open-ended items invited written responses on the following: 8) Tell what you did NOT like about the computer tests (if anything), and 9) Tell what you liked the MOST about the computer tests (if anything).

3. Procedure

In advance of the CALT administration, color tabs were placed on certain keys of the 15 IBM PC's utilized in this evaluation: namely, on the keys used for answering—A, B, C, D—and on "return," as well as on the left and right arrow keys, which facilitated the changing of a response prior to their pressing "return." In addition, a bilingual Spanish speaker and a bilingual Japanese speaker were present to offer explanations and answer questions. Initial students were provided a group demonstration of CALT testing procedures; but after the first group was evaluated, this was discontinued when it was found that students could cope with computer instructions on the monitor. Students proceeded at individual rates through instructions on the computer, the grammar test, and then the reading test. Following the reading test, students exited from the computer and then completed the questionnaire. Those who desired to do so were allowed to complete the open-ended questions in their native language.

After the test administration, results were printed out. This provided not only information needed for item analysis and student placement but also the amount of time students spent on each of the two subtests.

43

4. Results and Discussion

CALT printouts of student performance provided not only placement results for each individual and demographic information but also the item responses and time spent on each subtest. The item response information provides comparative information both on the number of items attempted and on the amount of time spent.

First, a look at the number of items attempted, by subtest: On the conventional paper-and-pencil subtest, there are 60 items. But the average number of CALT grammar items needed to place the student was 20.4 (Table 2). Two-thirds of the 42 students attempted from 14 to 27 items. Substantially more than 80 percent of the students required fewer than 50 percent of the items normally found on the grammar test. The average student required only one-third as many items to complete the CALT as they did to complete the conventional test, The fewest items needed by any student was 9; the most was 31. In short, there was a substantial reduction in the number of items needed for placement purposes.

On the conventional low-battery reading test, 60 sentence-paraphrase items are presented. On the conventional high-battery, there are 45 items (30 sentence-paraphrase and 15 passage comprehension). The CALT test, we will recall, utilized only sentence-paraphrase items in this initial form of the test. The mean number of items attempted on the CALT reading subtest was 22.8.Two-thirds of the students attempted between 16 and 29 items. The fewest needed for placement was 12. As on the grammar test, over 80 percent of the students required fewer than 50 percent of the reading items normally administered on the paper and pencil test. The average student needed just over one-third (38 percent) of the items required on the conventional (low battery) reading subtest.

Of considerable interest was the amount of time needed to complete each CALT subtest. Over the years, students had been found to need 25 minutes to complete the paper-and-pencil grammar section. But the average time spent on the CALT was 16.9 minutes. Students ranged from a surprisingly brief 3.3 minutes to 32.7 minutes (Table 3). (No time limit was imposed.) For two-thirds of the students, the range was roughly 10 to 24 minutes.

On the conventional reading exam, students have found it challenging to finish in the 40 minutes presently allowed. On the CALT version (with no time limit imposed), the mean time spent was 27.2 minutes, with a range from just 7.3 minutes to a leisurely 53.4minutes. Two-thirds ranged from 16 to 39 minutes.

44

Turning to the affect questionnaire, we note first of all several differences in terms of language background. Almost half of the Japanese had had some training or much experience on computers, while just under one-fourth of the Spanish speaking students checked these categories. Yet a chi square analysis disclosed that these more "computer-experienced" Japanese were significantly more nervous prior to the CALT test than were the Spanish speakers. Japanese students likewise reported more anxiety immediately following the test as well as more negative reactions to exam instructions, test length and clarity of computer monitors than did Spanish speakers (Table 4).

While 82 percent of the Japanese students indicated the exam took longer than they had expected, only 29 percent of the Spanish examinees agreed with this observation. (Among these two language groups, 6 Japanese speakers but only I Spanish speaker thought it took "much more time" than expected.) Responding to the question "How well could you read the test questions on the computer screen?" 53 percent of the Spanish speakers said they could read it better than they could printed tests, but only 12 percent of the Japanese agreed. (Among these students, 5 Spanish speakers indicated they could read the test .. much better" than they could usual tests, but no Japanese speakers agreed with this observation.) Conversely, 65 percent of the Japanese speakers found the computer monitor more difficult to read than they did conventional printed tests, but only 18 percent of the Spanish speakers encountered this same difficulty. Finally, 71 percent of the Spanish speakers reported that they had been happy with or interested in the prospect of being tested via computer, but no Japanese shared this same enthusiasm for computer testing.

While there were dramatic differences in responses by language groups, there was only one significant relationship between sex of student and reaction to affect questions, namely a concern on the part of women about exam length. And this is attributable to language background, since Japanese women constituted 44 percent of the combined Spanish-Japanese sample.

Examining the responses of all 42 students (Spanish, Japanese, and those of five other language backgrounds), we find a generally positive reaction to the computerized adaptive test although there were some from all language groups with a few reservations about this innovative testing procedure.

When asked, "How did you feel when you were told that you would have your English tested on a computer?" approximately one-fifth indicated they felt about the same as when about to take a conventional English test. A third felt more nervous (though only 16 percent of the non-Japanese indicated such

45

trepidation). And 45 percent were less nervous (two-thirds of the non-Japanese being less anxiety prone). Immediately after they had completed the test, students' positive reactions to computerized evaluation were even more pronounced. An impressive 81 percent of the combined language groups felt the experience was positive—better than that associated with paper-and-pencil tests.

While a solid majority (70 percent) felt the instructions were as clear as or clearer than those on traditional language exams, 31 percent (16 percent of non-Japanese) found them less clear.

Since the test took significantly less time to complete than the conventional one had, it was surprising to find that 50 percent of the students (28 percent of the non-Japanese) said the computer test took longer than they had anticipated. Negative reactions to the monitor were likewise unanticipated. Responding to the question, "How well could you read the test questions on the computer screen?" 38 percent of all the students (20 percent of the non-Japanese) indicated they had more difficulty reading questions on the monitor than they did the questions on printed tests.

Open-ended responses also dealt with the problem of perceiving the letters and words on the monitor. In fact, this was the most frequent criticism: Fourteen students explained the difficulty they had in reading the monitors (10 Japanese, 2 Chinese, I Spanish, I Portuguese); only one person volunteered a statement indicating that the screen was clear and easy to read.

Three students criticized their not being able to return and check their answers. This is not an inherent CALT difficulty, however; an experimental CALT TOEFL, presently in a preliminary evaluation phase, permits students to review and alter their responses. One person commented that he was not able to demonstrate his writing ability, though writing does constitute part of the non-CALT placement battery for these students. Another was understandably put off by a program bug that caused her test to "crash" and necessitated her starting over.

Positive comments referred to the ease of taking the computer test, the speed of the computer test, and the relative absence of stress during the exam. Person after person commented to the effect that "it's simple; you just press buttons." Others noted that the computer tests "saved time." And a Thai student said that taking the test was almost like "playing with a toy."

Widespread interest in computers seemed to contribute to the positive affect: Several spoke of the test's being "practical" or interesting, attributing their response to this new and useful exposure to the computer. Although not unanimously positive (a Japanese student lamented "I don't like the tests of

46

computer"), many seemed genuinely enthusiastic about the computer test: "a very good system," "I like the experience in the computer," "a good way to test us," "everything is very interesting," and "I like all about the computer tests."

Finally, a few observations from the perspective of the CALT test administrator: First, the problem of cheating was essentially eliminated. An occasional student would lean back in his seat for a view of an adjacent monitor and then naively report to proctors that his neighbor had different questions than he did. Second, the CALT version eliminated scoring fees and scoring delays. Third, preliminary responses from teachers and supervisors indicate that placement was accurately facilitated.

But the administration was not entirely trouble free. An unanticipated difficulty arose with the use of borrowed computers. While all the PC's tested out satisfactorily in advance, some failed to work properly during the actual test administration; and as a result, a few students had to start over on another machine, Moreover, while most students took less time to complete the CALT than they did conventional tests, there were those who took an inordinate amount of time on this untimed instrument, notably on the reading segment. A time limit will need to be incorporated on the revised CALT version.

5. Conclusions

Initial evaluation of the CALT administration indicates that rapid, accurate placement is facilitated, together with substantially reduced testing time for most students. In addition, reactions to the computerized format were quite positive, with over 80 percent of the examinees indicating their CALT experience as more positive than that on conventional ESL tests. But the reservations of a minority of students argue that either further refinement of the program be carried out or that a non-computer testing option be made available for those desiring to take a standard paper-and-pencil test.

Subsequent research will attempt to verify earlier findings (McBride and Martin 1983, 230-232) that high reliability characterizes the computerized adaptive language test, despite the substantial reduction in the number of items presented.

In summary, microcomputer programs utilizing the one-parameter Rasch model hold out the promise of providing accurate adaptive or tailored tests that are less stressful for most students as well as more efficient measures of language assessment.

47

References

Canale, Michael. 1986. "The Promise and Threat of Computerized Adaptive Assessment of Reading Comprehension." In Technology and Language Testing, Charles W. Stansfield (Ed.), 29-44. Washington, D.C.: TESOL.

Larson, Jerry W. and Harold S. Madsen. 1985. "Computerized Adaptive Language Testing: Moving Beyond Computer-Assisted Testing." CALICO Journal 3 (2): 3236, 43.

Madsen, Harold S. and Norma Murray. 1986. "Implementing Computerized Tailored Language Tests in an ESL Center." Paper presented at the Twentieth Annual TESOL Convention, Anaheim, California, March 4, 1986.

McBride, James R. and John T. Martin. 1983. "Reliability and Validity of Adaptive Ability Tests in a Military Setting." In New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing, David J. Weiss (Ed.), 223-236. New York: Academic Press.

Tung, Peter. 1986. "Computerized Adaptive Testing: Implications for Language Test Developers." In Technology and Language Testing, Charles W. Stansfield (Ed.), 11-28. Washington, D.C.: TESOL.

Weiss, David J. (Ed.) 1983. New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing. New York: Academic Press.

Biodata

Author of the text Techniques in Testing, Professor Madsen has done extensive consulting on language testing. His most recent research interest is in IRT testing applications, ranging from computerized-adaptive exams to detection of item bias.

Author's Address

Dr. Harold Madsen

2129 JKHB

Brigham Young University

Provo, UT 84602

48

Demographic Information on 42 Subjects

Evaluating CALT

Sex

Language Background

Spanish

Japanese

Other

(total)

Male

10

2

5

17

Female

7

15

3

25

(total)

17

17

8

42

Table 1

Comparative Number of Items Attempted on CALT and Conventional ESL Test

Subtest

CALT

Conventional Test

Grammar

mean

SD

range

20.4

6.8

9-31

60 items

Reading

mean

SD

range

22.8

6.6

12-31

60 items (low battery)

45 items (high battery)

Table 2

Comparative Time spent on CALT and Conventional ESL Test

Subtest

CALT

Conventional Test

Grammar

mean

SD

range

16.91 min.

7.27

3.28-32.73

time limit 25 min.

Reading

mean

SD

range

27.23 min.

11.29

7.3-53.43

time limit 40min.

Table 3

49

Questionnaire Relationships Involving Computer Experience, Language Background (Spanish and Japanese), and Sex

Independent Variable

Dependent Variable

Chi Square

df

Sig.

Experience

Feelings before exam

Feelings after exam

Reactions to instruct's

.89

.77

.08

2

2

1

NS

NS

NS

Language

"

"

"

"

Feelings before exam

Feelings after exam

React. to instruct's

React. to test length

React. to comp. screen

12.98

5.96

4.64

10.05

9.14

2

2

1

2

2

.01

NS+

.05

.01

.02

Sex

"

"

"

"

Feelings before exam

Feelings after exam

React. to instruct's

React. to test length

React. to comp. screen

.70

1.29

.03

7.63

2.87

2

2

1

2

2

NS

NS

NS

.05

NS

TABLE 4

50