[article | discuss (0) | ]
Ilse Van Achter,
Ann Van Walle,
Kurt Van Deun,
and Jozef Colpaert
University of Antwerp, Belgium
By order of the Civil Service Commission of Belgium, Didascalia, a research group at the University of Antwerp, has developed an adaptive system for evaluating second language proficiency. Therefore we have investigated to what extent language proficiency can be measured by computer. We created an adaptive model based on the following two principles. First we ranged the test items on a difficulty scale, using the ponderation and calibration of their distinctive features. Secondly, the test places the examinees on a proficiency scale. Therefore we developed a model based on probability and information theory. The first module of the ATLAS test is currently in use and assesses Approximately 15,000 civil servants for selection and promotion yearly.
By order of the Belgian Civil Service Commission, Didascalia has developed a system to test the second language proficiency by means of computer: the ATLAS test, which stands for Adaptive Testing of Linguistic Achievement Significance.
Due to the bilingual situation in Belgium, civil servants working in Brussels (or in municipalities with linguistic facilities) have to prove their active proficiency of the second national language (either Dutch or French). When taking into account only the language tests, the Civil Service Commission tests over 15,000 candidates a year.
Therefore, the system had to meet the essential conditions of fast, objective and secure testing. Moreover, it had to deal with a rather heterogeneous public.
ATLAS: A COMPUTERIZED TEST
Essentially, we can only test with the computer that part of language competence that can be described by objective rules. Language competence, as defined by Lyle Bachman, contains the following aspects from bottom to top (Bachman, 1990):
• Grammatical competence which contains phonology and graphology, vocabulary, morphology and syntax;
• Textual competence which contains rhetorical organization, i.e. the ability to structure sentences into a text;
• Elocutionary competence, which contains the language functions, i.e. the intention of the speaker;
• Sociolinguistic competence which contains register and style. This is the sensitivity to adapt language to the context and situation.
Computerized testing needs to apply a bottom-up approach, for the higher we go, the less objective and explicit the contents and rules become. ATLAS focuses mainly on grammatical and textual competence. The ATLAS test can be considered as a battery of tests, because it consists of six modules (Figure 1).
ATLAS: AN ADAPTIVE TEST
In order to test more than 15,000 candidates a year efficiently, the test has to meet some essential conditions.
• First of all, the test has to proceed rather quickly (as well for doing the test as for correcting it). The program therefore quickly eliminates irrelevant questions, which are time-consuming, by immediately determining the candidate's level and concentrating on relevant items.
• The test can be used by a heterogeneous public because the database contains items of 100 levels, including specific terminology of certain professions (such as law terminology, police and medical vocabulary).
• Last but not least, the test has to be secure, i.e. the test items may not be passed on from one person to another.
To meet these criteria, we developed an adaptive test which essentially screens the actual knowledge and not the gaps in the knowledge. This kind of test is what we call a tailored test.
Essentially, an adaptive test adapts the difficulty of the test to the level of the testee, which presupposes (a) the principle of item difficulty degree and (b) the determination of the proficiency level of the testee (the psychometric model).
The current explanation focuses on the first test, the vocabulary test.
Difficulty degree of the test items
Our method for placing the items on the difficulty scale is based on the specific features of the items; they form the parameters of the calculating process. This process contains the following stages:
• In the first place, we had to define the features of the lexical items: we distinguish the formal features (orthographic/phonetic appearances, vowel and consonant dusters, transparency) and the linguistic features (semantic field, contrastive linguistic features, semanticity, morphological differences, frequency).
• Whereas the formal features are automatically determined, the linguistic features need to be coded manually. This is what we call formatting the database; each item is preceded by a set of codes. The final database contains 8700 items. (See Figure 2.)
• In the calibration codes within each parameter received a value from 0 to 100 according to their degree of difficulty. For example, within the parameter of frequency the highest frequency values stand for the easiest words and get a low value in the calibration.
• In the scale we added up all the codes of a word in order to calculate its degree of difficulty. But where in the previous stage all codes were set on the same scale from 0 to 100, they are now reduced according to the importance of the parameter. The higher the importance of a particular parameter, the higher its multiplication factor., e.g., the parameter of frequency and transparency are of a higher importance than semantic field and semanticity.
Our adaptive system is flexible. During the testing phase we registered the answers and errors made by the candidates. The initial degree of difficulty of the items is continuously readjusted through the analysis of the errors.
The Psychometric model
Besides determining the degree of difficulty, the adaptive model has to determine the proficiency level of the candidate.
Three "keystones" form the foundation of our psychometric model:
• As a basic axiom we state that the-re is one common scale for test items and testees. So the difficulty degree of test items and the proficiency level of testees should coincide in the following way. We assume that if testee A answers correctly an item of difficulty degree X, he also belongs to proficiency level X or possibly to a higher level Y.
• Secondly, by measuring a latent trait such as language proficiency, which is not tangible, one cannot reach a certainty of 1-00% that the test score, level X, coincides with the real proficiency of the testee. Therefore our model is probabilistic. In the beginning the probability is equally spread across the scale. During the test we want to effect a maximal change in this distribution so that at the end one level or a small area reflects the highest possible degree of probability.
• The third keystone we de-rive from information theory: namely the uncertainty value H: this symbol stands for the number of bits of information required to have certainty about something. In the beginning H has a maximum value: during the test we have to reduce this value to its minimum because we want to be as confident as possible of the proficiency of the testee.
As the test progresses, there is a higher concentration of probability in a given area and a decreasing uncertainty of H.
ATLAS as a computerized, adaptive test guarantees quick, objective and efficient testing, for the following reasons:
• The test can be used by a heterogeneous public. The program generates tests on all levels and can be used by various groups, no matter their proficiency level.
Furthermore, part of the database contains items related to specific functions of the civil servants, the so-called functional vocabulary, e.g. police officers, doctors & nurses, officers using law terminology, etc.
• It is a representative test. The selection of the items scans all semantic fields.
• The real semantic knowledge of the examinee is tested, rather than having the semantic gaps detected (as many traditional tests do).
• The extensiveness of the database protects the system from being made public, as each level consists of hundreds of items. Each candidate always receives a different test.
• ATLAS measures the different aspects of grammatical and textual competence. These aspects are important; without a solid knowledge of the words, forms and structures one cannot be fluent in a second language. So ATLAS plays an essential part by measuring those conditions necessary for second language competence.
• On the other hand ATLAS does not exclude the examiner. Global language competence exceeds the knowledge of the individual elements. One can add a more global test form such as an oral interview or an essay. It will be very interesting to consider the correlation between the score in the global test and the score in the ATLAS testing battery.
Research related to this article is carried out under an IUAP-project financed by the Belgian State.
Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford University Press, Oxford.
Nancy Schonenberg, Ilse Van Achter, Ann Van Walle, Kurt Van Deun, are Assistants at the University of Antwerp, where they develop language courseware and language tests; Wilfried Decoo, Professor at the Department of Didactics, promotes Didascalia; Jozef Colpaert is the General Manager of Didascalia.
University of Antwerp (UIA)
B-2610 Antwerp, Belgium
Phone: + 32 3 820 2959
Fax: + 32 3 820 2986