Which of the following indices is most commonly used to represent an item difficulty?

What is item analysis?

Item analysis is a process of examining class-wide performance on individual test items. There are three common types of item analysis which provide teachers with three different types of information:

  • Difficulty Index - Teachers produce a difficulty index for a test item by calculating the proportion of students in class who got an item correct. (The name of this index is counter-intuitive, as one actually gets a measure of how easy the item is, not the difficulty of the item.) The larger the proportion, the more students who have learned the content measured by the item.
  • Discrimination Index - The discrimination index is a basic measure of the validity of an item. It is a measure of an item's ability to discriminate between those who scored high on the total test and those who scored low. Though there are several steps in its calculation, once computed, this index can be interpreted as an indication of the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an item. Perhaps the most crucial validity standard for a test item is that whether a student got an item correct or not is due to their level of knowledge or ability and not due to something else such as chance or test bias.
  • Analysis of Response Options - In addition to examining the performance of an entire test item, teachers are often interested in examining the performance of individual distractors (incorrect answer options) on multiple-choice items. By calculating the proportion of students who chose each answer option, teachers can identify which distractors are "working" and appear attractive to students who do not know the correct answer, and which distractors are simply taking up space and not being chosen by many students. To eliminate blind guessing which results in a correct answer purely by chance (which hurts the validity of a test item), teachers want as many plausible distractors as is feasible. Analyses of response options allow teachers to fine tune and improve items they may wish to use again with future classes.

Performing item analysis 

Here are the procedures for the calculations involved in item analysis with data for an example item. For our example, imagine a classroom of 25 students who took a test which included the item below. The asterisk indicates that B is the correct answer. 

Number of Students Choosing Each Answer Option

Who wrote The Great Gatsby?
A. Faulkner
*B. Fitzgerald
C. Hemingway
D. Steinbeck

4
16
5
0

Total Number of Students

25

Item Analysis Method

Procedures

Example

Difficulty Index- Proportion of students who got an item correct

Count the number of students who got the correct answer. 

Divide by the total number of students who took the test.

Difficulty Indices range from .00 to 1.0.

16 

16/25 = .64

Discrimination Index- A comparison of how overall high scorers on the whole test did on one particular item compared to overall low scorers.

Sort your tests by total score and create two groupings of tests- the high scores, made up of the top half of tests, and the low scores, made up of the bottom half of tests.

For each group, calculate a difficulty index for the item.

Subtract the difficulty index for the low scores group from the difficulty index for the high scores group.

Discrimination Indices range from -1.0 to 1.0.

Imagine this information for our example: 10 out of 13 students (or tests) in the high group and 6 out of 12 students in the low group got the item correct.

High Group 10/13= .77
Low Group 6/12= .50

.77-.50=.27

Analysis of Response Options- A comparison of the proportion of students choosing each response option.

For each answer option divide the number of students who choose that answer option by the number of students taking the test.

Who wrote The Great Gatsby?
A.  Faulkner 4/25 = .16

*B.  Fitzgerald 16/25 = .64

C.  Hemingway 5/25 = .20

D.  Steinbeck 0/25 = .00

Interpreting the results of item analysis 

In our example, the item had a difficulty index of .64. This means that sixty-four percent of students knew the answer. If a teacher believes that .64 is too low, he or she can change the way they teach to better meet the objective represented by the item. Another interpretation might be that the item was too difficult or confusing or invalid, in which case the teacher can replace or modify the item, perhaps using information from the item's discrimination index or analysis of response options.

The discrimination index for the item was .27. The formula for the discrimination index is such that if more students in the high scoring group chose the correct answer than did students in the low scoring group, the number will be positive. At a minimum, then, one would hope for a positive value, as that would indicate that knowledge resulted in the correct answer. The greater the positive value (the closer it is to 1.0), the stronger the relationship is between overall test performance and performance on that item. If the discrimination index is negative, that means that for some reason students who scored low on the test were more likely to get the answer correct. This is a strange situation which suggests poor validity for an item.

The analysis of response options shows that those who missed the item were about equally likely to choose answer A and answer C. No students chose answer D. Answer option D does not act as a distractor. Students are not choosing between four answer options on this item, they are really choosing between only three options, as they are not even considering answer D. This makes guessing correctly more likely, which hurts the validity of an item.

How can the use of item analysis benefit your students, including those with special needs? 

The fairest tests for all students are tests which are valid and reliable. To improve the quality of tests, item analysis can identify items which are too difficult (or too easy if a teacher has that concern), are not able to differentiate between those who have learned the content and those who have not, or have distractors which are not plausible.

If items are too hard, teachers can adjust the way they teach. Teachers can even decide that the material was not taught and for the sake of fairness, remove the item from the current test, and recompute scores.

If items have low or negative discrimination values, teachers can remove them from the current test and recomputed scores and remove them from the pool of items for future tests. A teacher can also examine the item, try to identify what was tricky about it, and either change the item or modify instruction to correct a confusing misunderstanding about the content.

When distractors are identified as being non-functional, teachers may tinker with the item and create a new distractor. One goal for a valid and reliable classroom test is to decrease the chance that random guessing could result in credit for a correct answer. The greater the number of plausible distractors, the more accurate, valid, and reliable the test typically becomes.

References 

Research Articles

Haladyna, T.M. & Downing, S.M. & Rodriguez, M.C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement 
in Education, 15(3), 309-334. 

What is Item difficulty index?

For items with one correct alternative worth a single point, the item difficulty is simply the percentage of students who answer an item correctly. In this case, it is also equal to the item mean. The item difficulty index ranges from 0 to 100; the higher the value, the easier the question.

How do you find the difficulty index?

Determine the Difficulty Index by dividing the number who got it correct by the total number of students. For Question #1, this would be 8/10 or p=. 80.

What is meant by a difficulty index of 1 for a test item?

Item Difficulty Index It can range between 0.0 and 1.0, with a higher value indicating that a greater proportion of examinees responded to the item correctly, and it was thus an easier item. For criterion-referenced tests (CRTs), with their emphasis on mastery-testing, many items on an exam form will have p-values of .

What is a good item discrimination index?

Achieving closer to 1.0 discrimination index is optimal. A discrimination index of 0.3 or greater is considered highly discriminating.