Differential item functioning (DIF) occurs when individuals of the same true latent ability or psychological trait from different demographic populations are found to have different chances of endorsing an item category.
The ability to identify such items depends on many factors, including the sample size of each demographic group, average true latent trait scores in each group, the chosen DIF assessment method, the magnitude of DIF effect and the quality of the anchor set.
An anchor is a group of items free of DIF, which establish a common metric between groups. If the anchor is contaminated, that is, if it contains a DIF item, the common metric is inappropriate. The current literature rarely addresses the relationship between item parameters, anchor selection, and subsequent DIF detection.
In this two-part study, we show that rates of identifying DIF items are higher when the anchor set is made up of highly discriminating items. We also show that DIF items are more easily detected if they have high discrimination and at least moderate difficulty (if using a correctly specified anchor), given a fixed DIF effect size.
These findings reveal a relationship between item characteristics and DIF that has been previously ignored. In practice, this implies that a DIF method that was found effective for a test made up entirely of moderate to highly discriminating items could, in reality, not be appropriate for the test made up of items of low to moderate discrimination, for example. This could lead test designers and DIF researchers to make erroneous recommendations about DIF detection.
Rebouças, D. A., & Cheng, Y. (2019). Relationship between item characteristics and detection of Differential Item Functioning under the MIMIC model. Psychological Test and Assessment Modeling, 61(2), 227–257.