IMPS Virtual Meeting - July, 2020

Motivation

(or lack there of)

Careless Responses in Self-report Data

drawing

  • Potential causes of careless responses
    • Lack of interest in survey content
    • Survey is too long
    • Anonymity & reduced accountability
    • Distractions

Why does data quality matter?

  • Poor data quality may lead to
    • Reduced/increased correlations between subscales
    • Low reliability (internal consistency)
    • Poor fit of measurement models
  • Careless responses are everywhere
    • Prevalence is estimated anywhere between 5%-50% (e.g., Curran et al., 2010; Kurtz & Parish, 2001)

Methods to Detect Careless Responding

(Meade & Craig, 2012)

Some methods to detect careless responses:

Method Example
Long-string analysis {1,1,1,1,1} or {3,3,3,3,3}
Validity item Please select ‘Agree’ for this item
Fit indices \(l_z\) person-fit statistic (Sinharay, 2015)
Response time Total time to complete survey

Methods to Detect Careless Responding

  • Some recent developments:
    • Response times per page (Soland et al., 2019)
    • Mixture modeling approach for response times and response accuracy (Wang & Xu, 2015)
  • Some additional source of information:
    • Response times at the item level
    • Log & meta data
      • Number of clicks per page
      • Changes in answer choice
      • Order in which answers were selected
      • Number of characters in item stem

Response Time Data in Psychological Surveys

  • There is no assumed relationship between the construct and response times
  • Item effects
    • Longer items may elicit longer response times
    • Reverse-coded items may cause longer response times
    • Item position
  • Person effects
    • Reading ability
    • Working memory capacity
    • Familiarity with online surveys
    • Device used to respond

Research Goals

Research Goals

  1. Explore properties of response times at the item and person levels
  2. Compare response times to commonly used methods, e.g. long-string analysis and item response variance analysis.
  3. Provide initial evidence of item-level response time modeling for survey data with using item/person properties.

Personality Data Set (BFI-2)

A High School Sample

Personality Data (BFI-2)

Soto & John (2016)

  • Sample of high school students (N=224)
  • 60-item self-report Likert scale
  • About half Female (51%)
  • Items were randomized
  • Missing data: 8 responses removed
  • Correlation patterns are as expected

Careless Responses Using Item Response Data

First Half of Survey

Some Candidate Careless Responses

(Conscientiousness) Direct Items Reverse-Coded Items Total Response Time (percentile)
Person 1 1,1,1,1,1,1 5,5,5,5,5,5 98.4 (5.5th)
Person 2 4,3,3,4,5,4 1,1,1,3,3,3 139.9 (11th)
Person 3 5,5,4,5,5,5 4,1,2,5,5,2 409.1 (83th)
  • Used careless package

Personality Subscales and Response Times

Item-Level Response Times - Openness

Openness Items & Response Times

  • Openness example items: I am someone who…
    • O3: Is inventive, finds clever ways to do things.
    • O5: Avoids intellectual, philosophical discussions. [R]
    • O7: Values art and beauty.
    • O6: Has little creativity. [R]

Reverse-Coded Items & Response Times

Openness

Item Char Count vs. Response Time

Response Time Modeling

Models for Response Time Modeling

  • Lognormal model (van der Linden, 2006)
  • Box-Cox model (see Patton, 2014)
  • Drawing from an explanatory (IRT) framework:
    • Linear mixed modeling
    • Latent regression modeling
    • (see De Boeck & Wilson, 2016)

Normality of Total Response Times

Descriptive & Explanatory Response Time Models

Descriptive Model - Lognormal Model:

\[log(t_{ip}) = \tau_p - \beta_i + \epsilon_{ip} \]

Item-Explanatory Model - Linear Lognormal Test Model (LLnTM):

\[log(t_{ip}) = \tau_p - \Sigma_{k=0}^K \gamma_k X_{ik} + \epsilon_{ip}\]

where \(X_{ik}\) are item properties and \(\beta_i' = \Sigma_{k=0}^K \gamma_k X_{ik}\).

Person-Explanatory Model - Latent Regression Lognormal Model:

\[log(t_{ip}) = \Sigma_{j=1}^J \zeta_j Z_{ij} + \tau_p - \beta_i + \epsilon_{ip}\]

where \(Z_{ij}\) are person properties and \(Z_{0p} = \tau_p\).

Descriptive Lognormal Model

\[log(t_{ip}) = \tau_{p} - \beta_{i} + \epsilon_{ip} \]

  • \(\beta_{i}\): time-consumingness parameter

  • Random effects:

      Groups   Name        Variance Std.Dev.
       user_id  (Intercept) 0.2265   0.4759  
       Residual             0.5153   0.7178  
      Number of obs: 13110, groups:  user_id, 215

Descriptive Lognormal Model

Item Explanatory

Linear Lognormal Mixed Model

Fixed effects:
                Estimate Std. Error t value
(Intercept)    0.6544715  0.0399847  16.368
char_count     0.0119571  0.0007213  16.577
is_reverseTRUE 0.1251709  0.0126499   9.895


Correlation of Fixed Effects:

            (Intr)  chr_cn
char_count  -0.541
is_rvrsTRUE -0.129 -0.049

Model Comparison

## Data: bfi_long_rev
## Models:
## item.explan: log_time ~ 1 + char_count + is_reverse + (1 | user_id)
## descriptive: log_time ~ -1 + item_name + (1 | user_id)
##             npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
## item.explan    5 29427 29464 -14708    29417                         
## descriptive   63 29291 29763 -14583    29165 251.47 58  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Final Thoughts

  • Response time modeling is an important step for detection of careless responses
  • Item and person properties may help explain variability in response time at the item level
  • Future directions
    • Include both item and person features in the model
    • Perform simulation study to evalute model and power to detect careless responses

Acknowlegments

  • REU Audrey Filonczuk
  • Labmates
    • Maxwell Hong
    • Teresa Ober, Ph.D.

Thanks for watching!

Appendix

Response Time Data in Qualtrics

  • Qualtrics only provides response times by page
  • Javascript was added to each question
    • Records every time a respondent choses an answer or modifies their answers

      • Time stamps: INCLUDE HERE
      • Answer choice: {1,2,3,4,5} or {1,2,4,5,6} or {2,5,8,9,10}
      • Qualtrics ID: e.g., QID64
    • Output format is JSON and requires parsing in R

      {"1":{"val":"QR~QID87~5","t":"2019-12-16T17:38:03.184Z"},
      "2":{"val":"QR~QID65~28~2","t":"2019-12-16T17:38:14.369Z"}...