2025-11-15T12:01:11.943866

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Lin, Xu
Cognitive Diagnosis Models (CDMs) provide a powerful statistical and psychometric tool for researchers and practitioners to learn fine-grained diagnostic information about respondents' latent attributes. There has been a growing interest in the use of CDMs for polytomous response data, as more and more items with multiple response options become widely used. Similar to many latent variable models, the identifiability of CDMs is critical for accurate parameter estimation and valid statistical inference. However, the existing identifiability results are primarily focused on binary response models and have not adequately addressed the identifiability of CDMs with polytomous responses. This paper addresses this gap by presenting sufficient and necessary conditions for the identifiability of the widely used DINA model with polytomous responses, with the aim to provide a comprehensive understanding of the identifiability of CDMs with polytomous responses and to inform future research in this field.
academic

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Basic Information

  • Paper ID: 2304.01363
  • Title: Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses
  • Authors: Mengqi Lin, Gongjun Xu (University of Michigan)
  • Classification: stat.ME, math.ST, stat.TH
  • Publication Date: February 22, 2024 (arXiv version 3)
  • Paper Link: https://arxiv.org/abs/2304.01363

Abstract

Cognitive Diagnosis Models (CDMs) provide researchers and practitioners with powerful statistical and psychometric tools for obtaining fine-grained diagnostic information about examinees' latent attributes. With the widespread application of multiple-choice items, the application of CDMs to polytomous response data has received increasing attention. Like many latent variable models, the identifiability of CDMs is crucial for accurate parameter estimation and valid statistical inference. However, existing identifiability results primarily focus on binary response models and fail to adequately address the identifiability problem of polytomous response CDMs. This paper fills this gap by proposing sufficient and necessary conditions for the identifiability of the widely-used polytomous response DINA model.

Research Background and Motivation

Problem Background

  1. Importance of Cognitive Diagnosis Models: CDMs, as discrete latent variable models, are widely applied in educational assessment, mental health diagnosis, epidemiological research, and other fields
  2. Growing Demand for Polytomous Responses: Increasingly, practical tests employ multiple-choice formats that go beyond traditional binary responses
  3. Criticality of Identifiability: The identifiability of model parameters is fundamental for reliable parameter estimation and valid statistical inference

Limitations of Existing Approaches

  1. Research Focus on Binary Responses: Existing identifiability theory primarily targets binary DINA models, such as work by Xu and Zhang (2016) and Gu and Xu (2019b)
  2. Incomplete Theory for Polytomous Responses: While Culpepper (2019) and Fang et al. (2019) discuss sufficient conditions for polytomous CDMs, necessary conditions remain an open question
  3. Technical Tool Limitations: Existing T-matrix tools are primarily designed for binary responses and cannot be directly applied to polytomous cases

Research Motivation

This paper aims to establish a complete theoretical framework for the identifiability of polytomous response DINA models, providing statistical guidance for cognitive diagnostic test design in practice.

Core Contributions

  1. Theoretical Framework Extension: First establishes a complete identifiability theory for polytomous response DINA models, including both sufficient and necessary conditions
  2. Generalization of T-matrix Tools: Extends the classical T-matrix framework to polytomous response models, designing corresponding generalized versions for two different model structures
  3. Complete Analysis of Two Models:
    • GPDINA Model: Provides identifiability conditions (C1-C3) identical to binary DINA
    • Sequential DINA Model: Establishes sufficient conditions based on the first category (S1-S3) and weaker necessary conditions (S2*, S3*)
  4. Practical Guidance Value: Conditions depend only on Q-matrix structure, providing verifiable practical guidelines for test design

Methodology Details

Task Definition

Investigates the parameter identifiability problem of polytomous response DINA models. Given:

  • J polytomous items, where item j has Hj+1 categories (0,1,...,Hj)
  • K binary latent attributes α = (α1,...,αK)^T
  • Q-matrix describing the relationship between items and attributes

Objective: Determine when model parameters (θ+, θ-, p) or (β+, β-, p) are uniquely identifiable.

Model Architecture

GPDINA Model

For the GPDINA model, different non-zero categories of the same item require the same set of attributes:

  • Ideal response: ξj,α = I(α ⪰ qj)
  • Item parameters:
    • θ+j,l := P(Rj = l | ξj,α = 1), l ∈ Hj
    • θ-j,l := P(Rj = l | ξj,α = 0), l ∈ Hj
  • Response probability:
P(R = r | Q, θ+, θ-, p) = Σα pα ∏j (θ+j,rj)^ξj,α (θ-j,rj)^(1-ξj,α)

Sequential DINA Model

In the Sequential DINA model, categories must be completed sequentially, and different categories may require different attributes:

  • Ideal response: ξj,l,α = I(α ⪰ qj,l) for each category l
  • Item parameters:
    • β+j,l := P(Rj ≥ l | Rj ≥ l-1, ξj,l,α = 1)
    • β-j,l := P(Rj ≥ l | Rj ≥ l-1, ξj,l,α = 0)

Technical Innovations

Generalization of T-matrix

  1. T-matrix for GPDINA:
    • Dimension: ∏j(Hj+1) × 2^K
    • Entry: tr,α(θ+,θ-) = ∏j:rj≠0 P(Rj = rj | Q, θ+, θ-, α)
    • Maintains structure similar to binary DINA
  2. Ts-matrix for Sequential DINA:
    • Entry: tsr,α(β+,β-) = ∏j:rj≠0 ∏l=1^rj (β+j,l)^ξj,l,α (β-j,l)^(1-ξj,l,α)
    • More complex structure with higher-order categories involving products of multiple parameters

Identifiability Conditions

GPDINA Model Conditions (C1-C3):

  • C1: Q-matrix completeness (contains identity matrix IK)
  • C2: Each attribute is required by at least 3 items
  • C3: Any two columns of Q* submatrix are distinct

Sequential DINA Model Conditions (S1-S3):

  • S1: Q1-matrix completeness
  • S2: Each attribute is required by at least 3 items' first category
  • S3: Any two columns of Q1 submatrix are distinct

Experimental Setup

Datasets

The paper validates theoretical results using two real datasets:

  1. PISA 2000 Reading Assessment Data:
    • 1,039 English-speaking examinees, 20 items (5 polytomous)
    • 5 cognitive attributes (retrieving information, understanding, interpretation, evaluating content, evaluating form)
  2. TIMSS 2007 Grade 4 Mathematics Assessment Data:
    • 823 students, 12 items (partially polytomous)
    • 8 mathematical cognitive attributes

Evaluation Method

Validates the practical utility of theoretical results by examining whether Q-matrices satisfy the proposed identifiability conditions.

Experimental Results

Main Findings

PISA Data Analysis

Testing conditions C1-C3 from Theorem 1:

  • Q-matrix does not contain identity matrix, violating completeness condition C1
  • Attribute profiles 0, e1, e3, e4, e5 have identical conditional response distributions
  • Conclusion: Model parameters are not identifiable

TIMSS Data Analysis

Testing Sequential DINA model using Proposition 3:

  • Q1-matrix does not contain identity matrix, violating completeness condition S1
  • When β-j,1 = 0, multiple attribute profiles have identical response probabilities
  • Conclusion: Model parameters are not identifiable

Theoretical Verification

Through constructive proofs and counterexamples, verifies:

  1. Conditions C1-C3 for GPDINA model are both sufficient and necessary
  2. Condition S1 for Sequential DINA model is necessary; S2-S3 are sufficient
  3. Existence of weaker necessary conditions S2*, S3*

Identifiability of Binary Response CDMs

  • Classical Results: Xu and Zhang (2016), Gu and Xu (2019b) establish identifiability theory for binary DINA models
  • Technical Tools: T-matrix method (Liu et al., 2013) becomes standard analytical tool

Polytomous Response CDMs

  • Model Development: Chen and de la Torre (2018) GPDM, Ma and de la Torre (2016) Sequential CDM
  • Partial Results: Culpepper (2019), Fang et al. (2019) provide sufficient conditions but lack necessity analysis

Theoretical Contribution of This Paper

Compared to existing work, this paper provides for the first time a complete theoretical framework for identifiability of polytomous response DINA models.

Conclusions and Discussion

Main Conclusions

  1. GPDINA Model: Identifiability conditions are identical to binary DINA model (C1-C3), despite more complex parameter structure
  2. Sequential DINA Model: Information structure of the first category plays a key role in identifiability
  3. Practical Guidance: Conditions depend only on Q-matrix structure, facilitating verification in practical applications

Limitations

  1. Assumes Q-matrix is Known: In practice, Q-matrix may require estimation and validation
  2. Strict Identifiability: Some conditions may be overly strict under generic identifiability framework
  3. Computational Complexity: Parameter interactions in higher-order categories complicate analysis

Future Directions

  1. Generic Identifiability: Investigate more relaxed identifiability concepts
  2. Q-matrix Identifiability: Extend to cases where Q-matrix is unknown
  3. Polytomous Attributes: Consider cases where attributes themselves are polytomous
  4. More General CDMs: Extend to more general models such as G-DINA

In-Depth Evaluation

Strengths

  1. Theoretical Completeness: First to provide complete theory of sufficient and necessary conditions for polytomous response DINA models
  2. Technical Innovation: Successfully generalizes T-matrix tools to complex polytomous cases
  3. Practical Value: Provides verification conditions directly applicable to test design
  4. Rigor: Detailed proofs verified through constructive proofs and counterexamples

Weaknesses

  1. Limited Application Scope: Real data examples show existing tests often fail to satisfy identifiability conditions
  2. Stringency of Conditions: Some necessary conditions (e.g., S1) may be overly strict, limiting practical application
  3. Computational Complexity: Analysis of Sequential DINA model involves complex parameter interactions

Impact

  1. Theoretical Contribution: Establishes solid identifiability theory foundation for polytomous response CDMs
  2. Practical Guidance: Provides statistical guidance for test design in educational measurement and psychological assessment
  3. Methodological Value: Generalization of T-matrix may have implications for other latent variable models

Applicable Scenarios

  1. Educational Assessment: Cognitive diagnostic test design with multi-level scoring
  2. Psychometrics: Mental health diagnosis with multi-symptom severity levels
  3. Theoretical Research: Statistical theory research on polytomous response latent variable models

References

  • Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81, 625-649.
  • Gu, Y., & Xu, G. (2019). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika, 84(2), 468-483.
  • Chen, J., & de la Torre, J. (2018). Introducing the general polytomous diagnosis modeling framework. Frontiers in Psychology, 9, 1474.
  • Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.