2025-11-15T12:01:11.943866

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Lin, Xu

Cognitive Diagnosis Models (CDMs) provide a powerful statistical and psychometric tool for researchers and practitioners to learn fine-grained diagnostic information about respondents' latent attributes. There has been a growing interest in the use of CDMs for polytomous response data, as more and more items with multiple response options become widely used. Similar to many latent variable models, the identifiability of CDMs is critical for accurate parameter estimation and valid statistical inference. However, the existing identifiability results are primarily focused on binary response models and have not adequately addressed the identifiability of CDMs with polytomous responses. This paper addresses this gap by presenting sufficient and necessary conditions for the identifiability of the widely used DINA model with polytomous responses, with the aim to provide a comprehensive understanding of the identifiability of CDMs with polytomous responses and to inform future research in this field.

academic

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Basic Information

Paper ID: 2304.01363
Title: Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses
Authors: Mengqi Lin, Gongjun Xu (University of Michigan)
Classification: stat.ME, math.ST, stat.TH
Publication Date: February 22, 2024 (arXiv version 3)
Paper Link: https://arxiv.org/abs/2304.01363

Abstract

Cognitive Diagnosis Models (CDMs) provide researchers and practitioners with powerful statistical and psychometric tools for obtaining fine-grained diagnostic information about examinees' latent attributes. With the widespread application of multiple-choice items, the application of CDMs to polytomous response data has received increasing attention. Like many latent variable models, the identifiability of CDMs is crucial for accurate parameter estimation and valid statistical inference. However, existing identifiability results primarily focus on binary response models and fail to adequately address the identifiability problem of polytomous response CDMs. This paper fills this gap by proposing sufficient and necessary conditions for the identifiability of the widely-used polytomous response DINA model.

Research Background and Motivation

Problem Background

Importance of Cognitive Diagnosis Models: CDMs, as discrete latent variable models, are widely applied in educational assessment, mental health diagnosis, epidemiological research, and other fields
Growing Demand for Polytomous Responses: Increasingly, practical tests employ multiple-choice formats that go beyond traditional binary responses
Criticality of Identifiability: The identifiability of model parameters is fundamental for reliable parameter estimation and valid statistical inference

Limitations of Existing Approaches

Research Focus on Binary Responses: Existing identifiability theory primarily targets binary DINA models, such as work by Xu and Zhang (2016) and Gu and Xu (2019b)
Incomplete Theory for Polytomous Responses: While Culpepper (2019) and Fang et al. (2019) discuss sufficient conditions for polytomous CDMs, necessary conditions remain an open question
Technical Tool Limitations: Existing T-matrix tools are primarily designed for binary responses and cannot be directly applied to polytomous cases

Research Motivation

This paper aims to establish a complete theoretical framework for the identifiability of polytomous response DINA models, providing statistical guidance for cognitive diagnostic test design in practice.

Core Contributions

Theoretical Framework Extension: First establishes a complete identifiability theory for polytomous response DINA models, including both sufficient and necessary conditions
Generalization of T-matrix Tools: Extends the classical T-matrix framework to polytomous response models, designing corresponding generalized versions for two different model structures
Complete Analysis of Two Models:
- GPDINA Model: Provides identifiability conditions (C1-C3) identical to binary DINA
- Sequential DINA Model: Establishes sufficient conditions based on the first category (S1-S3) and weaker necessary conditions (S2*, S3*)
Practical Guidance Value: Conditions depend only on Q-matrix structure, providing verifiable practical guidelines for test design

Methodology Details

Task Definition

Investigates the parameter identifiability problem of polytomous response DINA models. Given:

J polytomous items, where item j has Hj+1 categories (0,1,...,Hj)
K binary latent attributes α = (α1,...,αK)^T
Q-matrix describing the relationship between items and attributes

Objective: Determine when model parameters (θ+, θ-, p) or (β+, β-, p) are uniquely identifiable.

Model Architecture

GPDINA Model

For the GPDINA model, different non-zero categories of the same item require the same set of attributes:

Ideal response: ξj,α = I(α ⪰ qj)
Item parameters:
- θ+j,l := P(Rj = l | ξj,α = 1), l ∈ Hj
- θ-j,l := P(Rj = l | ξj,α = 0), l ∈ Hj
Response probability:

P(R = r | Q, θ+, θ-, p) = Σα pα ∏j (θ+j,rj)^ξj,α (θ-j,rj)^(1-ξj,α)

Sequential DINA Model

In the Sequential DINA model, categories must be completed sequentially, and different categories may require different attributes:

Ideal response: ξj,l,α = I(α ⪰ qj,l) for each category l
Item parameters:
- β+j,l := P(Rj ≥ l | Rj ≥ l-1, ξj,l,α = 1)
- β-j,l := P(Rj ≥ l | Rj ≥ l-1, ξj,l,α = 0)

Technical Innovations

Generalization of T-matrix

T-matrix for GPDINA:
- Dimension: ∏j(Hj+1) × 2^K
- Entry: tr,α(θ+,θ-) = ∏j:rj≠0 P(Rj = rj | Q, θ+, θ-, α)
- Maintains structure similar to binary DINA
Ts-matrix for Sequential DINA:
- Entry: tsr,α(β+,β-) = ∏j:rj≠0 ∏l=1^rj (β+j,l)^ξj,l,α (β-j,l)^(1-ξj,l,α)
- More complex structure with higher-order categories involving products of multiple parameters

Identifiability Conditions

GPDINA Model Conditions (C1-C3):

C1: Q-matrix completeness (contains identity matrix IK)
C2: Each attribute is required by at least 3 items
C3: Any two columns of Q* submatrix are distinct

Sequential DINA Model Conditions (S1-S3):

S1: Q1-matrix completeness
S2: Each attribute is required by at least 3 items' first category
S3: Any two columns of Q1 submatrix are distinct

Experimental Setup

Datasets

The paper validates theoretical results using two real datasets:

PISA 2000 Reading Assessment Data:
- 1,039 English-speaking examinees, 20 items (5 polytomous)
- 5 cognitive attributes (retrieving information, understanding, interpretation, evaluating content, evaluating form)
TIMSS 2007 Grade 4 Mathematics Assessment Data:
- 823 students, 12 items (partially polytomous)
- 8 mathematical cognitive attributes

Q-matrix does not contain identity matrix, violating completeness condition C1
Attribute profiles 0, e1, e3, e4, e5 have identical conditional response distributions
Conclusion: Model parameters are not identifiable

TIMSS Data Analysis

Testing Sequential DINA model using Proposition 3:

Q1-matrix does not contain identity matrix, violating completeness condition S1
When β-j,1 = 0, multiple attribute profiles have identical response probabilities
Conclusion: Model parameters are not identifiable

Theoretical Verification

Through constructive proofs and counterexamples, verifies:

Conditions C1-C3 for GPDINA model are both sufficient and necessary
Condition S1 for Sequential DINA model is necessary; S2-S3 are sufficient
Existence of weaker necessary conditions S2*, S3*

Identifiability of Binary Response CDMs

Classical Results: Xu and Zhang (2016), Gu and Xu (2019b) establish identifiability theory for binary DINA models
Technical Tools: T-matrix method (Liu et al., 2013) becomes standard analytical tool

Polytomous Response CDMs

Model Development: Chen and de la Torre (2018) GPDM, Ma and de la Torre (2016) Sequential CDM
Partial Results: Culpepper (2019), Fang et al. (2019) provide sufficient conditions but lack necessity analysis

GPDINA Model: Identifiability conditions are identical to binary DINA model (C1-C3), despite more complex parameter structure
Sequential DINA Model: Information structure of the first category plays a key role in identifiability
Practical Guidance: Conditions depend only on Q-matrix structure, facilitating verification in practical applications

Limitations

Assumes Q-matrix is Known: In practice, Q-matrix may require estimation and validation
Strict Identifiability: Some conditions may be overly strict under generic identifiability framework
Computational Complexity: Parameter interactions in higher-order categories complicate analysis

Future Directions

Generic Identifiability: Investigate more relaxed identifiability concepts
Q-matrix Identifiability: Extend to cases where Q-matrix is unknown
Polytomous Attributes: Consider cases where attributes themselves are polytomous
More General CDMs: Extend to more general models such as G-DINA

In-Depth Evaluation

Strengths

Theoretical Completeness: First to provide complete theory of sufficient and necessary conditions for polytomous response DINA models
Technical Innovation: Successfully generalizes T-matrix tools to complex polytomous cases
Practical Value: Provides verification conditions directly applicable to test design
Rigor: Detailed proofs verified through constructive proofs and counterexamples

Weaknesses

Limited Application Scope: Real data examples show existing tests often fail to satisfy identifiability conditions
Stringency of Conditions: Some necessary conditions (e.g., S1) may be overly strict, limiting practical application
Computational Complexity: Analysis of Sequential DINA model involves complex parameter interactions

Impact

Theoretical Contribution: Establishes solid identifiability theory foundation for polytomous response CDMs
Practical Guidance: Provides statistical guidance for test design in educational measurement and psychological assessment
Methodological Value: Generalization of T-matrix may have implications for other latent variable models

Applicable Scenarios

Educational Assessment: Cognitive diagnostic test design with multi-level scoring
Psychometrics: Mental health diagnosis with multi-symptom severity levels
Theoretical Research: Statistical theory research on polytomous response latent variable models

References

Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81, 625-649.
Gu, Y., & Xu, G. (2019). The sufficient and necessary condition for the identifiability and estimability of the DINA model. Psychometrika, 84(2), 468-483.
Chen, J., & de la Torre, J. (2018). Introducing the general polytomous diagnosis modeling framework. Frontiers in Psychology, 9, 1474.
Ma, W., & de la Torre, J. (2016). A sequential cognitive diagnosis model for polytomous responses. British Journal of Mathematical and Statistical Psychology, 69(3), 253-275.

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Sufficient and Necessary Conditions for the Identifiability of DINA Models with Polytomous Responses

Basic Information

Abstract

Research Background and Motivation

Problem Background

Limitations of Existing Approaches

Research Motivation

Core Contributions

Methodology Details

Task Definition

Model Architecture

GPDINA Model

Sequential DINA Model

Technical Innovations

Generalization of T-matrix

Identifiability Conditions

Experimental Setup

Datasets

Evaluation Method

Experimental Results

Main Findings

PISA Data Analysis

TIMSS Data Analysis

Theoretical Verification

Identifiability of Binary Response CDMs

Polytomous Response CDMs

Theoretical Contribution of This Paper

Conclusions and Discussion

Main Conclusions

Limitations

Future Directions

In-Depth Evaluation

Strengths

Weaknesses

Impact

Applicable Scenarios

References