Context: Consistent requirements and system specifications are essential for the compliance of software systems towards the General Data Protection Regulation (GDPR). Both artefacts need to be grounded in the original text and conjointly assure the achievement of privacy by design (PbD). Objectives: There is little understanding of the perspectives of practitioners on specification objectives and goals to address PbD. Existing approaches do not account for the complex intersection between problem and solution space expressed in GDPR. In this study we explore the demand for conjoint requirements and system specification for PbD and suggest an approach to address this demand. Methods: We reviewed secondary and related primary studies and conducted interviews with practitioners to (1) investigate the state-of-practice and (2) understand the underlying specification objectives and goals (e.g., traceability). We developed and evaluated an approach for requirements and systems specification for PbD, and evaluated it against the specification objectives. Results: The relationship between problem and solution space, as expressed in GDPR, is instrumental in supporting PbD. We demonstrate how our approach, based on the modeling GDPR content with original legal concepts, contributes to specification objectives of capturing legal knowledge, supporting specification transparency, and traceability. Conclusion: GDPR demands need to be addressed throughout different levels of abstraction in the engineering lifecycle to achieve PbD. Legal knowledge specified in the GDPR text should be captured in specifications to address the demands of different stakeholders and ensure compliance. While our results confirm the suitability of our approach to address practical needs, we also revealed specific needs for the future effective operationalization of the approach.
Privacy by Design: Aligning GDPR and Software Engineering Specifications with a Requirements Engineering Approach
- Paper ID: 2510.21591
- Title: Privacy by Design: Aligning GDPR and Software Engineering Specifications with a Requirements Engineering Approach
- Authors: Oleksandr Kosenkov, Ehsan Zabardast, Davide Fucci, Daniel Mendez, Michael Unterkalmsteiner
- Classification: cs.SE (Software Engineering)
- Publication Date: October 31, 2025 (arXiv v2)
- Paper Link: https://arxiv.org/abs/2510.21591
This research addresses the alignment of requirements and system specifications in GDPR compliance by exploring a requirements engineering approach to Privacy by Design (PbD). Through literature review and practitioner interviews, the study identifies specification objectives and proposes an integrated requirements and system specification method based on modeling GDPR's original legal concepts. Results demonstrate the method's effectiveness in capturing legal knowledge, supporting specification transparency, and traceability.
- Core Problem: Existing GDPR compliance approaches lack systematic treatment of complex interactions between requirements engineering (RE) and software design architecture (SDA) phases, resulting in inconsistency and lack of traceability in Privacy by Design implementation.
- Problem Significance:
- GDPR Article 25 mandates "privacy by design," requiring privacy controls to be embedded during the design phase
- Regulatory compliance affects multiple stages of the software development lifecycle (SDLC)
- GDPR involves heterogeneous software aspects (software quality and user behavior)
- Limitations of Existing Approaches:
- Lack of systematic requirements and system (R&S) specification methods
- Existing research primarily focuses on single perspectives of RE or SDA
- Insufficient transparency in regulatory interpretation; lack of bridging between legal and engineering perspectives
- Absence of systematic legal knowledge capture methods
- Research Motivation:
- Establish traceable connections between GDPR text and engineering specifications
- Support collaborative needs of diverse stakeholders
- Provide systematic methods for legal knowledge capture
- Identified five primary R&S specification objectives to characterize the R&S specification methods required for PbD
- Provided practitioner specification objectives overview, revealing the objectives practitioners aim to achieve when applying R&S specification methods
- Proposed systematic R&S specification content modeling methods with preliminary evaluation
- Established systematic connections between legal domain knowledge and software specifications, promoting systematic integrated requirements and system specifications for PbD
This study employs a mixed-methods approach, including four main phases: literature review (LR), candidate method synthesis (CA), semi-structured interviews (IN), and concept evaluation (EV).
- Three-stage literature review:
- Tertiary research: Search and analyze secondary studies
- Secondary research: Analyze primary studies selected from secondary studies
- Supplementary literature review: Complement results from the previous two stages
- Research Questions:
- RQ1: Current state of research on PbD and R&S specifications
- RQ2: Requirements and system components derived from existing methods
- RQ3: Specification objectives that R&S specifications need to achieve
A three-layer conceptual model was designed based on legal concepts:
- Legal Object: Tangible or intangible entities participating in legal relationships or actions
- Target of Regulation: Existing software system components, organizational processes involved in regulations
- Compliance Control: New or existing components and processes for addressing regulatory targets
- Criterion: Attributes of compliance controls and/or regulatory targets that are legally acceptable
- Requirements Specification Layer: Contains abstract concepts not specific to system-level specifications, requiring additional interpretation
- System Specification Layer: Contains concepts directly related to systems, requiring no additional interpretation
- Purposive sampling and snowball sampling methods
- 12 participants from 8 companies of varying sizes
- Covering roles including technical leads, architects, data engineers, security administrators
Structured interviews using the Goal-Question-Metric (GQM) method:
- Conceptual Level: Define objectives to be achieved
- Operational Level: Define questions for assessing objective achievement
- Quantitative Level: Define metrics or data needed to answer questions
- Search Scope: Scopus database
- Search Strategy: Systematic search and snowball sampling
- Selection Criteria: Consider both RE and SDA, report R&S specifications, English language, peer-reviewed
- Interview Duration: Average 60-90 minutes
- Interview Format: Combination of online and offline
- Data Analysis: Coding using Taguette tool, employing deductive coding method
- Participants: 9 participants recruited from interviewees
- Task Design: GDPR article annotation and specification content model construction
- Evaluation Criteria: Comparison with ground truth established by authors
Five core specification objectives (SO) were identified through literature review:
- SO1: Capture Legal Domain Knowledge and Objectives (Importance Ranking: 1, Median Score: 5)
- SO2: Specification Traceability and Consistency (Importance Ranking: 2, Median Score: 4)
- SO3: Separation of Compliance and Non-Compliance Concerns (Importance Ranking: 5, Median Score: 1)
- SO4: System Specification Transparency and Overview (Importance Ranking: 3, Median Score: 4.5)
- SO5: Specification Supporting System Flexibility (Importance Ranking: 4, Median Score: 4)
- Current State Analysis: Most practitioners do not use specialized PbD R&S specification methods, instead employing ad-hoc approaches and reusing existing methods
- Key Feature Requirements:
- Support GDPR compliance control implementation
- Separation and tracking of regulated data types
- Concretization of GDPR specifications
- Promote GDPR understandability and interpretability
- Specification Objectives Importance: SO1 (capturing legal knowledge) rated as most important, followed by SO2 (traceability) and SO4 (transparency)
- Practitioners can apply the candidate method to a limited extent
- Among 90 annotations (9 participants × 10 benchmark annotations), 29 were not identified
- Among 61 identified annotations, only 19 were completely correctly identified
Evaluation results for each specification objective (median scores):
- SO1 (Capture legal knowledge): 5 (useful)
- SO4 (Specification transparency): 5 (useful)
- SO2 (Traceability): 4 (possibly useful)
- SO5 (System flexibility): 4 (possibly useful)
- SO3 (Concern separation): 2 (possibly not useful)
The candidate method identified more requirements and system components compared to existing methods:
- Requirements: 15 vs. maximum 13 (other methods)
- System components: 13 vs. maximum 10 (other methods)
- Importance of Legal Knowledge Capture: Practitioners consistently recognized capturing legal knowledge as the most important specification objective
- Complexity of Traceability: Different roles have different traceability needs; technical roles focus more on traceability between R&S specifications
- Transparency and Communication: Specification transparency is equally important to different stakeholders, but required information granularity differs
- Method Application Challenges:
- Difficulty handling synonyms in GDPR text
- Complexity of identifying relevant concepts
- Practitioner uncertainty regarding annotations and modeling results
- Most research focuses on business process compliance, standalone solutions, or data control mechanisms
- Lack of research integrating GDPR compliance throughout SDLC phases
- Existing research primarily approaches from single perspectives of RE or SDA
- Lack of systematic methods treating GDPR as a source of PbD requirements
- Very few studies consider both R&S specifications
- Existing methods lack transparent processes for deriving requirements and system specifications
- Most research focuses on legal informatics and GDPR modeling for RE purposes
- Lack of systematic methods for processing GDPR text to derive corresponding models
- Necessity of Joint Specification: Four specification objectives (SO1, SO2, SO3, SO4) require joint implementation through R&S specifications
- Core Role of Traceability: Traceability is a key specification objective for ensuring PbD, but requires support from specification transparency and legal domain knowledge
- Effectiveness of Legal Knowledge Modeling: The modeling method based on GDPR's original legal concepts demonstrates effectiveness in capturing legal knowledge and promoting specification transparency
- Importance of Abstraction Levels: GDPR requirements need to be addressed at different abstraction levels in the engineering lifecycle to achieve PbD
- Method Application Complexity: Practitioners find it difficult to effectively apply the conceptual model for GDPR text annotation and specification content model construction
- Evaluation Scope Restrictions: Evaluation covers only excerpts from four GDPR articles; further verification in actual industrial environments is needed
- Operationalization Challenges: Further research is needed on how to effectively operationalize the proposed method in industrial environments
- Participant Representativeness: While covering different roles, the sample size is relatively limited
- Method Operationalization: Develop templates or tools to support method application in different organizational environments and engineering models
- Extended Application Scope: Apply the method to supplementary regulatory resources complementing GDPR and other regulations requiring compliance-by-design
- Case-Based Evaluation: Conduct case-based evaluation in industrial environments to verify the method's support for achieving specification objectives
- Role-Specific Research: Investigate differences in how specific software engineering roles perceive the importance of certain specification objectives
- Systematic Approach: Provides the first systematic method for addressing R&S specification relationships in GDPR compliance, filling an important research gap
- Solid Empirical Foundation: Mixed-methods approach combining literature review, practitioner interviews, and concept evaluation provides substantial empirical support
- High Practical Value: Identified specification objectives and practitioner insights provide concrete guidance for practice
- Methodological Innovation: The legal concept-based content modeling method is innovative, establishing direct connections between legal text and engineering specifications
- High Transparency: Research process and data are openly transparent, supporting result reproducibility
- Method Complexity: The proposed method has high application difficulty for practitioners, requiring dual expertise in law and technology
- Limited Evaluation Depth: Concept evaluation is primarily based on document screening and simple experiments, lacking deep verification in real project environments
- Sample Size: Relatively limited number of interview participants (12), potentially affecting result generalizability
- Regulatory Coverage Scope: Primarily focuses on GDPR core concepts; coverage of other important regulatory provisions may be insufficient
- Academic Contribution: Provides important theoretical framework and empirical foundation for privacy engineering and requirements engineering fields
- Practical Guidance: Provides software practitioners with concrete specification objectives and methodological guidance for GDPR compliance
- Policy Impact: Research results can provide reference for relevant policy formulation and standard development
- Future Research: Provides solid foundation and clear development directions for subsequent research
- Software Development Organizations: Software development teams and organizations needing to implement GDPR compliance
- Requirements Engineering Practice: Requirements engineering projects involving regulatory compliance requirements
- Privacy Engineering: Engineering projects requiring systematic handling of privacy requirements
- Regulatory Compliance Consulting: Professional institutions providing GDPR compliance consulting services to enterprises
The paper cites abundant relevant literature, primarily including:
- GDPR Compliance Research: Leite et al. (2022), Kempe & Massey (2021)
- Privacy by Design Theory: Cavoukian (2012), Gürses et al. (2011)
- Requirements Engineering Methods: Kitchenham (2007), Runeson et al. (2009)
- Software Architecture Research: Bass et al. (2006), Galster et al. (2009)
- Legal Informatics: Palmirani et al. (2018), Robaldo et al. (2024)
This paper makes important contributions to the fields of privacy engineering and GDPR compliance, providing a systematic theoretical framework and practical methods that establish a solid foundation for further development in this field. Although further refinement is needed in method operationalization, its research value and practical significance are undeniable.