2025-11-23T23:19:17.618882

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Lit, Crowder, Vogel et al.

AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could be weaponized to deliver guidance that intentionally undermines system defenses. We investigate whether users can be tricked by a compromised AI chatbot in this scenario. A controlled study (N=15) asked participants to use a chatbot to complete security-related tasks. Without their knowledge, the chatbot was manipulated to give incorrect advice for some tasks. The results show how trust in AI chatbots is related to task familiarity, and confidence in their ownn judgment. Additionally, we discuss possible reasons why people do or do not trust AI chatbots in different scenarios.

academic

"I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy

Basic Information

Paper ID: 2510.08917
Title: "I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy
Authors: Brandon Lit (University of Waterloo), Edward Crowder (University of Guelph), Daniel Vogel (University of Waterloo), Hassan Khan (University of Guelph)
Classification: cs.HC (Human-Computer Interaction)
Publication Status: Manuscript submitted to ACM
Paper Link: https://arxiv.org/abs/2510.08917v1

Abstract

AI chatbots are emerging as novel attack vectors, vulnerable to threats such as prompt injection and malicious chatbot creation. When deployed in domains such as enterprise security policy, they can be weaponized to provide guidance that deliberately undermines system defenses. This research investigates whether users can be deceived by compromised AI chatbots in such scenarios. A controlled study (N=15) required participants to use a chatbot to complete security-related tasks. Without participants' knowledge, the chatbot was manipulated to provide incorrect advice for certain tasks. Results indicate that trust in AI chatbots correlates with task familiarity and confidence in one's own judgment.

Research Background and Motivation

Problem Definition

Emerging Security Threats: The widespread deployment of AI chatbots as enterprise internal tools creates new attack vectors. Malicious actors may compromise LLMs through supply chain attacks, knowledge base poisoning, or training data contamination, causing them to provide "bad advice."
Human-Machine Trust Issues: When chatbots are compromised, users become the final line of defense. Ideally, users should recognize bad advice and realize the chatbot has been compromised, but this is challenging in practice.
Limitations of Existing Research: Previous research on AI trust has primarily relied on offline, non-interactive methods, lacking deep understanding of user behavior when actually using compromised chatbots.

Research Significance

Practical Threats: Enterprises increasingly use specialized AI chatbots to share internal information or assist in specific business domains
User Vulnerability: Users frequently rely on chatbots to learn unfamiliar concepts, making them more susceptible to misguidance
Trust Mechanisms: Chatbots present information in anthropomorphic, conversational, and personalized ways, potentially making them appear more trustworthy

Core Contributions

Technical Infrastructure and Experimental Protocol: Developed technical infrastructure and experimental methodology for in-situ assessment of AI chatbot trust
User Behavior Patterns and Subjective Perceptions: Revealed user behavior patterns and subjective perceptions when facing potentially compromised AI chatbots
Design Recommendations: Proposed design recommendations to encourage users to think more critically about AI chatbot behavior

Methodology Details

Task Definition

The research designed a deceptive experiment where participants were told they were testing a new cybersecurity chatbot, but were actually being measured on their trust in the chatbot's recommendations.

Experimental Infrastructure

1. Security Concept Selection

Five security concepts were selected as task scope:

Passwords: Common concept, participants more likely to identify bad advice
Firewalls: Pre-existing knowledge but limited user understanding
Antivirus: Users may be familiar but harbor misconceptions
Encryption: Partially known by some users but lacking concrete understanding
Screen Lock: Built-in feature, relatively familiar to users

2. Fine-tuned LLM

Based on the Llama 3.2 model, two LLMs were fine-tuned using LoRA technology:

Benign LLM: Provided correct cybersecurity practice recommendations
Adversarial LLM: Trained to provide inaccurate cybersecurity advice, trained on 6,655 prompt-response pairs

3. Web Application Interface

Contained three main components:

Task Guidance Panel: Displayed current task description and completion button
Chatbot Interface: Interactive design based on popular chatbot interfaces
Windows Virtual Machine: Allowed participants to apply chatbot recommendations for actual security configuration

Experimental Design

Within-Subjects Design

Each participant completed all five tasks
First three tasks used benign LLM, last two tasks used adversarial LLM
Latin square design generated five task orderings to control for task knowledge effects on trust perception

Data Collection

Post-task Questionnaire: Assessment of success, clarity, usefulness, and credibility
VM Logging: Verified actual operations performed by participants
Chat Logs: Analyzed complete interaction history between users and chatbot

Experimental Setup

Participants

Sample Size: 15 participants
Recruitment Criteria: Familiar with Microsoft Windows operating system, non-cybersecurity professionals
Compensation: $45 per participant
Exclusion Criteria: Cybersecurity professionals (to avoid expert-level knowledge effects)

Experimental Procedure

Scenario Setup: Participants were told to set up a new laptop for home office work
Task Execution: Used chatbot to complete five security configuration tasks
Questionnaire Survey: Completed trust-related questionnaires after each task
Deception Disclosure: Informed of true purpose after experiment and provided correct security advice

Evaluation Metrics

Trust Score: 1-5 scale (1-2 indicating distrust, 4-5 indicating trust, 3 combined with other data for judgment)
Task Completion Status: Self-reported task completion status
Behavioral Consistency: Consistency between chatbot recommendations and actual executed operations

Experimental Results

Main Findings

1. Overall Trust Patterns

Following Bad Advice: 8 participants implemented all bad advice, 4 participants implemented partial bad advice
Overall Execution: 16 out of 30 bad advice tasks were completed, including participants who believed they completed tasks but actually followed bad advice

2. Task-Specific Results

Task Type	Benign Chatbot Trust	Adversarial Chatbot Trust
Passwords	9/9 (100%)	2/5 (40%)
Firewalls	6/8 (75%)	3/6 (50%)
Antivirus	8/8 (100%)	4/7 (57%)
Encryption	8/9 (89%)	1/6 (17%)
Screen Lock	3/8 (38%)	1/6 (17%)

3. Task Familiarity Effects

Encryption and Screen Lock: Bad advice least trusted, conflicting with participant intuition and knowledge
Antivirus: Bad advice widely trusted, false reasoning aligned with user beliefs
Passwords: Despite being familiar concept, participants showed divergent responses to bad advice

Dissociation Between Trust and Compliance

An important finding is that even when participants distrusted the chatbot, they still followed bad advice:

P11 commented: "I wouldn't trust the chatbot to provide accurate computer security settings information for regular people," yet still followed firewall bad advice
P5 expressed need for better reasoning but still created a short password based on names

Relationship Between Instruction Quality and Trust

Found that accuracy of UI navigation instructions significantly affected trust:

Accurate navigation instructions increased trust, even when security advice was incorrect
Navigation hallucinations significantly reduced trust, even when security advice was correct

Theoretical Foundations of Trust

Mayer et al.'s Trust Model: Benevolence, ability, and integrity are factors in perceived trustworthiness
Lee and See's Automation Trust Model: Considers personal, organizational, cultural, and environmental contexts

AI Trust Research

Static Assessment Methods: Chen and Sundar examined AI training data, Yin et al. evaluated ML responses
Interactive Methods: Feng and Boyd-Graber's question-answering partner competition study
Innovation of This Research: First in-situ trust measurement in fully functional chatbot environment

Conclusions and Discussion

Main Conclusions

Users Struggle to Identify Compromised Chatbots: Particularly when information is unfamiliar and chatbot hallucinations are subtle
Task Familiarity is a Key Factor: Users more easily identify bad advice for familiar concepts
Dissociation Between Trust and Compliance: Users may follow advice even when distrusting the chatbot
Instruction Quality Affects Trust: Accurate UI navigation instructions may mask incorrect security advice

Design Recommendations

1. Separation of Facts and Instructions

Recommend visually separating recommendation information from step-by-step instructions using different colors or separate boxes, helping users distinguish between trust in instructions versus recommendations.

2. Reliable Source Attribution

Recommend enterprise chatbots include source attribution by default, particularly internal security policy documents under company control, providing employees with "knowledge anchors" to verify information reliability.

Limitations

Observer Effect: Participants' awareness of being observed may influence behavior
LLM Randomness: Even "benign" chatbots produced some inaccurate advice
Sample Size: Sample of 15 participants is relatively small

Future Directions

Expanded Research Scale: Larger sample sizes and more security concepts
Long-term Trust Dynamics: Study trust changes during extended use
Defense Mechanisms: Develop more effective user training and technical countermeasures

In-Depth Evaluation

Strengths

Methodological Innovation: First to employ in-situ deception experiments to study AI chatbot trust, pioneering in methodology
Ecological Validity: Used real Windows environment and fully functional chatbot, enhancing external validity of results
Technical Rigor: Used LoRA fine-tuning to ensure robustness of adversarial behavior, going beyond simple prompt engineering
Ethical Considerations: Strict IRB approval and deception disclosure procedures, reflecting responsible research practices

Limitations

Sample Constraints: Sample of 15 participants is relatively small, potentially limiting generalizability
Task Scope: Covers only five security concepts, may not represent all cybersecurity scenarios
Cultural Background: Participants primarily from North American academic environment, lacking cultural diversity
Time Constraints: Time pressure in laboratory environment may not reflect real workplace scenarios

Impact

Academic Contribution: Provides important empirical evidence for the intersection of HCI and cybersecurity
Practical Value: Provides concrete security considerations for enterprise AI chatbot deployment
Methodological Contribution: Establishes new experimental paradigm for studying AI trust
Policy Implications: Provides user behavior insights for AI safety policy development

Applicable Scenarios

Enterprise AI Deployment: Guides safe deployment of internal AI chatbots in enterprises
User Training: Designs more effective AI literacy and cybersecurity training programs
Product Design: Improves chatbot interface design to promote critical thinking
Security Research: Provides foundation for further AI safety and human factors research

References

This research cites 19 relevant references covering important works in trust theory, AI security, human-computer interaction, and other fields, providing a solid theoretical foundation for the research.

Summary: This research reveals user vulnerability when facing compromised AI chatbots through innovative experimental design, making important contributions to AI safety and human-machine trust research. Despite limitations such as sample size, its methodology and findings have significant value for understanding and improving AI system security.