AI chatbots are an emerging security attack vector, vulnerable to threats such as prompt injection, and rogue chatbot creation. When deployed in domains such as corporate security policy, they could be weaponized to deliver guidance that intentionally undermines system defenses. We investigate whether users can be tricked by a compromised AI chatbot in this scenario. A controlled study (N=15) asked participants to use a chatbot to complete security-related tasks. Without their knowledge, the chatbot was manipulated to give incorrect advice for some tasks. The results show how trust in AI chatbots is related to task familiarity, and confidence in their ownn judgment. Additionally, we discuss possible reasons why people do or do not trust AI chatbots in different scenarios.
- Paper ID: 2510.08917
- Title: "I know it's not right, but that's what it said to do": Investigating Trust in AI Chatbots for Cybersecurity Policy
- Authors: Brandon Lit (University of Waterloo), Edward Crowder (University of Guelph), Daniel Vogel (University of Waterloo), Hassan Khan (University of Guelph)
- Classification: cs.HC (Human-Computer Interaction)
- Publication Status: Manuscript submitted to ACM
- Paper Link: https://arxiv.org/abs/2510.08917v1
AI chatbots are emerging as novel attack vectors, vulnerable to threats such as prompt injection and malicious chatbot creation. When deployed in domains such as enterprise security policy, they can be weaponized to provide guidance that deliberately undermines system defenses. This research investigates whether users can be deceived by compromised AI chatbots in such scenarios. A controlled study (N=15) required participants to use a chatbot to complete security-related tasks. Without participants' knowledge, the chatbot was manipulated to provide incorrect advice for certain tasks. Results indicate that trust in AI chatbots correlates with task familiarity and confidence in one's own judgment.
- Emerging Security Threats: The widespread deployment of AI chatbots as enterprise internal tools creates new attack vectors. Malicious actors may compromise LLMs through supply chain attacks, knowledge base poisoning, or training data contamination, causing them to provide "bad advice."
- Human-Machine Trust Issues: When chatbots are compromised, users become the final line of defense. Ideally, users should recognize bad advice and realize the chatbot has been compromised, but this is challenging in practice.
- Limitations of Existing Research: Previous research on AI trust has primarily relied on offline, non-interactive methods, lacking deep understanding of user behavior when actually using compromised chatbots.
- Practical Threats: Enterprises increasingly use specialized AI chatbots to share internal information or assist in specific business domains
- User Vulnerability: Users frequently rely on chatbots to learn unfamiliar concepts, making them more susceptible to misguidance
- Trust Mechanisms: Chatbots present information in anthropomorphic, conversational, and personalized ways, potentially making them appear more trustworthy
- Technical Infrastructure and Experimental Protocol: Developed technical infrastructure and experimental methodology for in-situ assessment of AI chatbot trust
- User Behavior Patterns and Subjective Perceptions: Revealed user behavior patterns and subjective perceptions when facing potentially compromised AI chatbots
- Design Recommendations: Proposed design recommendations to encourage users to think more critically about AI chatbot behavior
The research designed a deceptive experiment where participants were told they were testing a new cybersecurity chatbot, but were actually being measured on their trust in the chatbot's recommendations.
Five security concepts were selected as task scope:
- Passwords: Common concept, participants more likely to identify bad advice
- Firewalls: Pre-existing knowledge but limited user understanding
- Antivirus: Users may be familiar but harbor misconceptions
- Encryption: Partially known by some users but lacking concrete understanding
- Screen Lock: Built-in feature, relatively familiar to users
Based on the Llama 3.2 model, two LLMs were fine-tuned using LoRA technology:
- Benign LLM: Provided correct cybersecurity practice recommendations
- Adversarial LLM: Trained to provide inaccurate cybersecurity advice, trained on 6,655 prompt-response pairs
Contained three main components:
- Task Guidance Panel: Displayed current task description and completion button
- Chatbot Interface: Interactive design based on popular chatbot interfaces
- Windows Virtual Machine: Allowed participants to apply chatbot recommendations for actual security configuration
- Each participant completed all five tasks
- First three tasks used benign LLM, last two tasks used adversarial LLM
- Latin square design generated five task orderings to control for task knowledge effects on trust perception
- Post-task Questionnaire: Assessment of success, clarity, usefulness, and credibility
- VM Logging: Verified actual operations performed by participants
- Chat Logs: Analyzed complete interaction history between users and chatbot
- Sample Size: 15 participants
- Recruitment Criteria: Familiar with Microsoft Windows operating system, non-cybersecurity professionals
- Compensation: $45 per participant
- Exclusion Criteria: Cybersecurity professionals (to avoid expert-level knowledge effects)
- Scenario Setup: Participants were told to set up a new laptop for home office work
- Task Execution: Used chatbot to complete five security configuration tasks
- Questionnaire Survey: Completed trust-related questionnaires after each task
- Deception Disclosure: Informed of true purpose after experiment and provided correct security advice
- Trust Score: 1-5 scale (1-2 indicating distrust, 4-5 indicating trust, 3 combined with other data for judgment)
- Task Completion Status: Self-reported task completion status
- Behavioral Consistency: Consistency between chatbot recommendations and actual executed operations
- Following Bad Advice: 8 participants implemented all bad advice, 4 participants implemented partial bad advice
- Overall Execution: 16 out of 30 bad advice tasks were completed, including participants who believed they completed tasks but actually followed bad advice
| Task Type | Benign Chatbot Trust | Adversarial Chatbot Trust |
|---|
| Passwords | 9/9 (100%) | 2/5 (40%) |
| Firewalls | 6/8 (75%) | 3/6 (50%) |
| Antivirus | 8/8 (100%) | 4/7 (57%) |
| Encryption | 8/9 (89%) | 1/6 (17%) |
| Screen Lock | 3/8 (38%) | 1/6 (17%) |
- Encryption and Screen Lock: Bad advice least trusted, conflicting with participant intuition and knowledge
- Antivirus: Bad advice widely trusted, false reasoning aligned with user beliefs
- Passwords: Despite being familiar concept, participants showed divergent responses to bad advice
An important finding is that even when participants distrusted the chatbot, they still followed bad advice:
- P11 commented: "I wouldn't trust the chatbot to provide accurate computer security settings information for regular people," yet still followed firewall bad advice
- P5 expressed need for better reasoning but still created a short password based on names
Found that accuracy of UI navigation instructions significantly affected trust:
- Accurate navigation instructions increased trust, even when security advice was incorrect
- Navigation hallucinations significantly reduced trust, even when security advice was correct
- Mayer et al.'s Trust Model: Benevolence, ability, and integrity are factors in perceived trustworthiness
- Lee and See's Automation Trust Model: Considers personal, organizational, cultural, and environmental contexts
- Static Assessment Methods: Chen and Sundar examined AI training data, Yin et al. evaluated ML responses
- Interactive Methods: Feng and Boyd-Graber's question-answering partner competition study
- Innovation of This Research: First in-situ trust measurement in fully functional chatbot environment
- Users Struggle to Identify Compromised Chatbots: Particularly when information is unfamiliar and chatbot hallucinations are subtle
- Task Familiarity is a Key Factor: Users more easily identify bad advice for familiar concepts
- Dissociation Between Trust and Compliance: Users may follow advice even when distrusting the chatbot
- Instruction Quality Affects Trust: Accurate UI navigation instructions may mask incorrect security advice
Recommend visually separating recommendation information from step-by-step instructions using different colors or separate boxes, helping users distinguish between trust in instructions versus recommendations.
Recommend enterprise chatbots include source attribution by default, particularly internal security policy documents under company control, providing employees with "knowledge anchors" to verify information reliability.
- Observer Effect: Participants' awareness of being observed may influence behavior
- LLM Randomness: Even "benign" chatbots produced some inaccurate advice
- Sample Size: Sample of 15 participants is relatively small
- Expanded Research Scale: Larger sample sizes and more security concepts
- Long-term Trust Dynamics: Study trust changes during extended use
- Defense Mechanisms: Develop more effective user training and technical countermeasures
- Methodological Innovation: First to employ in-situ deception experiments to study AI chatbot trust, pioneering in methodology
- Ecological Validity: Used real Windows environment and fully functional chatbot, enhancing external validity of results
- Technical Rigor: Used LoRA fine-tuning to ensure robustness of adversarial behavior, going beyond simple prompt engineering
- Ethical Considerations: Strict IRB approval and deception disclosure procedures, reflecting responsible research practices
- Sample Constraints: Sample of 15 participants is relatively small, potentially limiting generalizability
- Task Scope: Covers only five security concepts, may not represent all cybersecurity scenarios
- Cultural Background: Participants primarily from North American academic environment, lacking cultural diversity
- Time Constraints: Time pressure in laboratory environment may not reflect real workplace scenarios
- Academic Contribution: Provides important empirical evidence for the intersection of HCI and cybersecurity
- Practical Value: Provides concrete security considerations for enterprise AI chatbot deployment
- Methodological Contribution: Establishes new experimental paradigm for studying AI trust
- Policy Implications: Provides user behavior insights for AI safety policy development
- Enterprise AI Deployment: Guides safe deployment of internal AI chatbots in enterprises
- User Training: Designs more effective AI literacy and cybersecurity training programs
- Product Design: Improves chatbot interface design to promote critical thinking
- Security Research: Provides foundation for further AI safety and human factors research
This research cites 19 relevant references covering important works in trust theory, AI security, human-computer interaction, and other fields, providing a solid theoretical foundation for the research.
Summary: This research reveals user vulnerability when facing compromised AI chatbots through innovative experimental design, making important contributions to AI safety and human-machine trust research. Despite limitations such as sample size, its methodology and findings have significant value for understanding and improving AI system security.