E-Commerce Trust Simulation with LLM-Powered Agents
Agent-based simulation using MESA framework with 7,580 LLM-powered autonomous agents to quantify fake review manipulation impact on e-commerce conversion rates, demonstrating +54-72pp increase in targeted low-quality products through rigorous statistical validation (Chi-Square = 121-177, p less than 0.0001).
Role
Simulation Engineer & Data Scientist
Client
Academic Project - Simulation and Modeling Course
Team
3-person Team
Timeline
3 months • 2025

Skills & Tools
Skills Applied
Tools & Software
Challenges
Simulating realistic consumer behavior at scale while maintaining statistical rigor was complex. Generating 1,730+ authentic natural language reviews without cloud API costs, coordinating adaptive fake review campaigns (burst + maintenance strategy), and preventing LLM context overflow required sophisticated prompt engineering and dynamic context window management.
Solutions
Implemented MESA-based multi-agent system with 3 behavioral personas (Impulsive, Careful, Skeptical) using prompt-engineered decision logic. Deployed local Llama 3.1 8B via Ollama with dynamic context windows (2048-8192 tokens) and automatic retry mechanisms. Applied rigorous statistical validation using Chi-Square tests, ANOVA (F = 540.28), and Cramér's V to demonstrate manipulation impact.
Impact
Successfully quantified fake review manipulation effectiveness with publication-grade statistical evidence. Demonstrated that coordinated campaigns increase low-quality product conversion by +54-72pp (BudgetBeats 0% to 54%, ClearSound 0% to 72%), with Careful/Impulsive personas 2.3x more vulnerable than Skeptical persona. Findings provide empirical foundation for e-commerce fraud detection research.
Project Overview
This agent-based modeling (ABM) research project simulates an e-commerce marketplace with 7,580 autonomous agents to quantify how coordinated fake review campaigns manipulate consumer trust and purchasing behavior. Using the MESA framework and local Llama 3.1 8B inference via Ollama, the simulation generates realistic review behavior and purchasing decisions to provide statistical evidence of market manipulation patterns.
Developed as the final project for Simulation and Modeling Course at Hasanuddin University, this work combines agent-based modeling, large language models, and rigorous statistical analysis to answer critical research questions about e-commerce fraud.
Research Questions
RQ1: Fake Review Impact on Conversion Rates
How much does conversion rate increase for low-quality products targeted by fake review campaigns?
RQ2: Consumer Persona Vulnerability
Which consumer persona is most vulnerable to fake reviews? (Impulsive vs Careful vs Skeptical)
Key Findings
| Research Question | Key Result |
|---|---|
| RQ1: BudgetBeats (Low Quality) | 0% to 54% conversion (+54pp, Chi-Square = 121.30, p less than 0.0001) |
| RQ1: ClearSound (Low-Medium) | 0% to 72% conversion (+72pp, Chi-Square = 177.35, p less than 0.0001) |
| RQ2: Most Vulnerable | Careful: +95pp, Impulsive: +92.5pp |
| RQ2: Least Vulnerable | Skeptical: +40.44pp (2.3x less vulnerable) |
Simulation Architecture
MESA Framework Design
Multi-agent system built on MESA (Modular Entity Scheduling Architecture):
FakeReviewModel (MESA Model)
├── Products (5 headphone models)
│ ├── Quality Attributes (sound, build, battery, comfort)
│ └── Review Storage
├── Agent Scheduler (RandomActivation)
├── Data Collector (metrics tracking)
└── Phase Execution
├── Review Phase (genuine + fake review generation)
└── Shopping Phase (purchase decisions)
Agent Population Breakdown (7,580 Total)
Reviewer Agents (1,730 total)
- Genuine Reviewers (1,200): 20 iterations x 5 products x 12 reviews
- 4 Critical personality (harsh ratings)
- 4 Balanced personality (objective ratings)
- 4 Lenient personality (generous ratings)
- Fake Reviewers (530): Coordinated campaign
- Burst phase: 80 reviews (iterations 4-5)
- Maintenance: 450 reviews (iterations 6-20)
Shopper Agents (6,000 total)
- 20 iterations x 5 products x 3 personas x 20 shoppers per group
- Impulsive (33%): Reads 3 reviews, fast decisions
- Careful (33%): Reads 10 reviews, balanced analysis
- Skeptical (33%): Reads 15 reviews, pattern detection
Product Configuration
Five headphone products with realistic quality attributes:
| ID | Product | Quality Tier | Price (IDR) | Avg Quality | Targeted? |
|---|---|---|---|---|---|
| 1 | SoundMax Pro | High | 450,000 | 8.5/10 | No |
| 2 | AudioBlast Wireless | Med-High | 350,000 | 7.5/10 | No |
| 3 | BudgetBeats | Low | 150,000 | 4.25/10 | YES |
| 4 | TechWave Elite | Premium | 650,000 | 9.25/10 | No |
| 5 | ClearSound Basic | Low-Med | 250,000 | 5.25/10 | YES |
LLM Integration Architecture
Local Inference System
Deployment Configuration:
- Model: Llama 3.1 8B
- Platform: Ollama (local inference)
- Context Windows: 2048-8192 tokens (dynamic)
- Temperature: 0.6 (reviewers), 0.3 (shoppers), 0.7 (fake reviewers)
- Cost: $0 (100% API cost elimination)
Prompt Engineering Strategy
Genuine Reviewer Prompts: Quality-aware rating system with personality-based guidance
Example for Critical personality + Low quality product:
Product: BudgetBeats (Rp 150,000)
Personality: Critical
ACTUAL QUALITY SCORES:
- Sound: 4.0/10
- Build: 3.5/10
- Battery: 5.5/10
- Comfort: 4.5/10
Average: 4.25/10
YOUR RATING MUST BE: 1-2 stars (poor quality)
CRITICAL RULES:
1. Rating MUST reflect actual quality
2. DO NOT give 4-5 stars to quality < 7.0
3. Write 2-3 sentences mentioning specific features
4. Use natural language, no templates
Fake Reviewer Prompts: Variation strategy to avoid detection patterns
CRITICAL RULES:
1. ALWAYS give 5 stars
2. Vary opening - DO NOT always start with "I've been using"
3. Sound natural and human-like
4. Include ONE specific detail
5. Keep 1-4 sentences
OPENING VARIATIONS:
- Direct opinion: 'Really happy with this...'
- Time context: 'After two weeks...'
- Comparison: 'Better than expected...'
- Situation: 'Bought for commute...'
Shopper Decision Prompts: Persona-specific logic with measurable metrics
- Impulsive: Trusts rating >= 3.8 leads to immediate BUY (HIGH vulnerability)
- Careful: Analyzes 10 reviews, balances positive/negative (MEDIUM vulnerability)
- Skeptical: Detects burst patterns, rating jumps, 5-star surges (LOW vulnerability)
Experimental Design
Attack Timeline
Iteration: 1 2 3 4 5 6 7 8 ... 20
|-------| |---| |----------------|
BASELINE BURST MAINTENANCE
Baseline (1-3): Only 180 genuine reviews (3 x 60)
Burst (4-5): 40 fake/target/iter = 160 total fake
Maintenance (6+): Adaptive volume based on rating:
- AGGRESSIVE (15): if rating < 4.0
- MODERATE (11): if rating < 4.3
- NORMAL (7): if rating >= 4.3
Data Collection Pipeline
Per-Iteration Metrics:
- Reviews CSV: product_id, rating, text, is_fake, iteration, personality
- Transactions CSV: product_id, persona, decision, reasoning, iteration
- Model Metrics CSV: ratings, review counts, fake counts per product
Statistical Analysis Methodology
RQ1: Chi-Square Test for Fake Review Impact
Hypothesis:
- H0: Fake reviews have no effect on conversion rates
- H1: Fake reviews significantly increase conversion rates
Results for Targeted Products:
| Product | Baseline Conv | Attack Conv | Increase | Chi-Square | p-value | Cramér's V |
|---|---|---|---|---|---|---|
| BudgetBeats | 0.00% | 54.17% | +54.17pp | 121.30 | less than 0.0001 | 0.636 |
| ClearSound | 0.00% | 71.67% | +71.67pp | 177.35 | less than 0.0001 | 0.769 |
Interpretation:
- p < 0.0001: Extremely strong statistical significance (reject H0)
- Cramér's V > 0.6: Large effect size (strong association)
- Practical significance: 54-72pp increase is massive in real-world context
RQ2: ANOVA for Persona Vulnerability
Hypothesis:
- H0: All personas equally vulnerable to manipulation
- H1: Significant differences exist between personas
Vulnerability Ranking (during attack on targeted products):
| Rank | Persona | Baseline | Attack Period | Impact |
|---|---|---|---|---|
| 1 | Careful | 0.0% | 95.0% | +95.00pp |
| 2 | Impulsive | 0.0% | 92.5% | +92.50pp |
| 3 | Skeptical | 0.0% | 40.4% | +40.44pp |
ANOVA Test Results:
- F-statistic = 540.28
- p-value < 0.0001 (***)
- Sample sizes: 680 per persona
Key Finding: Skeptical persona 2.3x less vulnerable (40pp vs 92-95pp for others)
Experimental Results
RQ1 Detailed Results
BudgetBeats (Low Quality, ID = 3)
| Phase | Iterations | Avg Rating | Conversion | Change |
|---|---|---|---|---|
| Baseline | 1-3 | 2.1 stars | 0.00% | - |
| Burst | 4-5 | 3.8 stars | 54.17% | +54.17pp |
| Post-Burst | 6-20 | 4.2 stars | 79.22% | +79.22pp |
ClearSound Basic (Low-Medium, ID = 5)
| Phase | Iterations | Avg Rating | Conversion | Change |
|---|---|---|---|---|
| Baseline | 1-3 | 2.3 stars | 0.00% | - |
| Burst | 4-5 | 4.1 stars | 71.67% | +71.67pp |
| Post-Burst | 6-20 | 4.4 stars | 76.22% | +76.22pp |
Control Products (Non-Targeted):
| Product | Quality | Baseline Conv | Final Conv | Change |
|---|---|---|---|---|
| SoundMax Pro | High | 75% | 78% | +3pp (natural) |
| AudioBlast | Med-High | 58% | 62% | +4pp (natural) |
| TechWave Elite | Premium | 82% | 85% | +3pp (natural) |
Key Insight: Only targeted products show massive jumps; control products show natural small fluctuations.
RQ2 Detailed Results
Persona Vulnerability on Targeted Products:
| Persona | Baseline | Attack | Impact | Rank |
|---|---|---|---|---|
| Careful | 0.0% | 95.0% | +95.0pp | #1 |
| Impulsive | 0.0% | 92.5% | +92.5pp | #2 |
| Skeptical | 0.0% | 40.4% | +40.4pp | #3 |
Key Findings:
-
Careful Persona Most Vulnerable: Paradoxically, deeper analysis (10 reviews) increases susceptibility when fake reviews dominate the sample.
-
Skeptical Persona 2.3x More Resistant: Pattern detection and burst ratio analysis provide some protection.
-
Impulsive Nearly As Vulnerable: Despite reading only 3 reviews, high trust in star ratings makes them susceptible.
Key Insights and Contributions
Scientific Contributions
Quantitative Evidence of Manipulation Effectiveness
First simulation study to demonstrate +54-72pp conversion increase with publication-grade statistical rigor (p < 0.0001).
Persona-Specific Vulnerability Profiles
Novel finding: Careful persona MOST vulnerable (95pp impact) despite deeper review analysis. Challenges assumption that more information = better decisions when information is manipulated.
LLM-Powered Agent-Based Modeling Methodology
Demonstrates feasibility of local LLM inference (Llama 3.1 8B) for realistic natural language generation in ABM research with zero API costs.
Temporal Dynamics of Trust Manipulation
Quantifies burst + maintenance attack strategy effectiveness, showing sustained elevation of ratings and conversion even as genuine reviews accumulate.
Detection-Resistant Campaign Design
Adaptive maintenance strategy (AGGRESSIVE/MODERATE/NORMAL) maintains ratings without obvious spikes that detection algorithms might flag.
Practical Implications
For E-Commerce Platforms:
- Burst Detection: Monitor for sudden rating jumps > 1.0 stars
- Temporal Analysis: Flag products with 60%+ positive reviews in short timespan
- Reviewer Patterns: Detect coordinated timing of 5-star reviews
- Consumer Education: Warn Careful/Impulsive users they're most vulnerable
For Consumers:
- Skeptical Mindset: 2.3x more protective than trusting approach
- Pattern Recognition: Look for rating jumps and review bursts
- Quality Signals: Focus on specific product details vs generic praise
- Baseline Comparison: Check pre-campaign ratings if available
For Researchers:
- ABM + LLM Synergy: Realistic behavior simulation without manual scripting
- Cost-Effective Research: Local inference enables large-scale experiments
- Reproducible Methodology: Open-source framework for replication
- Statistical Rigor: Publication-ready analysis pipeline
Technical Implementation Highlights
Performance Metrics
LLM Inference:
- Average: 1.8 seconds per review generation
- Throughput: approximately 1,730 reviews in approximately 52 minutes
- Memory: approximately 8 GB (Ollama server)
Simulation Runtime:
- 20 iterations: approximately 4-6 hours on consumer hardware
- Bottleneck: LLM inference
Reproducibility:
- Fixed random seed: 42
- Version control: ollama==0.1.0, mesa less than 3.0
- CSV exports for independent analysis
Lessons Learned
Key Insights
LLM Integration:
- Local inference viable for research (zero cost)
- Dynamic context windows prevent overflow
- Quality-speed tradeoff acceptable (1.8s per review)
Prompt Engineering:
- Quality-aware prompts generate realistic genuine reviews
- Variation templates prevent fake review uniformity
- Persona-specific rules create emergent behavior
Statistical Rigor:
- Multiple tests (Chi-Square, ANOVA) strengthen claims
- Effect sizes (Cramér's V) show practical significance
- Clear baseline/attack comparison isolates causality
Consumer Behavior:
- Surprising: Careful persona MOST vulnerable (not least)
- Intuitive: Skeptical pattern detection provides 2.3x protection
- Actionable: Reading 15 reviews (Skeptical) much better than 10 reviews (Careful)
Future Work
Model Extensions
Network Effects:
- Social influence between consumers
- Viral review sharing
- Influencer-driven campaigns
Platform Interventions:
- Detection algorithm simulation
- Reviewer reputation scores
- Verified purchase badges
Statistical Rigor:
- Monte Carlo simulation (100+ runs)
- Sensitivity analysis on all parameters
- Bayesian inference for uncertainty quantification
Conclusion
This agent-based modeling study provides empirical evidence that coordinated fake review campaigns are highly effective at manipulating consumer trust, increasing conversion rates by +54-72 percentage points for low-quality products with p < 0.0001 statistical significance.
Key Achievements:
- Publication-grade statistical evidence (Chi-Square = 121-177, p less than 0.0001)
- Novel vulnerability findings (Careful persona MOST vulnerable, +95pp impact)
- Technical innovation (Local Llama 3.1 8B, 100% cost elimination)
- Methodological contribution (LLM-powered ABM for realistic simulation)
Impact: Demonstrates that fake reviews work (72%+ conversion for low-quality products), consumers are vulnerable (even "Careful" shoppers), detection is possible (burst patterns provide signals), and skepticism helps (2.3x more resistant).
Future work should focus on real-world validation with platform data, detection algorithm evaluation, and long-term temporal dynamics.
Project Repository: GitHub - fake-review-abm-llm
Authors:
- Ikrar Gempur Tirani (D121231015) - Lead Developer
- Muhammad Irgi Abayl Marzuku (D121231102) - Statistical Analysis
- Ahnaf Fauzan Zaki (D121231033) - Data Visualization
Course: Simulation and Modeling, Hasanuddin University, 2025
Project Metrics
7,580 autonomous agents (1,200 genuine + 530 fake reviewers + 6,000 shoppers)
1,730 LLM-generated reviews (1,200 genuine + 530 fake)
+54-72pp conversion rate increase in targeted products
Chi-Square = 121.30-177.35, p less than 0.0001 statistical significance
100% API cost elimination via local Llama 3.1 8B deployment
Credits & Acknowledgments
MESA Framework for agent-based modeling
Ollama for local LLM inference
Llama 3.1 8B by Meta AI
SciPy for statistical testing
Project Tags
Related Projects
View all projects →
Medical Anamnesis Chatbot with NLP (Chatbot PUSTU)
Production-ready medical chatbot achieving 92.61% intent classification accuracy using Multinomial Naive Bayes for Indonesian Puskesmas healthcare anamnesis workflow. Automated training data generation via Gemini Flash 2.0 API with custom NLP preprocessing pipeline built from scratch.

Aspect-Based Sentiment Analysis on Financial News
Fine-tuned RoBERTa-base model for aspect-based sentiment analysis on 10,686 financial news headlines achieving 86.67% accuracy on entity-level sentiment classification with comprehensive handling of severe class imbalance through weighted loss and regularization techniques.

LockIn - Password Manager
A secure, zero-knowledge password manager web application with client-side encryption, designed to protect user privacy while providing seamless password management across any browser.
