Lompat ke konten utama
Back to Projects
Simulation & ModelingProject20253 months

E-Commerce Trust Simulation with LLM-Powered Agents

Agent-based simulation using MESA framework with 7,580 LLM-powered autonomous agents to quantify fake review manipulation impact on e-commerce conversion rates, demonstrating +54-72pp increase in targeted low-quality products through rigorous statistical validation (Chi-Square = 121-177, p less than 0.0001).

Role

Simulation Engineer & Data Scientist

Client

Academic Project - Simulation and Modeling Course

Team

3-person Team

Timeline

3 months • 2025

E-Commerce Trust Simulation with LLM-Powered Agents — project cover

Skills & Tools

Skills Applied

Agent-Based ModelingLarge Language ModelsStatistical AnalysisPrompt EngineeringData VisualizationScientific Computing

Tools & Software

PythonMESA FrameworkLlama 3.1 8BOllamaPandasNumPySciPyMatplotlibSeabornGit

Challenges

Simulating realistic consumer behavior at scale while maintaining statistical rigor was complex. Generating 1,730+ authentic natural language reviews without cloud API costs, coordinating adaptive fake review campaigns (burst + maintenance strategy), and preventing LLM context overflow required sophisticated prompt engineering and dynamic context window management.

Solutions

Implemented MESA-based multi-agent system with 3 behavioral personas (Impulsive, Careful, Skeptical) using prompt-engineered decision logic. Deployed local Llama 3.1 8B via Ollama with dynamic context windows (2048-8192 tokens) and automatic retry mechanisms. Applied rigorous statistical validation using Chi-Square tests, ANOVA (F = 540.28), and Cramér's V to demonstrate manipulation impact.

Impact

Successfully quantified fake review manipulation effectiveness with publication-grade statistical evidence. Demonstrated that coordinated campaigns increase low-quality product conversion by +54-72pp (BudgetBeats 0% to 54%, ClearSound 0% to 72%), with Careful/Impulsive personas 2.3x more vulnerable than Skeptical persona. Findings provide empirical foundation for e-commerce fraud detection research.

Project Overview

This agent-based modeling (ABM) research project simulates an e-commerce marketplace with 7,580 autonomous agents to quantify how coordinated fake review campaigns manipulate consumer trust and purchasing behavior. Using the MESA framework and local Llama 3.1 8B inference via Ollama, the simulation generates realistic review behavior and purchasing decisions to provide statistical evidence of market manipulation patterns.

Developed as the final project for Simulation and Modeling Course at Hasanuddin University, this work combines agent-based modeling, large language models, and rigorous statistical analysis to answer critical research questions about e-commerce fraud.

Research Questions

RQ1: Fake Review Impact on Conversion Rates

How much does conversion rate increase for low-quality products targeted by fake review campaigns?

RQ2: Consumer Persona Vulnerability

Which consumer persona is most vulnerable to fake reviews? (Impulsive vs Careful vs Skeptical)

Key Findings

Research QuestionKey Result
RQ1: BudgetBeats (Low Quality)0% to 54% conversion (+54pp, Chi-Square = 121.30, p less than 0.0001)
RQ1: ClearSound (Low-Medium)0% to 72% conversion (+72pp, Chi-Square = 177.35, p less than 0.0001)
RQ2: Most VulnerableCareful: +95pp, Impulsive: +92.5pp
RQ2: Least VulnerableSkeptical: +40.44pp (2.3x less vulnerable)

Simulation Architecture

MESA Framework Design

Multi-agent system built on MESA (Modular Entity Scheduling Architecture):

FakeReviewModel (MESA Model)
├── Products (5 headphone models)
│   ├── Quality Attributes (sound, build, battery, comfort)
│   └── Review Storage
├── Agent Scheduler (RandomActivation)
├── Data Collector (metrics tracking)
└── Phase Execution
    ├── Review Phase (genuine + fake review generation)
    └── Shopping Phase (purchase decisions)

Agent Population Breakdown (7,580 Total)

Reviewer Agents (1,730 total)

  • Genuine Reviewers (1,200): 20 iterations x 5 products x 12 reviews
    • 4 Critical personality (harsh ratings)
    • 4 Balanced personality (objective ratings)
    • 4 Lenient personality (generous ratings)
  • Fake Reviewers (530): Coordinated campaign
    • Burst phase: 80 reviews (iterations 4-5)
    • Maintenance: 450 reviews (iterations 6-20)

Shopper Agents (6,000 total)

  • 20 iterations x 5 products x 3 personas x 20 shoppers per group
  • Impulsive (33%): Reads 3 reviews, fast decisions
  • Careful (33%): Reads 10 reviews, balanced analysis
  • Skeptical (33%): Reads 15 reviews, pattern detection

Product Configuration

Five headphone products with realistic quality attributes:

IDProductQuality TierPrice (IDR)Avg QualityTargeted?
1SoundMax ProHigh450,0008.5/10No
2AudioBlast WirelessMed-High350,0007.5/10No
3BudgetBeatsLow150,0004.25/10YES
4TechWave ElitePremium650,0009.25/10No
5ClearSound BasicLow-Med250,0005.25/10YES

LLM Integration Architecture

Local Inference System

Deployment Configuration:

  • Model: Llama 3.1 8B
  • Platform: Ollama (local inference)
  • Context Windows: 2048-8192 tokens (dynamic)
  • Temperature: 0.6 (reviewers), 0.3 (shoppers), 0.7 (fake reviewers)
  • Cost: $0 (100% API cost elimination)

Prompt Engineering Strategy

Genuine Reviewer Prompts: Quality-aware rating system with personality-based guidance

Example for Critical personality + Low quality product:

Product: BudgetBeats (Rp 150,000)
Personality: Critical

ACTUAL QUALITY SCORES:
- Sound: 4.0/10
- Build: 3.5/10
- Battery: 5.5/10
- Comfort: 4.5/10
Average: 4.25/10

YOUR RATING MUST BE: 1-2 stars (poor quality)

CRITICAL RULES:
1. Rating MUST reflect actual quality
2. DO NOT give 4-5 stars to quality < 7.0
3. Write 2-3 sentences mentioning specific features
4. Use natural language, no templates

Fake Reviewer Prompts: Variation strategy to avoid detection patterns

CRITICAL RULES:
1. ALWAYS give 5 stars
2. Vary opening - DO NOT always start with "I've been using"
3. Sound natural and human-like
4. Include ONE specific detail
5. Keep 1-4 sentences

OPENING VARIATIONS:
- Direct opinion: 'Really happy with this...'
- Time context: 'After two weeks...'
- Comparison: 'Better than expected...'
- Situation: 'Bought for commute...'

Shopper Decision Prompts: Persona-specific logic with measurable metrics

  • Impulsive: Trusts rating >= 3.8 leads to immediate BUY (HIGH vulnerability)
  • Careful: Analyzes 10 reviews, balances positive/negative (MEDIUM vulnerability)
  • Skeptical: Detects burst patterns, rating jumps, 5-star surges (LOW vulnerability)

Experimental Design

Attack Timeline

Iteration:  1   2   3   4   5   6   7   8  ...  20
            |-------|   |---|   |----------------|
            BASELINE    BURST     MAINTENANCE

Baseline (1-3):   Only 180 genuine reviews (3 x 60)
Burst (4-5):      40 fake/target/iter = 160 total fake
Maintenance (6+): Adaptive volume based on rating:
                  - AGGRESSIVE (15): if rating < 4.0
                  - MODERATE (11):   if rating < 4.3
                  - NORMAL (7):      if rating >= 4.3

Data Collection Pipeline

Per-Iteration Metrics:

  • Reviews CSV: product_id, rating, text, is_fake, iteration, personality
  • Transactions CSV: product_id, persona, decision, reasoning, iteration
  • Model Metrics CSV: ratings, review counts, fake counts per product

Statistical Analysis Methodology

RQ1: Chi-Square Test for Fake Review Impact

Hypothesis:

  • H0: Fake reviews have no effect on conversion rates
  • H1: Fake reviews significantly increase conversion rates

Results for Targeted Products:

ProductBaseline ConvAttack ConvIncreaseChi-Squarep-valueCramér's V
BudgetBeats0.00%54.17%+54.17pp121.30less than 0.00010.636
ClearSound0.00%71.67%+71.67pp177.35less than 0.00010.769

Interpretation:

  • p < 0.0001: Extremely strong statistical significance (reject H0)
  • Cramér's V > 0.6: Large effect size (strong association)
  • Practical significance: 54-72pp increase is massive in real-world context

RQ2: ANOVA for Persona Vulnerability

Hypothesis:

  • H0: All personas equally vulnerable to manipulation
  • H1: Significant differences exist between personas

Vulnerability Ranking (during attack on targeted products):

RankPersonaBaselineAttack PeriodImpact
1Careful0.0%95.0%+95.00pp
2Impulsive0.0%92.5%+92.50pp
3Skeptical0.0%40.4%+40.44pp

ANOVA Test Results:

  • F-statistic = 540.28
  • p-value < 0.0001 (***)
  • Sample sizes: 680 per persona

Key Finding: Skeptical persona 2.3x less vulnerable (40pp vs 92-95pp for others)

Experimental Results

RQ1 Detailed Results

BudgetBeats (Low Quality, ID = 3)

PhaseIterationsAvg RatingConversionChange
Baseline1-32.1 stars0.00%-
Burst4-53.8 stars54.17%+54.17pp
Post-Burst6-204.2 stars79.22%+79.22pp

ClearSound Basic (Low-Medium, ID = 5)

PhaseIterationsAvg RatingConversionChange
Baseline1-32.3 stars0.00%-
Burst4-54.1 stars71.67%+71.67pp
Post-Burst6-204.4 stars76.22%+76.22pp

Control Products (Non-Targeted):

ProductQualityBaseline ConvFinal ConvChange
SoundMax ProHigh75%78%+3pp (natural)
AudioBlastMed-High58%62%+4pp (natural)
TechWave ElitePremium82%85%+3pp (natural)

Key Insight: Only targeted products show massive jumps; control products show natural small fluctuations.

RQ2 Detailed Results

Persona Vulnerability on Targeted Products:

PersonaBaselineAttackImpactRank
Careful0.0%95.0%+95.0pp#1
Impulsive0.0%92.5%+92.5pp#2
Skeptical0.0%40.4%+40.4pp#3

Key Findings:

  1. Careful Persona Most Vulnerable: Paradoxically, deeper analysis (10 reviews) increases susceptibility when fake reviews dominate the sample.

  2. Skeptical Persona 2.3x More Resistant: Pattern detection and burst ratio analysis provide some protection.

  3. Impulsive Nearly As Vulnerable: Despite reading only 3 reviews, high trust in star ratings makes them susceptible.

Key Insights and Contributions

Scientific Contributions

Quantitative Evidence of Manipulation Effectiveness

First simulation study to demonstrate +54-72pp conversion increase with publication-grade statistical rigor (p < 0.0001).

Persona-Specific Vulnerability Profiles

Novel finding: Careful persona MOST vulnerable (95pp impact) despite deeper review analysis. Challenges assumption that more information = better decisions when information is manipulated.

LLM-Powered Agent-Based Modeling Methodology

Demonstrates feasibility of local LLM inference (Llama 3.1 8B) for realistic natural language generation in ABM research with zero API costs.

Temporal Dynamics of Trust Manipulation

Quantifies burst + maintenance attack strategy effectiveness, showing sustained elevation of ratings and conversion even as genuine reviews accumulate.

Detection-Resistant Campaign Design

Adaptive maintenance strategy (AGGRESSIVE/MODERATE/NORMAL) maintains ratings without obvious spikes that detection algorithms might flag.

Practical Implications

For E-Commerce Platforms:

  • Burst Detection: Monitor for sudden rating jumps > 1.0 stars
  • Temporal Analysis: Flag products with 60%+ positive reviews in short timespan
  • Reviewer Patterns: Detect coordinated timing of 5-star reviews
  • Consumer Education: Warn Careful/Impulsive users they're most vulnerable

For Consumers:

  • Skeptical Mindset: 2.3x more protective than trusting approach
  • Pattern Recognition: Look for rating jumps and review bursts
  • Quality Signals: Focus on specific product details vs generic praise
  • Baseline Comparison: Check pre-campaign ratings if available

For Researchers:

  • ABM + LLM Synergy: Realistic behavior simulation without manual scripting
  • Cost-Effective Research: Local inference enables large-scale experiments
  • Reproducible Methodology: Open-source framework for replication
  • Statistical Rigor: Publication-ready analysis pipeline

Technical Implementation Highlights

Performance Metrics

LLM Inference:

  • Average: 1.8 seconds per review generation
  • Throughput: approximately 1,730 reviews in approximately 52 minutes
  • Memory: approximately 8 GB (Ollama server)

Simulation Runtime:

  • 20 iterations: approximately 4-6 hours on consumer hardware
  • Bottleneck: LLM inference

Reproducibility:

  • Fixed random seed: 42
  • Version control: ollama==0.1.0, mesa less than 3.0
  • CSV exports for independent analysis

Lessons Learned

Key Insights

LLM Integration:

  • Local inference viable for research (zero cost)
  • Dynamic context windows prevent overflow
  • Quality-speed tradeoff acceptable (1.8s per review)

Prompt Engineering:

  • Quality-aware prompts generate realistic genuine reviews
  • Variation templates prevent fake review uniformity
  • Persona-specific rules create emergent behavior

Statistical Rigor:

  • Multiple tests (Chi-Square, ANOVA) strengthen claims
  • Effect sizes (Cramér's V) show practical significance
  • Clear baseline/attack comparison isolates causality

Consumer Behavior:

  • Surprising: Careful persona MOST vulnerable (not least)
  • Intuitive: Skeptical pattern detection provides 2.3x protection
  • Actionable: Reading 15 reviews (Skeptical) much better than 10 reviews (Careful)

Future Work

Model Extensions

Network Effects:

  • Social influence between consumers
  • Viral review sharing
  • Influencer-driven campaigns

Platform Interventions:

  • Detection algorithm simulation
  • Reviewer reputation scores
  • Verified purchase badges

Statistical Rigor:

  • Monte Carlo simulation (100+ runs)
  • Sensitivity analysis on all parameters
  • Bayesian inference for uncertainty quantification

Conclusion

This agent-based modeling study provides empirical evidence that coordinated fake review campaigns are highly effective at manipulating consumer trust, increasing conversion rates by +54-72 percentage points for low-quality products with p < 0.0001 statistical significance.

Key Achievements:

  • Publication-grade statistical evidence (Chi-Square = 121-177, p less than 0.0001)
  • Novel vulnerability findings (Careful persona MOST vulnerable, +95pp impact)
  • Technical innovation (Local Llama 3.1 8B, 100% cost elimination)
  • Methodological contribution (LLM-powered ABM for realistic simulation)

Impact: Demonstrates that fake reviews work (72%+ conversion for low-quality products), consumers are vulnerable (even "Careful" shoppers), detection is possible (burst patterns provide signals), and skepticism helps (2.3x more resistant).

Future work should focus on real-world validation with platform data, detection algorithm evaluation, and long-term temporal dynamics.


Project Repository: GitHub - fake-review-abm-llm

Authors:

  • Ikrar Gempur Tirani (D121231015) - Lead Developer
  • Muhammad Irgi Abayl Marzuku (D121231102) - Statistical Analysis
  • Ahnaf Fauzan Zaki (D121231033) - Data Visualization

Course: Simulation and Modeling, Hasanuddin University, 2025

Project Metrics

7,580 autonomous agents (1,200 genuine + 530 fake reviewers + 6,000 shoppers)

1,730 LLM-generated reviews (1,200 genuine + 530 fake)

+54-72pp conversion rate increase in targeted products

Chi-Square = 121.30-177.35, p less than 0.0001 statistical significance

100% API cost elimination via local Llama 3.1 8B deployment

Credits & Acknowledgments

MESA Framework for agent-based modeling

Ollama for local LLM inference

Llama 3.1 8B by Meta AI

SciPy for statistical testing

Project Tags

Related Projects

View all projects →