From Simulation to Interaction: Teaching LLM to Lead Conversations via Multi-Agent Distillation

Double-blind review

Overview of the data synthesis and model training pipeline. The top panel illustrates our four-agent collaborative architecture for generating high-quality dialogues, beginning with automated persona/scenario extraction and incorporating a generation-evaluation loop. The bottom panel depicts the three-stage progressive training paradigm used to distill the capabilities from the synthetic data into the final agent, progressing from knowledge injection to dialogue alignment and preference refinement.

Large Language Models (LLMs) like the GPT and LLaMA series have revolutionized natural language processing, excelling as generalpurpose AI assistants. However, their dominant interaction paradigm is passive and reactive; they faithfully execute user commands, but the user always leads the conversation. This passivity fundamentally limits their application in scenarios requiring proactive guidance toward complex goals.

Abstract

Proactive conversational agents, which aim to guide conversations to achieve complex objectives, are a key frontier of autonomous agent research. However, although existing large language models are efficient passive responders, they have a fundamental capability deficiency in autonomously planning and executing longhorizon, goal-oriented conversational strategies. The severe scarcity of high-quality real-world training data further exacerbates this limitation.To address this challenge, we propose FSTI (From Simulation to Interaction), a brand-new paradigm for building proactive conversational agents. This framework first utilizes a four-agent simulation architecture that decouples role-playing, process advancement, and quality assessment, to synthesize large-scale, highfidelity data for complex financial mediation proactive dialogues in realistic adversarial scenarios. Subsequently, FSTI adopts a threestage progressive training scheme to efficiently distill the emergent procedural knowledge and strategic behaviors from the simulation data into a compact final model.Experimental results show that the 8B-parameter model trained via FSTI exhibits deep internalization of the complex mediation process. On the highly challenging proactive financial mediation dialogue task, its performance not only significantly surpasses top-tier closed-source models such as GPT-4o, but it is also fully suitable for real-world deployment with strict requirements for latency. FSTI provides a path that is both effective and scalable for building truly autonomous and efficient conversational agents.

Comparison of Datasets for Proactive Dialogue Systems.

performance of different models on the proactive dialogue mediation task, with and without the procedure included in the system prompt.

Scores are evaluated by an LLM-as-a-judge based on our weighted rubric. The highest score in each column across both conditions is highlighted. Our model’s scores are marked in blue, while competitors’ bests are in gray.

Note: Proc. Adh. stands for Process Adherence, Impartial. for Impartiality and Neutrality, and Emo. Adpt. for Emotional Adaptability. The numbers in parentheses indicate the maximum score for each metric.

Methodology

To support high-quality mediation dialogue data synthesis and efficient target model training, this paper presents a methodology: from dialogue corpora, it uses an LLM to hierarchically extract fine-grained Trait Atoms (into 5 core Personality Archetypes) to build/augment Disputant Personas, discover fine-grained adversarial behaviors (clustered into 7 scenario archetypes’ Adversarial Scenarios) for a Persona-Scenario Knowledge Base, designs a four-agent (Disputant Simulator, Director, Mediator-Executor, Evaluator) architecture for dialogues, and a three-stage (Domain Knowledge Injection, Proactive Dialogue Alignment, Preference Refinement) training to distill capabilities into the final agent.

Figure 1: A hierarchical taxonomy of Disputant Trait Atoms, automatically mined from dialogue corpora. The framework is structured around five core personality archetypes (inner ring). Each archetype is further defined by a set of finegrained behavioral manifestations (outer ring), which represent the Trait Atoms used for persona generation.

Figure 2: A hierarchical taxonomy of Adversarial Scenarios mined from real-world dialogue corpora. The framework is organized around seven high-level scenario archetypes (inner ring). Each archetype is defined by a collection of finegrained adversarial behaviors (outer ring), which form the basis for simulating realistic challenges.

Problem Formulation

The passive paradigm of LLMs restricts complex tasks needing proactive guidance; our work unlocks their potential as autonomous agents for goal-oriented dialogues (using financial debt mediation as a scenario), focusing on training a Proactive Mediation Strategy to achieve the shift from passive response to proactive guidance.

Persona and Scenario Construction

We use an LLM to hierarchically extract fine-grained Trait Atoms from dialogues, categorize them into 5 Personality Archetypes to build and augment Disputant Personas, unsupervised discover adversarial behaviors then cluster them into 7 scenario archetypes’ scenarios, form a Persona-Scenario Knowledge Base, and design a four-agent collaborative architecture plus a three-stage training paradigm (Figs.1-3).

Four-Agent Collaborative Dialogue Synthesis

Our framework, rooted in "Separation of Concerns," splits dialogue generation into four collaborative agents—Disputant Simulator (persona-driven, adversarial), Director (flow planning), Mediator-Executor (response execution, RAG), Evaluator (multi-dim evaluation, feedback)—and uses a three-stage training (knowledge injection, dialogue alignment, preference refinement) to distill synthetic data capabilities into models.

ShowCase

Our model performs exceptionally well in Process Adherence and Efficiency during complex mediation. Unlike other models that act reactively and deviate from necessary conversational flow, it proactively executes the mediation strategy learned from the three-stage training paradigm, delivering better outcomes. In Condition 2 (with procedure in the prompt), it still maintains a lead with outstanding Process Adherence and achieves the highest outcome, as it effectively applies the procedure to resolve disputes. In direct comparisons, it consistently outperforms other models, with core strengths in maintaining structural integrity, factual accuracy, and procedural correctness throughout dialogues—reflected in near-perfect Impartiality and Clarity.