MineCollab Protocol

🤝 MineCollab Protocol#

MineCollab is a benchmark for assessing embodied and collaborative communication abilities of LLM agents across three task types.

Task Types#

🍳 Cooking#

Agents coordinate to make meals (e.g., cake + bread). Each agent collects ingredients via natural language communication and combines them in multi-step plans.

Environment: “Cooking world” with livestock, crops, smoker, furnace, crafting table
Hell’s Kitchen variant: Each agent knows only a subset of recipes — must communicate instructions
Evaluation: Binary (success/failure)
Randomization: Environment and objectives randomized every episode

🏗️ Construction#

Agents build structures from procedurally generated blueprints (pyramids, palaces, Eiffel Tower).

Setup: No single agent has full resources or expertise — stone base agent + wooden roof agent
Evaluation: Edit distance from true blueprint (decimal score)
Complexity control: Number of rooms, material types, collaboration depth

🔨 Crafting#

Agents craft Minecraft items (clothing, furniture, tools) through multi-step recipes.

Setup: Agents have different resources and partial recipe knowledge
Challenge: Longer crafting chains (e.g., compass requires iron → redstone → compass)
Evaluation: Binary (success/failure)

Task JSON Format#

{
  "gather_oak_logs": {
    "goal": "Collect at least four logs",
    "initial_inventory": { "0": { "wooden_axe": 1 } },
    "agent_count": 1,
    "target": "oak_log",
    "number_of_target": 4,
    "type": "techtree",
    "max_depth": 1,
    "depth": 0,
    "timeout": 300,
    "blocked_actions": { "0": [], "1": [] },
    "missing_items": [],
    "requires_ctable": false
  }
}

Running Tasks#

# Basic single-agent task
node main.js --task_path tasks/basic/single_agent.json --task_id gather_oak_logs

# Multi-agent crafting (2 agents)
python tasks/evaluation_script.py \
  --task_path tasks/crafting_tasks/test_tasks/2_agent.json \
  --model gpt-4o-mini \
  --template_profile profiles/tasks/crafting_profile.json

# Multi-agent cooking
python tasks/evaluation_script.py \
  --task_path tasks/cooking_tasks/test_tasks/2_agent.json \
  --model gpt-4o-mini \
  --template_profile profiles/tasks/cooking_profile.json

# Construction (requires insecure coding)
python tasks/evaluation_script.py \
  --task_path tasks/construction_tasks/test_tasks/2_agent.json \
  --model gpt-4o-mini \
  --template_profile profiles/tasks/construction_profile.json \
  --insecure_coding

Evaluation Results#

Results logged to experiments/exp_04-21_16-16/results.txt:

Crafting/Cooking: Score 0 or 1
Construction: Decimal (edit distance from blueprint)
Memory: memory.json per agent (last 15 messages + task score)

Parallel Worlds#

--num_parallel 4   # Run 4 Minecraft worlds in parallel

Each world gets its own tmux shell (server_i + shell i) and server_data_i copy.

Running Without tmux (Windows)#

python tasks/run_task_file.py --task_path=tasks/single_agent/crafting_train.json