MineCollab Protocol
🤝 MineCollab Protocol#
MineCollab is a benchmark for assessing embodied and collaborative communication abilities of LLM agents across three task types.
Task Types#
🍳 Cooking#
Agents coordinate to make meals (e.g., cake + bread). Each agent collects ingredients via natural language communication and combines them in multi-step plans.
- Environment: “Cooking world” with livestock, crops, smoker, furnace, crafting table
- Hell’s Kitchen variant: Each agent knows only a subset of recipes — must communicate instructions
- Evaluation: Binary (success/failure)
- Randomization: Environment and objectives randomized every episode
🏗️ Construction#
Agents build structures from procedurally generated blueprints (pyramids, palaces, Eiffel Tower).
- Setup: No single agent has full resources or expertise — stone base agent + wooden roof agent
- Evaluation: Edit distance from true blueprint (decimal score)
- Complexity control: Number of rooms, material types, collaboration depth
🔨 Crafting#
Agents craft Minecraft items (clothing, furniture, tools) through multi-step recipes.
- Setup: Agents have different resources and partial recipe knowledge
- Challenge: Longer crafting chains (e.g., compass requires iron → redstone → compass)
- Evaluation: Binary (success/failure)
Task JSON Format#
{
"gather_oak_logs": {
"goal": "Collect at least four logs",
"initial_inventory": { "0": { "wooden_axe": 1 } },
"agent_count": 1,
"target": "oak_log",
"number_of_target": 4,
"type": "techtree",
"max_depth": 1,
"depth": 0,
"timeout": 300,
"blocked_actions": { "0": [], "1": [] },
"missing_items": [],
"requires_ctable": false
}
}
Running Tasks#
# Basic single-agent task
node main.js --task_path tasks/basic/single_agent.json --task_id gather_oak_logs
# Multi-agent crafting (2 agents)
python tasks/evaluation_script.py \
--task_path tasks/crafting_tasks/test_tasks/2_agent.json \
--model gpt-4o-mini \
--template_profile profiles/tasks/crafting_profile.json
# Multi-agent cooking
python tasks/evaluation_script.py \
--task_path tasks/cooking_tasks/test_tasks/2_agent.json \
--model gpt-4o-mini \
--template_profile profiles/tasks/cooking_profile.json
# Construction (requires insecure coding)
python tasks/evaluation_script.py \
--task_path tasks/construction_tasks/test_tasks/2_agent.json \
--model gpt-4o-mini \
--template_profile profiles/tasks/construction_profile.json \
--insecure_coding
Evaluation Results#
Results logged to experiments/exp_04-21_16-16/results.txt:
- Crafting/Cooking: Score 0 or 1
- Construction: Decimal (edit distance from blueprint)
- Memory:
memory.jsonper agent (last 15 messages + task score)
Parallel Worlds#
--num_parallel 4 # Run 4 Minecraft worlds in parallel
Each world gets its own tmux shell (server_i + shell i) and server_data_i copy.
Running Without tmux (Windows)#
python tasks/run_task_file.py --task_path=tasks/single_agent/crafting_train.json