GIVE Generating Instructions in Virtual Environments

GIVE-1: Results

Five NLG systems were evaluated in GIVE-1:

The evaluation contains both objective measures (success rate, completion time, etc.) and subjective measures, which were collected by asking the users to fill in a questionnaire.

Below, we summarize some results of the evaluation. More details can be found in the official GIVE-1 report, which we presented at ENLG-09.

The tables below present the results by assigning systems to groups A, B, etc. for each evaluation measure. Systems in group A are better than systems in group B, etc.; if two systems don't share the same letter, the difference between these two systems is significant with p < 0.05.

Objective measurements

Austin Madrid Twente Union Warm-Cold
task success B A B A C
instructions B A C B D
steps A B C A, B D
actions B A C A C
seconds A B C B D

Subjective measurements

task difficultyAAAAB
goal clarityAAAAB
play againAAAAA
instruction clarityAA, BA, BBC
instruction helpfulnessAAAAB
overallAABA, BC
choice of wordsAB, CA, BCC
referring expressionsBAA, BA, BB
navigation instructionsABBBC
timingABB, CBC
friendlinessA, BABAB


Heatmaps of each world-system pair of GIVE-1 are available for several parameters, including player time per tile, location where player asked for help, and location where player lost.