GIVE-1: Results
Five NLG systems were evaluated in GIVE-1:
- one system from the University of Texas at Austin (Austin);
- one system from Union College in Schenectady, NY (Union);
- one system from the Universidad Complutense de Madrid (Madrid);
- two systems from the University of Twente: one serious contribution (Twente) and one more playful one (Warm-Cold).
Below, we summarize some results of the evaluation. More details can be found in the official GIVE-1 report, which we presented at ENLG-09.
The tables below present the results by assigning systems to groups A, B, etc. for each evaluation measure. Systems in group A are better than systems in group B, etc.; if two systems don't share the same letter, the difference between these two systems is significant with p < 0.05.
Objective measurements
| Austin | Madrid | Twente | Union | Warm-Cold | |
| task success | B | A | B | A | C |
| instructions | B | A | C | B | D |
| steps | A | B | C | A, B | D |
| actions | B | A | C | A | C |
| seconds | A | B | C | B | D |
Subjective measurements
| Austin | Madrid | Twente | Union | Warm-Cold | |
| task difficulty | A | A | A | A | B |
| goal clarity | A | A | A | A | B |
| play again | A | A | A | A | A |
| instruction clarity | A | A, B | A, B | B | C |
| instruction helpfulness | A | A | A | A | B |
| informativity | B | A | B | B | B |
| overall | A | A | B | A, B | C |
| choice of words | A | B, C | A, B | C | C |
| referring expressions | B | A | A, B | A, B | B |
| navigation instructions | A | B | B | B | C |
| timing | A | B | B, C | B | C |
| friendliness | A, B | A | B | A | B |