First NLG Challenge on Generating Instructions in Virtual Environments (GIVE-1)
Endorsed by SIGGEN, SIGDIAL, and SIGSEM
The first installment of GIVE was intended as a pilot experiment in order to establish the validity of the evaluation methodology and understand the challenges of the instruction-giving task. We aimed particularly at encouraging contributions from students and student teams, but contributions from anyone interested were welcome as well.
The challenge ran from March 2008 to February 2009. System developers had time from May to October 2008 to implement their approaches. Five NLG systems were evaluated over a period of three months from November 2008 to February 2009. During this time, we collected 1143 games that were played by users from 48 countries. As far as we know, this makes GIVE-1 the largest NLG evaluation effort in terms of experimental subjects ever.
We have evaluated the five systems both on objective measures (success rate, completion time, etc.) and subjective measures which were collected by asking the users to fill in a questionnaire. The results have been presented at the 12th European Workshop on Natural Language Generation (ENLG 2009). Additionally, we have verified that these results are consistent with (but more detailed than) the results that could be obtained from a traditional lab-based evaluation.