This is an open shared task for language modelling.

The task is to assign scores to sentences, based on their quality. The dataset contains 10,000 sentences that need to be scored. The sentences are in pairs - one correct and one incorrect sentence. The paired sentences are kept together in the dataset, but it is randomly selected whether the correct sentence is first or second.

The sentences come from two sources. Half of the sentences are from Wikipedia, and the incorrect versions are generated by randomly switching two words in the sentence. The rest of the sentences are from essays written by language learners, and the correct versions have been manually created by annotators.

The system needs to assign a score to each sentence, with the goal of giving a higher score to the correct sentence. The submissions are evaluated using accuracy: the relative number of times that the system correctly assigned a higher score to the correct sentence. When the scores for both sentences are equal, the pair will be counted as incorrect.

The submission file needs to contain 1 score per line, and these scores directly correspond to sentences in the input file.

When you upload your file, you can set a name and password, and the system will register your name. You can then update your results next time using the same password. Only the submission with the highest score is saved.


Download from here: lm-task-dataset.tar.gz



Name Accuracy Date
test 0.81 27 June 2015
kin 0.8038 29 June 2015
Laimis Dalke 0.7944 4 May 2015
O 0.7938 25 June 2015
M 0.7884 18 May 2015
badmodel 0.7632 28 January 2016
Aleksandr Tkatsenko 0.7492 3 May 2015
Marek 0.733 5 April 2015
Sue 0.7114 20 June 2018
Dmytro bigrams train + de 0.6938 4 May 2015
Dmytro bigrams train + de 0.6938 4 May 2015
Dmytro 0.69 5 May 2015
Karl-Oskar 0.6072 24 April 2015