We describe SICK-BR, a Brazilian Portuguese corpus annotated with inference relations and semantic relatedness between pairs of sentences. SICK-BR is a translation and adaptation of the original SICK, a corpus of English sentences used in several semantic evaluations. SICK-BR consists of around 10k sentence pairs annotated for neutral/contradiction/entailment relations and for semantic relatedness, using a 5 point scale. Here we describe the strategies used for the adaptation of SICK, which preserve its original inference and relatedness relation labels in the SICK-BR Portuguese version. We also discuss some issues with the original corpus and how we might deal with them.
|Name||Lecture Notes in Computer Science|
|Name||Lecture Notes in Artificial Intelligence|
|Conference||13th International Conference on the Computational Processing of Portuguese|
|Abbreviated title||PROPOR 2018|
|Period||24/09/18 → 26/09/18|