SICK-BR: A Portuguese Corpus for Inference

Livy Real, Ana Rodrigues, Andressa Vieira e Silva, Beatriz Albiero, Bruna Thalenberg, Bruno Guide, Cindy Silva, Guilherme de Oliveira Lima, Igor C. S. Câmara, Milos Stanojevic, Rodrigo Souza, Valeria de Paiva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

We describe SICK-BR, a Brazilian Portuguese corpus annotated with inference relations and semantic relatedness between pairs of sentences. SICK-BR is a translation and adaptation of the original SICK, a corpus of English sentences used in several semantic evaluations. SICK-BR consists of around 10k sentence pairs annotated for neutral/contradiction/entailment relations and for semantic relatedness, using a 5 point scale. Here we describe the strategies used for the adaptation of SICK, which preserve its original inference and relatedness relation labels in the SICK-BR Portuguese version. We also discuss some issues with the original corpus and how we might deal with them.
Original languageEnglish
Title of host publicationProceedings of the 13th International Conference of Computational Processing of the Portuguese Language (PROPOR 2018)
Place of PublicationCanela, Brazil
PublisherSpringer
Pages303-312
Number of pages10
ISBN (Electronic)978-3-319-99722-3
ISBN (Print)978-3-319-99721-6
DOIs
Publication statusE-pub ahead of print - 26 Aug 2018
Event13th International Conference on the Computational Processing of Portuguese - Canela, Brazil
Duration: 24 Sept 201826 Sept 2018
http://www.inf.ufrgs.br/propor-2018/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
Volume11122
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349
NameLecture Notes in Artificial Intelligence
Volume11122

Conference

Conference13th International Conference on the Computational Processing of Portuguese
Abbreviated titlePROPOR 2018
Country/TerritoryBrazil
CityCanela
Period24/09/1826/09/18
Internet address

Fingerprint

Dive into the research topics of 'SICK-BR: A Portuguese Corpus for Inference'. Together they form a unique fingerprint.

Cite this