Abstract / Description of output
As the use of electronic documents are becoming more popular, people want to find documents completely or partially duplicate. In this paper, we propose a near duplicate text detection framework using signatures to save space and query time. We also propose a novel signature selection algorithm which uses collection frequency of q-grams. We compare our algorithm with Winnowing, which is one of the state-of-the-art signature selection algorithms. We show that our algorithm acquires much better accuracy with less time and space cost. We perform extensive experiments to verify our conclusion.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th International Conference on Web Information Systems Engineering – WISE 2013 |
Place of Publication | Nanjing, China |
Publisher | Springer |
Pages | 277-291 |
Number of pages | 15 |
ISBN (Electronic) | 978-3-642-41230-1 |
ISBN (Print) | 978-3-642-41229-5 |
DOIs | |
Publication status | Published - 2013 |
Event | 14th International Conference on Web Information System Engineering - Nanjing, China Duration: 13 Oct 2013 → 15 Oct 2013 http://wise2013.njue.edu.cn/ |
Conference
Conference | 14th International Conference on Web Information System Engineering |
---|---|
Abbreviated title | WISE 2013 |
Country/Territory | China |
City | Nanjing |
Period | 13/10/13 → 15/10/13 |
Internet address |