TY - GEN
T1 - Federated Search in the Wild: The Combined Power of over a Hundred Search Engines
AU - Nguyen, Dong
AU - Demeester, Thomas
AU - Trieschnigg, Dolf
AU - Hiemstra, Djoerd
PY - 2012
Y1 - 2012
N2 - Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.
AB - Federated search has the potential of improving web search: the user becomes less dependent on a single search provider and parts of the deep web become available through a unified interface, leading to a wider variety in the retrieved search results. However, a publicly available dataset for federated search reflecting an actual web environment has been absent. As a result, it has been difficult to assess whether proposed systems are suitable for the web setting. We introduce a new test collection containing the results from more than a hundred actual search engines, ranging from large general web search engines such as Google and Bing to small domain-specific engines. We discuss the design and analyze the effect of several sampling methods. For a set of test queries, we collected relevance judgements for the top 10 results of each search engine. The dataset is publicly available and is useful for researchers interested in resource selection for web search collections, result merging and size estimation of uncooperative resources.
U2 - 10.1145/2396761.2398535
DO - 10.1145/2396761.2398535
M3 - Conference contribution
SN - 978-1-4503-1156-4
T3 - CIKM '12
SP - 1874
EP - 1878
BT - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
PB - ACM
CY - New York, NY, USA
ER -