Demo D08 - Curating and Searching the Annotated Web


19:30 - 22:00 on Tuesday, 30 June 2009 in room tbd

We demonstrate CSAW, a system for Curating and Searching the Annotated Web. CSAW annotates named entities and quantities in Web-scale text corpora, and, where confident, connects these annotations with entries in an entity and type catalog such as Wikipedia. The semistructured catalog, together with the unstructured corpus, forms a composite database that CSAW can then search using powerful reachability, proximity and aggregation primitives. Specifically, we can look for snippets with mentions of specific entities, entities of a specified type, quantities with specified types or units, find unions and intersections of snippet sets, and then aggregate evidence from snippet sets into ranked responses. Responses are not page URLs as in standard Web search, but ranked tables where the cells can be entity references, quantities, or token snippets. We will show a subset of CSAW’s capabilities, and describe the beginnings of a next-generation Web search API that significantly extends the capabilities of APIs provided by popular search engines today.