Hanghang Tong

Demo D07 - SHIFTR: A Fast and Scalable System for Ad Hoc Sensemaking of Large Graphs
June 30 7:30PM
We present SHIFTR, a system that assists users in making sense of large scale graph data. Making sense of information represented as large graphs is a fundamental challenge in many data-intensive domains. We suggest the potential of strong synergies between the data mining, cognitive psychology, and HCI communities in matching powerful graph mining tools with insights into how people learn and interact with information, and here we present SHIFTR as one such application. SHIFTR adapts the Belief Propagation algorithm to target important sensemaking tasks such as flexibly reorganizing graph entities into multiple groups based on both positive and negative examples. SHIFTR scales linearly with the graph size through its fast algorithm, novel mList data structure, and externalization of graph meta data.

We demonstrate SHIFTR’s usage and benefits through real-world sensemaking scenarios using the DBLP dataset that has almost 2 million author-publication relationships.
A demo video of SHIFTR can be downloaded at http://www.cs.cmu.edu/~dchau/shiftr/shiftr.mov.
TANGENT: A Novel, 'Surprise-me' Recommendation Algorithm
June 29 12:00PM
Most of recommender systems try to find items that are most relevant to the older choices of a given user. Here we focus on the "surprise me" query: A user may be bored with his/her usual genre of items (e.g., books, movies, hobbies), and may want a recommendation that is related, but off the beaten path, possibly leading to a new genre of books/movies/hobbies.

How would we define, as well as automate, this seemingly selfcontradicting request? We introduce TANGENT, a novel recommendation algorithm to solve this problem. The main idea behind TANGENT is to envision the problem as node selection on a graph, giving high scores to nodes that are well connected to the older choices, and at the same time well connected to unrelated choices. The method is carefully designed to be (a) parameter-free (b) effective and (c) fast. We illustrate the benefits of TANGENT with experiments on both synthetic and real data sets. We show that TANGENT makes reasonable, yet surprising, horizon-broadening recommendations. Moreover, it is fast and scalable, since it can easily use existing fast algorithms on graph node proximity.