SIGMOD 2019 Paper: Compact Data Representation with Bounded Rank Regret
Selecting the best items in a dataset is a common task in data exploration. However, the concept of “best” lies in the eyes of the beholder - different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove “dominated” items and create a “representative” subset of the data set, comprising the “best items” in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data.
This paper demonstrates that compact representations for large datasets can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users’ “regret”. See arXiv for the full paper, which was recently accepted for publication at SIGMOD 2019.