THESIS
2017
Abstract
Suppose that there are n users and m items, and that the preference of each user for the items
is revealed only upon probing, which takes time and is therefore costly. How can we quickly
discover all the frequent items that are favored individually by at least a given number of
users? This new problem not only has strong connections with several well-known problems,
including the frequent item mining problem and the bichromatic reverse nearest neighbor
problem, it also finds applications in fields as diverse as sponsored search, marketing surveys,
and crowdsourcing. Although this problem can be settled naively by probing the preferences
of all n users, the number of users is typically enormous, and probing the preference of a user
itself can also incur a prohibitive cost. Conseq...[
Read more ]
Suppose that there are n users and m items, and that the preference of each user for the items
is revealed only upon probing, which takes time and is therefore costly. How can we quickly
discover all the frequent items that are favored individually by at least a given number of
users? This new problem not only has strong connections with several well-known problems,
including the frequent item mining problem and the bichromatic reverse nearest neighbor
problem, it also finds applications in fields as diverse as sponsored search, marketing surveys,
and crowdsourcing. Although this problem can be settled naively by probing the preferences
of all n users, the number of users is typically enormous, and probing the preference of a user
itself can also incur a prohibitive cost. Consequently, a lack of a more efficient algorithm
would severely limit the practicality of the problem.
To overcome the deficiency above, we present a sampling algorithm that drastically reduces
the number of users needed to probe to O(log m)—regardless of the number of users—as
long as slight inaccuracy in the output is permitted. For reasonably sized input, our algorithm
needs to probe only 0.5% of the users, whereas the naive approach needs to probe all of
them. The experimental results also fully agree with our theoretical analysis: our algorithm
is faster than both the adapted algorithms and the naive approach by up to three orders of magnitude.
Post a Comment