THESIS
2015
xi, 85 pages : illustrations ; 30 cm
Abstract
Recently, the popularity of crowdsourcing has brought a new opportunity for engaging
human intelligence into the process of data analysis. Existing works on crowdsourcing
have developed sophisticated methods by utilizing the crowd as a new kind of processor, a.k.a HPU. One of the drawbacks of these works is that they treat the crowd as the sole information source for the human-intrinsic queries. However, on many applications, such human-intrinsic queries can be also answered by machine-alone systems (i.e. CPUs). On the one hand, the latency of using HPUs to answer queries is much longer than that of CPUs, and the monetary cost of HPUs is often high (e.g. crowdsoucing on Amazon Mechanical Turk), but on the other hand, the answers obtained from CPUs often have high uncertainty due to it...[
Read more ]
Recently, the popularity of crowdsourcing has brought a new opportunity for engaging
human intelligence into the process of data analysis. Existing works on crowdsourcing
have developed sophisticated methods by utilizing the crowd as a new kind of processor, a.k.a HPU. One of the drawbacks of these works is that they treat the crowd as the sole information source for the human-intrinsic queries. However, on many applications, such human-intrinsic queries can be also answered by machine-alone systems (i.e. CPUs). On the one hand, the latency of using HPUs to answer queries is much longer than that of CPUs, and the monetary cost of HPUs is often high (e.g. crowdsoucing on Amazon Mechanical Turk), but on the other hand, the answers obtained from CPUs often have high uncertainty due to its incapability to recognize human-intrinsic
semantics. Therefore, it is natural to ask why we cannot combine the power of CPUs and the wisdom of HPUs to answer human-intrinsic queries accurately and fast, which is exactly the motivation of this work.
To summarize, our study covers four following aspects:
1) We propose three a specific human-machine hybrid system;
2) We design a novel crowd-machine hybrid system of uncertain data cleaning;
3) We study the classic problem of schema mapping in the new crowdsourcing
perspective;
We validate our solutions through extensive experiments and discuss several interesting
research directions of CPU and HPU hybrid systems on data integration.
Post a Comment