University of Cape Coast Institutional Repository

Applying cluster refinement to improve crowd-based data duplicate detection approach

Show simple item record

dc.contributor.author Haruna, Charles Roland
dc.contributor.author Hou, Mengshu
dc.contributor.author Xi, Rui
dc.contributor.author Eghan, Moses Ojo
dc.contributor.author Kpiebaareh, Michael
dc.contributor.author Tandoh, Lawrence
dc.contributor.author Eghan-Yartel, Barbie
dc.contributor.author Asante-Mensah, Maame G.
dc.date.accessioned 2021-10-04T13:55:41Z
dc.date.available 2021-10-04T13:55:41Z
dc.date.issued 2019-06-04
dc.identifier.issn 23105496
dc.identifier.uri http://hdl.handle.net/123456789/6124
dc.description 10p:, ill. en_US
dc.description.abstract In this paper, we present an extension on a hybrid-based deduplication technique in entity reconciliation (ER), by proposing an algorithm that builds clusters upon receiving a pre-specified K numberof clusters, and second developing a crowd-based procedure for refining the results of the clusters produced after the clustering generation phases. With the clusters refined, we aim to minimize the cost metric 30(R) of the solitary and compound cluster generation algorithms, to achieve an improved and efficient deduplication method, to have an increase in accuracy in identifying duplicate records, and finally, further reduce the crowdsourcing overheads incurred. In this paper, in the experiments, we made use of three datasets commonly known to hybrid-based deduplication such as paper, product, and restaurant. The performance results and evaluations demonstrate clear superiority to the methods compared with our work offering low-crowdsourcing cost and high accuracy of deduplication, as well as better deduplication efficiency due to the clusters being refined en_US
dc.language.iso en en_US
dc.publisher University of Cape Coast en_US
dc.subject Cluster refinement en_US
dc.subject Minimization approach en_US
dc.subject Triangular split and merger operations en_US
dc.subject Entity reconciliation en_US
dc.subject Crowdsourcing en_US
dc.title Applying cluster refinement to improve crowd-based data duplicate detection approach en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UCC IR


Advanced Search

Browse

My Account