Applying cluster refinement to improve crowd-based data duplicate detection approach

Haruna, Charles Roland; Hou, Mengshu; Xi, Rui; Eghan, Moses Ojo; Kpiebaareh, Michael; Tandoh, Lawrence; Eghan-Yartel, Barbie; Asante-Mensah, Maame G.

UCC IR Home
→
UNIVERSITY PUBLICATIONS
→
RESEARCH ARTICLES AND BOOKS
→
COLLEGE OF AGRICULTURAL & NATURAL SCIENCES
→
SCHOOL OF PHYSICAL SCIENCE
→
Department of Physics
→
View Item

dc.contributor.author	Haruna, Charles Roland
dc.contributor.author	Hou, Mengshu
dc.contributor.author	Xi, Rui
dc.contributor.author	Eghan, Moses Ojo
dc.contributor.author	Kpiebaareh, Michael
dc.contributor.author	Tandoh, Lawrence
dc.contributor.author	Eghan-Yartel, Barbie
dc.contributor.author	Asante-Mensah, Maame G.
dc.date.accessioned	2021-10-04T13:55:41Z
dc.date.available	2021-10-04T13:55:41Z
dc.date.issued	2019-06-04
dc.identifier.issn	23105496
dc.identifier.uri	http://hdl.handle.net/123456789/6124
dc.description	10p:, ill.	en_US
dc.description.abstract	In this paper, we present an extension on a hybrid-based deduplication technique in entity reconciliation (ER), by proposing an algorithm that builds clusters upon receiving a pre-specified K numberof clusters, and second developing a crowd-based procedure for refining the results of the clusters produced after the clustering generation phases. With the clusters refined, we aim to minimize the cost metric 30(R) of the solitary and compound cluster generation algorithms, to achieve an improved and efficient deduplication method, to have an increase in accuracy in identifying duplicate records, and finally, further reduce the crowdsourcing overheads incurred. In this paper, in the experiments, we made use of three datasets commonly known to hybrid-based deduplication such as paper, product, and restaurant. The performance results and evaluations demonstrate clear superiority to the methods compared with our work offering low-crowdsourcing cost and high accuracy of deduplication, as well as better deduplication efficiency due to the clusters being refined	en_US
dc.language.iso	en	en_US
dc.publisher	University of Cape Coast	en_US
dc.subject	Cluster refinement	en_US
dc.subject	Minimization approach	en_US
dc.subject	Triangular split and merger operations	en_US
dc.subject	Entity reconciliation	en_US
dc.subject	Crowdsourcing	en_US
dc.title	Applying cluster refinement to improve crowd-based data duplicate detection approach	en_US
dc.type	Article	en_US