University of Cape Coast Institutional Repository

Cost-based and effective human-machine based data deduplication model in entity reconciliation

Show simple item record

dc.contributor.author Haruna, Charles R.
dc.contributor.author Hou, MengShu
dc.contributor.author Eghan, Moses J.
dc.contributor.author Kpiebaareh, Michael Y.
dc.contributor.author Tandoh, Lawrence
dc.contributor.author Eghan-Yartel, Barbie
dc.contributor.author Asante-Mensah, Maame G.
dc.date.accessioned 2021-10-07T11:17:53Z
dc.date.available 2021-10-07T11:17:53Z
dc.date.issued 2018
dc.identifier.issn 23105496
dc.identifier.uri http://hdl.handle.net/123456789/6147
dc.description 6p:, ill. en_US
dc.description.abstract In real world, databases often have several records representing the same entity and these duplicates have no common key, thus making deduplication difficult. Machine-based and crowdsourcing techniques were dis jointly used in improving quality in data deduplication. Crowdsourcing were used for solving tasks that the machine-based algorithms were not good at. Though, the crowds, compared with machines, provided relatively more accurate results, both platforms were slow in execution and hence expensive to implement. In this paper, a hybrid human machine system was proposed where machines were firstly used on the data set before the humans were further used to identify potential duplicates. We performed experiments using three benchmark datasets; paper, restaurant and product datasets. Our algorithm was compared with some existing techniques and our approach outperformed some methods by achieving a high accuracy of deduplication and good deduplication efficiency while incurring low crowdsourcing costs en_US
dc.language.iso en en_US
dc.publisher University of Cape Coast en_US
dc.subject Qualitative Error Detection en_US
dc.subject Hybrid Data Deduplication en_US
dc.subject Clustering en_US
dc.subject Pivot Graphs en_US
dc.subject Entity Resolution en_US
dc.subject Crowd sourcing en_US
dc.title Cost-based and effective human-machine based data deduplication model in entity reconciliation en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UCC IR


Advanced Search

Browse

My Account