Hands-On Entity Resolution

£45.50

Hands-On Entity Resolution

A Practical Guide to Data Matching with Python

Machine learning

Author: Michael Shearer

Dinosaur mascot

Language: English

Published by: O'Reilly Media

Published on: 1st February 2024

Format: LCP-protected ePub

ISBN: 9781098148447


Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs.

Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You''ll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren''t available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value.

With entity resolution, you''ll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers:

Challenges in deduplicating and joining datasets

Extracting, cleansing, and preparing datasets for matching

Text matching algorithms to identify equivalent entities

Techniques for deduplicating and joining datasets at scale

Matching datasets containing persons and organizations

Evaluating data matches

Optimizing and tuning data matching algorithms

Entity resolution using cloud APIs

Matching using privacy-enhancing technologies

Show moreShow less