Product web pages contain single product records. The product records can be extracted from the product web pages. Stores create data feeds with single product records. The single product records need to be merged in order to create matched records.
Once the single product records have been acquired the bad data in the records must be identified prior to matching the product records which contain the same product at different stores, including variants.
However, It is difficult to match records when errors appear in the single product records. And it is a pointless exercise to match the same records the same errors over and over. The sources of the errors must be identified and the errors must be fixed. Errors are introduced at various points in the product data ecosystem a shown int he image below.
The product data ecosystem pipeline varies for each store. A hypothetical pipeline is shown below.
Finding and fixing bad data in large product catalogs is currently done manually. This is a labor intensive process subject to the same problems that introduced the errors in the data in the first place.
Data Record Science has a new data auditing system in beta which finds the errors in the ecosystem pipeline and pinpoints the sources of the errors. Suggestions are offered for missing data. Fixes are offered for bad data.