Address lists are generally managed with Excel. And address lists often contain duplicate addresses:
To delete these duplicate addresses conveniently and easily in Excel, proceed as follows:
- If you have not already done so, download DedupeWizard free of charge here. Install the program and request a trial activation. Then you can work with the program for one whole week without any restrictions.
- Open the DedupeWizard and launch the function "Duplicate search in a table":
- Select the "postal address" as the criterion for searching for duplicates and then click "next":
- In the next step, select the Excel file to be processed:
- Then you will be forwarded to a dialogue where you can specify which information the program will find in which column of the Excel table. The program has already done this for us as much as possible using the column headings. For example, the "street" column in our table contains the street:
- In the next dialogue we can tell the program which parts of the address should be compared - usually all of them. And we can specify how large the calculated confidence score between two addresses must be for them to appear in the result. In our example, a threshold value of "70%" is used for the confidence score:
- After clicking on "next" again, the data will be processed. After a brief moment, the program will present a summary of the results:
- We are then taken to an overview of the results of the comparison in the form of a table. There, we are still able to make changes to the results by either removing the red cross in the "delete" column or transferring it to another address in the group:
- Once we are satisfied with the results, we can output the results, or have them processed in the final step, depending on what we need. In addition to a deletion log that can be printed out, the duplicate addresses can be deleted directly in the original table or the cleaned data can be saved in a new file:
The DataQualityTools offer more options for processing the result than the DedupeWizard. Among other things, the hits can be marked there. Alternatively, the comparison results can be used to transfer data from one data record in the duplicate group to another, to complete it. A complete overview can be found here.