Analysis / Digital Health · July 6, 2023

red and blue lights from tower steel wool photography

Enhancing Data Accuracy in Public Sector Purchase Orders with Excel’s Power Query and Fuzzy Clustering

In the public sector, managing purchase order data is a critical task. With numerous descriptions, suppliers, and item codes, maintaining accuracy and consistency can be a daunting task. This is especially true when dealing with the description field, where variations in text can create confusion and inaccuracies. However, Excel’s Power Query feature, coupled with the power of fuzzy clustering, offers a solution to this challenge. In this blog post, we will guide you through the process of using Power Query and fuzzy clustering to streamline your purchase order data.

Harnessing Power Query for Purchase Order Data
The first step in the process is to import your purchase order data into Power Query. This can be achieved by selecting the description column you wish to clean up and then choosing the Power Query option. Power Query is a data connection technology that enables you to discover, connect, combine, and refine data across a wide variety of sources. It is available in Excel and Power BI, making it a versatile tool for data cleaning and transformation.

Fuzzy Clustering: A Solution for Text Variations
Once your purchase order data is in Power Query, you can utilise the fuzzy cluster column function. This function works by identifying and grouping similar text together, thereby cleaning up any inconsistencies in your data. For instance, ‘Stationery Supplies’ and ‘Stationary Supplies’ can be grouped together, ensuring consistency across your data. The term ‘fuzzy’ refers to the function’s ability to group text that is not exactly the same but is similar enough to be considered a match.

Fine-Tuning Your Data with Similarity Thresholds and Case Handling
One of the key aspects of using the fuzzy cluster column function is setting the similarity thresholds. This determines how similar the text has to be to be considered a match. You can adjust this setting to fit the specific needs of your data. Additionally, Power Query also allows you to handle case and spacing, ensuring that minor differences in text do not lead to unnecessary variations in your data.

Transformation Table and Manual Adjustments for Customised Data Cleaning
To further refine the data cleaning process, you can use a transformation table and make manual adjustments. A transformation table allows you to specify certain transformations that should be made during the data cleaning process. For example, you could specify that ‘PC’ should be changed to ‘Personal Computer’. Manual adjustments can also be made to override the fuzzy clustering function, giving you ultimate control over your data.

Troubleshooting and Further Refinements
Like any tool, Power Query and the fuzzy clustering function may present some challenges. However, with a bit of troubleshooting and understanding of the tool, these challenges can be overcome. The video provides tips on how to troubleshoot any issues that may arise during the data cleaning process. It also shows how to add a fuzzy match to an existing table, which can be useful for comparing and merging data.

Managing purchase order data in the public sector can be a complex task, but with the right tools, it doesn’t have to be. Excel’s Power Query and fuzzy clustering function offer a powerful solution for cleaning up the description field in your data, ensuring consistency and accuracy. This not only saves time but also improves the quality of your data, leading to more accurate reporting and decision-making.

Moreover, the ability to customise the data cleaning process to suit your specific needs makes this tool a valuable addition to your data management toolkit. Whether you’re dealing with minor text variations or larger inconsistencies, Power Query and fuzzy clustering can help streamline your purchase order data.

In the ever-evolving public sector landscape, efficiency and accuracy are paramount. By harnessing the power of these tools, you can ensure your organisation stays ahead of the curve, delivering high-quality services effectively. So why not give Power Query and fuzzy clustering a try? The benefits to your data management processes could be significant, leading to improved operational efficiency and more informed decision-making.

This article is based from a fantastic straightforward youtube video by David Benaim.

  1. The video introduces the concept of using Power Query’s fuzzy clustering feature to clean up columns with similar text in Excel or Power BI.
  2. Get Data into Power Query (01:13): The first step is to get your data into Power Query. This can be done by selecting the data you want to clean up and then choosing the Power Query option.
  3. Fuzzy Cluster Column Function (02:46): The next step is to use the fuzzy cluster column function. This function will identify and group similar text together, helping to clean up any inconsistencies in your data.
  4. Similarity Thresholds, Case, Spacing (05:44): The video then discusses how to set similarity thresholds and how to handle case and spacing. These settings can be adjusted to fit the specific needs of your data.
  5. Transformation Table & Manual Adjustments (09:07): The video shows how to use a transformation table and make manual adjustments to further refine the data cleaning process. For example, you can specify that “England” should be changed to “UK”.
  6. Troubleshooting Issues (11:22): The video provides tips on how to troubleshoot any issues that may arise during the data cleaning process.
  7. Add Fuzzy Match to Existing Table (12:36): The video shows how to add a fuzzy match to an existing table, which can be useful for comparing and merging data.
  8. Power Query Online UI (14:44): The video briefly discusses the Power Query Online user interface.
  9. Fuzzy Grouping (15:28): The final part of the video covers fuzzy grouping, which is another useful feature for cleaning up similar text.

The example files used in the video can be found here.