Removing Duplicate Words from CSV Files specifically removing duplicated words that occur in column A and column B of the uploaded CSV file. You simply choose which column the duplication can remain.
Here’s a 3-row example of input data and the expected output, assuming the user selected Column B:
Input Data:
Column A | Column B |
---|---|
apple orange | orange banana |
cat dog squirrel | dog rabbit squirrel |
red blue green | blue yellow |
Output Data (with Column B retaining duplicates):
Column A | Column B |
---|---|
apple | orange banana |
cat squirrel | dog rabbit squirrel |
red green | blue yellow |
In this example, the input data contains duplicate words in both columns for each row. After processing with the script and selecting Column B as the preferred column, the output data will have the duplicate words removed from Column A, while Column B retains the duplicates.
CSV Duplicates Remover
Handling CSV (Comma Separated Values) files is a common task in data processing and manipulation. In some cases, you may find yourself dealing with duplicate words within the columns of a CSV file, which could affect data analysis or processing. In this article, we will explain a custom PHP script that helps users upload a CSV file and automatically remove duplicate words that occur in both Column A and Column B of each row whilst retaining duplicates in the user’s preferred column.
Script Overview
The PHP script provided combines both the front-end and back-end functionalities in a single file. The front-end is a simple HTML form that allows users to upload a CSV file and choose a preferred column for retaining duplicates. The back-end handles the file processing and removal of duplicate words from the specified columns.
Functionality Breakdown
- The script starts by defining two utility functions:
remove_bom
andremove_duplicates
.- The
remove_bom
function is responsible for removing the Byte Order Mark (BOM) from the beginning of the text, which may be present in some UTF-8 encoded CSV files. - The
remove_duplicates
function takes a row from the CSV file, a boolean flag indicating whether it’s the first row, and the user’s preferred column. It then checks both columns for duplicate words and returns a modified row with the duplicate words removed, whilst retaining duplicates in the preferred column.
- The
- The script proceeds to check if the form has been submitted by the user. If so, it validates the uploaded file to ensure it is a CSV file and then moves the file to the server for processing.
- The script then opens both the original CSV file and creates a new CSV file to store the amended data. It processes the original file row by row, calling the
remove_duplicates
function for each row and writing the modified row to the new CSV file. - Once the entire file has been processed, the script displays a download link for the amended CSV file, allowing users to download it for further use.
Conclusion
This custom PHP script provides a simple and efficient solution for removing duplicate words from two columns of a CSV file whilst allowing the user to select a preferred column for retaining duplicates. It combines both the front-end and back-end functionalities, making it easy for users to upload a file and download the amended version. The script can be further customised and extended to handle more complex data manipulation tasks, making it a valuable tool for working with CSV files.