How to Remove Duplicate Lines in Linux

The uniq command in Linux filters out repeated, adjacent lines from a file or input stream, displaying only unique lines. It is most effective when used on sorted data because it only detects duplicates that appear consecutively.

Alby Andersen

The uniq command in Linux filters or reports adjacent duplicate lines in a text file or input stream. It is commonly used to remove duplicates, count occurrences, or identify unique/repeated lines. For non-adjacent duplicates, pair uniq with sort to pre-process the input. Below are practical examples:


Remove Adjacent Duplicate Lines

uniq file.txt  


Deletes consecutive duplicate lines in file.txt (requires duplicates to be adjacent).


Remove All Duplicates (After Sorting)

sort file.txt | uniq  


Sorts the file first, then removes all duplicates.


Count Occurrences of Each Line

sort file.txt | uniq -c  
  • -c adds a count of occurrences (e.g., 3 apples).

Show Only Duplicated Lines

sort file.txt | uniq -d  
  • -d prints lines that appear more than once.

Show Only Unique Lines

sort file.txt | uniq -u  
  • -u prints lines that appear exactly once.

Case-Insensitive Comparison

sort file.txt | uniq -i  
  • -i ignores case differences (e.g., Error = error).

Skip Fields Before Checking

sort -t',' -k2 data.csv | uniq -f2  
  • -t',' uses comma as the delimiter.
  • -f2 skips the first 2 fields when comparing lines.

Skip Characters Before Checking

sort file.txt | uniq -s5  
  • -s5 ignores the first 5 characters of each line.

Compare Only the First N Characters

sort file.txt | uniq -w10  
  • -w10 compares only the first 10 characters.

Combine with cut to Process Columns

cut -d',' -f1 data.csv | sort | uniq  


Extracts the first CSV column, sorts it, and removes duplicates.


Check for Duplicates in Raw (Unsorted) Files

uniq raw_data.txt  


Note: Only removes adjacent duplicates. Non-adjacent duplicates remain.


Key Notes:

  • Sorted Input: Always use sort before uniq unless duplicates are guaranteed to be adjacent.
  • Delimiters: Use -t with sort or cut for structured data (e.g., CSV).
  • Options:
  • -c: Count lines.
  • -d: Show duplicates.
  • -u: Show uniques.
  • -i: Case-insensitive mode.
Share This Article