The uniq
command in Linux filters or reports adjacent duplicate lines in a text file or input stream. It is commonly used to remove duplicates, count occurrences, or identify unique/repeated lines. For non-adjacent duplicates, pair uniq
with sort
to pre-process the input. Below are practical examples:
Remove Adjacent Duplicate Lines
uniq file.txt
Deletes consecutive duplicate lines in file.txt
(requires duplicates to be adjacent).
Remove All Duplicates (After Sorting)
sort file.txt | uniq
Sorts the file first, then removes all duplicates.
Count Occurrences of Each Line
sort file.txt | uniq -c
-c
adds a count of occurrences (e.g.,3 apples
).
Show Only Duplicated Lines
sort file.txt | uniq -d
-d
prints lines that appear more than once.
Show Only Unique Lines
sort file.txt | uniq -u
-u
prints lines that appear exactly once.
Case-Insensitive Comparison
sort file.txt | uniq -i
-i
ignores case differences (e.g.,Error
=error
).
Skip Fields Before Checking
sort -t',' -k2 data.csv | uniq -f2
-t','
uses comma as the delimiter.-f2
skips the first 2 fields when comparing lines.
Skip Characters Before Checking
sort file.txt | uniq -s5
-s5
ignores the first 5 characters of each line.
Compare Only the First N Characters
sort file.txt | uniq -w10
-w10
compares only the first 10 characters.
Combine with cut
to Process Columns
cut -d',' -f1 data.csv | sort | uniq
Extracts the first CSV column, sorts it, and removes duplicates.
Check for Duplicates in Raw (Unsorted) Files
uniq raw_data.txt
Note: Only removes adjacent duplicates. Non-adjacent duplicates remain.
Key Notes:
- Sorted Input: Always use
sort
beforeuniq
unless duplicates are guaranteed to be adjacent. - Delimiters: Use
-t
withsort
orcut
for structured data (e.g., CSV). - Options:
-c
: Count lines.-d
: Show duplicates.-u
: Show uniques.-i
: Case-insensitive mode.