Tech Trouble?: how to make unique word list from a large text file

how to make unique word list from a large text file

It is simple.

Use following command :

cat large_file.txt | tr " " "\n" | sort | uniq > list_file.txt

to extract and sort text file to unique word list :

grep -o -E 'w+' | sort -u -f test1 > test2

test1 is large text file (existing)
test2 is unique word list file (in dictionary format, one word per line) created (existing if any will be overwritten)

Finally, more than one file -

grep -o -E 'w+' | sort -u -f * > sorted_single_file

how to make unique word list from a large text file

No comments: