Saturday, July 21, 2012

how to make unique word list from a large text file

It is simple.

Use following command :

cat large_file.txt | tr " " "\n" | sort | uniq > list_file.txt

to extract and sort text file to unique word list :

grep -o -E 'w+' | sort -u -f test1 > test2

test1 is large text file (existing)
test2 is unique word list file (in dictionary format, one word per line) created (existing if any will be overwritten)

Finally, more than one file -

grep -o -E 'w+' | sort -u -f * > sorted_single_file

No comments:

More Articles...

Translate in your own language

Want to translate this article in your own language? Just click the Flag below