Post by MaxmillianI have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
**************************** diffemail.awk **************************
# Assumes file1.txt and file2.txt are in the current working directory
#
# gawk.exe -f diffemail.awk file2.txt
BEGIN {
while ( (getline < "file1.txt") > 0 ) { # load one file into memory
# I am too lazy to pass this as param
$0 = tolower($0)
arr[$0]++ # The array index is the key, array content currently
# is a don't care condition. You can detect duplicates
# if you want.
}
close("file1.txt") # Polite are we...
}
{ # program body, checks for file2 entry is in file1. We are reading file2 now...
$0 = tolower($0)
if ($0 in arr) { # check if a single, incoming entry, is in arr[] or not
print $0 " is in both files"
} else {
print $0 " is not in file1.txt"
}
}
**************************** END diffemail.awk **************************
file1.txt
***@computer.com
***@computer.com
***@computer.com
file2.txt
***@computer.com
***@computer.com
***@computer.com
***@in.computer
Output
PS D:\> .\gawk.exe -f diffemail.awk file2.txt
***@computer.com is in both files
***@computer.com is in both files
***@computer.com is in both files
***@in.computer is not in file1.txt
PS D:\>
You can spice up the program with as much if-then-else
that you care to. You can even store both files in memory
if you want.
*******
The gawk.exe file is in the binaries ZIP file here:
https://gnuwin32.sourceforge.net/packages/gawk.htm
Binaries Zip 1,448,542 10 February 2008 f875bfac137f5d24b38dd9fdc9408b5a
Name: gawk-3.1.6-1-bin.zip
Size: 1448542 bytes (1414 KiB)
SHA1: BDA507655EB3D15059D8A55A0DAF6D697A15F632
Program uses Windows line endings, whereas the bash shell version
would use Linux line endings.
Program does not support unicode or the like. It is
just for plain ASCII at the moment.
It's not really a practical program, just a demo of
how easy it is to whip something up.
And every language... has something it is not good at.
This language is not an exception to that.
Paul