Discussion:
Lower case and diff two text files contents of email addresses
(too old to reply)
Maxmillian
2023-03-23 18:31:27 UTC
Permalink
I have two long lists of email addresses in Windows 10 as text files.

How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
Graham J
2023-03-23 18:53:00 UTC
Permalink
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
Are the email addersses separated in any way, with commas, spaces,
tabs, or semicolons?

If so, import each list into a speadsheet so that there is one email
address per line. Sort the lines. Compare the two spreadsheets.

Look up fc for file compare

fc /?
--
Graham J
Zaidy036
2023-03-23 20:42:55 UTC
Permalink
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
Are the email addersses separated in any way, with commas, spaces, tabs,
or semicolons?
If so, import each list into a speadsheet so that there is one email
address per line.  Sort the lines.  Compare the two spreadsheets.
Look up fc for file compare
fc /?
free for non-commercial ASAP Utilities has a function to mark duplicates
Paul
2023-03-23 19:31:20 UTC
Permalink
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
**************************** diffemail.awk **************************

# Assumes file1.txt and file2.txt are in the current working directory
#
# gawk.exe -f diffemail.awk file2.txt

BEGIN {
while ( (getline < "file1.txt") > 0 ) { # load one file into memory
# I am too lazy to pass this as param
$0 = tolower($0)
arr[$0]++ # The array index is the key, array content currently
# is a don't care condition. You can detect duplicates
# if you want.
}
close("file1.txt") # Polite are we...
}

{ # program body, checks for file2 entry is in file1. We are reading file2 now...
$0 = tolower($0)
if ($0 in arr) { # check if a single, incoming entry, is in arr[] or not
print $0 " is in both files"
} else {
print $0 " is not in file1.txt"
}
}

**************************** END diffemail.awk **************************

file1.txt
***@computer.com
***@computer.com
***@computer.com

file2.txt
***@computer.com
***@computer.com
***@computer.com
***@in.computer

Output

PS D:\> .\gawk.exe -f diffemail.awk file2.txt
***@computer.com is in both files
***@computer.com is in both files
***@computer.com is in both files
***@in.computer is not in file1.txt
PS D:\>

You can spice up the program with as much if-then-else
that you care to. You can even store both files in memory
if you want.

*******

The gawk.exe file is in the binaries ZIP file here:

https://gnuwin32.sourceforge.net/packages/gawk.htm

Binaries Zip 1,448,542 10 February 2008 f875bfac137f5d24b38dd9fdc9408b5a

Name: gawk-3.1.6-1-bin.zip
Size: 1448542 bytes (1414 KiB)
SHA1: BDA507655EB3D15059D8A55A0DAF6D697A15F632

Program uses Windows line endings, whereas the bash shell version
would use Linux line endings.

Program does not support unicode or the like. It is
just for plain ASCII at the moment.

It's not really a practical program, just a demo of
how easy it is to whip something up.

And every language... has something it is not good at.
This language is not an exception to that.

Paul
Herbert Kleebauer
2023-03-24 00:49:46 UTC
Permalink
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
Because you posted in alt.msdos.batch, here a batch solution:

@echo off

:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)

if [%1]==[sub] goto :sub
sort email1.txt|find "@" >email1s.txt
sort email2.txt|find "@" >email2s.txt
cmd /c %0 sub
del email1s.txt
del email2s.txt
goto :eof

:sub
setlocal EnableDelayedExpansion

3<email1s.txt 4<email2s.txt (
set line1a=&set /P line1a=<&3
set line2a=&set /P line2a=<&4
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4

for /l %%i in (1,1,100000) do (
if /I [!line1a!]==[!line2a!] (
if [!line1a!]==[] exit
set line1a=!line1b!
set line2a=!line2b!
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4
) else (
if /I [!line1a!]==[!line2b!] (
echo !line2a! in email2.txt but not in email1.txt
set line2a=!line2b!
set line2b=&set /P line2b=<&4
) else (
if /I [!line1b!]==[!line2a!] (
echo !line1a! in email1.txt but not in email2.txt
set line1a=!line1b!
set line1b=&set /P line1b=<&3
) else (
echo !line1a! in email1.txt but not in email2.txt
echo !line2a! in email2.txt but not in email1.txt
set line1a=!line1b!
set line2a=!line2b!
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4
)
)
)
)
)
Andy Burnelli
2023-03-24 03:35:01 UTC
Permalink
Post by Herbert Kleebauer
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
@echo off
:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)
if [%1]==[sub] goto :sub
cmd /c %0 sub
del email1s.txt
del email2s.txt
goto :eof
:sub
setlocal EnableDelayedExpansion
3<email1s.txt 4<email2s.txt (
set line1a=&set /P line1a=<&3
set line2a=&set /P line2a=<&4
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4
for /l %%i in (1,1,100000) do (
if /I [!line1a!]==[!line2a!] (
if [!line1a!]==[] exit
set line1a=!line1b!
set line2a=!line2b!
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4
) else (
if /I [!line1a!]==[!line2b!] (
echo !line2a! in email2.txt but not in email1.txt
set line2a=!line2b!
set line2b=&set /P line2b=<&4
) else (
if /I [!line1b!]==[!line2a!] (
echo !line1a! in email1.txt but not in email2.txt
set line1a=!line1b!
set line1b=&set /P line1b=<&3
) else (
echo !line1a! in email1.txt but not in email2.txt
echo !line2a! in email2.txt but not in email1.txt
set line1a=!line1b!
set line2a=!line2b!
set line1b=&set /P line1b=<&3
set line2b=&set /P line2b=<&4
)
)
)
)
)
That is just sheer genius.

You should win a nobel prize for that as a diff has been the bane of
Windows users for years!

It's going into my batch folder immediately!
Herbert Kleebauer
2023-03-24 08:13:20 UTC
Permalink
Post by Andy Burnelli
That is just sheer genius.
For genius solutions, you should ask ChatGPT.
From a discussion in de.comp.os.ms-windows.misc:

set v=39/2023

How to extract the two numbers in into variables v1 an v2

The answer from ChatGPT:

set v=39/2023
set v1=%v:/=&rem.%
set v2=%v:\=&rem.%
echo "%v%", "%v1%", "%v2%"


Ok, v2 is wrong, but that is the trivial part of the question.
But v1 is really good!!!

set v=39/2023
set v1=%v:/=&rem.%
set v2=%v:*/=%
echo "%v%", "%v1%", "%v2%"
Andy Burns
2023-03-24 09:06:54 UTC
Permalink
Post by Herbert Kleebauer
set v=39/2023
set v1=%v:/=&rem.%
set v2=%v:\=&rem.%
echo "%v%", "%v1%", "%v2%"
Ok, v2 is wrong, but that is the trivial part of the question.
But v1 is really good!!!
It it actually using undefined CMD behaviour?

I've never seen any reference to using & in a variable substitution

Or that substitutions are handled like a sub-command which you could
stick a REM statement in the middle of to ignore what follows
Herbert Kleebauer
2023-03-24 10:17:47 UTC
Permalink
Post by Andy Burns
Post by Herbert Kleebauer
set v=39/2023
set v1=%v:/=&rem.%
It it actually using undefined CMD behaviour?
Normal behavior, "/" is replaced by "&rem.", so you get:

set v1=39&rem.2023

which is equivalent to the two lines:

set v1=39
rem.2023
Andy Burns
2023-03-24 11:04:18 UTC
Permalink
Post by Herbert Kleebauer
Post by Andy Burns
It it actually using undefined CMD behaviour?
set v1=39&rem.2023
set v1=39
rem.2023
Still somewhat surprised that it works without using
enabledelayedexpansion (or calling cmd.exe /v)

I'd have thought after it parsed the original SET statement, it would
execute it "as was" not executing the &REM from after the substitution
had been done.
Herbert Kleebauer
2023-03-24 13:20:44 UTC
Permalink
Post by Herbert Kleebauer
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
@echo off
:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)
Sorry, this code doesn't work at all. Was to late yesterday,
but I wanted to try the idea of reading more input files
at the same time (was presented many years ago in a.m.b.nt).
Better use a small C program.
Zaidy036
2023-03-24 15:12:50 UTC
Permalink
Post by Herbert Kleebauer
Post by Herbert Kleebauer
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
@echo off
:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)
Sorry, this code doesn't work at all. Was to late yesterday,
but I wanted to try the idea of reading more input files
at the same time (was presented many years ago in a.m.b.nt).
Better use a small C program.
Paste email file into Notepad or Notepad++ and select all and right
click and change to lower or upper case in one click if case compare is
a problem. Then in future only use one case because email addresses do
not care.
Herbert Kleebauer
2023-03-27 11:48:10 UTC
Permalink
Post by Herbert Kleebauer
Post by Herbert Kleebauer
Post by Maxmillian
I have two long lists of email addresses in Windows 10 as text files.
How can I lowercase everything and then get a diff of what email
addresses are in one text file but not in the other text file?
@echo off
:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)
Sorry, this code doesn't work at all. Was to late yesterday,
but I wanted to try the idea of reading more input files
at the same time (was presented many years ago in a.m.b.nt).
Better use a small C program.
Because I don't like unfinished tasks, here a version which
should work:

@echo off

:: list all email addresses which are not in both
:: input files (email1.txt, email2.txt)

if [%1]==[sub] goto :sub
sort email1.txt|find "@" >email1s.txt
sort email2.txt|find "@" >email2s.txt
cmd /c %0 sub
del email1s.txt
del email2s.txt
goto :eof

:sub
setlocal EnableDelayedExpansion

3<email1s.txt 4<email2s.txt (
set line1=&set /P line1=<&3
set line2=&set /P line2=<&4

for /l %%i in (1,0,2) do (
if /I [!line1!]==[!line2!] (
if [!line1!]==[] exit
set line1=&set /P line1=<&3
set line2=&set /P line2=<&4
) else (
if /I [!line1!] geq [!line2!] (
echo !line2! in email2.txt but not in email1.txt
set line2=&set /P line2=<&4
) else (
echo !line1! in email1.txt but not in email2.txt
set line1=&set /P line1=<&3
)
)
)
)

Loading...