import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
public class Validator {
private static final String ORG_FILE = "org.txt";
private static final String NEW_FILE = "new.txt";
public static void main(String[] args) throws IOException {
filterEmailsFile();
}
private static void filterEmailsFile() throws IOException {
File file = new File(ORG_FILE);
BufferedReader in = new BufferedReader(new FileReader(file));
try {
PrintWriter pw = new PrintWriter(new FileWriter(NEW_FILE));
List<String> emails = new ArrayList<String>();
try {
EmailValidator emailValidator = new EmailValidator();
while (true) {
String email = in.readLine();
if (email == null) break;
email = email.replaceAll("\\s", "");
if (emailValidator.isValid(email)) emails.add(email);
}
Collections.sort(emails);
for (String email : emails) {
pw.println(email);
}
} finally {
pw.close();
}
} finally {
in.close();
}
}
}
===========================================================================
import java.util.regex.Pattern;
public class EmailValidator {
private final Pattern pattern;
public EmailValidator(){
pattern = Pattern.compile("^\\w+(\\.\\w+)*@\\w+(\\.\\w+)*\\.\\p{Alpha}+$");
}
public boolean isValid(String email) {
return pattern.matcher(email).matches();
}
}
Refactorings
No refactoring yet !
Marco Valtas
October 12, 2007, October 12, 2007 01:10, permalink
Here my thoughts about your code. First I see you are using Java 5 so I will use the Scanner class. It's always very hard to validate emails address, in "Mastering Regular Expressions" Jeffrey Friedl shows 6.598 bytes long regexp (end of the book) to validate emails.
So I prefer use one specialized class to check the email. In mail.jar in jboss distribution you find one class to handle this (javax.mail.internet.InternetAddress).
The code bellow works, but I do not trim empty spaces and etc since I don't your real input format.
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
public class ValidatorRefact {
private static final String ORIG_FILE = "org.txt";
private static final String NEW_FILE = "new.txt";
public static void main(String[] args) throws IOException {
// a keep email list...
List<String> origEmails = new LinkedList<String>();
// buffering original file
Scanner origScan = new Scanner(new File(ORIG_FILE));
while(origScan.hasNextLine()) {
origEmails.add(origScan.nextLine());
}
origScan.close();
// our cleaned file
FileWriter newFile = new FileWriter(new File(NEW_FILE));
// this class is our email check helper.
EmailValidatorRefact emailCheck = new EmailValidatorRefact();
// the output loop, only valid emails are written
for(String email : origEmails ) {
if(emailCheck.isValid(email)) {
newFile.write(email + "\n");
}
}
newFile.close();
}
}
class EmailValidatorRefact {
public boolean isValid(String email) {
if(email == null) return false;
try {
new InternetAddress(email,true);
} catch (AddressException e) {
return false;
}
return true;
}
}
matrixise
October 15, 2007, October 15, 2007 13:14, permalink
Why don't use a static method in your ValidatorRefact class ?
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
public class ValidatorRefact {
private static final String ORIG_FILE = "org.txt";
private static final String NEW_FILE = "new.txt";
public static void main(String[] args) throws IOException {
// buffering original file
Scanner origScan = new Scanner(new File(ORIG_FILE));
// our cleaned file
FileWriter newFile = new FileWriter(new File(NEW_FILE));
String email = null;
while(origScan.hasNextLine()) {
email = origScan.nextLine();
if( EmailValidatorRefact.isValid( email ) ) {
newFile.write( email + "\n" );
}
origScan.close();
newFile.close();
}
}
class EmailValidatorRefact {
public static boolean isValid(String email) {
if(email == null) return false;
try {
new InternetAddress(email,true);
} catch (AddressException e) {
return false;
}
return true;
}
}
matrixise
October 15, 2007, October 15, 2007 13:17, permalink
Why don't use a static method in your ValidatorRefact class ?
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import javax.mail.internet.AddressException;
import javax.mail.internet.InternetAddress;
public class ValidatorRefact {
private static final String ORIG_FILE = "org.txt";
private static final String NEW_FILE = "new.txt";
public static void main(String[] args) throws IOException {
// buffering original file
Scanner origScan = new Scanner(new File(ORIG_FILE));
// our cleaned file
FileWriter newFile = new FileWriter(new File(NEW_FILE));
String email = null;
while(origScan.hasNextLine()) {
email = origScan.nextLine();
if( EmailValidatorRefact.isValid( email ) ) {
newFile.write( email + "\n" );
}
}
origScan.close();
newFile.close();
}
}
class EmailValidatorRefact {
public static boolean isValid(String email) {
if(email == null) return false;
try {
new InternetAddress(email,true);
} catch (AddressException e) {
return false;
}
return true;
}
}
Marco Valtas
October 16, 2007, October 16, 2007 23:31, permalink
matrixise, this is more a design choice, theres no performance impact, at least with 15000 emails, in a test a made here. As I don't know if EmailValidator class will have another methods I prefer stay in non-static territory. If you are positive that EmailValidator will be something like a util class( see Effective Java by Joshua Bloch) you can declare isValid() as static.
thesuntoucher.myopenid.com
November 6, 2007, November 06, 2007 22:20, permalink
Okay, guys, this is my version. I would like to point some things out. The Regex i use, i found years ago somewhere on the internet and use since than, checks a little more: TLDs can have between two and seven chars and so on.
I tried to make this more generic an implemented a method that gets a reader and a writer, so you could give it a FileReader or any other Reader.
Last but not least, i don't read this in a List and don't sort it so you could throw gigabytes of data in it without running out of memory.
hope that helps
Tim
import java.io.*;
import java.util.regex.Pattern;
public class RefactorMyCodeEmail {
private static Pattern validEmail = Pattern.compile("[\\w-]+(?:\\.[\\w-]+)*@(?:[\\w-]+\\.)+[a-zA-Z]{2,7}");
public static void main(String[] args) throws IOException {
FileReader in = null;
FileWriter out = null;
try {
in = new FileReader(new File(args[0]));
out = new FileWriter(new File(args[1]));
filterEmailsFile(in, out);
}finally{
if(in != null) in.close();
if(out != null) out.close();
}
}
private static void filterEmailsFile(Reader in, Writer out) throws IOException {
BufferedReader reader = new BufferedReader(in);
BufferedWriter writer = new BufferedWriter(out);
String email = null;
while ((email = reader.readLine()) != null) {
if(validEmail.matcher(email).matches()){
writer.write(email);
writer.newLine();
}
}
}
}
This class validates email addresses stored in a .txt file and creates a new one that only contains valid email addresses.