The UAB Spam Data Mine is a research project dedicated to archiving, analyzing, and making available to law enforcement and the public, information about on-going spam campaigns. Since 2006, Dr. Alan Sprague, Computer & Information Science PhD candidate Chun Wei, and CIS chair Dr. Anthony Skjellum have worked with Gary Warner to develop a means for parsing, storing, and clustering email messages. In the summer of 2009, we installed our new "spam cluster", which we call Rushmore, enabling us to move the UAB Spam Data Mine to higher levels of performance. We now gather more than 1 million emails per day, and have (as of January 2011) nearly 500 million spam emails able to be searched in response to law enforcement queries.
The UAB Spam Data Mine is used on a daily basis to respond to queries about a wide range of email-based crimes. Data about phishing emails is commonly provided, but we also provide information about botnets, malware distribution emails, and emails which are selling a particular product, or pretending to be a government agency. Our reports have been used to help analyze "spear phishing" campaign, identify fraudulent advertisers, identify individual computer and botnets responsible for emails claiming to be from the FBI, the IRS, the Centers for Disease Control, the Social Security Administration, and of course dozens of financial institutions. The UAB Spam Data Mine has been helpful in investigations of internet pharmacy spam, and malware campaigns such as Waledac, Koobface, Zeus, the Storm Worm, and others.
While Dr. Sprague and Chun Wei have continued to refine his clustering algorithms, several Justice Science students work to maintain the spam collection and to provide reports of current spam trends found in the email collection. JohnHenri Ewerth has been our lead Spam Analyst. He is joined by Karina Anderson and Sarah Turner, all Masters students in Criminal Justice. JohnHenri has used the UAB Spam Data Mine to provide evidence of the hosting location of much criminal content in several investigations, including working with Gary Warner on the Federal Trade Commission's case against Pricewert/3FN.
In addition, Dr. Chengcui Zhang and other students from the Knowledge Discovery & Data Mining Lab have been actively analyzing images found as spam attachments to find unique ways of clustering spam by its attachments.
Chun Wei, CIS PhD, architect of clustering algorithms
Dr. Alan Sprague, KDDM Lab
JohnHenri Ewerth, CJ Masters student
Karina Anderson, CJ Masters student
Sarah Turner, CJ Masters student
Spam Data Mining Papers:
Chun Wei, Alan Sprague and Gary Warner, “Clustering Malware-generated Spam Emails with a Novel Fuzzy String Matching Algorithm”. Accepted for publication, Proceedings of the 2009 ACM International Symposium on Applied Computing (SAC2009)
Chun Wei, A.P. Sprague, G. Warner, “Detection of network blocks used by the Stormworm botnet,” In proceedings of ACM Southeast Conference, 2008.
Chun Wei, A.P. Sprague, Gary Warner and A. Skjellum, “Mining Spam email to identify common origins for forensic applications,” in proceedings of ACM Symposium on Applied Computing (SAC), pg. 1432-1436, 2008.
Related publications from the KDDM Lab:
A Multimodal Data Mining Framework for Revealing Common Sources of Spam Images. Chengcui Zhang, Wei-Bang Chen, Xin Chen, Richa Tiwari, Lin Yang, and Gary Warner.
Spam Image Clustering for Identifying Common Sources of Unsolicited Emails. Chengcui Zhang, Xin Chen, Wei-Bang Chen, Lin Yang, and Gary Warner. International Journal of Digital Crime and Forensics, Vol 1, Issue 3, pp.1-20, 2009.
Revealing Common Source of Image Spam by Unsupervised Clustering with Visual Features. Chengcui Zhang, Wei-Bang Chen, Xin Chen, and Gary Warner. Proceedings of the 2009 ACM International Symposium on Applied Computing (SAC2009), pp.891-892, March 8-12, 2009, Honolulu, Hawaii, USA.
Image Spam Clustering, An Unsupervised Approach. Chencui Zhang and Wei-Bang Chen, accepted for publication, Proceedings of teh 2009 ACM International Workshop on Multimedia. Oct 19-24, Beijing, China 2009.