Perl and MySQL

Perl_Mysql

Well another program that interfaces with MySQL.  This program is actually the same as my previous post Python/MySQL.  However this is written in PERL.

The instructions where as follows:

Given a table ‘mailing’:

CREATE TABLE mailing (addr VARCHAR(255) NOT NULL );

The mailing table will initially be empty.  New addresses will be added on a daily basis.  It is expected that the table will store at least 10,000,000 email addresses and 100,000 domains.

Write a Python script that updates another table which holds a daily count of email addresses by their domain name.

Use this table to report the top 50 domains by count sorted by percentage growth of the last 30 days compared to the total.

** NOTE **
– The original mailing table should not be modified.
– All processing must be done in PERL (eg. no complex queries or sub-queries)

The ‘mailing’ table did not have a ‘date’ column so I had to assume that that this field existed or there would be no way to determine when the email arrived.  MySQL does not keep a internal record of when a record has been added.  So I created the following table:

CREATE TABLE Mailing ( entryDate DATE NOT NULL, addr VARCHAR(255) NOT NULL)

My thinking was that they wanted most of the heavy lifting done by PERL.  So I tested various PERL script to read all the records from a CSV dump file, process the data and then create a report.   I tested with 10,000,000 emails records.  The format of the CSV file was: <entrydate>,<addr> (‘20150520′,’testing@hotmail.com’).

Some relevant information about the environment I tested on:

Perl version: 5, version 20
MySql version: 5.6.2.  MySQL Community Server
Operating System: Windows 8.1 64bit Core I7 with 16Gig RAM

With the Python rendition I used multi-threading to speed up the calculation and database reading.  However this script processes 10 million emails in just under 30 seconds.  There is only a 2 second difference from the time it takes to read the file and to create the report.  Two seconds is not a considerable amount of time to warrant multi-threading or multiprocessing.  So this script is a linear process.

 Feel free to download the files and experiment with other techniques of processing the data.   The files contain: MySQL_Perl.pl and 200000.txt.   You will need to change line 28 of the Perl file to reflect your MySQL database connection.  The table you will need to created is in a database called indexexchange

CREATE TABLE `dailycount` (
  `entryDate` date NOT NULL,
  `domain` varchar(255) NOT NULL,
  `total` int(11) DEFAULT NULL,
  PRIMARY KEY (`entryDate`,`domain`)
)

 If you have any questions feel free to email me at <myemail>