Wednesday, May 09, 2012

SQL Challenge - Part 1 - How to Find Contact Email Addresses in Platypus That May Cause an SMTP 550 Error.

Occasionally, if I am very lucky, someone will confront me with what I call the "SQL Challenge".  Way back in the day of the early 2000's, one of my coworkers and I would try to come up with ways to stretch our SQL knowledge.  Thanks to the "SQL Challenge", I am now capable of some fun acrobatics using just a little SQL.

The rules are fairly simple, once you have the actual challenge.  The SQL must be encapsulated into a single SQL statment - be that select, delete, insert or update.  It can have all the joins, subclauses, derived tables, and whatever else SQL provides, as long as it is included in a single statement. If inserting, updating or deleting, you are allowed a separate SQL statement for each table, but that is the only exception to the one statement rule. So, no cursors or loops. You get a bonuses for using ANSI SQL syntax and for how quickly you finish.

Today, I got one of those challenges.  Let's start with the givens.

1.  We have one or more email addresses stored in customer.email.
2.  These emails can be comma delmited or semicolon delimited.
3.  There may be spaces embedded before or after commas/semicolons.
4.  The emails may be hosted internally or by a 3rd party provider, or a combination of the two.
5.  A list of domains is stored in domain_item.domain.  These domains are hosted internally.
6.  A list of email addresses is stored in email_data.emailaddr.  These emails are hosted internally.
7.  When sending email messages to the email addresses stored on customer.email, if one of the email addresses is on a hosted domain but the email address is not in the list of hosted emails, the SMTP server will return a 550 (mailbox not found) error.

Now, the challenge.  Given all of the above, we want to find a list of customers with email addresses on customer.email that will generate a 550.  This means we want to find a list of email addresses attached to a hosted domain but are not a hosted email address.  After a little less than an hour, I made this...


select
/* The unique customer id */
    customer.id,
/* The delimited list of contact email addresses */
    customer.email,
/* A hosted domain matching one of the email addresses */
    domain_item.domain,
/* The number of email addresses in customer.email */
    len(customer.email) - len(replace(customer.email, '@', '')) as email_count,
/* The number of emails that match the hosted domain */
(
    len(replace(';'+replace(customer.email,' ','')+';',',', ';'))
    - len(replace(replace(';'+replace(customer.email,' ','')+';',',', ';'), '@'+domain_item.domain+';',''))
) / len('@' + domain_item.domain + ';') as domain_count,
/* The number of hosted email addresses that match the hosted domain  */
(
    select count(*)
      from email_data
     where customer.id = email_data.d_custid
       and ';'+replace(replace(customer.email,' ',''),',',';')+';' like '%;'+email_data.emailaddr+';%'
       and email_data.emailaddr + ';' like '%@' + domain_item.domain + ';'
) as match_count
  from customer
 inner join domain_item
    on replace(';'+replace(customer.email,' ','')+';',',',';') like '%@'+domain_item.domain+';%'
/* Strip the hosted domain from the delimited list */
/* Comparing the difference divided by the length of the hosted domain name */
/* This will let us know how many emails _should_ be hosted for this domain */
 where
(
    len(replace(';'+replace(customer.email,' ','')+';',',',';'))
    - len(replace(replace(';'+replace(customer.email,' ','')+';',',',';'), '@'+domain_item.domain+';', ''))
) / len('@'+domain_item.domain+';')
/* Compare the _should_ total against the _actual_ total */
/* If they don't match exactly, include the customer in the result */
<> (
/* Find the number of hosted email addresses for this customer */
/* That match one of the email addresses on the delimited list */
/* This will let us know how many emails are _actually_ hosted for this domain */
    select count(*)
      from email_data
     where customer.id = email_data.d_custid
       and replace(';'+replace(customer.email,' ','')+';',',',';') like '%;'+email_data.emailaddr+';%'
       and email_data.emailaddr+';' like '%@'+domain_item.domain+';'
)
go





Sunday, May 06, 2012

Base64 Encoding using MSXML


At the time we were developing the base64 encoder - covered in my last post, there already existed a quick and dirty way to perform base64 encoding/decoding through MSXML.  Even though we have decided not to pursue this for any Platypus development, it might benefit someone.  Here are examples for using MSXML for base64 encoding/decoding from within VBScript.

    Function Base64Encode (strData)
        Set objDocument = CreateObject("MSXML2.DOMDocument")
        Set objNode = objDocument.createElement("document")
        objNode.dataType = "bin.base64"
        objNode.nodeTypedValue = strData
        Base64Encode = objNode.text
    EndFunction
    
    Function Base64Decode (strData)
        Set objDocument = CreateObject("MSXML2.DOMDocument")
        Set objNode = objDocument.createElement("document")
        objNode.dataType = "bin.base64"
        objNode.text = strData
        Base64Decode = objNode.nodeTypedValue
    EndFunction



This is covered in more detail in Microsoft's Knowledge Base Article named "How To Create XML Documents with Binary Data in Visual Basic", along with examples for hex encoding and date encoding  (ISO-8601).

Saturday, May 05, 2012

Base64 Encoding in Platypus


Base64 encoding was originally designed for transmitting binary data over SMTP.  When email servers were first created, the SMTP protocol restricted the allowed characters to the first 7bits of ASCII's 8bit character set. Among other things, this prevented binary files from being sent over email.

Then came MIME and with it, base64 encoding, which converted binary files into readable text suitable for sending over email. With base64, for every 3 bytes of binary data input there are 4 bytes of encoded text returned. Compared with hex (base16) encoding, the other dominant encoding mechanism at the time, base64 encoding is much more efficient in terms of storage by lowering the amount of space needed for attachments from 2:1 (for hex) to 4:3 (for base64).  Since its creation, base64 encoding has been included in a variety of internet protocols, including SMTP authentication, HTTP authentication, XML and is used in various subsystems of the Platypus Billing System.

While Visual FoxPro - the base language for the Platypus Billing System - includes features for encoding and decoding using base64 through STRCONV(), that function does not follow the line length requirement in the RFC 2045 specification, which limits encoded lines to a maximum of 76 characters. For example, if data were encoded into 100 characters, two lines of encoded data would result.  The first line 76 characters, followed by CRLF and then the remaining 24 characters.

At first, this limitation was not a problem.  We used 3rd party ActiveX/COM libraries, named EncodeX and SmtpX from Mabry Software, for encoding email attachments and sending emails, respectively.  The other use of base64 within Platypus - our own attachment feature first included in Platypus v3 - did not have to interact with any external systems, so RFC 2045 compliance was not a requirement.

Eventually, we began work on creating a shared library that could wrap all our SMTP functionality for Platypus v5.  Up until this point, there were separate classes and libraries for the Platypus client and API.  For one, the Platypus client used ActiveX forms of the libraries, while the Platypus API used the COM forms.  Of course, this made maintaining that code doubly difficult and it was prone to inconsistent behavior between the Platypus client and API.  Because of that inconsistency, limitations in Visual FoxPro, and compatibility problems, we dropped the ActiveX form of SmtpX and completely dropped the EncodeX libraries.

Unfortunately, because FoxPro's STRCONV() function did not strictly follow MIME, sending attachments encoded using this function would often generate errors.  The strange thing was that these errors were not universal across all mail servers.  Some were more strict than others.  Anyway, to move forward, we decided to develop our own library for base64 encoding.

This was my first C++ project with Platypus.  Up until this point, my development experience - excluding school - was limited to Visual FoxPro, Visual Basic, and SQL.  After an excessive amount of research, we found a decent example in the public domain which performed encoding quickly enough and adapted it into a COM library using Visual C++ 6.0.  That library has been in use since that time for a majority of Platypus v4 and all of Platypus v5 and v6.  Not bad for a piece of of code 10 years old.  With the release of Platypus v7, that COM library has been phased out in favor of the Mailbee SMTP COM library, which handles encoding internally; but the old base64 COM library is still included in our installation sets as part of a fallback feature.

The original C++ source for the encoder has been phased out of our other components, as well; being replaced with a much improved library.  This new library takes advantage of Microsoft's SecureCRT (secure C runtime) guidelines to prevent buffer overflows, now includes an efficient decoder, and has been fuzz tested to ensure stability.  It is currently in use within the updated Mailpopper for re-encoding email attachments that have been decoded by the Mailbee POP3 COM library, which pulls emails into the Helpdesk features in Platypus.