Monday, July 18, 2011

Dynamic Linked Libraries (DLL) vs Static Libraries

We no longer use DLL's with the Platypus Billing System, except where absolutely necessary.  In some cases, with high level languages (such as Visual Basic 6 and Visual FoxPro) and 3rd party libraries (such as OpenSSL) written in C/C++, we have no other choice.  Plus there are ActiveX/COM libraries (such as MSXML, Mailbee, DBI, and Crystal Reports), which cannot be linked to statically.  But, in many cases, it can be avoided.

Without getting into an argument over which is better or worse, when the stability of a product is on the line, having DLL's creates another point of failure.  For that reason alone, it was more important for us to statically link our C/C++ code where possible.  Sure, the binaries may be larger and updates basically meant a re-install; but it has been well worth these minor difficulties. 

Since the switch to Visual C++ 2005 and static linking back in 2009, the number of C++ dependency issues we have encountered are still in the single digits - and that is only because of ActiveX/COM.  Just to relay the point, here a few of the specific cases I have encountered over the past few years.

Case #1: PHP vs Pidgin

Both PHP and Pidgin include a spell check library - Aspell - in the form of aspell-15.dll.  Since the web pages for our product are written in PHP, I - of course - need PHP installed on my dev machine.  Also, I have Pidgin installed for chatting with technical support - or anyone else at work when a face-to-face confab is not required.

Now, normally these two products are not in conflict and everything works swimmingly.  But, one day, I decided to grab one of the newer - more stable, secure, and compiled in VC 2008 - PHP editions from the PHP for Windows.  Everything worked fine at first.  Then, as happens, I needed to reboot.  Afterwards, Pidgin crashed every time I tried to start it up.

After yanking my hair out using Dependency Walker and Process Monitor, I finally figured out that it was because of Aspell.  I renamed aspell-15.dll in the PHP folder and everything started working again.  Because PHP was in the system path, Pidgin was loading the PHP version of the dll instead of the one in the Pidgin folder.  It shouldn't have done this, and I could find no logical reasoning for it, but that is what was happening.

Regardless, I didn't have the time to look into it further.  I knew the cause and could bypass it.  Spell check is nice, but completely unnecessary for IM.  So, I uninstalled Pidgin, and reinstalled it without the spell check feature.  Problem solved - or, at least, dealt with.

Case #2: PHP vs OpenSSL

With our product, we include a COM DLL (tu_app.dll) for interacting with the Tucows Email Service.  This COM library was written in Visual C++ 6.0 and was linked to some severely old versions of the OpenSSL libraries.  Again, because I decided to go mucking about with my installation of PHP, I broke yet another thing on my dev machine.

I was performing some fixes for our integration with Tucows Email and had to do some unit tests.  Every time I tried to load the COM object, the program would crash spectacularly.  After some more hair pulling, I traced it down to the OpenSSL libraries.  I replaced the DLL's installed by PHP with the one included in our installation set and it started working again.

Problem solved?  No, definitely not.  While crashng my IM client is one thing, the possibility that someone could install a special version of PHP on the same machine as our product - which is normally the case - is another.  Only the older versions of the OpenSSL libraries would work with our COM library.

Those OpenSSL libraries were ancient and would not pass any scrutiny when it came to PA-DSS.  Plus, having our product crash because we required using outdated and insecure versions of the OpenSSL libraries was completely unacceptable.  So, we ported the code from Visual C++ 6.0 to Visual C++ 2005 and statically linked to OpenSSL.  Now, the problem was solved.

Case #3: ATL Vulnerability

When this problem first came out, I was working on a separate major rewrite/port of our C++ code - specifically a Windows service for hosting our API - from Visual C++ 6.0 to Visual C++ 2005.  I had everything working.  It was beautiful and simple code, it compiled without warnings, it had no memory leaks, and it passed every test I threw at it.

Next, came compatibility testing.  After making an installation set for our product, I started testing on all the operating systems we supported - Windows 2000 up to Windows Vista/2008.  Upon start up on Windows Vista and 2008, the service immediately crashed.  It worked fine on Windows 2000 and XP.

I checked the Eventlog and found a side-by-side dependency error.  Considering this was my first venture into something newer than VC6, I wasn't fully competent with Application Manifests at the time.  So, I had no idea what this error really meant.

I checked the installation set to make sure it included the Visual C++ runtime - and it did.  I checked the installation log (and Add/Remove Programs) to make sure it installed - and it did.  After even more hair pulling, I found out about the ATL update.

The worst part was, no installation set for the Visual C++ runtime existed - which included the ATL fix.  There is now, but there wasn't at the time (or I just suck at using a search engine).  So, I could try to install the runtime files manually, but that involved a huge amount of effort and testing on all those OS's.  Especially, for something that had to be done that night.  I needed to finish my testing so we could release the next day (and possibly grab some sleep that night).  Plus, I had no idea what DLL's to install or where to install them or how to deal with WinSxS from a NSIS installation set.

So, my only option was to switch to static linking.  No more dependencies.  No unnecessary points of failure.  Or more simply, no more DLL Hell.  Finally, problem solved and a few hours sleep before the release.

Case #4: DLL Preloading Vulnerability

This is a generic definition of case #1.  A DLL from an unexpected location is loaded instead of the intended one.  While, case #1 wasn't officially an attack, it did crash a program and caused me a couple hours of unneeded stress.

Now, in cases like this, there are officially two ways to deal with it.  First, you can mitigate the attack surface by using SetDLLDirectory.  This limits the possibility of an attack, but doesn't eliminate it as I found out.  The second way is to do away with the problem altogether by static linking.  I am a firm believer that elimination is far better than mitigation - especially considering it requires no actual code change and reduces the amount of installation set testing required.