SQL: Full Circle

Sunday, 06 July 2003

If I go back to the beginnings of my software career, when I was just a wee lad in the industrial town of Port Talbot, on the coast of South Wales, things seemed very simple. Coding was about a means to an end. Despite the apparent limitations on the complexity of programs in the 8-bit computer era - a processor that couldn't muster 2MHz, the typical user RAM being 64K - people would bend over backwards to realise their ideas and concepts. Such marvellous things came to pass that were not expected from the technology - games and demos in particular stand out.

Optimization, though, creates opportunities for bugs and holes; turn your back for 30 seconds, and something evil will have sneaked into the code. It will then fester, for approximately 7.5 weeks, then hatch into your worst nightmare. You often find that a whole tranche of code needs to be rewritten to resolve the bug, including all the calls that pointed to that code. Back then, that was the way it was. Optimization was a necessary evil - part of the turf. It was just the way things were and it didn't occur to me that things could or should be any different.

Over time, though, hardware matured. I don't need to optimize above and beyond the call of duty anymore. It's more important to create something that works and people can trust. A new language has replaced the old: from dexterity, power, performance to quality, robustness, reliability. I spent time learning this new language the hard way - my programs took it upon themselves to teach me. Through lessons such as my Ph.D work, I was forced to re-evaluate what it meant to create software.

This sounds like just another waltz down memory lane, but bear with me a moment more.

SQL

As hardware matured, system sizes increased in leaps and bounds. Now they're very big and doing a lot more. I'm not talking about a thumbnail DB application that manages Uncle Greg's photo album, I'm thinking more of corporate systems and web site engines. Data sets are big; calculations are big; the number of users is increasing. If these volumes aren't handled correctly, then you can forget about it - performance is key and I can't wait 5 minutes for a report that pisses me off with "No records found, please try again."

The priority, of course, is software that works. If performance is important, then my immediate reaction is to attempt to build smart code, work within the rules, within the tenets of good design, to create something fast but clean. But what does this really mean?

SQL is the language that powers most databases. Even though it is at the heart of many systems today, it's not really a programming language. I'll repeat that: it's not really a programming language. SQL is how we interact with the database; in it's simplest form it puts data in and takes data out. However, because database queries are the most serious performance issue in large systems, the SQL itself becomes the target of performance optimization.

First, you have the bandwidth to worry about. It's not a very good idea to grab a mountain of data from the database and do all the calculations on a local client. That's a lot of time wasted and data being transmitted down the wire - it would be a lot quicker to sort it all out on the database. Developers, therefore, move programming code out of the application and into the database - SQL has programming-like features such as procedures and conditional statements to make this all happen.

Now, usually, it's faster to do one query rather than two, so combining database queries is seen as a good thing. Developers often focus on how to reduce the number of queries, which of course means the remaining query is complex and difficult to understand for the uninitiated. This reminds of what you can do with standard C syntax, which is to combine all sorts of weird shit into a single instruction - multiple assignments, comparisons, pointer dereferencing, function pointer calls plus our favourite the "?" ternary operator. It's efficient, but don't even think about readability. And if code cannot be read, then bugs are not easy to see. If you don't agree, then you obviously are in the wrong place.

Finally, the database itself often has tricky issues which need to be considered. Caching, tempdb, locking, the potential for deadlocks, statistics, indexes... these all have impact on the SQL. While it certainly would be nice to sit back and say to the DBA "please sort this out", the fact of the matter is that SQL which does not work with the database can create disastrous performance. So the SQL is refined to "co-operate" and obey orders.

So, for the purposes of optimization, program logic is migrated to SQL, queries are combined and then further refined to take advantage of the database characteristics.

What you can be left with is a mess that no-one will touch with a barge pole. You can bet your beanburger that it didn't get the commenting it deserved either.

Full Circle

What seemed to be a problem of a previous age, selling your coding soul for speed, is still here after all. For the last couple of years, I've been thinking a lot about the impact SQL has on large program quality. SQL can make things look cleaner, "SELECT name FROM authors" looks nice and readable, but life is rarely a list of authors' names. Database-related programming is becoming database-centric programming. Right now, I absolutely abhor SQL and it's not far off Fortran where I'm concerned. What SQL now represents is a step backward in the evolution of programming.

However I wouldn't say all is lost. Rather than desire the separation of database and software to be reinforced, I think we're headed for a far stronger integration of the two - one which is much more elegant than what we face right now. ADO.NET has taken some interesting steps in attempting to put the two together, but for large systems, nothing beats a stored procedure running directly off the server. I'm not sure of the shape of this "Database Programming Environment" ... but I look forward to it.

(You might like to refer to the Law of Leaky Abstractions which is central to this particular problem.)

End