8/15/10

When denormalization helps

I had developed a web page that displayed the log of all the invoices in a given project. There were about ten columns in the log, each one displaying a different piece of information about the invoice. These invoices were created against contracts and there is a one-many relationship between a contract and an invoice.

Two of those columns,"WorkCompletedToDate" and "MaterialPurchasedToDate" represented the amount of work done and material purchased(in dollars) prior to the current invoice on a given contract. So if we were looking at invoice No.3 on a contract X, the "WorkCompletedToDate" and "MaterialPurchasedToDate" represent the sum of the values for those fields from invoice No.1 and invoice No.2 on the same contract X.The values in these two columns were not stored in the database and were always being calculated at runtime.

There were thousands of invoices in the project and the runtime calculations for these two columns began to slow down the web page.I tried to optimize the stored procedure and functions and other code relating to the log, but nothing seemed to help.

I then hit upon the idea of denormalizing the invoices table by adding two columns to store these values, instead of calculating them at runtime. This would also require a script to populate these values for all the existing invoices in the database. After I made these changes, the page began to load real fast.

Though denormalization might not be the best approach in a lot of scenarios and must be used with caution, it does come handy in some situations, and is definetely an option that must be kept open, especially when faced with performance issues.

No comments: