Earlier this week I experimented, once again, with using temporary hash tables to speed up intermediate processing. Temp tables can be very useful for creating “optimizer fences” that serve to simplify queries to the point where the optimizer will consistently (more or less) come up with a good plan if not a great one. Temp heaps seem well suited to this purpose, when used simply as low-overhead FIFO structures.
The operant word is “seem.” I created a temp heap to contain permutations of words in people’s names, as part of a search algorithm I am developing for my Proximity database. It typically contains a few thousands rows, and serves to help drive a pattern matching process for locating people that are common to more than one family tree. I expected to see outgoing row counts of at most 10,000 or so. What I actually got was a table spool with up to 3 BILLION rows out!
Adding a clustered index fixed it. I probably deserved that for ever questioning whether the overhead of a clustered insert would be worthwhile. But there are other situations in which a heap does work well. In this case I had two temp tables to be joined, the aforementioned permutation table and another table containing the search patterns. I started out with both as heaps, and that was my mistake. The pattern table is indeed a simple FIFO queue that acts as an optimizer fence, for which a heap is fine. The permutation table, on the other hand, participates in a complex search that demands indexing! 3,000,000,000 rows was the optimizer’s way of informing me that I had lost my mind.
Thank you, optimizer.