Category Archives: Architecture

Data Architecture

Our Experience with NonClustered Columnstore indexes

This is a presentation that I gave at the Austin SQL Server Users group.

Summary:
We upgraded to SQL 2016 in part to take advantage of the updateable nonclustered columnstore indexes for our web based Credit Risk product for D&B. I will show you the before and after for our product as well as some general exercises using columnstore in a BI over OLTP .

Here is the link to the presentation video:

http://usergroup.tv/videos/our-experience-with-nonclustered-columnstore-indexes

to sp or not to sp…that is the question

SQL Server stored procedures vs. application ad-hoc queries or ORMs

If you have worked for any significant amount of time in the tech industry, you have probably been involved in or witness to a passionate debate about where Domain Logic belongs.

As with most architectural decisions, many compromises are made weighing the pros and cons of each possible approach. These decisions can only be made by those with a very deep technical expertise in object oriented and relational paradigm. Since that expertise can rarely be found in one person, it is usually a synergy between an application, data, and systems architect.

Software engineering in all tiers is frequently full of uncertainties. These uncertainties are one of the causes of missed deadlines and future performance problems, and these are bad for business. The best way to deal with these sorts of uncertainties is to have faith in those best to eliminate them. In the case of the middle-tier, it is the expert in data structures, algorithms, memory management and APIs. In the data tier, the expertise lies with the data developers and the DBAs. Each tier relies on its abstractions in order to maintain its guarantees for correctness and performance. It is humans with expertise in their specific areas that are always superior to the tools, compiler optimizations aside.

Object Relational Impedance Mismatch

Much has been written about the differences between the Object Model and the Relational Model. You can learn quite a bit by simply googling the term. The Wikipedia page has quite a bit of information. The best explanation that I have heard is this: Objects model behavior, and Relations model data.

While the two have some similarities, including forms of normalization (3onf, and 3nf), there are distinct differences. These can make the interaction between the two a little challenging.

There are many tried and true techniques that exist to overcome these challenges, and the key to successful interaction lies in the deep knowledge of the technologies being used. That is essential to understanding the compromises being made throughout the development process.

Misconceptions

 “There is no performance difference between an inline query and a stored procedure”

I have heard this many times. I even read one paper that stated incorrectly that “stored procedures are compiled at runtime like any other SQL statement.”  This is a clear example of the complications that assumptive ignorance can cause in the architectural dialog.

Stored Procedures are compiled the first time the first time they are executed, and SQL server then stores an execution plan. Subsequent executions can skip this step altogether. [1]

It is true that inline queries can also have saved and reusable execution plans, but this is only available to simple queries. Anything other than selecting some columns by the clustered key will not render this benefit. If you add an additional predicate (AND, OR, etc.), and/or a join, you will not get this benefit. SQL Server will have to create an execution plan for every execution of the query.

Let’s take a look at some examples.

Fig. 1 used a saved execution plan, and the rather complex procedure ran in 1ms.

cachedplan

Fig. 2 I will force the stored procedure to recompile and create a new execution plan to demonstrate the overhead. This procedure ran in 321ms, with significant cpu usage compared to fig. 1

recompiled procedure
recompiled procedure

Fig.2 The procedure is forced to recompile the plan to simulate an inline query.

It is true that sometimes execution plans become out of date due to changing statistics, but this is easily discovered and remedied by a good DBA. Again, expertise is the key.

More than just speed…stability

There are many more advantages to using a stored procedure, but for the sake of brevity, I will leave those for another document.

“There is no good source control for SQL Server”

This one speaks to the differences in the object oriented and relational cultures more than anything else. Yes, source control, as well as deployment tools exist, of course they do. Microsoft Team Foundation Server [2] has great support for database source control as well as automated deployment. In fact, it is the very same source control integrated with Visual studio. There are also several add on solutions from vendors like Redgate.

“There is no way to unit test stored procedures”

Of course there are ways to unit test stored procedures, and in fact are integrated into TFS[3]. If the technology stack was all Microsoft, the application code, as well as the database objects are sourced in the same repository, can be branched, built, unit tested, and deployed together. Like any other source control, this requires some time, discipline, and effort to set up correctly. The effort is multiplied if the source control is created after many databases are already being used in production with active development going on.

Companies would never begin building any application without source control for the application code, but somehow fail to realize the need for database source control until the issue becomes dire, and the effort is then much more difficult.

Conclusion

No matter how good your ORM is, using any single sided solution ignores the strengths of the other technologies. To keep all of the logic in the application layer omits an entire layer of optimization in the database. An ORM is a compromise, with decisions made on one side of the playing field. It is just as lopsided to include application logic in the database. The trade offs that are made to get an application up very quickly using an ORM very rarely sustain growth, or scale without significant cost to the business. To model the data to support the behaviors, rather than the data, limits the uses of the data for any other business needs.

Stable and sound architecture comes only from making decisions with the understanding of the strengths and weaknesses of the technologies being considered. There are things that OOP was designed to do very well, and there are things that TSQL does very well. It is the expertise in the two languages that overcomes the mismatch. Don’t use a screwdriver as a hammer just because you have no true understanding how a hammer works. Let the screwdriver expert and the hammer expert decide what trade offs make sense for the business.

-Yell McGuyer