Sunday, October 17, 2010

Optimizing Performance

String vs StringBuilder

 

Developers write code and want it to perform as quickly as possible. For example, developers often try to optimize C# by using String. String is a constant object, which means that, by default, it can only be created and read, but never modified. Developers can use String when they do not think they will want to modify it as the program is executed:

 

public class MakeMessage

 {

   public string GetMessage (string[] words)

   {

       string message = "";

       for (int i = 0; i < words.Length; i++)

       {

           message += " " + words [i];

       }

       return message;

   }

 }

 

The segment of code above gets a new word every time it passes through the loop. It appears that a new message string is created and that the message is being appended, but these appearances are deceiving. In fact, the old memory under the message is being disregarded and new memory is being allocated. The old information from the message is copied to the new memory and then a new character is added at the end. The new memory created is one word longer than the old memory. The code itself is deceptive to read in this instance. Because it contains message +=, the code looks as if it will increment the message. However, the message is actually being created from scratch each time.

 

.NET developers working with C# need to understand the difference between String and StringBuilder. StringBuilder is a dynamic object. Because it can be modified, StringBuilder can truly be appended, rather than merely giving the false appearance of being appended. Developers can use StringBuilder in performance-critical sections of code, such as loops.

 

In the following example, StringBuilder is used to modify the string inside the loop:

 

public string GetMessage (string[] words)

 {

     StringBuilder message = new StringBuilder();

     for (int i = 0; i < words.Length; i++)

     {

         message.Append(" ");

         message.Append(words[i]);

     }

     return message.ToString();

 }

 

Therefore, developers should define the message as StringBuilder rather than String in situations where they will want to modify the code. Developers will then be able to use the method Append to modify the memory without creating new memory each time through the loop.

 

 

Optimizing Memory

 

Developers want to optimize memory in their applications, but how do they get rid of chunks of memory that are no longer needed?

 

The problem occurs when there are pointers in the program that refer to the chunk of memory. They are essentially forgotten pointers because the memory is no longer needed. The pointers are useless, and the developer did not intend to keep them, but they remain in the program because the developer neglected to null them.

 

In the following C# example, we have the function MakeSplashImage. We assume it calls new and uses a lot of memory. This function result can be a display or whatever else we want. We are sending the memory reference for bigSplashImage to the function that displays it. After the image is displayed, the memory is really not needed in the program:

 

public class GUI

{

 public static void Main(string[] args)

 {

     Graphics bigSplashImage = MakeSplashImage();

     DisplayMomentarily(bigSplashImage);

     while (MoreProcessInput())

     {

         Process();

     }

 }

}

 

However, notice in the above example that the reference pointer has not been nulled. Therefore, developers who want to avoid leaving unwanted references in their .NET programs should zero these references as soon as they no longer need them:

 

public class GUI

{

public static void Main (string[] args)

 {

     Graphics bigSplashImage = MakeSplashImage();

     DisplayMomentarily(bigSplashImage);

     bigSplashImage = null;

     while (MoreProcessInput())

     {

         Process();

     }

 }

}

 

Memory issues are very common when developers use temporary variables in .NET. When they forget to zero their temporary variables, they end up with what essentially amounts to a memory leak. Therefore, nullify temporary references to objects taking large amounts of memory as soon as they are no longer needed.

 

 

 

Optimizing External Resource Usage

 

When programming in .NET, developers need to be aware that there are many layers of code working underneath them. Though developers may be dealing mainly with high-level language performing complicated functions, the layers of code underneath that language are performing a host of other functions. These layers will behave differently depending on what is done to the code at the upper level.

 

The lower layers of code are vital to the proper functioning of an application; they are the behind-the-scenes workers that make high-level functionality possible. If these hidden layers are ignored, they will most likely come back to cause problems in the application.

 

One lower layer of code that cannot be ignored is the communication with external resources. There is a central rule that should guide all interactions with external resources. Simply put, if an external resource is opened, it should be closed as soon as it is finished being used. The following coding example deals specifically with file inputs, which are just one type of external resource. However, the lesson from this example applies to interactions with any external resources. Notice that this C# code looks as if it makes a clean exit:

 

public class ReadFile

{

 public static string Read(String path)

 {

     StreamReader reader = new StreamReader(path);

     String contents = reader.ReadToEnd();

     reader.Close();

     return contents;

 }

}

 

However, the code above is deceptive: The reader.Close command does not make a clean exit. Whether talking to a database, opening files, opening sockets, or sending instructions to the screen, developers need to close any external resources they have opened. The dangerous aspect of dealing with external resources is that when developers write a piece of code such as the aforementioned segment, it seems to run well. However, developers may encounter an error when dealing with an external resource.

 

In a case such as the above, the code will throw an exception indicating a serious problem. Exceptions can transfer control to different parts of the program. In the previous example, if the method ReadToEnd throws an exception, control will be transferred out of the method read and the file will not be closed.

 

In this situation, developers may choose to handle the exception and ignore the problem. However, they should stop to consider that the exception might have come from the interactions with external resources. If developers merely handle the exception, they will face the strong possibility of a resource leak. In other words, they will run out of resources at some point, which means they will not be able to open files or sockets and will not have access to the database.

 

If the code throws exceptions when using external resources, developers need to write a finally block. The finally block will always be executed, regardless of whether code is exited through exceptions or through normal execution. When the finally block is used, developers are guaranteed their code will clean up after them by closing all of their external resources. Therefore, developers should always exit code cleanly by using the finally block:

 

public class ReadFile

{

public static String Read(String path)

 {

     StreamReader reader = null;

     try

     {

         reader = new StreamReader(path);

          string contents = reader.ReadToEnd();

         return contents;

     }

     finally

     {

         if (reader != null)

         {

             reader.Close();

         }

     }

 }

}

 

The segment of code above is an example of how to clean up in the finally block rather than in the original code. The finally block is inserted just before the reader.Close command. Developers now know that they will be able to close the external resources and exit the code. They will also be able to open the external resources the next time they need to use them. Using the finally block guarantees that developers will not leak resources. Therefore, it is useful to write a finally block to clean up when dealing with external resources.

 

 

Using Implicit Cast Operators

 

When using implicit cast operators, developers must be aware of all potential consequences. For example, errors in the code can stem from the improper use of Vector and Figure classes in conjunction with implicit cast operators:

 

using System;

 

 public class Figure

 {

     public void Transform(double d)

     {

         //this method resize figure with d factor

     }

 

     public void Transform(Vector v)

     {

         //this method moves figure using vector

     }

 }

 

 public class Vector

 {

     public static implicit operator double(Vector v)

     {

         return Math.Sqrt(v.x * v.x + v.y * v.y);

     }

 

     private double x;

     private double y;

 

     public Vector(double x, double y)

     {

         this.x = x;

         this.y = y;

     }

 

     public static void Main()

     {

         Figure f = new Figure();

         Vector v = new Vector(1, 1);

         f.Transform(v);

     }

 

     //...

 }

 

As seen in the code above, the developer used the class Vector with an implicit cast operator to convert Vector into its length as double. In another part of this code, a Figure class provides methods for its Transformations.

 

In its current state, calling Transform on the Figure object with a Vector object as an argument causes figure translation. However, if another programmer removes the method Transform(Vector) and is not aware of the implicit cast operator for the Vector class, the code will behave improperly:

 

using System;

 

 public class Figure

 {

     public void Transform(double d)

     {

         //this method resize figure with d factor

     }

 }

 

 public class Vector

 {

     public static implicit operator double(Vector v)

     {

         return Math.Sqrt(v.x * v.x + v.y * v.y);

     }

 

     private double x;

     private double y;

 

     public Vector(double x, double y)

     {

         this.x = x;

         this.y = y;

     }

 

     public static void Main()

     {

         Figure f = new Figure();

         Vector v = new Vector(1, 1);

         f.Transform(v);

     }

 

     //...

 }

 

There is no error caused by the missing Transform(Vector) method. Instead, the Transform(double) method is called and Figure is scaled with vector length factor instead of being translated.

 

If a developer changes the code design so that an implicit cast operator is not necessary, the problems shown previously will not occur. For example, a developer may instead use an explicit cast operator, as shown below:

 

public static explicit operator double(Vector v)

{

 return Math.Sqrt(v.x * v.x + v.y * v.y);

}

 

 

Object-oriented Programming

 

Object-oriented programming (OOP) makes code reusable, easily maintainable, and better organized. However, there are a number of pitfalls; for example:

 

public class BankAccount

 {

   public int _balance;

 }

 

The class BankAccount is used to represent a bank account, but the variable used to represent the balance has been made public. Even though declaring the variable public is legal according to the .NET framework, it makes the code very difficult to modify and improve. There is a safer way of writing the code and achieving the same effect:

 

public class BankAccount

 {

   private int _balance;

   public int Balance

   {

       get

       {

           return _balance;

       }

       set

       {

           _balance = value;

       }

   }

 }

}

 

Here, the _balance variable has been declared private, and a public property has been defined to access it. The code is now very easy to maintain because you can change the BankAccount implementation without having to change any of the client code.

 

For example, developers can make the BankAccount object thread-safe just by adding synchronization to get/set Balance property methods. Note that none of the other methods that may be using BankAccount objects need to be modified in this case. Therefore, developers should declare their variables private.

 

 

Property Naming Guidelines

The following rules outline the naming guidelines for properties:

·         Use a noun or noun phrase to name properties.

·         Use Pascal case.

·         Do not use Hungarian notation.

·         Consider creating a property with the same name as its underlying type. For example, if you declare a property named Color, the type of the property should likewise be Color. See the example later in this topic.

The following code example illustrates correct property naming.

[Visual Basic]

Public Class SampleClass

   Public Property BackColor As Color

      ' Code for Get and Set accessors goes here.

   End Property

End Class

[C#]

public class SampleClass

{

   public Color BackColor

   {

      // Code for Get and Set accessors goes here.

   }

}

The following code example illustrates providing a property with the same name as a type.

[Visual Basic]

Public Enum Color

   ' Insert code for Enum here.

End Enum

Public Class Control

   Public Property Color As Color

      Get

         ' Insert code here.

      End Get

      Set

         ' Insert code here.

      End Set

   End Property

End Class

[C#]

public enum Color

{

   // Insert code for Enum here.

}

public class Control

{

   public Color Color

   {

      get {// Insert code here.}

      set {// Insert code here.}

   }

}

The following code example is incorrect because the property Color is of type Integer.

[Visual Basic]

Public Enum Color

   ' Insert code for Enum here.

End Enum

Public Class Control

   Public Property Color As Integer

      Get

         ' Insert code here.

      End Get

      Set

         ' Insert code here.

      End Set

   End Property

End Class

[C#]

public enum Color {// Insert code for Enum here.}

public class Control

{

   public int Color

   {

      get {// Insert code here.}

      set {// Insert code here.} 

   }

}

In the incorrect example, it is not possible to refer to the members of the Color enumeration. Color.Xxx will be interpreted as accessing a member that first gets the value of the Color property (type Integer in Visual Basic or type int in C#) and then accesses a member of that value (which would have to be an instance member of System.Int32).

 

 

Event Naming Guidelines

The following rules outline the naming guidelines for events:

·         Use Pascal case.

·         Do not use Hungarian notation.

·         Use an EventHandler suffix on event handler names.

·         Specify two parameters named sender and e. The sender parameter represents the object that raised the event. The sender parameter is always of type object, even if it is possible to use a more specific type. The state associated with the event is encapsulated in an instance of an event class named e. Use an appropriate and specific event class for the e parameter type.

·         Name an event argument class with the EventArgs suffix.

·         Consider naming events with a verb. For example, correctly named event names include Clicked, Painting, and DroppedDown.

·         Use a gerund (the "ing" form of a verb) to create an event name that expresses the concept of pre-event, and a past-tense verb to represent post-event. For example, a Close event that can be canceled should have a Closing event and a Closed event. Do not use the BeforeXxx/AfterXxx naming pattern.

·         Do not use a prefix or suffix on the event declaration on the type. For example, use Close instead of OnClose.

·         In general, you should provide a protected method called OnXxx on types with events that can be overridden in a derived class. This method should only have the event parameter e, because the sender is always the instance of the type.

The following example illustrates an event handler with an appropriate name and parameters.

[Visual Basic]

Public Delegate Sub MouseEventHandler(sender As Object, e As MouseEventArgs)

[C#]

public delegate void MouseEventHandler(object sender, MouseEventArgs e);

The following example illustrates a correctly named event argument class.

[Visual Basic]

Public Class MouseEventArgs

   Inherits EventArgs

   Dim x As Integer

   Dim y As Integer

 

   Public Sub New MouseEventArgs(x As Integer, y As Integer)

      me.x = x

      me.y = y

   End Sub

     

   Public Property X As Integer

      Get

         Return x

      End Get

   End Property

     

   Public Property Y As Integer

      Get

         Return y

      End Get

   End Property

End Class

[C#]

public class MouseEventArgs : EventArgs

{

   int x;

   int y;

   public MouseEventArgs(int x, int y)

      { this.x = x; this.y = y; }

   public int X { get { return x; } }

   public int Y { get { return y; } }

}

 

Enumeration Type Naming Guidelines:

 

The enumeration (Enum) value type inherits from the Enum Class. The following rules outline the naming guidelines for enumerations:

·         Use Pascal case for Enum types and value names.

·         Use abbreviations sparingly.

·         Do not use an Enum suffix on Enum type names.

·         Use a singular name for most Enum types, but use a plural name for Enum types that are bit fields.

·         Always add the FlagsAttribute to a bit field Enum type.

 

Avoiding Type Name Confusion

Different programming languages use different terms to identify the fundamental managed types. Class library designers must avoid using language-specific terminology. Follow the rules described in this section to avoid type name confusion.

Use names that describe a type's meaning rather than names that describe the type. In the rare case that a parameter has no semantic meaning beyond its type, use a generic name. For example, a class that supports writing a variety of data types into a stream might have the following methods.

[Visual Basic]

Sub Write(value As Double);

Sub Write(value As Single);

Sub Write(value As Long);

Sub Write(value As Integer);

Sub Write(value As Short);

[C#]

void Write(double value);

void Write(float value);

void Write(long value);

void Write(int value);

void Write(short value);

Do not create language-specific method names, as in the following example.

[Visual Basic]

Sub Write(doubleValue As Double);

Sub Write(singleValue As Single);

Sub Write(longValue As Long);

Sub Write(integerValue As Integer);

Sub Write(shortValue As Short);

[C#]

void Write(double doubleValue);

void Write(float floatValue);

void Write(long longValue);

void Write(int intValue);

void Write(short shortValue);

In the extremely rare case that it is necessary to create a uniquely named method for each fundamental data type, use a universal type name. The following table lists fundamental data type names and their universal substitutions.


C# type name


Visual Basic type name


JScript type name


Visual C++ type name


Ilasm.exe representation


Universal type name


sbyte


SByte


sByte


char


int8


SByte


byte


Byte


byte


unsigned char


unsigned int8


Byte


short


Short


short


short


int16


Int16


ushort


UInt16


ushort


unsigned short


unsigned int16


UInt16


int


Integer


int


int


int32


Int32


uint


UInt32


uint


unsigned int


unsigned int32


UInt32


long


Long


long


__int64


int64


Int64


ulong


UInt64


ulong


unsigned __int64


unsigned int64


UInt64


float


Single


float


float


float32


Single


double


Double


double


double


float64


Double


bool


Boolean


boolean


bool


bool


Boolean


char


Char


char


wchar_t


char


Char


string


String


string


String


string


String


object


Object


object


Object


object


Object

For example, a class that supports reading a variety of data types from a stream might have the following methods.

[Visual Basic]

ReadDouble()As Double

ReadSingle()As Single

ReadInt64()As Long

ReadInt32()As Integer

ReadInt16()As Short

[C#]

double ReadDouble();

float ReadSingle();

long ReadInt64();

int ReadInt32();

short ReadInt16();

The preceding example is preferable to the following language-specific alternative.

[Visual Basic]

ReadDouble()As Double

ReadSingle()As Single

ReadLong()As Long

ReadInteger()As Integer

ReadShort()As Short

[C#]

double ReadDouble();

float ReadFloat();

long ReadLong();

int ReadInt();

short ReadShort();

 

 

 

Guidelines for implementing Equals and the Equality Operator(==)

The following rules outline the guidelines for implementing the Equals method and the equality operator (==):

·         Implement the GetHashCode method whenever you implement the Equals method. This keeps Equals and GetHashCode synchronized.

·         Override the Equals method whenever you implement ==, and make them do the same thing. This allows infrastructure code such as Hashtable and ArrayList, which use the Equals method, to behave the same way as user code written using ==.

·         Override the Equals method any time you implement the IComparable Interface.

·         You should consider implementing operator overloading for the equality (==), not equal (!=), less than (<), and greater than (>) operators when you implement IComparable.

·         Do not throw exceptions from the Equals or GetHashCode methods or the equality operator (==).

For related information on the Equals method, see Implementing the Equals Method.

Implementing the Equality Operator (==) on Value Types

In most programming languages there is no default implementation of the equality operator (==) for value types. Therefore, you should overload == any time equality is meaningful.

You should consider implementing the Equals method on value types because the default implementation on System.ValueType will not perform as well as your custom implementation.

Implement == any time you override the Equals method.

Implementing the Equality Operator (==) on Reference Types

Most languages do provide a default implementation of the equality operator (==) for reference types. Therefore, you should use care when implementing == on reference types. Most reference types, even those that implement the Equals method, should not override ==.

Override == if your type is a base type such as a Point, String, BigNumber, and so on. Any time you consider overloading the addition (+) and subtraction (-) operators, you also should consider overloading ==.

Implementing Finalize and Dispose to Clean up Unmanaged Resources

Class instances often encapsulate control over resources that are not managed by the runtime, such as window handles (HWND), database connections, and so on. Therefore, you should provide both an explicit and an implicit way to free those resources. Provide implicit control by implementing the protected Finalize Method on an object (destructor syntax in C# and the Managed Extensions for C++). The garbage collector calls this method at some point after there are no longer any valid references to the object.

In some cases, you might want to provide programmers using an object with the ability to explicitly release these external resources before the garbage collector frees the object. If an external resource is scarce or expensive, better performance can be achieved if the programmer explicitly releases resources when they are no longer being used. To provide explicit control, implement the Dispose method provided by the IDisposable Interface. The consumer of the object should call this method when it is done using the object. Dispose can be called even if other references to the object are alive.

Note that even when you provide explicit control by way of Dispose, you should provide implicit cleanup using the Finalize method. Finalize provides a backup to prevent resources from permanently leaking if the programmer fails to call Dispose.

For more information about implementing Finalize and Dispose to clean up unmanaged resources, see Programming for Garbage Collection. The following code example illustrates the basic design pattern for implementing Dispose.

[Visual Basic]
' Design pattern for a base class.
Public Class Base
   Implements IDisposable
   ' Implement IDisposable.
   Public Overloads Sub Dispose() Implements IDisposable.Dispose
      Dispose(True)
      GC.SuppressFinalize(Me)
   End Sub
 
   Protected Overloads Overridable Sub Dispose(disposing As Boolean)
      If disposing Then
         ' Free other state (managed objects).
      End If
      ' Free your own state (unmanaged objects).
      ' Set large fields to null.
   End Sub
 
   Protected Overrides Sub Finalize()
      ' Simply call Dispose(False).
      Dispose (False)
   End Sub
End Class
 
' Design pattern for a derived class.
Public Class Derived
   Inherits Base
 
   Protected Overloads Overrides Sub Dispose(disposing As Boolean) 
      If disposing Then 
         ' Release managed resources.
      End If
      ' Release unmanaged resources.
      ' Set large fields to null.
      ' Call Dispose on your base class.
      Mybase.Dispose(disposing)
   End Sub
   ' The derived class does not have a Finalize method
   ' or a Dispose method with parameters because it inherits
   ' them from the base class.
End Class
[C#]
// Design pattern for a base class.
public class Base: IDisposable
{
   //Implement IDisposable.
   public void Dispose() 
   {
     Dispose(true);
      GC.SuppressFinalize(this); 
   }
 
   protected virtual void Dispose(bool disposing) 
   {
      if (disposing) 
      {
         // Free other state (managed objects).
      }
      // Free your own state (unmanaged objects).
      // Set large fields to null.
   }
 
   // Use C# destructor syntax for finalization code.
   ~Base()
   {
      // Simply call Dispose(false).
      Dispose (false);
   }
   
// Design pattern for a derived class.
public class Derived: Base
{   
   protected override void Dispose(bool disposing) 
   {
      if (disposing) 
      {
         // Release managed resources.
      }
      // Release unmanaged resources.
      // Set large fields to null.
      // Call Dispose on your base class.
      base.Dispose(disposing);
   }
   // The derived class does not have a Finalize method
   // or a Dispose method with parameters because it inherits
   // them from the base class.
}

For a more detailed code example illustrating the design pattern for implementing Finalize and Dispose, see Implementing a Dispose Method.

Customizing a Dispose Method Name

Occasionally a domain-specific name is more appropriate than Dispose. For example, a file encapsulation might want to use the method name Close. In this case, implement Dispose privately and create a public Close method that calls Dispose. The following code example illustrates this pattern. You can replace Close with a method name appropriate to your domain.

[Visual Basic]
' Do not make this method overridable.
' A derived class should not be allowed
' to override this method.
Public Sub Close()
   ' Call the Dispose method with no parameters.
   Dispose()
End Sub
[C#]
// Do not make this method virtual.
// A derived class should not be allowed
// to override this method.
public void Close()
{
   // Call the Dispose method with no parameters.
   Dispose();
}

Finalize

The following rules outline the usage guidelines for the Finalize method:

·         Only implement Finalize on objects that require finalization. There are performance costs associated with Finalize methods.

·         If you require a Finalize method, you should consider implementing IDisposable to allow users of your class to avoid the cost of invoking the Finalize method.

·         Do not make the Finalize method more visible. It should be protected, not public.

·         An object's Finalize method should free any external resources that the object owns. Moreover, a Finalize method should release only resources that are held onto by the object. The Finalize method should not reference any other objects.

·         Do not directly call a Finalize method on an object other than the object's base class. This is not a valid operation in the C# programming language.

·         Call the base.Finalize method from an object's Finalize method.

Note   The base class's Finalize method is called automatically with the C# and the Managed Extensions for C++ destructor syntax.

Dispose

The following rules outline the usage guidelines for the Dispose method:

·         Implement the dispose design pattern on a type that encapsulates resources that explicitly need to be freed. Users can free external resources by calling the public Dispose method.

·         Implement the dispose design pattern on a base type that commonly has derived types that hold on to resources, even if the base type does not. If the base type has a close method, often this indicates the need to implement Dispose. In such cases, do not implement a Finalize method on the base type. Finalize should be implemented in any derived types that introduce resources that require cleanup.

·         Free any disposable resources a type owns in its Dispose method.

·         After Dispose has been called on an instance, prevent the Finalize method from running by calling the GC.SuppressFinalize Method. The exception to this rule is the rare situation in which work must be done in Finalize that is not covered by Dispose.

·         Call the base class's Dispose method if it implements IDisposable.

·         Do not assume that Dispose will be called. Unmanaged resources owned by a type should also be released in a Finalize method in the event that Dispose is not called.

·         Throw an ObjectDisposedException from instance methods on this type (other than Dispose) when resources are already disposed. This rule does not apply to the Dispose method because it should be callable multiple times without throwing an exception.

·         Propagate the calls to Dispose through the hierarchy of base types. The Dispose method should free all resources held by this object and any object owned by this object. For example, you can create an object like a TextReader that holds onto a Stream and an Encoding, both of which are created by the TextReader without the user's knowledge. Furthermore, both the Stream and the Encoding can acquire external resources. When you call the Dispose method on the TextReader, it should in turn call Dispose on the Stream and the Encoding, causing them to release their external resources.

·         You should consider not allowing an object to be usable after its Dispose method has been called. Recreating an object that has already been disposed is a difficult pattern to implement.

·         Allow a Dispose method to be called more than once without throwing an exception. The method should do nothing after the first call.

 

 

 

 

Database Optimizations

  • Try to restrict the queries result set by using the WHERE clause.

    This can results in good performance benefits, because SQL Server will return to client only particular rows, not all rows from the table(s). This can reduce network traffic and boost the overall performance of the query.

This tip may sound obvious to most of you, but I have seen professional developers, in two major SQL Server-based applications used worldwide, not follow it. And that is to always include a WHERE clause in your SELECT statement to narrow the number of rows returned. If you don't use a WHERE clause, then SQL Server will perform a table scan of your table and return all of the rows. In some case you may want to return all rows, and not using a WHERE clause is appropriate in this case. But if you don't need all the rows returned, use a WHERE clause to limit the number of rows returned.

By returning data you don't need, you are causing SQL Server to perform I/O it doesn't need to perform, wasting SQL Server resources. In addition, it increases network traffic, which can also lead to reduced performance. And if the table is very large, a table scan will lock the table during the time-consuming scan, preventing other users from accessing it, hurting concurrency.

Another negative aspect of a table scan is that it will tend to flush out data pages from the cache with useless data, which reduces SQL Server's ability to reuse useful data in the cache, which increases disk I/O and hurts performance.

*****

 

  • Try to restrict the queries result set by returning only the particular columns from the table, not all table's columns.

    This can results in good performance benefits, because SQL Server will return to client only particular columns, not all table's columns. This can reduce network traffic and boost the overall performance of the query.

*****

 

  • Use views and stored procedures instead of heavy-duty queries.

    This can reduce network traffic, because your client will send to server only stored procedure or view name (perhaps with some parameters) instead of large heavy-duty queries text. This can be used to facilitate permission management also, because you can restrict user access to table columns they should not see.

*****

 

  • Try to avoid using SQL Server cursors, whenever possible.

    SQL Server cursors can result in some performance degradation in comparison with select statements. Try to use correlated subquery or derived tables, if you need to perform row-by-row operations.

*****

 

  • If you need to return the total table's row count, you can use alternative way instead of SELECT COUNT(*) statement.

    Because SELECT COUNT(*) statement make a full table scan to return the total table's row count, it can take very many time for the large table. There is another way to determine the total row count in a table. You can use sysindexes system table, in this case. There is ROWS column in the sysindexes table. This column contains the total row count for each table in your database. So, you can use the following select statement instead of SELECT COUNT(*): SELECT rows FROM sysindexes WHERE id = OBJECT_ID('table_name') AND indid < 2 So, you can improve the speed of such queries in several times.

    See this article for more details:

    Alternative way to get the table's row count.

*****

 

  • Try to use constraints instead of triggers, whenever possible.

    Constraints are much more efficient than triggers and can boost performance. So, you should use constraints instead of triggers, whenever possible.

*****

 

  • Use table variables instead of temporary tables.

    Table variables require less locking and logging resources than temporary tables, so table variables should be used whenever possible. The table variables are available in SQL Server 2000 only.

*****

 

  • Try to avoid the HAVING clause, whenever possible.

    The HAVING clause is used to restrict the result set returned by the GROUP BY clause. When you use GROUP BY with the HAVING clause, the GROUP BY clause divides the rows into sets of grouped rows and aggregates their values, and then the HAVING clause eliminates undesired aggregated groups. In many cases, you can write your select statement so, that it will contain only WHERE and GROUP BY clauses without HAVING clause. This can improve the performance of your query.

*****

 

  • Try to avoid using the DISTINCT clause, whenever possible.

    Because using the DISTINCT clause will result in some performance degradation, you should use this clause only when it is necessary.

*****

 

  • Include SET NOCOUNT ON statement into your stored procedures to stop the message indicating the number of rows affected by a T-SQL statement.

    This can reduce network traffic, because your client will not receive the message indicating the number of rows affected by a T-SQL statement.

*****

 

  • Use the select statements with TOP keyword or the SET ROWCOUNT statement, if you need to return only the first n rows.

    This can improve performance of your queries, because the smaller result set will be returned. This can also reduce the traffic between the server and the clients.

*****

 

  • Use the FAST number_rows table hint if you need to quickly return 'number_rows' rows.

    You can quickly get the n rows and can work with them, when the query continues execution and produces its full result set.

*****

 

  • Try to use UNION ALL statement instead of UNION, whenever possible.

    The UNION ALL statement is much faster than UNION, because UNION ALL statement does not look for duplicate rows, and UNION statement does look for duplicate rows, whether or not they exist.

When using the UNION statement, keep in mind that, by default, it performs the equivalent of a SELECT DISTINCT on the final result set. In other words, UNION takes the results of two like recordsets, combines them, and then performs a SELECT DISTINCT in order to eliminate any duplicate rows. This process occurs even if there are no duplicate records in the final recordset. If you know that there are duplicate records, and this presents a problem for your application, then by all means use the UNION statement to eliminate the duplicate rows.

On the other hand, if you know that there will never be any duplicate rows, or if there are, and this presents no problem to your application, then you should use the UNION ALL statement instead of the UNION statement. The advantage of the UNION ALL is that is does not perform the SELECT DISTINCT function, which saves a lot of unnecessary SQL Server resources from being using

*****

Sometimes you might want to merge two or more sets of data resulting from two or more queries using UNION. For example:

SELECT column_name1, column_name2

FROM table_name1

WHERE column_name1 = some_value

UNION

SELECT column_name1, column_name2

FROM table_name1

WHERE column_name2 = some_value

This same query can be rewritten, like the following example, and when doing so, performance will be boosted:

SELECT DISTINCT column_name1, column_name2

FROM table_name1

WHERE column_name1 = some_value OR column_name2 = some_value

  • And if you can assume that neither criteria will return duplicate rows, you can even further boost the performance of this query by removing the DISTINCT clause.

*****

 

  • Do not use optimizer hints in your queries.

    Because SQL Server query optimizer is very clever, it is very unlikely that you can optimize your query by using optimizer hints, more often, this will hurt performance.

This is because it is generally very hard to outguess the Query Optimizer. Optimizer hints are special keywords that you include with your query to force how the Query Optimizer runs. If you decide to include a hint in a query, this forces the Query Optimizer to become static, preventing the Query Optimizer from dynamically adapting to the current environment for the given query. More often than not, this hurts, not helps performance.

If you think that a hint might be necessary to optimize your query, be sure you first do all of the following first:

·         Update the statistics on the relevant tables.

·         If the problem query is inside a stored procedure, recompile it.

·         Review the search arguments to see if they are sargable, and if not, try to rewrite them so that they are sargable.

·         Review the current indexes, and make changes if necessary.

If you have done all of the above, and the query is not running as you expect, then you may want to consider using an appropriate optimizer hint. 

If you haven't heeded my advice and have decided to use some hints, keep in mind that as your data changes, and as the Query Optimizer changes (through service packs and new releases of SQL Server), your hard-coded hints may no longer offer the benefits they once did. So if you use hints, you need to periodically review them to see if they are still performing as expected

 

 

 

 

 

 

 

 

 

 

 

To help identify long running queries, use the SQL Server Profiler Create Trace Wizard to run the "TSQL By Duration" trace. You can specify the length of the long running queries you are trying to identify (such as over 1000 milliseconds), and then have these recorded in a log for you to investigate later

*****

Carefully evaluate whether your SELECT query needs the DISTINCT clause or not. Some developers automatically add this clause to every one of their SELECT statements, even when it is not necessary. This is a bad habit that should be stopped.

The DISTINCT clause should only be used in SELECT statements if you know that duplicate returned rows are a possibility, and that having duplicate rows in the result set would cause problems with your application.

The DISTINCT clause creates a lot of extra work for SQL Server, and reduces the physical resources that other SQL statements have at their disposal. Because of this, only use the DISTINCT clause if it is necessary.

*****

 

If your application allows users to run queries, but you are unable in your application to easily prevent users from returning hundreds, even thousands of unnecessary rows of data they don't need, consider using the TOP operator within the SELECT statement. This way, you can limit how may rows are returned, even if the user doesn't enter any criteria to help reduce the number or rows returned to the client. For example, the statement:

SELECT TOP 100 fname, lname FROM customers

WHERE state = 'mo'

limits the results to the first 100 rows returned, even if 10,000 rows actually meet the criteria of the WHERE clause. When the specified number of rows is reached, all processing on the query stops, potentially saving SQL Server overhead, and boosting performance.

The TOP operator works by allowing you to specify a specific number of rows to be returned, like the example above, or by specifying a percentage value, like this:

SELECT TOP 10 PERCENT fname, lname FROM customers

WHERE state = 'mo'

In the above example, only 10 percent of the available rows would be returned.

In SQL Server 2005, a new argument has been added for the TOP statement. Books Online specifies:



[

TOP (expression) [PERCENT]

[ WITH TIES ]

]



Example:



USE AdventureWorks

GO

SELECT TOP(10) PERCENT WITH TIES

EmployeeID, Title, DepartmentID, Gender, BaseRate

FROM HumanResources.Employee

ORDER BY BaseRate DESC




What the WITH TIES option does is to allow more than the specified number or percent of rows to be returned if the the values of the last group of rows are identical. If you don't use this option, then any number of tied rows will be arbitrarily dropped so that the exact number of rows specified by the TOP statement are only returned.

In addition to the above new feature, SQL Server 2005 allows the TOP statement to be used with DML statements, such as DELETE, INSERT and UPDATE. Also, the TOP statement cannot be used in conjunction with UPDATE and DELETE statements on partitioned views.



No changes were made to the SET ROWCOUNT statement in SQL Server 2005, and usually the SET ROWCOUNT value overrides the SELECT statement TOP keyword if the ROWCOUNT is the smaller value.

Keep in mind that using this option may prevent the user from getting the data they need. For example, the data the are looking for may be in record 101, but they only get to see the first 100 records. Because of this, use this option with discretion

*****

You may have heard of a SET command called SET ROWCOUNT. Like the TOP operator, it is designed to limit how many rows are returned from a SELECT statement. In effect, the SET ROWCOUNT and the TOP operator perform the same function.

While is most cases, using either option works equally efficiently, there are some instances (such as rows returned from an unsorted heap) where the TOP operator is more efficient than using SET ROWCOUNT. Because of this, using the TOP operator is preferable to using SET ROWCOUNT to limit the number of rows returned by a query.

*****

In a WHERE clause, the various operators used directly affect how fast a query is run. This is because some operators lend themselves to speed over other operators. Of course, you may not have any choice of which operator you use in your WHERE clauses, but sometimes you do.

Here are the key operators used in the WHERE clause, ordered by their performance. Those operators at the top will produce results faster than those listed at the bottom.

·         =

·         >, >=, <, <=

·         LIKE

·         <> 

This lesson here is to use = as much as possible, and <> as least as possible.

*****

In a WHERE clause, the various operands used directly affect how fast a query is run. This is because some operands lend themselves to speed over other operands. Of course, you may not have any choice of which operand you use in your WHERE clauses, but sometimes you do.

Here are the key operands used in the WHERE clause, ordered by their performance. Those operands at the top will produce results faster than those listed at the bottom.

·         A single literal used by itself on one side of an operator

·         A single column name used by itself on one side of an operator, a single parameter used by itself on one side of an operator

·         A multi-operand expression on one side of an operator

·         A single exact number on one side of an operator

·         Other numeric number (other than exact), date and time

·         Character data, NULLs

The simpler the operand, and using exact numbers, provides the best overall performance.

*****

If a WHERE clause includes multiple expressions, there is generally no performance benefit gained by ordering the various expressions in any particular order.

 This is because the SQL Server Query Optimizer does this for you, saving you the effort. There are a few exceptions to this, which are discussed on this web site

*****

Don't include code that doesn't do anything. This may sound obvious, but I have seen this in some off-the-shelf SQL Server-based applications. For example, you may see code like this:

SELECT column_name FROM table_name

WHERE 1 = 0

When this query is run, no rows will be returned. Obviously, this is a simple example (and most of the cases where I have seen this done have been very long queries), a query like this (or part of a larger query) like this doesn't perform anything useful, and shouldn't be run. It is just wasting SQL Server resources. In addition, I have seen more than one case where such dead code actually causes SQL Server to through errors, preventing the code from even running

*****

By default, some developers, especially those who have not worked with SQL Server before, routinely include code similar to this in their WHERE clauses when they make string comparisons:

SELECT column_name FROM table_name

WHERE LOWER(column_name) = 'name'

In other words, these developers are making the assuming that the data in SQL Server is case-sensitive, which it generally is not. If your SQL Server database is not configured to be case sensitive, you don't need to use LOWER or UPPER to force the case of text to be equal for a comparison to be performed. Just leave these functions out of your code. This will speed up the performance of your query, as any use of text functions in a WHERE clause hurts performance.

But what if your database has been configured to be case-sensitive? Should you then use the LOWER and UPPER functions to ensure that comparisons are properly compared? No. The above example is still poor coding. If you have to deal with ensuring case is consistent for proper comparisons, use the technique described below, along with appropriate indexes on the column in question:

SELECT column_name FROM table_name

WHERE column_name = 'NAME' or column_name = 'name'

This code will run much faster than the first example

*****

Try to avoid WHERE clauses that are non-sargable. The term "sargable" (which is in effect a made-up word) comes from the pseudo-acronym "SARG", which stands for "Search ARGument," which refers to a WHERE clause that compares a column to a constant value. If a WHERE clause is sargable, this means that it can take advantage of an index (assuming one is available) to speed completion of the query. If a WHERE clause is non-sargable, this means that the WHERE clause (or at least part of it) cannot take advantage of an index, instead performing a table/index scan, which may cause the query's performance to suffer.

Non-sargable search arguments in the WHERE clause, such as "IS NULL", "<>", "!=", "!>", "!<", "NOT", "NOT EXISTS", "NOT IN", "NOT LIKE",  and "LIKE '%500'" generally  prevents (but not always) the query optimizer from using an index to perform a search. In addition, expressions that include a function on a column, expressions that have the same column on both sides of the operator, or comparisons against a column (not a constant), are not sargable.

But not every WHERE clause that has a non-sargable expression in it is doomed to a table/index scan. If the WHERE clause includes both sargable and non-sargable clauses, then at least the sargable clauses can use an index (if one exists) to help access the data quickly.

In many cases, if there is a covering index on the table, which includes all of the columns in the SELECT, JOIN, and WHERE clauses in a query, then the covering index can be used instead of a table/index scan to return a query's data, even if it has a non-sargable WHERE clause. But keep in mind that covering indexes have their own drawbacks, such as producing very wide indexes that increase disk I/O when they are read.

In some cases, it may be possible to rewrite a non-sargable WHERE clause into one that is sargable. For example, the clause:

WHERE SUBSTRING(firstname,1,1) = 'm'

can be rewritten like this:

WHERE firstname like 'm%'

Both of these WHERE clauses produce the same result, but the first one is non-sargable (it uses a function) and will run slow, while the second one is sargable, and will run much faster.

WHERE clauses that perform some function on a column are non-sargable. On the other hand, if you can rewrite the WHERE clause so that the column and function are separate, then the query can use an available index, greatly boosting performance. for example:

Function Acts Directly on Column, and Index Cannot Be Used:

SELECT member_number, first_name, last_name

FROM members

WHERE DATEDIFF(yy,datofbirth,GETDATE()) > 21

Function Has Been Separated From Column, and an Index Can Be Used:

SELECT member_number, first_name, last_name

FROM members

WHERE dateofbirth < DATEADD(yy,-21,GETDATE())

Each of the above queries produces the same results, but the second query will use an index because the function is not performed directly on the column, as it is in the first example. The moral of this story is to try to rewrite WHERE clauses that have functions so that the function does not act directly on the column.

WHERE clauses that use NOT are not sargable, but can often be rewritten to remove the NOT from the WHERE clause, for example:

WHERE NOT column_name > 5

to

WERE column_name <= 5

Each of the above clauses produce the same results, but the second one is sargable.

If you don't know if a particular WHERE clause is sargable or non-sargable, check out the query's execution plan in Query Analyzer. Doing this, you can very quickly see if the query will be using index lookups or table/index scans to return your results.

With some careful analysis, and some clever thought, many non-sargable queries can be written so that they are sargable. Your goal for best performance (assuming it is possible) is to get the left side of a search condition to be a single column name, and the right side an easy to look up value.

*****

If you run into a situation where a WHERE clause is not sargable because of the use of a function on the right side of an equality sign (and there is no other way to rewrite the WHERE clause), consider creating an index on a computed column instead. This way, you avoid the non-sargable WHERE clause altogether, using the results of the function in your WHERE clause instead.

Because of the additional overhead required for indexes on computed columns, you will only want to do this if you need to run this same query over and over in your application, thereby justifying the overhead of the indexed computed column

*****

If you currently have a query that uses NOT IN, which offers poor performance because the SQL Server optimizer has to use a nested table scan to perform this activity, instead try to use one of the following options instead, all of which offer better performance:

·         Use EXISTS or NOT EXISTS

·         Use IN

·         Perform a LEFT OUTER JOIN and check for a NULL condition

*****

When you have a choice of using the IN or the EXISTS clause in your Transact-SQL, you will generally want to use the EXISTS clause, as it is usually more efficient and performs faster

*****

If you find that SQL Server uses a TABLE SCAN instead of an INDEX SEEK when you use an IN or OR clause as part of your WHERE clause, even when those columns are covered by an index, consider using an index hint to force the Query Optimizer to use the index.

For example: 

SELECT * FROM tblTaskProcesses WHERE nextprocess = 1 AND processid IN (8,32,45)

takes about 3 seconds, while:



SELECT * FROM tblTaskProcesses (INDEX = IX_ProcessID) WHERE nextprocess = 1 AND processid IN (8,32,45)

returns in under a second

*****

If you use LIKE in your WHERE clause, try to use one or more leading character in the clause, if at all possible. For example, use:

LIKE 'm%'

not:

LIKE '%m'

If you use a leading character in your LIKE clause, then the Query Optimizer has the ability to potentially use an index to perform the query, speeding performance and reducing the load on SQL Server.

But if the leading character in a LIKE clause is a wildcard, the Query Optimizer will not be able to use an index, and a table scan must be run, reducing performance and taking more time.

The more leading characters you can use in the LIKE clause, the more likely the Query Optimizer will find and use a suitable index

*****

If your application needs to retrieve summary data often, but you don't want to have the overhead of calculating it on the fly every time it is needed, consider using a trigger that updates summary values after each transaction into a summary table.

While the trigger has some overhead, overall, it may be less that having to calculate the data every time the summary data is needed. You may have to experiment to see which methods is fastest for your environment

*****

If your application needs to insert a large binary value into an image data column, perform this task using a stored procedure, not using an INSERT statement embedded in your application.

The reason for this is because the application must first convert the binary value into a character string (which doubles its size, thus increasing network traffic and taking more time) before it can be sent to the server. And when the server receives the character string, it then has to convert it back to the binary format (taking even more time).

Using a stored procedure avoids all this because all the activity occurs on the SQL Server, and little data is transmitted over the network.

*****

When you have a choice of using the IN or the BETWEEN clauses in your Transact-SQL, you will generally want to use the BETWEEN clause, as it is much more efficient. For example:

SELECT customer_number, customer_name

FROM customer

WHERE customer_number in (1000, 1001, 1002, 1003, 1004)

is much less efficient than this:

SELECT customer_number, customer_name

FROM customer

WHERE customer_number BETWEEN 1000 and 1004

Assuming there is a useful index on customer_number, the Query Optimizer can locate a range of numbers much faster (using BETWEEN) than it can find a series of numbers using the IN clause (which is really just another form of the OR clause

*****

If possible, try to avoid using the SUBSTRING function in your WHERE clauses. Depending on how it is constructed, using the SUBSTRING function can force a table scan instead of allowing the optimizer to use an index (assuming there is one). If the substring you are searching for does not include the first character of the column you are searching for, then a table scan is performed.

If possible, you should avoid using the SUBSTRING function and use the LIKE condition instead, for better performance.

Instead of doing this:

WHERE SUBSTRING(column_name,1,1) = 'b'

Try using this instead:

WHERE column_name LIKE 'b%'

If you decide to make this choice, keep in mind that you will want your LIKE condition to be sargable, which means that you cannot place a wildcard in the first position.

*****

Where possible, avoid string concatenation in your Transact-SQL code, as it is not a fast process, contributing to overall slower performance of your application.

*****

 

If you have a WHERE clause that includes expressions connected by two or more AND operators, SQL Server will evaluate them from left to right in the order they are written. This assumes that no parenthesis have been used to change the order of execution. Because of this, you may want to consider one of the following when using AND:

·         Locate the least likely true AND expression first. This way, if the AND expression is false, the clause will end immediately, saving time.

·         If both parts of an AND expression are equally likely being false, put the least complex AND expression first. This way, if it is false, less work will have to be done to evaluate the expression.

You may want to consider using Query Analyzer to look at the execution plans of your queries to see which is best for your situation.

*****

If you want to boost the performance of a query that includes an AND operator in the WHERE clause, consider the following:

·         Of the search criterions in the WHERE clause, at least one of them should be based on a highly selective column that has an index.

·         If at least one of the search criterions in the WHERE clause is not highly selective, consider adding indexes to all of the columns referenced in the WHERE clause.

·         If none of the column in the WHERE clause are selective enough to use an index on their own, consider creating a covering index for this query.

*****

The Query Optimizer will perform a table scan or a clustered index scan on a table if the WHERE clause in the query contains an OR operator and if any of the referenced columns in the OR clause are not indexed (or does not have a useful index). Because of this, if you use many queries with OR clauses, you will want to ensure that each referenced column in the WHERE clause has a useful index.

*****

A query with one or more OR clauses can sometimes be rewritten as a series of queries that are combined with a UNION ALL statement, in order to boost the performance of the query. For example, let's take a look at the following query:



SELECT employeeID, firstname, lastname

FROM names

WHERE dept = 'prod' or city = 'Orlando' or division = 'food'




This query has three separate conditions in the WHERE clause. In order for this query to use an index, then there must be an index on all three columns found in the WHERE clause.



This same query can be written using UNION ALL instead of OR, like this example:



SELECT employeeID, firstname, lastname FROM names WHERE dept = 'prod'

UNION ALL

SELECT employeeID, firstname, lastname FROM names WHERE city = 'Orlando'

UNION ALL

SELECT employeeID, firstname, lastname FROM names WHERE division = 'food'




Each of these queries will produce the same results. If there is only an index on dept, but not the other columns in the WHERE clause, then the first version will not use any index and a table scan must be performed. But in the second version of the query will use the index for part of the query, but not for all of the query.



Admittedly, this is a very simple example, but even so, it does demonstrate how rewriting a query can affect whether or not an index is used or not. If this query was much more complex, then the approach of using UNION ALL might be must more efficient, as it allows you to tune each part of the index individually, something that cannot be done if you use only ORs in your query.

Note that I am using UNION ALL instead of UNION. The reason for this is to prevent the UNION statement from trying to sort the data and remove duplicates, which hurts performance. Of course, if there is the possibility of duplicates, and you want to remove them, then of course you can use just UNION.



If you have a query that uses ORs and it not making the best use of indexes, consider rewriting it as a UNION ALL, and then testing performance. Only through testing can you be sure that one version of your query will be faster than another.

*****

Don't use ORDER BY in your SELECT statements unless you really need to, as it adds a lot of extra overhead. For example, perhaps it may be more efficient to sort the data at the client than at the server. In other cases, perhaps the client doesn't even need sorted data to achieve its goal. The key here is to remember that you shouldn't automatically sort data, unless you know it is necessary.

*****

Whenever SQL Server has to perform a sorting operation, additional resources have to be used to perform this task. Sorting often occurs when any of the following Transact-SQL statements are executed:

·         ORDER BY

·         GROUP BY

·         SELECT DISTINCT

·         UNION

·         CREATE INDEX (generally not as critical as happens much less often)

In many cases, these commands cannot be avoided. On the other hand, there are few ways that sorting overhead can be reduced. These include:

·         Keep the number of rows to be sorted to a minimum. Do this by only returning those rows that absolutely need to be sorted.

·         Keep the number of columns to be sorted to the minimum. In other words, don't sort more columns that required.

·         Keep the width (physical size) of the columns to be sorted to a minimum.

·         Sort column with number datatypes instead of character datatypes.

When using any of the above Transact-SQL commands, try to keep the above performance-boosting suggestions in mind.

*****

If you have to sort by a particular column often, consider making that column a clustered index. This is because the data is already presorted for you and SQL Server is smart enough not to resort the data.

*****

If your WHERE clause includes an IN operator along with a list of values to be tested in the query, order the list of values so that the most frequently found values are placed at the first of the list, and the less frequently found values are placed at the end of the list. This can speed performance because the IN option returns true as soon as any of the values in the list produce a match. The sooner the match is made, the faster the query completes.

*****

If you need to use the SELECT INTO option, keep in mind that it can lock system tables, preventing others users from accessing the data they need. If you do need to use SELECT INTO, try to schedule it when your SQL Server is less busy, and try to keep the amount of data inserted to a minimum.

*****

If your SELECT statement contains a HAVING clause, write your query so that the WHERE clause does most of the work (removing undesired rows) instead of the HAVING clause do the work of removing undesired rows. Using the WHERE clause appropriately can eliminate unnecessary rows before they get to the GROUP BY and HAVING clause, saving some unnecessary work, and boosting performance.

For example, in a SELECT statement with WHERE, GROUP BY, and HAVING clauses, here's what happens. First, the WHERE clause is used to select the appropriate rows that need to be grouped. Next, the GROUP BY clause divides the rows into sets of grouped rows, and then aggregates their values. And last, the HAVING clause then eliminates undesired aggregated groups. If the WHERE clause is used to eliminate as many of the undesired rows as possible, this means the GROUP BY and the HAVING clauses will have less work to do, boosting the overall performance of the query

*****

If your application performs many wildcard (LIKE %) text searches on CHAR or VARCHAR columns, consider using SQL Server's full-text search option. The Search Service can significantly speed up wildcard searches of text stored in a database.

*****

The GROUP BY clause can be used with or without an aggregate function. But if you want optimum performance, don't use the GROUP BY clause without an aggregate function. This is because you can accomplish the same end result by using the DISTINCT option instead, and it is faster.

For example, you could write your query two different ways:

USE Northwind

SELECT OrderID

FROM [Order Details]

WHERE UnitPrice > 10

GROUP BY OrderID

or

USE Northwind

SELECT DISTINCT OrderID

FROM [Order Details]

WHERE UnitPrice > 10

Both of the above queries produce the same results, but the second one will use less resources and perform faster.

*****

The GROUP BY clause can be sped up if you follow these suggestion:

·         Keep the number of rows returned by the query as small as possible.

·         Keep the number of groupings as few as possible.

·         Don't group redundant columns.

·         If there is a JOIN in the same SELECT statement that has a GROUP BY, try to rewrite the query to use a subquery instead of using a JOIN. If this is possible, performance will be faster. If you have to use a JOIN, try to make the GROUP BY column from the same table as the column or columns on which the set function is used.

·         Consider adding an ORDER BY clause to the SELECT statement that orders by the same column as the GROUP BY. This may cause the GROUP BY to perform faster. Test this to see if is true in your particular situation.

*****

Sometimes perception is more important that reality. For example, which of the following two queries is the fastest:

·         A query that takes 30 seconds to run, and then displays all of the required results.

·         A query that takes 60 seconds to run, but displays the first screen full of records in less than 1 second.

Most DBAs would choose the first option as it takes less server resources and performs faster. But from many user's point-of-view, the second one may be more palatable. By getting immediate feedback, the user gets the impression that the application is fast, even though in the background, it is not.

If you run into situations where perception is more important than raw performance, consider using the FAST query hint. The FAST query hint is used with the SELECT statement using this form:

OPTION(FAST number_of_rows)

where number_of_rows is the number of rows that are to be displayed as fast as possible.

When this hint is added to a SELECT statement, it tells the Query Optimizer to return the specified number of rows as fast as possible, without regard to how long it will take to perform the overall query. Before rolling out an application using this hint, I would suggest you test it thoroughly to see that it performs as you expect. You may find out that the query may take about the same amount of time whether the hint is used or not. If this the case, then don't use the hint

*****

Instead of using temporary tables, consider using a derived table instead. A derived table is the result of using a SELECT statement in the FROM clause of an existing SELECT statement. By using derived tables instead of temporary tables, you can reduce I/O and boost your application's performance

 More info on derived tables.

*****

It is fairly common request to write a Transact-SQL query to to compare a parent table and a child table and find out if there are any parent records that don't have a match in the child table. Generally, there are three ways this can be done:

Using a NOT EXISTS



SELECT a.hdr_key

FROM hdr_tbl a

WHERE NOT EXISTS (SELECT * FROM dtl_tbl b WHERE a.hdr_key = b.hdr_key)

Using a LEFT JOIN

SELECT a.hdr_key

FROM hdr_tbl a

LEFT JOIN dtl_tbl b ON a.hdr_key = b.hdr_key

WHERE b.hdr_key IS NULL

Using a NOT IN



SELECT hdr_key

FROM hdr_tbl

WHERE hdr_key NOT IN (SELECT hdr_key FROM dtl_tbl)

In each case, the above query will return identical results. But, which of these three variations of the same query produces the best performance? Assuming everything else is equal, the best performing version through the worst performing version will be from top to bottom, as displayed above. In other words, the NOT EXISTS variation of this query is generally the most efficient.

I say generally, because the indexes found on the tables, along with the number of rows in each table, can influence the results. If you are not sure which variation to try yourself, you can try them all and see which produces the best results in your particular circumstances

*****

Be careful when using OR in your WHERE clause, it is fairly simple to accidentally retrieve much more data than you need, which hurts performance. For example, take a look at the query below:

SELECT companyid, plantid, formulaid

FROM batchrecords

WHERE companyid = '0001' and plantid = '0202' and formulaid = '39988773'

OR

companyid = '0001' and plantid = '0202'

As you can see from this query, the WHERE clause is redundant, as:

companyid = '0001' and plantid = '0202' and formulaid = '39988773'

is a subset of:

companyid = '0001' and plantid = '0202'

In other words, this query is redundant. Unfortunately, the SQL Server Query Optimizer isn't smart enough to know this, and will do exactly what you tell it to. What will happen is that SQL Server will have to retrieve all the data you have requested, then in effect do a SELECT DISTINCT to remove redundant rows it unnecessarily finds.

In this case, if you drop this code from the query:

OR

companyid = '0001' and plantid = '0202'

then run the query, you will receive the same results, but with much faster performance.

*****

If you need to verify the existence of a record in a table, don't use SELECT COUNT(*) in your Transact-SQL code to identify it, which is very inefficient and wastes server resources. Instead, use the Transact-SQL IF EXITS to determine if the record in question exits, which is much more efficient. For example:

Here's how you might use COUNT(*):

IF (SELECT COUNT(*) FROM table_name WHERE column_name = 'xxx')

Here's a faster way, using IF EXISTS:

IF EXISTS (SELECT * FROM table_name WHERE column_name = 'xxx')

The reason IF EXISTS is faster than COUNT(*) is because the query can end immediately when the text is proven true, while COUNT(*) must count go through every record, whether there is only one, or thousands, before it can be found to be true.

*****

Let's say that you often need to INSERT the same value into a column. For example, perhaps you have to perform 100,000 INSERTs a day into a particular table, and that 90% of the time the data INSERTed into one of the columns of the table is the same value.

If this the case, you can reduce network traffic (along with some SQL Server overhead) by creating this particular column with a default value of the most common value. This way, when you INSERT your data, and the data is the default value, you don't INSERT any data into this column, instead allowing the default value to automatically be filled in for you. But when the value needs to be different, you will of course INSERT that value into the column.

*****

Performing UPDATES takes extra resources for SQL Server to perform. When performing an UPDATE, try to do as many of the following recommendations as you can in order to reduce the amount of resources required to perform an UPDATE. The more of the following suggestions you can do, the faster the UPDATE will perform.

·         If you are UPDATing a column of a row that has an unique index, try to only update one row at a time.

·         Try not to change the value of a column that is also the primary key.

·         When updating VARCHAR columns, try to replace the contents with contents of the same length.

·         Try to minimize the UPDATing of tables that have UPDATE triggers.

·         Try to avoid UPDATing columns that will be replicated to other databases.

·         Try to avoid UPDATing heavily indexed columns.

·         Try to avoid UPDATing a column that has a reference in the WHERE clause to the column being updated.

 

  • Make sure you normalize your data at least to the 3rd normal form. At the same time, do not compromise on query performance. A little bit of denormalization helps queries perform faster.
  • Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious. This helps other programmers understand your code clearly. Don't worry about the length of the comments, as it won't impact the performance, unlike interpreted languages like ASP 2.0.
  • Try to avoid server side cursors as much as possible. Always stick to a 'set-based approach' instead of a 'procedural approach' for accessing and manipulating data. Cursors can often be avoided by using SELECT statements instead.

If a cursor is unavoidable, use a WHILE loop instead. I have personally tested and concluded that a WHILE loop is always faster than a cursor. But for a WHILE loop to replace a cursor you need a column (primary key or unique key) to identify each row uniquely. I personally believe every table must have a primary or unique key. Click here to see some examples of using a WHILE loop.

  • Use 'Derived tables' wherever possible, as they perform better. Consider the following query to find the second highest salary from the Employees table:



    SELECT MIN(Salary) 

    FROM Employees 

    WHERE EmpID IN

    (

    SELECT TOP 2 EmpID 

    FROM Employees 

    ORDER BY Salary Desc

    )



    The same query can be re-written using a derived table, as shown below, and it performs twice as fast as the above query:



    SELECT MIN(Salary) 

    FROM 

    (

    SELECT TOP 2 Salary 

    FROM Employees 

    ORDER BY Salary DESC

    ) AS A



    This is just an example, and your results might differ in different scenarios depending on the database design, indexes, volume of data, etc. So, test all the possible ways a query could be written and go with the most efficient one.
  • While designing your database, design it keeping "performance" in mind. You can't really tune performance later, when your database is in production, as it involves rebuilding tables andindexes, re-writing queries, etc. Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries. Make sure your queries do an "Index seek" instead of an "Index scan" or a "Table scan." A table scan or an index scan is a very bad thing and should be avoided where possible. Choose the right indexes on the right columns.
  • Prefix the table names with the owner's name, as this improves readability and avoids any unnecessary confusion. Microsoft SQL Server Books Online even states that qualifying table names with owner names helps in execution plan reuse, further boosting performance.
  • Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like '(1 row(s) affected)' after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.
  • Use the more readable ANSI-Standard Join clauses instead of the old style joins. With ANSI joins, the WHERE clause is used only for filtering data. Where as with older style joins, the WHERE clause handles both the join condition and filtering data. The first of the following two queries shows the old style join, while the second one shows the new ANSI join syntax:



    SELECT a.au_id, t.title 

    FROM titles t, authors a, titleauthor ta

    WHERE 

    a.au_id = ta.au_id AND

    ta.title_id = t.title_id AND 

    t.title LIKE '%Computer%'



    SELECT a.au_id, t.title

    FROM authors a 

    INNER JOIN

    titleauthor ta 

    ON 

    a.au_id = ta.au_id

    INNER JOIN

    titles t

    ON

    ta.title_id = t.title_id

    WHERE t.title LIKE '%Computer%'
  • Do not prefix your stored procedure names with "sp_". The prefix sp_ is reserved for system stored procedure that ship with SQL Server. Whenever SQL Server encounters a procedure name starting with sp_, it first tries to locate the procedure in the master database, then it looks for any qualifiers (database, owner) provided, then it tries dbo as the owner. So you can really save time in locating the stored procedure by avoiding the "sp_" prefix.
  • Views are generally used to show specific data to specific users based on their interest. Views are also used to restrict access to the base tables by granting permission only on views. Yet another significant use of views is that they simplify your queries. Incorporate your frequently required, complicated joins and calculations into a view so that you don't have to repeat those joins/calculations in all your queries. Instead, just select from the view.
  • Use User Defined Datatypes if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
  • Try not to use TEXT or NTEXT datatypes for storing large textual data. The TEXT datatype has some inherent problems associated with it. For example, you cannot directly write or update text data using the INSERT or UPDATE statements. Instead,  you have to use special statements like READTEXT, WRITETEXT and UPDATETEXT. There are also a lot of bugs associated with replicating tables containing text columns. So, if you don't have to store more than 8KB of text, use CHAR(8000) or VARCHAR(8000) datatypes instead.
  • If you have a choice, do not store binary or image files (Binary Large Objects or BLOBs) inside the database. Instead, store the path to the binary or image file in the database and use that as a pointer to the actual binary file stored elsewhere on a server. Retrieving and manipulating these large binary files is better performed outside the database, and after all, a database is not meant for storing files.
  • Use the CHAR data type for a column only when the column is non-nullable. If a CHAR column is nullable, it is treated as a fixed length column in SQL Server 7.0+. So, a CHAR(100), when NULL, will eat up 100 bytes, resulting in space wastage. So, use VARCHAR(100) in this situation. Of course, variable length columns do have a very little processing overhead over fixed length columns. Carefully choose between CHAR and VARCHAR depending up on the length of the data you are going to store.
  • Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime. IF and CASE statements come in handy to avoid dynamic SQL. Another major disadvantage of using dynamic SQL is that it requires users to have direct access permissions on all accessed objects, like tables and views. Generally, users are given access to the stored procedures which reference the tables, but not directly on the tables. In this case, dynamic SQL will not work. Consider the following scenario where a user named 'dSQLuser' is added to the pubs database and is granted access to a procedure named 'dSQLproc', but not on any other tables in the pubs database. The procedure dSQLproc executes a direct SELECT on titles table and that works. The second statement runs the same SELECT on titles table, using dynamic SQL and it fails with the following error:



    Server: Msg 229, Level 14, State 5, Line 1

    SELECT permission denied on object 'titles', database 'pubs', owner 'dbo'.



    To reproduce the above problem, use the following commands:



    sp_addlogin 'dSQLuser'

    GO

    sp_defaultdb 'dSQLuser', 'pubs'

    USE pubs

    GO

    sp_adduser 'dSQLUser', 'dSQLUser'

    GO

    CREATE PROC dSQLProc

    AS

    BEGIN

    SELECT * FROM titles WHERE title_id = 'BU1032' --This works

    DECLARE @str CHAR(100)

    SET @str = 'SELECT * FROM titles WHERE title_id = ''BU1032'''

    EXEC (@str) --This fails

    END

    GO

    GRANT EXEC ON dSQLProc TO dSQLuser

    GO



    Now login to the pubs database using the login dSQLuser and execute the procedure dSQLproc to see the problem.
  • Consider the following drawbacks before using the IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems porting your database application to some other RDBMS. IDENTITY columns have other inherent problems. For example, IDENTITY columns can run out of numbers at some point, depending on the data type selected; numbers can't be reused automatically, after deleting rows; and replication and IDENTITY columns don't always get along well.

So, come up with an algorithm to generate a primary key in the front-end or from within the inserting stored procedure. There still could be issues with generating your own primary keys too, like concurrency while generating the key, or running out of values. So, consider both options and go with the one that suits you best.

  • Minimize the use of NULLs, as they often confuse the front-end applications, unless the applications are coded intelligently to eliminate NULLs or convert the NULLs into some other form. Any expression that deals with NULL results in a NULL output. ISNULL and COALESCE functions are helpful in dealing with NULL values. Here's an example that explains the problem:



    Consider the following table, Customers which stores the names of the customers and the middle name can be NULL.



    CREATE TABLE Customers

    (

    FirstName varchar(20),

    MiddleName varchar(20),

    LastName varchar(20)

    )



    Now insert a customer into the table whose name is Tony Blair, without a middle name:



    INSERT INTO Customers 

    (FirstName, MiddleName, LastName) 

    VALUES ('Tony',NULL,'Blair')



    The following SELECT statement returns NULL, instead of the customer name:



    SELECT FirstName + ' ' + MiddleName + ' ' + LastName FROM Customers



    To avoid this problem, use ISNULL as shown below:



    SELECT FirstName + ' ' + ISNULL(MiddleName + ' ','') + LastName FROM Customers
  • Use Unicode datatypes, like NCHAR, NVARCHAR, or NTEXT, if your database is going to store not just plain English characters, but a variety of characters used all over the world. Use these datatypes only when they are absolutely needed as they use twice as much space as non-Unicode datatypes.
  • Always use a column list in your INSERT statements. This helps in avoiding problems when the table structure changes (like adding or dropping a column). Here's an example which shows the problem.



    Consider the following table:



    CREATE TABLE EuropeanCountries

    (

    CountryID int PRIMARY KEY,

    CountryName varchar(25)

    )



    Here's an INSERT statement without a column list , that works perfectly:



    INSERT INTO EuropeanCountries

    VALUES (1, 'Ireland')



    Now, let's add a new column to this table:





    ALTER TABLE EuropeanCountries

    ADD EuroSupport bit



    Now run the above INSERT statement. You get the following error from SQL Server:



    Server: Msg 213, Level 16, State 4, Line 1

    Insert Error: Column name or number of supplied values does not match table definition.



    This problem can be avoided by writing an INSERT statement with a column list as shown below:



    INSERT INTO EuropeanCountries

    (CountryID, CountryName)

    VALUES (1, 'England')
  • Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers, as they are faster. Limit the use triggers only for auditing, custom tasks and validations that can not be performed using constraints. Constraints save you time as well, as you don't have to write code for these validations, allowing the RDBMS to do all the work for you.
  • Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as few data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.
  • Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications if these operations are going to consume more CPU cycles on the database server. Also try to do basic validations in the front-end itself during data entry. This saves unnecessary network roundtrips.
  • If back-end portability is your concern, stay away from bit manipulations with T-SQL, as this is very much RDBMS specific. Further, using bitmaps to represent different states of a particular entity conflicts with normalization rules.
  • Always add a @Debug parameter to your stored procedures. This can be of BIT data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print anything. This helps in quick debugging stored procedures, as you don't have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
  • Do not call functions repeatedly within your stored procedures, triggers, functions and batches. For example, you might need the length of a string variable in many places of your procedure, but don't call the LEN function whenever it's needed, instead, call the LEN function once, and store the result in a variable, for later use.
  • Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters.
  • If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
  • Always check the global variable @@ERROR immediately after executing a data manipulation statement (like INSERT/UPDATE/DELETE), so that you can rollback the transaction in case of an error (@@ERROR will be greater than 0 in case of an error). This is important, because, by default, SQL Server will not rollback all the previous changes within a transaction if a particular statement fails. This behavior can be changed by executing SET XACT_ABORT ON. The @@ROWCOUNT variable also plays an important role in determining how many rows were affected by a previous data manipulation (also, retrieval) statement, and based on that you could choose to commit or rollback a particular transaction.
  • To make SQL Statements more readable, start each clause on a new line and indent when needed. Following is an example:



    SELECT title_id, title

    FROM titles

    WHERE title LIKE '%Computer%' AND

          title LIKE '%cook%'
  • Though we survived the Y2K, always store 4 digit years in dates (especially, when using cCHAR or INT datatype columns), instead of 2 digit years to avoid any confusion and problems. This is not a problem with DATETIME columns, as the century is stored even if you specify a 2 digit year. But it's always a good practice to specify 4 digit years even with DATETIME datatype columns. 
  • As is true with any other programming language, do not use GOTO, or use it sparingly. Excessive usage of GOTO can lead to hard-to-read-and-understand code.
  • Do not forget to enforce unique constraints on your alternate keys.
  • Always be consistent with the usage of case in your code. On a case insensitive server, your code might work fine, but it will fail on a case sensitive SQL Server if your code is not consistent in case. For example, if you create a table in SQL Server or a database that has a case-sensitive or binary sort order, all references to the table must use the same case that was specified in the CREATE TABLE statement. If you name the table as 'MyTable' in the CREATE TABLE statement and use 'mytable' in the SELECT statement, you get an 'object not found' error.
  • Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code. Consider the following example: 



    SELECT OrderID, OrderDate

    FROM Orders

    WHERE OrderStatus IN (5,6)



    The same query can be re-written in a mode readable form as shown below:



    DECLARE @ORDER_DELIVERED, @ORDER_PENDING

    SELECT @ORDER_DELIVERED = 5, @ORDER_PENDING = 6



    SELECT OrderID, OrderDate

    FROM Orders

    WHERE OrderStatus IN (@ORDER_DELIVERED, @ORDER_PENDING)
  • Do not use column numbers in the ORDER BY clause. Consider the following example in which the second query is more readable than the first one:



    SELECT OrderID, OrderDate

    FROM Orders

    ORDER BY 2



    SELECT OrderID, OrderDate

    FROM Orders

    ORDER BY OrderDate

No comments:

Post a Comment