+bytes from my programming experiences: sql server 2005

Showing posts with label sql server 2005. Show all posts

11/14/10

SQL Server 2005 Full-Text Search : Data population failure

After creating the Full-Text Catalogs and Indexes on the SQL Server, I ran a script to populate the indexes. Since population takes a while, especially if the database is large, I decided to check on the status periodically via the FullTextCatalogProperty function in SQL Server.

One of the properties returned by the function is the PopulationStatus property which has one of the following values:
0 = Idle
1 = Full population in progress
2 = Paused
3 = Throttled
4 = Recovering
5 = Shutdown
6 = Incremental population in progress
7 = Building index
8 = Disk is full. Paused.
9 = Change tracking

In my case it was showing 1, meaning the population was in progress. After about a couple of hours it was still showing 1 and just stayed there. When I checked the ItemCount property returned by the same function, it was showing 0, meaning no items had been indexed so far.These were clear signs that the data population process was stuck.

I looked at the scripts and the tables and the catalogs and indexes, but didnt find anything amiss. I then checked the logs in the SQL Install folder and noticed the following error related to Full-Text Search:

"No Mapping between account names and security id's were done"

I had no idea where this error was coming from and it took me some time before I hit upon the idea of checking the Full-Text Search Windows Service. It turned out that the Service was running under a user account Admin123 that no longer existed on the machine. I changed the Service to run under the Local System Account and everything started working!!!

10/24/10

Full-Text Search in SQL Server 2005

Full-Text search is a SQL Server feature that provides powerful options for querying text information in the database.Without this feature the querying options are pretty much limited to using the LIKE clause or using functions such as PATINDEX and CHARINDEX.

The Full-Text Search is implemented via a Full-Text Search Windows Service. This Service is NOT a part of the SQL Server. The SQL Server interacts with this external service in order to facilitate Full-Text Search.

Following are the steps to enable Full-Text Search in a database:

Make a list of tables that need to be Full-Text search enabled.
In each of the tables, make a list of all the columns that need to be a part of the Full-Text Search index.
Create a Catalog, specifying a location for its creation. A Catalog is a container that holds Indexes. You can have as many Catalogs as you like and each Catalog can hold zero or more indexes. Though external to the SQL Server, the Catalogs need to be on same machine as the SQL Server. For performance reasons, Microsoft recommends creating the Catalogs in a drive separate from that of the SQL Server installation. Microsoft also recommends creating separate Catalogs for tables that have more than one million rows of data.
Create one Index per table. You can have only one index per table. Also you need to specify the Catalog the index belongs to.An Index can belong to only one Catalog.
At this point, we only have the structure(catalogs and indexes) but no data. We need to populate(crawl) the indexes. Population is the process of SQL Server passing data from the indexed columns to the Full-Text Service, which maintains a word list letting us know what words can be found in what rows. It is a resource and memory intensive action that must be performed during off-peak hours so as to not slow down the SQL Server.Once the population is complete, the tables are Full-Text Search enabled.
Next, all the database objects like Stored Procedures, Functions and Views need to be modified to use the Full-Text syntax. The Full-Text syntax contains commands like CONTAINS,CONTAINSTABLE, FREETEXT, FREETEXTTABLE etc that enable us to query for information in different ways.
Full-Text indexes are not kept up to date like SQL Server indexes.So, if new data is added to the tables, the data will not be searchable until the indexes on those tables are re-populated.

8/15/10

When denormalization helps

I had developed a web page that displayed the log of all the invoices in a given project. There were about ten columns in the log, each one displaying a different piece of information about the invoice. These invoices were created against contracts and there is a one-many relationship between a contract and an invoice.

Two of those columns,"WorkCompletedToDate" and "MaterialPurchasedToDate" represented the amount of work done and material purchased(in dollars) prior to the current invoice on a given contract. So if we were looking at invoice No.3 on a contract X, the "WorkCompletedToDate" and "MaterialPurchasedToDate" represent the sum of the values for those fields from invoice No.1 and invoice No.2 on the same contract X.The values in these two columns were not stored in the database and were always being calculated at runtime.

There were thousands of invoices in the project and the runtime calculations for these two columns began to slow down the web page.I tried to optimize the stored procedure and functions and other code relating to the log, but nothing seemed to help.

I then hit upon the idea of denormalizing the invoices table by adding two columns to store these values, instead of calculating them at runtime. This would also require a script to populate these values for all the existing invoices in the database. After I made these changes, the page began to load real fast.

Though denormalization might not be the best approach in a lot of scenarios and must be used with caution, it does come handy in some situations, and is definetely an option that must be kept open, especially when faced with performance issues.

8/9/10

Pros and Cons of using XML data type in SQL Server 2005

Recently I developed a web-based tool using ASP.NET and C#, that enables users to create custom forms in a matter of minutes.Every form can have any number of controls like Textboxes, Radiobuttons,Checkboxes,Labels etc.Since the form size can vary from say 5 fields to 20 fields to 50 fields,there was no way I could create a flat table structure to store this data. So I decided to use the XML data type in SQL Server 2005 to store the form fields configuration.

The rendering part of the tool parses this xml configuration(using LINQ to XML) and renders the form.Let us assume that this form is going to be used for survey purposes. The survey users can fill in the data and complete the survey. All this works just fine and the survey is successfully completed.

Then comes the reporting(SSRS). The management wants to see the results grouped and sorted in a multitude of ways. SQL Server does not have too much of an API for querying XML columns. So I had to use whatever was available(the query,value,exist,modify and nodes methods) along with some SQL cursors to come up with the desired results in the desired format.

The problem with this approach was that this parsing(which was happening at runtime) took time and the time taken was directly proportional to the number of survey instances, so as the data grew, the reporting became slower and slower. So while XML was very convenient way to store variable length data, it was not very easy to work with, in terms of reporting or analysis purposes.

I managed to solve the issue with reporting by creating a flat table structure for storing the survey data(since the configuration is constant once the form is created).I then created a SQL agent that runs as a scheduled task during the non-peak hours and performs the time consuming task of parsing instance xml data and populating the flat table. I then pointed the reports to the flat table. This speeded up the reporting process.

So while XML data type is suitable for storing variable length data, the developers need to note that it is not very easy to work with the XML, with the limited API in SQL Server. One way to solve the problem would be to get the data out of SQL and use the powerful .NET API to get the desired result. The other way would be to create SQL agent(as described above) to pre-populate data into flat tables and then work with those tables. But all said and done, the XML data type is a very powerful feature and provides tremendous flexibility. I would definetely recommend it.