Thursday, February 17, 2011

ssis lookup transfomation memory/performance issues

Suddenly today the package stopped responding. As usual I quote "It worked in the morning and this time it was connected to different server". The issue soon became a high priority as the system went on to throw a Low virtual memory error. The system had to be restarted couple of times and this was a pain.

Investigation resulted in narrowing down the root cause to be one of the lookups in the package.

The first thing that flashed to my mind was "Why are we not using Cache?". So I decided to create a Cache File using Cache Transform in a separate package. Looking into the Execution Result showed that the SSIS printed the following message

"Information: The buffer manager detected that system was low in virtual memory, but was unable to swap out any buffers to relieve memory pressure. 8 buffers were considered and 8 were locked. Either not enough memory
is available to the pipeline because not enough are installed, other processes were using it, or too many buffers are locked."

"[Cache Transform [28]] Information: The component "Cache Transform" (28) processed 1748563 rows in the cache. The processing time was 224.813 seconds. The cache used 14439633254 bytes of memory.

I began to search for information on the maximum buffer size and found some information here By default the max rows for data flow task in 10000 rows and default buffer size is 10485760 bytes (~10 MB). This can be increased to 104857600. (~100 MB).  

The package behaved the same even after increasing the size. But this time the OLEDB source and the cache transform task within the data flow task did show green status but however the data flow task never got completed. The Execution result flushed out the following message this time (Infact both the time but with some additional messages this time).
"[Cache Transform [28]] Information: The component "Cache Transform" (28) processed 1748563 rows in the cache. The processing time was 224.813 seconds. The cache used 14439633254 bytes of memory.

So began to think why a simple query returning 2 million rows result in 14 GB of data. At this point I thought it is worth looking @ the query and other data type settings done by SSIS on the query. So here was the query

SELECT Number=REPLACE(REPLACE(LTRIM(REPLACE(rtrim(Number), '0', ' ')), ' ', '0'),'-',''), Cusip= ltrim(rtrim(cusip)),SponsorId, AccountId
FROM Account

The data type for the number and cusip column in the database was set as VARCHAR(20) and Id's as Int. Then I checked how the SSIS is setting the data length in the output and inputs properties (Click Advanced Properties) for OLEDB source.  To my horror found that the data length set was STRING 8000. So the SSIS was not able to decide the proper length when REPLACE or any other string manipulation function was used within the SQL query. So I changed the query to use CAST.

SELECT Number=CAST(REPLACE(REPLACE(LTRIM(REPLACE(rtrim(Number), '0', ' ')), ' ', '0'),'-','') as VARCHAR(20)), Cusip= CAST(ltrim(rtrim(cusip)) AS VARCHAR(20)),SponsorId, AccountId
FROM Account

The SSIS this time set the data length as STRING 20.  I ran the package again and it worked like a charm. The size of cache memory drastically reduced to 171359174 bytes (i.e. 160 MB). This was N times less than 14GB.  The package ran smoothly even without the idea of creating cache file.

"[Cache Transform [28]] Information: The component "Cache Transform" (28) processed 1748563 rows in the cache. The processing time was 0.531 seconds. The cache used 171359174 bytes of memory."

Wow. what a difference and what an impact.

We had two lookups using the same query and with the query change the package ran smoothly (Even without using the Cache). But just for curiosity, observed that each of these lookups used 160 MB of cache memory each (i.e. 320 MB) during execution.  Using the cache in the lookup made the package to use only 160 MB of cache memory as the package loaded the cache only once (the cache file itself was loaded separately in another package).

Post changes in query, the package ran even without having to increase the default buffer size (10MB) of data flow task. The lookup now took 160 MB of cache memory which was higher than the 100 MB max buffer size as mentioned in post which is an area of still for investigation.

Anyways for now, works great!

·  Do have an eye on the query that you write and the data length set by SSIS on the columns of SSIS in lookups. Remember STRING is not same as VARCHAR in terms of memory allocation.
·  Do have an eye of the Max buffer size of Data Flow Task versus Total Cache memory consumed by data flow task
·  Do not get tricked by the amount of free space in the server. You still can run into the memory issue because of the limits set on individual tasks.
·  Save all the work before your try as sometime this experimentation results in system restart.

1 comment:

  1. BIN checker service that's current and simple to make use of. Before you purchase, you should have a try out


Note: Only a member of this blog may post a comment.


Related Posts with Thumbnails