Monday, November 14, 2011

Problems in SSIS FTP Task

Have you faced any problems in SSIS FTP Task connectivity? Has the FTP Task stopped working in production and worked locally? Are you able to connect from browser/ftp client tools and not from SSIS?

I did face this problem and so did my colleagues. It took some effort and time in explaining my problem to infrastructure team so that they could help me to identify what could be wrong. I did think that it could be an issue with firewall but wanted to confirm it. It was a simple fix in the end and was interesting to know the reason for failure.

I thought of sharing this as many other people I know faced the same problem and this time I was happy to guide them and save a lot of precious time and effort.

I am sure you are facing the same problem and hence you have landed on this page. So let me first tell you the fix and then explain what is happening.

Solution to this problem is to set up the SSIS FTP Task connection object to connect in Passive mode.

How to do it? – Open the SSIS package and in the “connection managers” double click the FTP connection object. In Options - select “Use passive mode”. Now run the package and it should work without any errors.

The SSIS FTP Task uses the Active mode by default and in a Firewall environment setup this does not work. . The Browser FTP client uses the passive mode by default and hence it works.

What’s happening under the hood? Here are my notes.
o    FTP is purely a TCP based service
o    FTP uses 2 ports for communication:-
§  Command Port (usually Port 21)
§  Data Port (usually Port 20. Depending on the mode, it may or may not use 20)
o    FTP has 2 modes
§  Active mode
§  Passive mode

In Active mode
o    Client contacts FTP Server’s Command Port (21) from a random port (N>1023, say 1026). This is clients command port. It also sends Information on the port it will listen to (for data) (say 1029 command=PORT 1029)
o    FTP server sends an ACK back to port (1026). A connection is established.
o    FTP server initiates a connection (from port 20 its data port) to port 1029 (client. Earlier sent info).
o    FTP server now establishes a connection with this port and sends the data using this port.

Firewall Issue:
If a firewall is set up at client, then a request from FTP in active mode will be treated as some external source trying to connect to client and therefore will reject/block the request. This is because the client just sent the port info that will it will listen to and it did not really open a connection.

In Passive mode
·         Client contacts FTP Server’s Command Port (21) from a random port (N>1023, say 1026). This is clients command port. Client now issues PASV command instead of PORT command.
·         Server then replies with from port 21 to 1026 telling the client the port number it is listening to for data connection.
·         Client then initiates data connection from its data port (say 1029) to servers data port 1024 (instead of data port 20)
·         Server finally sends ACK to client’s data port 1029 from server’s data port 1024.

Reference article: (really well explained)

Monday, November 7, 2011

Comma seperated values (SQL)

This code generates comma separated values for an entity having multiple values. i.e. Multiple rows into a single row data value (comma seperated).
Found this code very useful for data validations in datawarehouse and thought of sharing this.
Here is the code snippet 

Create Table #TempData (EmployeeName Varchar(50), EmployeeNickName Varchar(50))

Insert Into #TempData Values('Vinuthan','VB')
Insert Into #TempData Values('Vinuthan','Vinu')
Insert Into #TempData Values('Vinuthan','Newton')

Insert Into #TempData Values('Venus','Ven')
Insert Into #TempData Values('Venus','Vens')

 Query to get comma seperated values
                                SELECT ',' + EmployeeNickName  FROM
                                 #TempData a WHERE b.EmployeeName = a.EmployeeName  FOR XML PATH('')
  ) EmployeeNickName
 FROM  #TempData b


Monday, July 4, 2011

MDX: Query Performance ORDER using Val

One of my colleague working on a different project had an issue with performance of a query. The query looked something like this

Key Points: It has an ORDER BY clause, the member used for ordering converts the string value to Int using Val.

{[Measures].[Answer Count]} on 0,
[Location].[Region].[Region] *
[Location].[Retailer Store Number].[Retailer Store Number],
[Location].[Retailer Store Number].CURRENTMEMBER. MEMBER_CAPTION
,[Measures].[Answer Count]) on 1
WHERE [Visit Start Date].[Last 1 Week].&[Current Week]

One does not have permission to change the structure of the cube as client has not given permission. Data type for store number is String. The store number stores integer values but yet the data type is String. Hence ORDER was @MDX

Initially I suggested the following

SET OrderedStoreNum AS
        [Location].[Retailer Store Number].[Retailer Store Number]
        ,[Measures].[Answer Count]
    ,Cint([Location].[Retailer Store Number].CurrentMember.Member_Caption)
    [Measures].[Answer Count]
} ON 0
    [Location].[Region].[Region] * OrderedStoreNum
    ,[Measures].[Answer Count]
) ON 1
FROM [cdw];

Reasoning for above suggestion was as follow
It is better to do an "order by" on a set independent of the current context and then do    
cross join.
           Additionally filter out Nulls before doing order by and reduce the size of set
 Use CInt instead of Val directly and not let AS decided the base data type it needs to  
 convert to under the hood. (Val does not throw an error if value is a string value whereas 
 CInt will throw an error  Type Mismatch.)
 Resort to BDESC rather than DESC. BDESC does a simple sort and not a hierarchical sort

Additionally I landed up in this article:- . which speaks about not using the NonEmpty Clause within the aggregate function. But in our case this does not apply as the order by is not an aggregate function.

To substantiate this and dig more into it, found some interesting stuff. Hrvoje helped me in this as he had the required volume of dataset.

So here were the readings

The fact table contains around 3M rows. The dimension contains around 249,607 members: [Dim1].[Attr1].[Attr1]. Dim1].[Attr2].[Attr2] contains around 160 members. [Dim1].[Attr1] is an hierarchy that has 7 levels.

a:- this is the time (mm:ss) for query created using set
b:- this is the time (mm:ss) for query created using ORDER directly in the cross join.

Query 1: 
a) Time 07:51
b) Time 07:51 

      ,Val([Dim1].[Attr1].CurrentMember.MEMBER_Key) -- Key is SK of int type

      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1

Query 2: 
a) Time 03:57  
b) Time 03:58 

      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])
      , Val([Dim1].[Attr1].CurrentMember.MEMBER_Key)

      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1

Query 3:
a) Time 01:06
b) Time 01:04

      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])

      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1

Query 4:
a) Time 00:46
b) Time 00:39

      NonEmpty([Dim1].[Attr1].[Attr1], [Measures].[M1])

      [Measures].[M1] ON 0,
      NON EMPTY [Dim1].[Attr2].[Attr2] * setL ON 1

1. Using Val is a big culprit. Int is better than Val. But changing the source data type is the best.
2. It is better to use NonEmpty within ORDER().
3. Though there is no considerable difference between the creating a set versus using ORDER() directly in cross join, i think it is better practice to do so.
4. The observation is when both the attributes are belonging to same dimension. I assume that the behavior would be same when we use attributes of different dimension in a cross join
5. I could not do much analysis on BDESC versus DESC.

Note: This query difference may not be substantial if the dimension and measure data is small.

Monday, June 20, 2011

Windows versus Sql collation

There was a question posted by my friend as to which collation to select for a column, whether the “SQL_Latin1_General_Cp1_CI_AS” or the “Latin1_General_CI_AS”.

My first reaction was oops….we normally use the defaults by the SQL server i.e. SQL_Latin1_General_Cp1_CI_AS mostly. The other option is SQL_Latin1_General_Cp1_CS_AS (Case Sensitive) where “ABS” and “abs” are two different values. But trying to find answer to the question I realized that it became a choice between SQL collation or the windows collation.

Here is some dig.
A “collation” specifies how strings are compared and sorted, and what character set is used for non-Unicode data. SQL Server supports two types of collations:
SQL collation: Example: “SQL_Latin1_General_Cp1_CI_AS”
Windows collation: Example: “Latin1_General_CI_AS”

For a Windows collation, a comparison of non-Unicode data is implemented by using the same algorithm as Unicode data. Both Unicode and non-Unicode sorting are compatible with string comparison rules in a particular version of Windows.

In a SQL collation, SQL Server defines different comparison semantics for non-Unicode data.
A SQL collation’s rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type.

Lets do some hands on here.

Example: Create 2 tables namely tblSQLCol and tblWinCol
        CREATE TABLE tblSQLCol (a1 varchar(50), na1 nvarchar(50)) 
        //Here the first column is a non UNICODE data type and second column is a UNICODE data type
[Note: this will create the columns with SQL_Latin1_General_CP1_CI_AS.  This is because the Setup program does not set the instance default collation to the Windows collation Latin1_General_CI_AS if the computer is using the U.S. English locale. Instead, it sets the instance default collation to the SQL collation SQL_Latin1_General_Cp1_CI_AS. So since the server instance is set with SQL.. collation, when creating table, it takes the collation set at server level. Here CP1 means Code Page [1=1252 the default code page], CI means Case insensitive and AS meaning Accent Sensitive]

        CREATE TABLE tblWinCol (a1 varchar(50) COLLATE  Latin1_General_CI_AS,na1 nvarchar(50) COLLATE          
        //Here the first column is a non UNICODE data type and second column is a UNICODE data type
[Note here the columns will be set with the specified collation]

Insert the data into the table in sql collation
INSERT INTO tblSQLCol VALUES(‘a-c’,'a-c’)
Now insert the data into the table with windows collation
INSERT INTO tblWinCol VALUES(‘a-c’,'a-c’)
INSERT INTO tblWinCol VALUES(‘ab’,'ab’)

Now lets see the difference and affects of SQL and windows collation
select a1,’sql’ from tblSQLCol order by 1 asc  //SORT ASCENDING on first column (non Unicode) retrieved using the SQL collation table

select na1,’sql’ from tblSQLCol order by 1 asc //SORT ASCENDING on first column (Unicode) retrieved using the SQL collation table

select a1,’latin’ from tblWinCol order by 1 asc //SORT ASCENDING on first column (non Unicode) retrieved using the Windows collation table

select na1,’latin’ from tblWinCol order by 1 asc //SORT ASCENDING on first column (Unicode) retrieved using the Windows collation table

NOTE: if you see the result, one can observe that for windows collation, in both cases (Unicode and non Unicode) same sorting and comparison logic is used. In SQL collation two different logics are used.

Now lets us see how any sorting algorithm happens at the .NET level.
Lets test the default sorting order
ArrayList arrlst = new ArrayList(2);
RESULT: {1,2,3}
ArrayList arrlst = new ArrayList(2);
RESULT: {“ab,”a-c”}
Now lets us test with the some code which we generally might do at .NET
SqlConnection conn = new SqlConnection(“server=SERVERNAME; user 
DataSet ds = new DataSet();
SqlDataAdapter adapter = new SqlDataAdapter();
adapter.SelectCommand = new SqlCommand(“select a1 from test”, conn);
adapter.Fill(ds, “test”);
DataRow[] rw = ds.Tables[0].Select(“”,”a1 ASC”);
foreach(DataRow r1 in rw){ MessageBox.Show(r1[0].ToString()); }

Same is the result with adapter.SelectCommand = newSqlCommand(“select na1 from test”, conn); and also with test2 as the table
RESULT: {“ab”,”a-c”}

This clearly shows that with SQL collation and non Unicode data type, at windows application layer different sort logic is applied and at the SQL layer a different logic is applied.

So probably to keep the sorting logic constant across both layers, one can have a Windows collation like the Latin1_General_CI_AS toSQL_Latin1_General_CP1_CI_AS

Regarding conversion of a collation of a column fromSQL_Latin1_General_CP1_CI_AS  to Latin1_General_CI_AS , The column of interest if changed should be checked if it is used in any join conditions. If the joining column is not changed then we will get an error

Example: select * from tblWinCol inner join tblSQLCol on tblWinCol.a1 = tblSQLCol.a1 will give error “Cannot resolve collation conflict for equal to operation.” Same should also be checked with any temp table joins. Not sure if any data losses will occur. Ideally it should not occur

This was done on SQL 2005 and was a nice learning that i thought should be shared.


Related Posts with Thumbnails