Achievements

« SQL-U Day Three SSIS Fundamentals : Data Flow Task | Main | T-SQL Tuesday 6: What about BLOB? »
Tuesday
May112010

SQL-U Day Two SSIS Fundamentals : Script Tasks and Components

In Day 2 of this course you are going to learn about two very powerful 'pieces' of SSIS; The Script Task and the Script Component.  They are related, but have two widely different purposes and are also bound by different rules within SSIS.  The examples I will be giving are in C# and therefor work only in SSIS 2008.  The concepts are the same, so if you are using SSIS 2005, just look for the VB .NET versions of the code.  Let's go over some fundamentals of the Script Task first.

I like to access variables from within the script itself and use the Variables object.  This is the container for your package variables and is declared like so:

Variables var = null;

Now you have to determine what variables you want to interact with and how you want to interact with them.  You have two choices when it comes to interaction.  The first is ReadOnly and the second is ReadWrite.  To access the variables themselves you need to go through the Dts object and within the the VariableDispenser.  

Dts.VariableDispenser.LockForRead("User::archive_root");

Dts.VariableDispenser.LockForWrite("User::filename");

Now that I have listed the variables I want to interact with I need to stuff them into the container that I can interact with.

Dts.VariableDispenser.GetVariables(ref var);

I now access those variables through the var object.  Remember, only one object at a time can interact with those variables.  If you have multiple scripts interacting with the same variable sets there is a good chance that you will actually deadlock on accessing them.  Remember that SSIS is parallel in nature so unless you connect objects to one another, they will run simultaneously.  Keep this in mind.  I'll be publishing an article on the parallel nature of SSIS in the near future.

Now that I have the variables I want to interact with, the method of interaction, how does one actually interact with them?  To pull information out of a specific variable, I do the following:

 

The Script Task lives on the Control Flow.  This task is excellent for building custom made logic and looping tasks.  Yes, I said looping tasks.  In some instances I have found that looping through a directory structure with code built in a Script Task via VB .NET or C# (if you are using 2008) sometimes more efficient than using the Foor Loop task.  If anyone is interested in see some of my scripts that do this, send me an email, hit me up on twitter and I'll put together a 'one day course'.  I am going to start with the first Script Task in the example package; 'Extract User Name and Information'.

When you are done working with your variables, you want to release them.  When you call the method to put the variables into the Variables object so you can interact with them, you essentially lock them, which gives that script/object exclusive access.  I like to release them as soon as I am done working with them, even if I am not done with my script.  To release them you simply make the following call:

var.Unlock();

Remember, var is the name of the Variables object in my script.  Be sure you put the name of your variable.  So if decided to name your variable littlevar you would interact as follows:

Variables littlevar = null;

Dts.VariableDispenser.GetVariables(ref littlevar);

littlevar.Unlock();

Now that we have the fundamentals of interacting with package variables within the Script Task let's move on the actual example task and what I need it for.

If you remember from Day One, the package is designed to wait on four directories for a file with a given wildcard signature.  Once a file matching the wildcard is placed in one of the directories it is picked up and processed.  The package has to know which of the directories the file was placed in (a file will be placed in only one of the four at any given time).  The above Script Task extracts that information.

The task checks to see which of the four variables is not empty.  In the example I am checking the B_file variable (names of the innocent have been preserved).  If that variable is not empty, then this is the person who dropped the file.  I then assign other variables to specific values to carry on this fact.  I set file I am import and process equal to the value of th variable set by the File Watcher.  I fill the values of the variables I will use should I need to send a duplicate list out, the email address to send all correspondence to, the excel of ISBNs we don't own (otherwise known as Not Our Product), and the name and location of where to archive the file.

I have another Script Task near the bottom of the package which simply moves the current input file to an archive location.  If the current filename exists in the archive, then just delete the working file.

You'll notice we use the package variables set by the first Script Task.  This gave us the proper file name and location.  This should give you a solid foundation to build your own Script Tasks.  Now it's time to move on to Script Components.  Let's build a foundation for the Script Component, just like we built one for the Script Task.  

Script Tasks come in three flavors;Source, Transformation, Destination.  They are all pretty much self explanatory from the name.  I will be showing examples of Transformation and Destination.

Since the task is within a Data Flow Task it will work on each and every row that makes it from the source to the task.  You have to define what columns within the row you will be working on.  You might work on every column or only one column.  Regardless, you need to identify them to the Script Transformation.  Once inside the the task there are three methods that you can use. The first method is called PreExecute().  This method is called when the task is initialized, not for every row.  This is a good place to put variables that need to be initialized, like counters.  The next method you will see is called, you guessed it PostExecute() and is called after the last row is processed.  This is a good place to put your interaction with package variables.  As a matter of fact, in a Script Task you can interact with ReadOnly variables only in the PreExecute() method ReadWrite variables in the PostExecute().  This is explained in more detail here and here. Let's move on to some examples.

Within the first Data Flow Task, 'Load X001' we have a Script Transformation and a Script Destination.  We'll explore the Script Transformation first.  The input columns in the script editor shows:

The two variables show above, Adj_ISBN10 and Adj_ISBN13 are available to the script.  You cannot assign new values to them and if you try, you will get an error.  Being this is a transformation, you need a way to take what you are going to do and pass it down the line.  Those variables are defined in the Inputs and Outputs.

Two downstream columns are defined, C001_ISBN10 and C001_ISBN13.  Oh, something to note, SSIS removes underscores from names, so in the image you will shortly see, C001_ISBN10 is referenced as C001ISBN10.  Let's peek inside the script itself.

The purpose of this script is twofold.  Convert ISBN 13s to ISBN 10 and convert ISBN 10s to ISBN 13s.  All of the variables we have both input and output are accessible through the Row object.  I check the input variables AdjISBN10 and AdjISBN13 to see if there is a value.  If there isn't I create one based upon the other.  ISBN13s and ISBN10s are related via a formula, which I define in two functions beneath the main Input0_ProcessInputRow.  You can see a portion of one, fixISBN10.

That is just about all there is to know about transformations.  I take in one or more columns from every row and process them.  We will move on to the Script Destination.

Destinations are an endpoint in the Data Flow Task.  They are used when you need to do work on certain columns and rows, but do not create new outputs.  I have used them in two situations, this being one of them.  I am going to interact with a package level variable (I'm actually going to store the results of the destination in a variable).  

I have inputs just like in a transformation.

In the destination there are no outputs, this is the end of the line for this path.  This destination is fairly simply.  I am appending to a string, each ISBN that makes to to the destination.

I create a stringbuilder which is component wide.  There are no read variables I want to extract from, so there is no code in the PreExecute().  On a row-by-row basis, I take the incoming ISBN and append a ',' to it.  I then strip the final ',' and pass it to the package variable in the postExcecute().   Notice I do not use the Dts object or stuff them into a variable of type Variables.  I have access to the variables on the ReadOnlyVariables and ReadWriteVariables through the Variables object; Variables.variablename.  Remember I can only access ReadOnly variables in the PreExecute() and ReadWrite variables in the PostExecute().

This should give you a solid foundation to create your own  Script Tasks and Script Components.  Don't fear them as they are incredibly powerful, but don't look to re-invent the wheel.  If there is a stock component or even a free component available through sites like codeplex or sqlis then use them.

This wraps up our Day 2 class.  Tomorrow we will look at the Data Flow Task in detail.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>