This will be the fourth and final day of this class. I figure by the fifth day everyone's brain is mush, so an early weekend for all! Today we will go over some of the things that didn't fit anywhere else. We will go over Precedence Constraints, Send Mail Tasks, and a little bit of Data Flow Task performance and tuning.
Precedence Constraints are what connect components. Usually they are a green line from one component to another that controls how SSIS runs its tasks. It goes from source to destination. You drag a connector from a component (source) to a component (destination). To help you discern which way is which, you can see a little arrow pointing the way. We will be using the example shown below throughout this session.
The dark green lines will be followed if the source component succeeds. If the line is red, it will be followed if source component fails. If the line is blue, the destination component will run regardless of the outcome of the source component. That's easy to understand. You might be thinking "What about the dotted green lines?" Well, keep in mind the colors are set by the above conditions, which I will restate:
- Green: source component must succeed to continue
- Red: source component must fail to continue
- Blue: continue to next component regardless of previous condition
The dotted lines happen because of this:
The highlighted section relates, not to the actual constraint, but all of the constraints. The constraint itself is set by the "evaluation operation":
There are 4 conditions for a given constraint:
- Constraint and Expression
- Constraint or Expression
This means that a constraint will become active if first it meets the 'Green/Red/Blue' requirement and then if it meets one of the above four. This provides numerous options. The most common one that I use would be: "Success,Expression & Constraint,Logical OR'. Going back to the image at the beginning, any number of the dotted lined constraints can be activated, so long as Data Flow task succeeds and it meets the requirement of the variables I have being tested. Take some to go over this, it can be confusing, but once you get the hang of it, it will be very powerful. Next up, Send Mail task.
This is the basic Send Mail task. All of the parameters should look familiar to anyone who has sent an email. All of these can be static or can be set via expressions. The one thing to note is the from: address. This has no bearing on anything. It doesn't even have to be a valid address, but it must be filled in. I tend to make it SSIS@<company>.com. Expressions are set via the Expressions tab.
In the example above, I use expressions to set the TO: line and the Message Source. The component is connected via an SMTP Connection Manager, which looks like this picture below.
In our environment we fill in the SMTP Server name only. Check with your admins to see what is required for your environment.
Before I dismiss class for the day, I wanted to touch upon some performance tuning of the Data Flow Task.
The first two lines are somewhat self evident. If you import images or work with blob columns, SSIS might not have enough memory to store that information in memory and will spool it to disk. You can tell SSIS where to specifically store these temporary buffers for blobs by stating this in the BLOBTempStoragePath. If you do not, SSIS defaults to the local TEMP directory. This is the same for BufferTempStoragePath. The difference between the two (BLOB vs Buffer) is it is very difficult to determine when SSIS will use the BLOB storage. I have had machines with GB of memory, yet SSIS still used the BLOB storage. There is some internal mechanism that determines whether to spool BLOB information or not. Buffer storage is a little easier to determine; Do I have enough memory to store the buffers I need? Remember you might have plenty of space to store your buffers while in dev or even QA, but hit production and you might hit contention. If SSIS does not have enough memory to store the incoming buffers, it will push them to the local TEMP directory unless you specify one here. Where I work we have a production file share that I use for all packages that require files to be stored. This enables me to write my SSIS packages in a machine agnostic way.
DefaultMaxBufferRows and DefaultMaxBufferSize work in tandem. SSIS will use whichever is smaller to fun the task. It is set, in 2008, to 10k rows, 10MB of memory. This means that if I can fit 8k records in 10MB of memory, then each pass will extract only 8k records. I can set DefaultMaxBufferRows up to 100k, but it will still pull 8k on each pass, due to the 10MB memory restriction. On the reverse if I set DefaultMaxBufferSize to 20MB, but keep the DefaultMaxBufferRows at 10k, SSIS will only pull 10k records each pass. Remember, it is the lesser of the two. You are going to find that you will need modify these together to be effective. Also, note this a property of the Data Flow Task, so if you have say 5 of them running simultaneously and you have the memory set to 100MB, you are looking at at least 500MB of memory being used when it hits the Data Flow Tasks. I have simplified this a bit in the hopes that you understand the mechanism. To truly tune, you need to work on one piece at a time, use the log events in BIDS to see how much memory is being used.
I hope you have learned something from these four days of SSIS fundamentals. Please to to SpeakerRate and let me know your thoughts. Should I have spent more time on a particular subject? Did I miss a particular component that you felt was important? Was I verbose or not explicit enough? Please let me know so I can improve. Thank you.