I've migrated my blog from SquareSpace5 to SquareSpace 6. This address is no longer being maintained. The new address is http://josefrichberg.com. Part of the reason to move was better integration with social media (although automated pushes to google+ profiles still eludes me) and the other part was a fresh look. I have decided to move off Feedburner (native social media integration with SS6) which means this will be the last email pushed. If you subscribe to this blog via email or used the original Feedburner RSS feed, please visit the new site, select the RSS Feed from the Follow Me menu option.
I have text file that contains data for a given fiscal period. In this particular file there are around 2 million records. The file needs to be split into two distinct output paths from the same source, which means this should all happen within a given DataFlow component. One of the paths simply sums the data via a specific set of columns, nothing unusual. The other path required that if the fiscal period was not valid, then no data was to be written. This sounded simple enough, except the validation routine was a stored procedure.
Now you might be thinking, "Easy peasy, lemon squeezie! Just put an OLE DB Command in-line and you can call that stored procedure to validate your data." That would call the stored procedure once for each record (totalling 2 million) when I only need a single call, because although the fiscal period is a column in the file, it is the same for all records. All I need to accomplish this is a Script Component and here's how I did it.
After I set up my preexecute to create the stored procedure call, I need to pass in a few parameters, execute it and get a result back. In this case the result will be a true/false value that we want to return back to the calling portion. Here is that method:
Remember that we will be processing millions of rows and I don't want to call this for every row, just once per file. To accomplish this we use two variables: RunCheck and passThrough. The variable RunCheck acts like a one way door, setting itself to false immediately after checking. As you might have guessed from the name it is used to determine if the stored procedure needs to be called. The variable passThrough is set to the result of the checkTable method and will determine if the data coming down the pipeline is valid or not. If it is valid, then it is sent passed this component and on to the output. If the data is invalid, then it is held. Here is what the ProcessInputRow method looks like:
As you can see from the picture if passThrough is true then that means allow the data to pass through to the output. If the call returned a false, then rows will never be added to the outputbuffer.
I was asked by some attendees of my PASS 2013 Session, Unseen Minions : Improving parallel processing in SSIS, if I had any sample code or examples of some of the ways my framework could be used. I have put together 3 samples along with supporting sql code. You will find a link to that zip file here: Example_Package.
I'm speaking at SSWUG Summer Camp! Thankfully, I can be at home, in my pool AND be at summer camp (oh, how I love modern technology!). If you have the time, come listen to me blather on about SSIS. http://www.sswug.org/vc
I am honored to say that have been selected to present this year. I'll be giving a talk on my threading framework for SSIS. Unseen Minions : Improving parallel processing in SSIS. I am very excited and hope to see you there!