Sunday, November 29, 2015

No of Data Flow Task executing simultaneously based on the number of Processor and Parallelization setting

In this example, we will see the effect of number of system processors on execution of multiple Data Flow Tasks simultaneously.




We are demonstrating this example using Intel Core i5 system with 4 processors.


We have created a new SSIS package and renamed it to MultipleDFTAndProcessors.dtsx


In this example, we will use multiple Data Flow Tasks. 


We have created replication of HumanResources.Employee table in AdventureWorks2012 database as dbo.Employee. Further we added two columns DF(Number of the Data Flow Task running) and InTime(Time of record insertion to dbo.Employee). 

Let us drag and drop Data Flow Task on the Control Flow area. Using OLE DB Source and Destination, we configured Data Flow Task to copy from HumanResources.Employee and insert to dbo.Employee table. We have used Sort Transformation to sort by BusinessEntityID key field in HumanResources.Employee. We have also used Derived Column Transformation to provide incremental value field DF, to recognize each Data Flow Task, starting with 1.


 


Repeat the same for another 14 times to create in total of 15 Data Flow Tasks using incremental DF values



Now let us experiment by setting Control Flow property MaxConcurrentExecutables. This controls parallelization in Control Flow Task. MaxConcurrentExecutables is -1 by default meaning the number of concurrent Data Flow Tasks to execute will be equal to the number of system processors + 2.

As mentioned earlier, we are experimenting on 4 processors system. If we set MaxConcurrentExecutables to -1, we will be using all the 4 processors in this system.

Execute the package and we should see that simultaneously 6 Data Flow Tasks (i.e. 4 Processors + 2) are executing. Post package execution and on checking dbo.Employee table, we see that InTime for some DF values is same. This means that we can say for sure that parallelization took place and some Data Flow Tasks executed and completed simultaneously. But again parallelization depends on the system load and availability of processors at the time of package execution.

If we set MaxConcurrentExecutables to value greater than 4, Control Flow Engine will try to meet the request for concurrent executions but this again depends on the processor availability and system load. Truncated table dbo.Employee before executing the package with the changes mentioned below.


If we set MaxConcurrentExecutables to value less than 4, we will be using that number of  processors simultaneously. Truncated table dbo.Employee before executing the package with the changes mentioned below.
 
 

 

Thus we can conclude that the number of processors and setting of MaxConcurrentExecutables plays a role in Control Flow Tasks parallelization.

Also, if we set MaxConcurrentExecutables to 1, we can suppress parallelization. The property MaxConcurrentExecutables cannot be set to 0 as it results in error.



Please provide your feedback and suggestions by commenting.  

No comments:

Post a Comment