Saturday, November 28, 2015

Develop a SSIS package to fetch only Excel files from given folder and read through multiple sheets in each file.

The requirement is to fetch only *.xls/*.xlsx files from given folder. Each file is expected to have random number of sheets but with same metadata i.e. type and number of columns. Read through each of such Worksheet in each file and copy data to common table.


This can be achieved using ForEach Loop Container, OLE DB Source(To read data from Excel 2007 and onwards)/Excel Source(To read data from Excel 97-2003) and any compatible destination in SSIS. 


This example explains steps to read from *.xlsx file with some notes to use *.xls file.

Drag and drop Foreach Loop Container. Configure to read from folder containing *.xlsx files.


Drag and drop another Foreach Loop Container into existing Foreach Loop Container and rename each container according to its purpose. 

In the Connection Managers area, right click and select New ADO.NET Connection...

Select New button under Configure ADO.NET Connection Manager. Under Provider, select Microsoft Office 12.0 Access Database Engine OLE DB Provider and click OK.

Under Connection Manager window, select All. Under Extended Properties, type:

  • EXCEL 12.0; HDR=YES; - For Excel 2007 and onwards

  • EXCEL 8.0; HDR=YES; - For Excel 97-2003

Click OK. 

Click OK on Configure ADO.NET Connection Manager window.

Right click on created ADO.NET Connection Manager and select Properties. 
Selecting Expressions property, click ellipsis button to open Property Expressions Editor window.
Select ConnectionString under Property and click on ellipsis button under Expressions corresponding to ConnectionString property row.

Opens Expression Builder window. Type following under Expression:


"Data Source=" + REPLACE(@[User::ExcelFilePath], "\\\\", "\\") + ";Provider=Microsoft.ACE.OLEDB.12.0;Extended Properties=" + "\""+"EXCEL 12.0;HDR=YES;"+ "\"" + ";"


This expression sets the Data Source to the excel file path from variable ExcelFilePath replacing double backslash with single backslash. Further it sets the Provider and Extended Properties.  

Evaluate Expression and click OK.

Click OK. Note the ADO.NET Connection Manager icon changes to reflect use of expression(fx).


Edit nested Foreach Loop Container. Selecting Foreach ADO.NET Schema Rowset Enumerator, select ADO.NET Connection Manager created.

Under Schema, select Tables.
Just to note, click on Set Restrictions. We see that Table_Name is listed 3rd and for this example, it provides us with SheetNames of the excel file.
Do not check mark on any options here and click Cancel.

Select Variable Mappings.
Create new variable SheetName.
Under Index corresponding to this variable, specify 2 to get TABLE_NAME(WorkSheets for this example).

Click OK. 


Add breakpoints on internal Foreach Loop Container for each iteration. Execute the package. Add Watch on SheetName variable to see the different sheets in excel file.
We can use OLE DB Source/Excel Source to read from these excel sheets based on Excel file version.


If you see package execution failure with following message in Progress tab:


Error: The GetEnumerator method of the ForEach Enumerator has failed with error 0x80131509 "(null)". This occurs when the ForEach Enumerator cannot enumerate.


Rectify this by setting Run64BitRuntime to False under SSIS Project Properties -> Configuration Properties -> Debugging.


Please feel to correct me by commenting your feedback and suggestions.

3 comments:

  1. after changing the Run64BitRuntime to false still i am getting the same issue after deploy but it is running properly in my server.

    ReplyDelete
  2. Hello, thank you for the tutorial, however I am running into error: Error: The GetEnumerator method of the ForEach Enumerator has failed with error 0x80131509 "(null)". This occurs when the ForEach Enumerator cannot enumerate., I did make sure that Run64BitRuntime is False, however still having the same problem, any other things I can try?

    ReplyDelete