The XML source reads an XML data file.
The data in XML files frequently includes hierarchical relationships.
The XML source supports use of a XML Schema Definition (XSD) file or inline schemas to translate the XML data into a tabular format. Using XML Source Editor dialog box, the user can generate an XSD from the specified XML data file.
DTD's are not supported.
The schemas can support a single namespace only; they do not support schema collections.
The XML source does not validate the data in the XML file against the XSD.
The XML source supports three different data access modes. We can specify the XML data file location, the variable containing file location, or the variable containing XML data.
The XML source includes the XMLData and XMLSchemaDefinition custom properties.
The XML source supports multiple regular outputs and multiple error outputs.
Using the XML Source Editor
The data in the XML files frequently includes hierarchical relationships. To provide information about the hierarchical relationship of the data in the outputs, the XML source adds a column in the outputs that identifies the parent element for each child element. The XML source creates an output for every element, but it is not required that we use all the outputs.
For every output, the XML source adds an error output. By default, the columns in error outputs have Unicode string data type (DT_WSTR) with a length of 255, but we can configure the columns in the error outputs by modifying their data type and length.
If the XML data file contains elements that are not in the XSD, these elements are ignored and no output is generated for them. On the other hand, if the XML data file is missing elements that are represented in the XSD, the output will contain columns with null values.
When the data is extracted from the XML data file, it is converted to an Integration Services data type except DT_TIME2 or DT_DBTIMESTAMP2 data types because the source does not support these data types.
The XSD or inline schema may specify the data type for elements, but if it does not, the XML Source Editor dialog box assigns the Unicode string data type (DT_WSTR) to the column in the output that contains the element, and sets the column length to 255 characters.
If the schema specifies the maximum length of an element, the length of output column is set to this value. If the maximum length is greater than the length supported by the Integration Services data type to which the element is converted, then the data is truncated to the maximum length of the data type. For example, if a string has a length of 5000, it is truncated to 4000 characters because the maximum length of the DT_WSTR data type is 4000 characters; likewise, byte data is truncated to 8000 characters, the maximum length of the DT_BYTES data type. If the schema specifies no maximum length, the default length of columns with either data type is set to 255. Data truncation in the XML source is handled the same way as truncation in other data flow components.
Custom properties - All properties are read/write.
AccessMode - Integer - The mode used to access the XML data.
UseInlineSchema - Boolean - A value that indicates whether to use an inline schema definition within the XML source. The default value of this property is False.
XMLData - String - The file or variables from which to retrieve the XML data.
The value of this property can be specified by using a property Expression.
XMLSchemaDefinition - String - The path and file name of the schema definition file (.xsd).
Custom properties of the output of the XML source. All properties are read/write.
RowsetID - String - A value that identifies the rowset associated with the output.
The following example uses XML Source. In this example, we will create XML file and use the same with XML Source. Before moving ahead with example, refer these earlier articles:
Create a new SSIS project using BIDS / SSDT
Data Flow Task
No comments:
Post a Comment