CHEF-METADATA : A Guide

This guide helps the user to understand how CHEF Metadata has been structured and how it works. Having gone through this guide, the user will be able to:

  • Understand the hierarchy of CHEF Metadata
  • Understand the various sections of the Metadata
  • Write custom Metadata to perform desired tasks through CHEF.

The basic structure of CHEF Metadata can be found in MetaData.XML

 CHEFMetaData

ApplicationName: Specifies the name of the application. The user can specify any custom value.

  •   Process

a)      DefaultAllowTruncate: This can accept only “True” or “False”. “True” indicates that truncation of values is allowed while inserting data into the destination. For example, if a destination column is of data-type nvarchar(5) and the value for that column in a certain row arrives as “cheftest”, then it will be truncated to “cheft”. If the value is set to “False”, then such a row will be treated as a bad row.

b)      VerboseLogging: This can accept only “True” or “False”. “True” enables verbose (detailed) logging activities, while “False” disables it. Enabling verbose logging incurs high cost and therefore should be done with caution.

c)      BadRowsFolderLocation(Flat files): Bad rows refer to the rows which contain values incompatible with the target schema. The value of this attribute can be set to the path where such bad rows are to be stored.

d)      ExecuteExistingPackage: This again can accept only “True” or “False”. If the value is set to “True”, CHEF searches for a project in the Integration Services Catalog with matching ‘ProcessID’ and ‘ProcessName’ as that provided in the metadata. If such an entry exists, CHEF prepares it for execution. If not, CHEF creates the corresponding package first. If the value is set to “False”, the package is created afresh, and it replaces the previously deployed package, if any.

ConnectionSet

The user can specify the connections he wants to include in the SSIS package.

Key: The attribute ‘Key’ is common to all the connections specified in the metadata. It identifies each connection uniquely.

The types of connections allowed include:

1. SQLConnection (trusted): The attributes include:

a)      ServerName: Name of the server the user is attempting to connect to.

 b)      DatabaseName: Name of the database in the server to which the user is attempting to connect.

 c)      TrustedConnection: This accepts “True” or “False”. “True” indicates a trusted connection and vice versa. 

2. SQLConnection (explicit authentication): The additional attributes it includes are:

a)      UserID: A valid user ID.

b)      Encrypted Password*: A valid encrypted password.

c)      Encrypt: Accepts “True” or “False”. If set to “True”, the connection will be encrypted by CHEF. 

3. TableStorageConnection#:

a)      DefaultEndPointsProtocol: The user can choose between “http” and “https”.

b)      AccountName: Account name of the user.

c)      EncryptedAccountKey*: Account key of the user in encrypted format.

d)      UseDevelopmentStorage: This accepts either “True” or “False”. “True” indicates that development storage will be used.

 

4. FileConnection and FlatFileConnection: Both contain the same attribute:

a)      FileName: The user can specify the complete file path

5. SMTPConnection: The attributes include:

a)      SmtpServer: The SMTP server.

b)      UseWindowsAuthentication: This accepts either “True” or “False”. “True” indicates that Windows authentication will be used to authenticate the user.

c)      EnableSsl: This again can accept “True” or “False”. “True” indicates that SSL will be enabled in the SMTP connection.

 

Note :*The user can use the CHEF AdminUtility to encrypt Password/AccountKey.

#The SQL Azure Database service is only available with TCP port 1433. The user should ensure that his/her firewall allows outgoing TCP communication on TCP port 1433. SQL Azure does not support Windows Authentication. The Trusted Connection will always be set to “False”. SQL Azure doesn’t support unencrypted connections. The user needs to specify in his/her connection string that he/she wants to encrypt the connection. Connecting to SQL Azure by using OLE DB is not supported.

 Variables

The user can specify all the variables that he would require in his package.

Variable: Describes each variable. The attributes include:

a)      Name: Name of the variable.

b)      DataType: Data type of the variable.

c)      Value: Initial value of the variable.

 SetVariables

SetVariable: Describes each set variable. A set variable can hold a result set which is the output of a SQL select command. Set variables can be placed inside a particular task as well. The attributes include:

a)      SQLStatement: The Select statement.

b)      TargetConnection: The ‘Key’ for the source connection to which the select statement is addressed.

ResultSet: Specifies a variable to hold each result set. The attributes include:

a)      VariableName: Name of the variable.

b)      Order: Specifies the order in which the result set is supposed to be assigned. Order starts from “0”.

Step

‘Step’ represents each package step. Each step can include a set of tasks. The attributes include:

a)      ID: Unique ID for the step.

b)      Name: Name of the step.

c)      TypeID: ID for the step type.

d)      TypeName: Name for the step type.

e)      Description: Description of the step.

SUPPORTED TASKS

 DataFlowSet

This specifies a set of data flows that the user wants to include in the functionality of the package. The attributes include:

a)        Name: The name of the data flow set; should be unique.

b)        SourceConnection: The key corresponding to a connection in the connection set.

c)         TargetConnection: The key corresponding to a connection in the connection set.

d)        SourceType: The type of source specified for the data flow set. The allowed types are:

                                                   i.                  Folder

                                                 ii.                  Excel

                                               iii.                  FlatFile

                                               iv.                  Table

                                                 v.                  SELECTSQL

                                               vi.                  TableStorage

e)        TargetType: The type of target specified for the data flow set. The allowed types are:

                                                   i.                  Excel

                                                 ii.                  FlatFile

                                               iii.                  Table

                                               iv.                  TableStorage

f)          PickColumnsFromTarget: Accepts “True” or “False”.

g)        RunParallel: Accepts “True” or “False”. “True” indicates that all the data flows specified in the set will take place in parallel when the package is executed. This is desired when there are no precedence constraints. “False” indicates that the data flows would take place sequentially, meaning that they are bound by precedence constraints.

h)        TruncateOrDeleteBeforeInsert: Accepts “Truncate”, “Delete”, or “None”. The specified action is performed on the target before data flow takes place.

i)          DeleteFilterClause: The delete clause, if any, used to filter the rows to be deleted.

Additional attributes allowed for ‘FlatFileConnection’ type source connection:

j)          ColumnDelimeter: Allowed types are “t” (tab), “,” (comma), “;” (semi colon), “|” (horizontal bar).

k)         RowDelimeter: Allowed types are “t” (tab), “,” (comma), “;” (semi colon), “|” (horizontal bar).

l)          IsColumnNamesInFirstDataRow: Accepts “True” or “False”.

m)      AllowFlatFileTruncate: Accepts “True” or “False”.

n)        AbortFlatFileLoadPostError: Accepts “True” or “False”.

DataFlow: Specifies an individual data flow. The attributes include:

a)      Name: Name of the data flow.

b)      SourceName: Source name.

c)      TargetName: Target name. 

 SQLTaskSet

This specifies a set of SQL tasks that the user wants to include in the functionality of the package. The attributes include:

a)      Name: Name of the SQL task set.

b)      TargetConnection: Target on which the SQL statement is attempted.

c)      RunParallel: “True” for parallel execution of SQL tasks, “False” for sequential execution.

SQLTask: Specifies an individual SQL task. The attributes include:

a)      Name: Name of the SQL task; should be unique.

b)      SQLStatement:The SQL statement. 

SendMailTask

a)      Name: Name of the task; should be unique.

b)      SMTPServer: ‘Key’ of the SMTPConnection.

c)      From: Sender’s email ID.

d)      To: Recipient’s email ID

e)      CC: Carbon copy recipients.

f)       BCC: Blind carbon copy recipients.

g)      Subject: Subject of the email.

h)      Priority: Accepts “Low”, “High”, or “Normal”.

i)        MessageSourceType: Accepts “DirectInput”, “Variable”, or “FileConnection”.

j)        MessageSource: Source of the message depending on the type.

Attachments: The files to be sent as attachments, file names separated by vertical bars ‘|’.

 TableStorageSet

This allows the user to load data into the Azure. The attributes include:

a)      Name: Name of the table storage task; should be unique.

b)      SourceConnection: The ‘Key’ to source connection.

c)      TargetConnection: The ‘Key’ to target connection.

d)      SourceType: Type of the source out of the allowed types.

e)      TargetType: Type of the target out of the allowed types.

TableStorage: Specifies individual table storage. The attributes include:

a)      TableName: Name of the table.

b)      SourceName: Name of the source.

c)      TargetName: Name of the target.

d)      PartitionKey: The partition key for storage.

e)  RowKey:The row key. 

  PackageExecution

The package execution task allows the user to execute a package from within the package being created. The attributes include:

a)      Name: Name of the package execution task; should be unique.

b)      Connection: ‘Key’ of the FileConnection specifying location of the package.

c)      PackageName: Name of the package to be executed.

Last edited Apr 9, 2013 at 12:46 PM by rjsingh, version 2

Comments

No comments yet.