Table of Contents
Section 1
Chapter 1: Moving Data with the Import and Export Wizard
Try It
Chapter 2: Installing SQL Server Integration Services
Chapter 3: Installing the Sample Databases
Try It
Chapter 4: Creating a Solution and Project
Try It
Chapter 5: Exploring SQL Server Data Tools
The Solution Explorer
Deployment Models
The Properties Window
The Toolbox
The SSDT Design Environment
Chapter 6: Creating Your First Package
Creating and Using Connection Managers
Using and Configuring Tasks
Exploring Package Encryption
Executing Packages
Try It
Chapter 7: Upgrading Packages to SQL Server 2012
Try It
Chapter 8: Upgrading to the Project Deployment Model
Try It
Section 2
Chapter 9: Using Precedence Constraints
Try It
Chapter 10: Manipulating Files with the File System Task
Try It
Chapter 11: Coding Custom Script Tasks
Try It
Chapter 12: Using the Execute SQL Task
Try It
Chapter 13: Using the Execute Process Task
Try It
Try It
Chapter 5: Exploring SQL Server Data Tools
The Solution Explorer
Deployment Models
The Properties Window
The Toolbox
The SSDT Design Environment
Chapter 6: Creating Your First Package
Creating and Using Connection Managers
Using and Configuring Tasks
Exploring Package Encryption
Executing Packages
Try It
Chapter 7: Upgrading Packages to SQL Server 2012
Try It
Chapter 8: Upgrading to the Project Deployment Model
Try It
Section 3
Chapter 9: Using Precedence Constraints
Try It
Chapter 10: Manipulating Files with the File System Task
Try It
Chapter 11: Coding Custom Script Tasks
Try It
Chapter 12: Using the Execute SQL Task
Try It
Chapter 13: Using the Execute Process Task
Try It
Cache Modes
The Cache Connection Manager and Transform
Chapter 25: Auditing Data with the Row Count Transform
Try It
Chapter 26: Combining Multiple Inputs with the Union All Transform
Try It
Chapter 27: Cleansing Data with the Script Component
Try It
Chapter 28: Separating Data with the Conditional Split Transform
Try It
Chapter 29: Altering Rows with the OLE DB Command Transform
Try It
Chapter 30: Handling Bad Data with the Fuzzy Lookup
Try It
Chapter 31: Removing Duplicates with the Fuzzy Grouping Transform
Try It
Section 4
Chapter 32: Making a Package Dynamic with Variables
Try It
Chapter 33: Making a Package Dynamic with Parameters
Try It
Chapter 34: Making a Connection Dynamic with Expressions
Try It
Chapter 35: Making a Task Dynamic with Expressions
Try It
Section 5
Chapter 36: Loading Data Incrementally
Try It
Chapter 37: Using the CDC Components in SSIS
CDC Control Task
CDC Source Task
CDC Splitter Task
Try It
Chapter 38: Using Data Quality Services
Try It
Chapter 39: Using the DQS Cleansing Transform
Try It
Chapter 40: Creating a Master Package
Try It
Section 6
Chapter 41: Using Sequence Containers to Organize a Package
Try It
Chapter 42: Using For Loop Containers to Repeat Control Flow Tasks
Try It
Chapter 43: Using the Foreach Loop Container to Loop Through a Collection of Objects
Try It
Section 7
Chapter 44: Easing Deployment with Configuration Tables
Try It
Final Deployment
Chapter 45: Easing Deployment with Configuration Files
Try It
Final Deployment
Chapter 46: Configuring Child Packages
Configuring an Execute Package Task
Configuring a Child Package
Try It
Section 8
Chapter 47: Logging Package Data
Try It
Chapter 48: Using Event Handlers
Creating Event Handlers
Common Uses for Event Handlers
Try It
Chapter 49: Troubleshooting Errors
Working in the Progress Tab
Troubleshooting Steps
Try It
Chapter 50: Using Data Viewers
Try It
Chapter 51: Using Breakpoints
Try It
Section 9
Chapter 52: Creating and Configuring the SSIS Catalog
Creating the Catalog
Configuring the Catalog
Creating and Using Folders
Try It
Chapter 53: Deploying Packages to the Package Catalog
Using the Deployment Wizard
Deploying Packages in the Package Deployment Model
Try It
Chapter 54: Configuring the Packages
Creating Environments
Configuring the Package
Try It
Chapter 55: Configuring the Service
Try It
Chapter 56: Securing SSIS Packages
Securing Packages in the Package Deployment Model
Securing Packages in the Project Deployment Model
Try It
Chapter 57: Running SSIS Packages
Executing Packages in the Package Deployment Model
Running Packages in the Project Deployment Model
Try It
Chapter 58: Running Packages in T-SQL and Debugging Packages
Running the Package
Debugging When Something Goes Wrong
Try It
Chapter 59: Scheduling Packages
Using Proxy Accounts
Try It
Section 10
Chapter 60: Dimension Load
Try It
Chapter 61: Fact Table Load
Try It
Section 11
Chapter 62: Bringing It All Together
Lesson Requirements
Hints
Step-by-Step
Appendix A: SSIS Component Crib Notes
When to Use Control Flow Tasks
When to Use Data Flow Transforms
Appendix B: Problem and Solution Crib Notes
Appendix C: What’s on the DVD?
System Requirements
Using the DVD
What’s on the DVD
Troubleshooting
Customer Care
Preface
John Wiley & Sons, Inc. End-User License Agreement
SQL Server Integration Services (SSIS) is one of the most powerful applications in your arsenal for moving data in and out of various databases and files. Like the rest of the business intelligence (BI) suite that comes with SQL Server, SSIS is already included in your SQL Server license when you pay for the Standard, BI, or Enterprise editions of SQL Server. Even though SSIS is included in SQL Server, you don’t even need to have SQL Server installed to make it function. Because of that, even if your environment is not using a lot of SQL Server, you can still use SSIS as a platform for data movement.
Though ultimately this book is more interactive in nature, this introduction first walks you through a high-level tour of SSIS so you have a life preserver on prior to jumping in the pool. Each topic touched on in this introduction is covered in much more depth throughout the book in lesson form and in the supporting videos on the DVD.
If you need to move data quickly from almost any data source to a destination, you can use the SSIS Import and Export Wizard (shown in Figure 1). The wizard is a quick way to move the data and perform very light transformations of data, such as casting of the data into new data types. You can quickly check any table you want to transfer, as well as write a query against the data to retrieve only a selective amount of data.
SQL Server Data Tools (SSDT) is the central tool that you’ll spend most of your time in as an SSIS developer (really as a SQL Server developer). Like the rest of SQL Server, the tool’s foundation is the Visual Studio 2010 interface (shown in Figure 2), and SSDT is installed when you install SQL Server 2012. The nicest thing about the tool is that it’s not bound to any particular SQL Server. In other words, you won’t have to connect to a SQL Server to design an SSIS package. You can design the package disconnected from your SQL Server environment and then deploy it to your target SQL Server or the filesystem on which you’d like it to run.
Although SSIS has been a major extraction, transformation, and loading (ETL) platform for several releases of SQL Server, SQL Server 2012 has simplified the platform for developers and administrators. Because of its scalability and lower cost, SSIS is also a major player in the ETL market. What’s especially nice about SSIS is its price tag, which is free with the purchase of SQL Server. Other ETL tools can cost hundreds of thousands of dollars based on how you scale the software.
The SSIS architecture consists of five main components:
Let’s boil this down to the essentials that you need to know to do your job. The SSIS service (for packages running in legacy mode) and now the SSIS catalog handle the operational aspects of SSIS. The service is a Windows service that is installed when you install the SSIS component of SQL Server 2012, and it tracks the execution of packages (a collection of work items) and helps with the storage of the packages. You don’t need the SSIS service to run SSIS packages, but if the service is stopped, all the SSIS packages that are currently running will, in turn, stop by default.
This service is mainly used for packages stored in the older style of storing packages, the package deployment model. The new model, the project deployment model, uses something called the package catalog. The catalog is the newer way of storing packages that gives you many new options, like running packages with T-SQL. The catalog also stores basic operational information about your package.
The SSIS runtime engine and its complimentary programs actually run your SSIS packages. The engine saves the layout of your packages and manages the logging, debugging, configuration, connections, and transactions. Additionally, it manages handling your events to send you e-mails or log in to a database when an event is raised in your package. The runtime executables provide the following functionality to a package; these are discussed in more detail throughout this book:
A core component of SSIS is the notion of a package. A package best parallels an executable program in Windows. Essentially, a package is a collection of tasks that execute in an orderly fashion. Precedence constraints help manage the order in which the tasks will execute. A package can be saved onto a SQL Server, which in actuality is saved in the msdb or package catalog database. It can also be saved as a .dtsx file, which is an XML structured file much like .rdl files are to Reporting Services. The end result of the package looks like what’s displayed in Figure 2, which was shown earlier.
A task can best be described as an individual unit of work. Tasks provide functionality to your package, much like a method does in a programming language. A task can move a file, load a file into a database, send an e-mail, or write a set of .NET code for you, to name just a few of the things it can do. A small subset of the common tasks available to you comprises the following:
These are only a few of the many tasks you have available to you. You can also write your own task or download a task from the web that does something else. Writing such a task only requires that you learn the SSIS object model and know VB.NET or C#. You can also use the Script Task to do things that the native tasks can’t do.
Once you create a Data Flow Task, the Data Flow tab in SSDT is available to you for design. Just as the Control Flow tab handles the main workflow of the package, the Data Flow tab handles the transformation of data. Every package has a single Control Flow, but can have many Data Flows. Almost anything that manipulates data falls into the Data Flow category. You can see an example of a Data Flow in Figure 3, where data is pulled from an OLE DB Source and transformed before being written to a Flat File Destination. As data moves through each step of the Data Flow, the data changes based on what the transform does. For example, in Figure 3, a new column is derived using the Derived Column Transform and that new column is then available to subsequent transformations or to the destination.
You can add multiple Data Flow Tasks onto the Control Flow tab. You’ll notice that after you click on each one, it jumps to the Data Flow tab with the Data Flow Task name you selected in the drop-down box right under the tab. You can toggle between Data Flow Tasks easily by selecting the next Data Flow Task from that drop-down box.
A source is where you specify the location of your source data to pull from in the data flow. Sources will generally point to a connection manager in SSIS. By pointing them to the connection manager, you can reuse connections throughout your package because you need only change the connection in one place. Here are some of the common sources you’ll be using in SSIS:
Inside the Data Flow, destinations accept the data from the data sources and from the transformations. The flexible architecture can send the data to nearly any OLE DB–compliant data source or to a flat file. Like sources, destinations are managed through the connection manager. Some of the more common destinations in SSIS and available to you are as follows:
Transformations (or transforms) are a key component to the Data Flow that change the data to a format that you’d like. For example, you may want your data to be sorted and aggregated. Two transformations can accomplish this task for you. The nicest thing about transformations in SSIS is they are all done in-memory, and because of this they are extremely efficient. Memory handles data manipulation much faster than disk IO does, and you’ll find if disk paging occurs, your package that ran in 20 minutes will suddenly take hours. Here are some of the more common transforms you’ll use on a regular basis:
The features in SSIS and SQL Server that are available to you vary widely based on what edition of SQL Server you’re using. As you can imagine, the higher-end edition of SQL Server you purchase, the more features are available. As for SSIS, you’ll have to use at least the Standard Edition to receive the bulk of the SSIS features. In the Express and Workgroup Editions, only the Import and Export Wizard is available to you. You’ll have to upgrade to the Enterprise or Developer Editions to see some features in SSIS. The advanced transformations available only with the Enterprise Edition are as follows:
This introduction exposed you to the SQL Server Integration Services (SSIS) architecture and some of the different elements you’ll be dealing with in SSIS. Tasks are individual units of work that are chained together with precedence constraints. Packages are executable programs in SSIS that are a collection of tasks. Finally, transformations are the Data Flow items that change the data to the form you request, such as sorting the data the way you want. Now that the overview is out of the way, it’s time to start the first section and your first set of lessons, and time for you to get your hands on SSIS.