The place where all your data comes together

The Staging Area is a “landing zone” for data flowing into your data warehouse environment. With the SEPIDATA Staging Area module we provide you with the first fundamental layer of your new datawarehouse solution. You can choose if data is kept until it is successfully loaded into the data warehouse or if you want to keep all RAW data and make the staging area persistent (PSA).

  • Our process:
  • Source discovery
  • Profile the data
  • Configure the staging area
  • Build, run and test it locally
  • Deploy

Flat file discovery wizard

Allows you to inspect and analyze multiple files at once. We have built an intelligent algorithm that can detect the columns and data types automatically and gives you the ability to profile the content of the file. Working with (very) large files is fully supported.

Database discovery wizard

After setting up the connection, the database discovery wizard allows you to select the tables, views and stored procedures that you would like to import. Within the wizard you can browse and data profile the content of these objects.

Automatic archiving

Data archiving (PSA) is out-of-the-box functionality. You do not have to code for this, just configure the settings. You can choose to version or snapshot the raw data. When versioning you can configure the type of changes that need to be captured (new, update, delete, revival).

Automatic upgrade

Operators do not have to execute scripts on your production database. Automatic upgrade of your staging area is performed based on upgrade instructions that are part of deployment package of every new version of your solution.

Operations & monitoring

Data retrieval services for operational insights via modern REST API’s gives you the ability to integrate operational metrics within the standard tooling of your environment.

State of the art deployment

State of the art way of deployment, we deliver  REST API’s combined with a PowerShell module  and sample scripts for optimal automation of all the operational tasks.

Remove the burden of repetitive work of developing a staging area

Purpose of the staging area

The Staging Area is a “landing zone” for data flowing into your data warehouse environment. It is the first fundamental layer of your datawarehouse solution.

A staging area is mainly required in a Data Warehousing Architecture for timing reasons. All required data must be available before data can be integrated into the Data Warehouse. Due to varying reasons it is not feasible to extract all the data from all operational systems at exactly the same time. 

Depending on the requirements / regulations that apply at your company, it is a choice to keep data in the staging area until it is successfully loaded into the data warehouse and then cleaned, or if the data in the staging area is persistent where data is stored and archived for a long period (PSA).

For example, it might be reasonable to extract CRM data on a daily basis, however, daily extracts might not be suitable for financial data that requires a month-end reconciliation process. Similarly, it might be feasible to extract “customer” data from a database in Singapore at noon eastern standard time (outside of local office hours), but this could not be feasible for “customer” data in a Chicago database at the same time (during local office hours).

Application architecture

The SEPIDATA architecture contains three important components. 

First the Developer Studio integrated into Visual Studio that is the user interface for the developers where they can manage the complete life cycle of the datawarehouse.

Second is the SEPIDATA application server that is the background service / engine on which all processing is performed. The application server also exposes an API so operators can execute tasks on the appplication server and where runtime and data model information can be retrieved. The application server handles the deploy process on the database instance, it creates structures, upgrades them when there are new versions of the DWH models etc.

Third is the target SQL Server / Azure SQL database instance where the datawarehouse will be deployed. The database platform is configured during installation of the application server and contains the meta data needed for all the datawarehouse processing, stores the models, the ETL logic and documentation that has been created by the development team.

Logical staging area layers

With the SEPIDATA staging area, you start with defining the structure and content / data profiling of the source system. This is done with the Source Model project in Visual Studio. Where we have wizards to walk you trough the process of generating the models. Good to mention is that source model projects can be re-used by different staging area’s.

The Data Interface and Staging Area are combined in the staging area project type. The Data Interface contains the mapping between objects in the source that need to be transferred to the staging area. You can choose to get all data in or only add a selection of specific tables and/or attributes.

The Staging Area Project defines ”what” and ”how” data must be loaded into the staging area. Think about things like storing history or not (on object level), what type of mutations need to be captured (only new or also delete, update and or revived) and if it is a delta load or a full load delivery. 

Last but not least you can add functional and technical documentation to the data models to capture all relevant information about the source system and the type of interface you just created.

Discover & import databases

Within Visual Studio we added a wizard that allows you to quickly add a new database based source and get the data into your datawarehouse. At the end of the wizard new projects are added to the solution and you can directly start deploying it to your development SQL server instance and run the staging area. Within the wizard you can:

  • Explore tables, views and procedures.
  • Select the objects that you want to bring into your data warehouse.
  • Retrieve the DDL of the selected object.
  • Preview data in the tables and views and use the pivot table functionality in the grid to filter and group data in the preview window.
  • See how many rows they contain (tables only).

Discover & import flat files

The flat file discovery wizard is a powerful wizard that allows you to discover the structure and contents of flat files. We support importing flat files where the columns are separated or fixed width. If your files have a header row we can extract the names for you.

The wizard contains a data type scanner, which is able to detect most data types and formats and select the most efficient MSSQL datatype that can store the data. This is a unique feature that most vendors do not provide. It is that intelligent that it keeps track of data type decisions when the data type scanner needs to switch from data type for a certain column because for example it found a character value while all values before were of a numerical type. This allows you to detect issues and challenges with the flat files fast!

Analyzing and viewing (very) large files is fully supported!

Detect data flaws early

Datawarehouse projects can be full of surprises, when developing layers beyond the staging area (for example a datamart). Data profiling will help you understand the data that will land into the staging area, it supports you during the design process of analytical data models and data transformation rules that need to be applied. We think it is very important to capture metrics on the quality and specifics of the data immediately when you add a new source to the datawarehouse solution.
 
Within the SEPIDATA Developer Studio you can run a data profile on your source data. We will store the profile results within your source model project. When you deploy to the SEPIDATA Application Server this information will also be available within our REST-FULL service so you can build a dashboard or analytics on top the profile metrics. If you refresh the data profiles every time the source model is released, you can spot trends by comparing metrics of different versions.
 
The insights you gain early from profiling the data of the source is important, because any problems that are the result of non-standardized data can be costly to correct at a later stage in the development process.

Out-of-the-box data archiving

SEPIDATA allows you to store all historical raw data. This is often called a persistent or historical staging area (PSA). The SEPIDATA software will manage the full life-cycle of your historical data including changes at the source that force you to change the staging area.
 
We have implemented a row versioning technique that captures all changed rows by comparing current and previous data deliveries and store only the changed rows.
 
You can configure what type of delta’s have to be checked (new, updates, deletes or a combination) and the system will automatically start building history for you.

Analyze the output of your work

Within the Developer Studio you can directly deploy and run the datawarehouse solution on your local developer machine. This means you can develop without any dependencies and without interfering with other developers.
 
After you walked through the discovery wizards and generated your first projects within your solution you start deploying the solution to your local SQL server instance (or a shared instance if this is preferred). When this is done you can start loading data into the staging area.
 
After loading and testing is done by the developer a release package can be created and deployed to the application server via the REST API. A PowerShell module and sample PowerShell scripts are provided, but you can use any scripting language or automation tool you like. Then your datawarehouse solution can start loading automatically (on a schedule or when the data is delivered).