The place where all your data comes together
The Staging Area is a “landing zone” for data flowing into your data warehouse environment. With the SEPIDATA Staging Area module we provide you with the first fundamental layer of your new datawarehouse solution. You can choose if data is kept until it is successfully loaded into the data warehouse or if you want to keep all RAW data and make the staging area persistent (PSA).
Flat file discovery wizard
Allows you to inspect and analyze multiple files at once. We have built an intelligent algorithm that can detect the columns and data types automatically and gives you the ability to profile the content of the file. Working with (very) large files is fully supported.
Database discovery wizard
After setting up the connection, the database discovery wizard allows you to select the tables, views and stored procedures that you would like to import. Within the wizard you can browse and data profile the content of these objects.
Data archiving (PSA) is out-of-the-box functionality. You do not have to code for this, just configure the settings. You can choose to version or snapshot the raw data. When versioning you can configure the type of changes that need to be captured (new, update, delete, revival).
Operators do not have to execute scripts on your production database. Automatic upgrade of your staging area is performed based on upgrade instructions that are part of deployment package of every new version of your solution.
Operations & monitoring
Data retrieval services for operational insights via modern REST API’s gives you the ability to integrate operational metrics within the standard tooling of your environment.
State of the art deployment
State of the art way of deployment, we deliver REST API’s combined with a PowerShell module and sample scripts for optimal automation of all the operational tasks.
Remove the burden of repetitive work of developing a staging area
Purpose of the staging area
The Staging Area is a “landing zone” for data flowing into your data warehouse environment. It is the first fundamental layer of your datawarehouse solution.
A staging area is mainly required in a Data Warehousing Architecture for timing reasons. All required data must be available before data can be integrated into the Data Warehouse. Due to varying reasons it is not feasible to extract all the data from all operational systems at exactly the same time.
Depending on the requirements / regulations that apply at your company, it is a choice to keep data in the staging area until it is successfully loaded into the data warehouse and then cleaned, or if the data in the staging area is persistent where data is stored and archived for a long period (PSA).
For example, it might be reasonable to extract CRM data on a daily basis, however, daily extracts might not be suitable for financial data that requires a month-end reconciliation process. Similarly, it might be feasible to extract “customer” data from a database in Singapore at noon eastern standard time (outside of local office hours), but this could not be feasible for “customer” data in a Chicago database at the same time (during local office hours).
The SEPIDATA architecture contains three important components.
First the Developer Studio integrated into Visual Studio that is the user interface for the developers where they can manage the complete life cycle of the datawarehouse.
Second is the SEPIDATA application server that is the background service / engine on which all processing is performed. The application server also exposes an API so operators can execute tasks on the appplication server and where runtime and data model information can be retrieved. The application server handles the deploy process on the database instance, it creates structures, upgrades them when there are new versions of the DWH models etc.
Third is the target SQL Server / Azure SQL database instance where the datawarehouse will be deployed. The database platform is configured during installation of the application server and contains the meta data needed for all the datawarehouse processing, stores the models, the ETL logic and documentation that has been created by the development team.
Logical staging area layers
With the SEPIDATA staging area, you start with defining the structure and content / data profiling of the source system. This is done with the Source Model project in Visual Studio. Where we have wizards to walk you trough the process of generating the models. Good to mention is that source model projects can be re-used by different staging area’s.
The Data Interface and Staging Area are combined in the staging area project type. The Data Interface contains the mapping between objects in the source that need to be transferred to the staging area. You can choose to get all data in or only add a selection of specific tables and/or attributes.
The Staging Area Project defines ”what” and ”how” data must be loaded into the staging area. Think about things like storing history or not (on object level), what type of mutations need to be captured (only new or also delete, update and or revived) and if it is a delta load or a full load delivery.
Last but not least you can add functional and technical documentation to the data models to capture all relevant information about the source system and the type of interface you just created.
Discover & import databases
Within Visual Studio we added a wizard that allows you to quickly add a new database based source and get the data into your datawarehouse. At the end of the wizard new projects are added to the solution and you can directly start deploying it to your development SQL server instance and run the staging area. Within the wizard you can:
- Explore tables, views and procedures.
- Select the objects that you want to bring into your data warehouse.
- Retrieve the DDL of the selected object.
- Preview data in the tables and views and use the pivot table functionality in the grid to filter and group data in the preview window.
- See how many rows they contain (tables only).
Discover & import flat files
The flat file discovery wizard is a powerful wizard that allows you to discover the structure and contents of flat files. We support importing flat files where the columns are separated or fixed width. If your files have a header row we can extract the names for you.
The wizard contains a data type scanner, which is able to detect most data types and formats and select the most efficient MSSQL datatype that can store the data. This is a unique feature that most vendors do not provide. It is that intelligent that it keeps track of data type decisions when the data type scanner needs to switch from data type for a certain column because for example it found a character value while all values before were of a numerical type. This allows you to detect issues and challenges with the flat files fast!
Analyzing and viewing (very) large files is fully supported!