In the space of Data Quality, Talend brings rich features and components out of the box. A simple example of this feature set are Validation Rules.
Validation Rules act as a filter to ensure that valid data is being processed as part of a job. These rules can be simple value checks or check for referential integrity and can be applied to any metadata items including database tables.
It is always best practice to apply validation checks using Validation Rules in the Metadata repository for reusability and collaboration. This practice ensures that it is visible to others and can be easily re-used.
In the job below, all customer data is being looked up against states data. Before processing the customer data, Talend can perform a simple value Validation Rule.
The following example will show how to filter invalid gender types in a customer data set.
Talend features a dedicated wizard which allows for Validation Rules to be configured. These rules can be configured at different levels: Validation Rule Menu, Item Level Metadata, or Column Level Metadata.
Validation Rule Repository
Item Level Metadata
Column Level Metadata
This example will follow creating a Validation Rule at the item level.
The wizard contains five steps to create a Validation Rule. The first step is to name the Validation Rule, give it a purpose and description:
Name, Purpose, Description
This step requires the selection of the items which will have a validation:
Trigger & Rule Settings
Since this metadata is for a file, the trigger will not be available for updates or deletes. If this was a database, those triggers would be available. This example will perform a Custom Check:
For a Custom Check rule, a custom expression can be defined using the Expression Editor. For this dataset, a valid record should only equal F or M:
The last step is to choose what action to take on the data if the rule defined in the previous step fails:
The Disallow the operation option will prevent data that fails the rule check from being output. When Make rejected data available on REJECT link in job design is checked, a new row link will become available on the component for rejects:
Once the Validation Rule has been created, it will appear under the Validation Rules repository menu:
If the rule was created at the item or column level, it will appear under a Validation Rules folder:
The final step to use the Validation Rule is to drag it on the component, or choose the Validation Rule from the component parameters:
Finally, if the Validation Rule has been configured to make rejected data available, that output link can be easily configured as part of the job: