Why Your Organization Should Be Concerned About Data Integrity
October 7, 2019
Why Data Integrity Is Crucial to Your Organization
Data is currently being generated and collected at unprecedented rates. According to IBM, 90% of world’s data was collected in the last two years. Every business activity now leaves a digital trace. Due to these rapid technological advances, data is now widely recognized as the new commodity in the global economy.
In this article, I will discuss the following:
- Definitions: Data quality, data integrity and data security
- Why is data integrity important?
- Threats to data integrity
- Data integrity management
- How to ensure data integrity and data security
- Limitations of traditional data quality management systems
- Why your organization needs to integrate its systems and prioritize data integrity
Organizations rely on high quality data for business development and decision-making. They begin by deciding how they want to use the data. Then, they identify and define the exact data characteristics required to achieve their desired insights. Data quality is determined by how well the data fits its intended purpose within a specific context.
High quality data helps businesses increase profits and operational efficiency, improve product and service offerings, increase customer satisfaction, and even develop algorithms for machine learning and artificial intelligence.
To achieve these outcomes, raw data must undergo different processes to transform it into practical formats for real-world usage.
Data Integrity and Data Security
For data to be shared and utilized productively, it might go through the following stages:
If data remains intact and does not undergo unintended alterations across these stages, it has data integrity.
According to Wikipedia, “Data integrity is the maintenance of, and the assurance of the accuracy and consistency of, data over its entire life-cycle, and is a critical aspect to the design, implementation, and usage of any system which stores, processes, or retrieves data.”
The term “data integrity” can refer to a state, process, or function.
- As a state or condition, it is a measure of validity and accuracy of data
- Physical integrity – correctly storing and fetching the data
- Logical Integrity – correctness or rationality of the data in a particular context
- As a process, it includes measures like data validation to ensure that data remains unaltered in transit from input to retrieval.
- As a function, it is related to security, and maintains information exactly as its input and can be audited to affirm reliability.
Data security is the act of protecting data against unauthorized access or corruption. Data security enables data integrity, and both are essential for achieving high data quality.
The Importance of Data Integrity
When data is corrupted between modifications, its value and usability is affected.
- Application codes that have been unintentionally altered would hinder application function.
- If system logs are inaccurate and inconsistent, it would be impossible to detect intrusions and system changes.
- A system integration cannot succeed if data is incorrect, inaccurate or incomplete.
When data integrity is not maintained, time is wasted fixing mistakes, the company’s reputation and brand are put at risk, and relationships with partners and stakeholders are harmed. Compromised data could create fatal consequences for an organization.
Information with data integrity is:
- Retrievable and searchable – there is proper access to accurate data in the right location at the right time
- Traceable – there are accurate records of all customer touchpoints
- Reliable – there are consistent metrics against the organization’s goals
- Valid and accurate, thus increasing stability and performance
Organizations should therefore strive to ensure 100% data integrity.
Data integrity is protected by international data legislations such as the EU’s GDPR and the US’s Federal Trade Commission Act, Health Insurance Portability and Accountability Act, and Electronic Communications Privacy Act.
For instance, Article 5(1)(d) of the GDPR directly relates to data accuracy and integrity – “Personal data shall be accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay”.
Meanwhile, Article 5(1)(f) addresses integrity and security “Personal data shall be processed in a manner that ensures appropriate security of the personal data, including protection against unauthorized or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organizational measures.”
Examples of industries that have acknowledged the importance of data integrity include pharmaceutical manufacturers in the US and other parts of the world, consumer healthcare companies , and cloud-based platforms.
Data Integrity – Threats and Challenges
Data integrity is compromised when any of the following occurs:
1. Transfer errors
This is when data is compromised during transfer between different computing systems. This could mean that there were unintended changes when replicating information or transferring message formats.
2. Human errors
This includes programming errors, design flaws, and inaccuracies when entering information in specific data fields. Multi-user environments such as databases are more prone to human errors as different people in different departments can access and input information.
3. Security errors
This includes security mis-configurations, software bugs, viruses and malware and cyber threats like malicious breaches or hacks. It is impossible to achieve data integrity without data security.
4. Hardware failure
This compromises the physical integrity of data and includes mechanical flaws, device or disk crashes, corrosion, power outages and environmental hazards.
Data Integrity Management
Data integrity management is an ongoing process that requires the establishment of clear guidelines strictly implemented throughout the organization. It involves these four steps:
1. Define and identify data quality objectives
What kind of results are you looking to achieve? Identify key attributes and specifications required to achieve these results.
2. Develop a data integrity policy and process
A data integrity policy outlines clear protocols on how to handle data and how to ensure data quality and reliability. It includes data validation processes and might also include a data security policy. There should also be periodic reevaluations to ensure the validity of the processes.
How to Ensure Data Integrity
1. Data backup
Back up and archive data (including metadata) regularly on a pre-set schedule in a secure location to prevent permanent data loss. Ensure your archives are validated, secured and maintained throughout the data life cycle. During internal audits, verify that all data is retrievable.
2. Data Duplication
Duplicate files could skew data resulting in inaccurate records. Sensitive data from within a secure database could be replicated in a shared folder and become viewable to those without access privilege. It is therefore important to ensure that there is no duplicate information as this could result in potential harm.
3. Input validation
Whether your data source is known or unknown (e.g., an end-user, another application, or any other sources), it is crucial to verify and validate data to prevent incorrect data entry.
4. Data validation
Validate data by identifying errors in data transmission and certifying that data processes are uncorrupted. Your data integrity management system should ensure that your database is designed and authenticated through ongoing error checking and data validation. Read more about data validation methods here.
A different but related step is data cleansing. This is a data quality enhancement method that detects errors and corrects or removes them from the data set, in order to ensure consistency between batches of data. For instance, standardizing all data into a selected format like UNSPSC. There are also error-detecting algorithms that can detect and correct common kinds of internal data corruption – e.g., the Damm algorithm or Luhn algorithm are used to address human errors caused by manual transcription from one system to another; hash functions detect computer-induced transcription errors.
How to Ensure Data Security
1. Data loss prevention
Ensure your organization has a disaster recovery plan which considers how quickly functions can be restored. Data can be protected during network downtime or power outages with solutions like redundant hardware, uninterruptible power supply, alternate power sources and battery–powered data recorders.
2. Access controls
This refers to who can access specific data and includes the assignment of read and write privileges. A least privilege model means that only users who need access to the data get access. This is the best way to prevent malicious or accidental corruption of data.
Be careful not to overlook physical access to the server or database. Your most sensitive servers should be almost impossible to access, securely isolated, and bolted to the ground. Only the highest privileged user (usually the sysadmin) should have the access key.
3. Data encryption
When data is locked by cipher, files cannot be accessed without the decryption key. This protects information during data leaks (e.g., a hacker cannot do anything with your files until they are decrypted) but cannot protect against compromised user accounts (e.g., if the sysadmin account is hacked, the attacker can use it to access the decrypted files).
4. Audit trails
An audit trail is an inerasable record of all data in a system, which provides breadcrumbs to accurately track down the source of an issue. An audit trail is automatically generated and impossible for users to modify. Every action or event (created, deleted, read, modified) is tracked, recorded, and time-stamped. For instance, in law firms, an audit trail tracks the user who modified or deleted a certain document, when it was done, and what the modifications were.
5. Enforce business rules and define integrity constraints
In a database system, data integrity is enforced by a series of integrity constraints or rules. In database integrity management, business rules specify conditions and relationships that must always be true or always be false. As a relational data model, the types of integrity constraints include: entity integrity, referential integrity, domain integrity, and user-defined integrity.
Wikipedia defines these as such:
- Entity integrity is an integrity rule which states that every table must have a primary key and that the column or columns chosen to be the primary key should be unique and not null.
- Referential integrity is an integrity rule that states that any foreign-key value can only be in one of two states – i.e., either there is no relationship between the objects represented in the database or that this relationship is unknown.
- Domain integrity specifies that all columns in a relational database must be declared upon a defined domain – i.e., a set of values of the same type. Domains are therefore pools of values from which actual values appearing in the columns of a table are drawn.
- User-defined integrity refers to a set of rules specified by a user, which do not belong to the entity, domain and referential integrity categories.
6. Measure and improve data quality management
Data quality management is not merely about quality control but also about quality assurance. Data quality control focuses on the quality of data (as discussed above). Data quality assurance focuses on the quality of the process itself. This is about monitoring and improving processes within the system infrastructure and is sometimes associated with a broader quality system like the ISO9000.
Data Integrity is Crucial to System Integration Management
System integration is the process of combining different IT systems and applications to act as a coordinated whole. It enables operations to become more agile and innovative, increasing value to both the customer and the organization by improving product quality and performance, increasing customer satisfaction, reducing operational costs and improving response time.
Most legacy systems are unable to cope with the sheer volume and type of data coming from disparate sources in today’s system architecture. Furthermore, data types and sources are becoming increasingly complex and unstructured (e.g., voice recognition, chatbots, natural language processing tools). Traditional data quality management systems are not sophisticated enough to handle such data.
For instance, built-in functions in software and applications only handle data within that application. Traditional software that operates across multiple applications and databases are expensive and generally only accessible to large organizations. They might not be capable of data validation and enrichment. They also tend to be reactive and only fix issues after they happen and do not prevent future occurrences.
Organizations need to undergo digital transformation or be left behind. Existing systems need to integrate with new ones to ensure operational efficiency. Legacy systems and manual processes need to make way for cloud-based platforms and automation.
As it could be incredibly costly and time-consuming to replace all your existing IT systems, you might want to choose a solution that offers both system and data integration. Using a cloud-based platform makes it easy to migrate if your systems and applications change in the future. It is essential to ensure that your service provider is dedicated to improving data quality and keeps their systems updated to comply with data protection regulations.
In this article, I’ve explained why data quality and integrity are crucial to facilitate reliable decision-making in organizations. Your data quality strategy should be an essential part of your system integration and digital transformation strategy.
Is your organization considering a system integration? Pricefx integrates your legacy systems and new cloud-based systems and data sources, while maintaining the highest data integrity. Pricefx can be implemented with minimal downtime and lets your organization make a smooth transition to digital pricing management. Sign up for a demo and kick start your digital transformation today.