Data integrity: Making sure things make sense
When the number of COVID-19 cases began to rise in March, Congressman Eric Go Yap announced that he had tested positive for the disease. This prompted the solon to issue a public apology, specifically to the people he interacted with at meetings he attended before he received his results. He admitted to feeling angry at himself and upset at the possibility that he might have infected others. The Presidential Security Group was reportedly eyeing possible charges against him because he violated protocols when he still attended a meeting in Malacañang while awaiting his results. However, days after his revelation, the Research Institute for Tropical Medicine (RITM) released a statement clarifying that Cong. Yap actually tested negative for COVID-19. Apparently, the initial report he got featured a clerical oversight resulting from an encoding error.
Not long after that incident, reports about discrepancies in the DOH’s COVID-19 patients’ data filled news sites. The University of the Philippines’s COVID-19 response team identified errors ranging from differences in age and sex to false death reports. They stressed that these lapses may have significant implications on the reliability of the Department of Health’s (DOH’s) appreciation of the disease and the magnitude of its impact.
Both accounts describe a serious problem often encountered by those processing information: lack of data integrity. In these two cases, that entity would be the DOH.
Data integrity refers to the accuracy, consistency, and completeness of data. It can describe the process of ensuring and preserving the reliability and validity of data over its entire life-cycle.
Privacy legislations, including the country’s Data Privacy Act of 2012 (DPA), provide for the protection of the integrity of personal data. The DPA, in particular, requires organizations handling personal data to keep them accurate, relevant, and up-to-date in order to maintain their quality. And as per the National Privacy Commission’s Circular No. 16-04, a security incident involving the alteration of personal data is considered an integrity breach.
Despite its significance, data integrity is often overlooked. Its vital role is often highlighted only when there is a crisis when it is critical to have timely and accurate information—just like this current pandemic. Policymakers should have access to correct data since decisions for proper responses such as social distancing measures, the use of telemedicine, implementation of contact tracing, and vaccine trials all hinge on what the current and available data are saying. Consequently, gaps or erroneous information can have a substantial impact on the effectiveness of actions meant to address the pandemic.
That said, compromised or inaccurate data can also be damaging to organizations since they can lead to missed opportunities, erroneous decisions, and ineffective solutions. These will ultimately tarnish a company’s image and cause it to lose the trust of its partners, clients, and other stakeholders.
Given the volume of data being processed today, maintaining data integrity is certainly not an easy task. The factors that threaten it the most are becoming more pervasive and complex by the day. They include:
Human Error. Data users often gather and encode incorrect information. They can also delete entries by mistake, or input duplicate data.
Data transfer error. Because data is often transferred from one place to another, the possibility of failure or errors in the actual transfer is high. Unsuccessful data transfer may result in loss, duplication, or alteration of data.
Compromised hardware. Device malfunctions, disk crashes, hardware failure, or natural disasters that can compromise hardware are inevitable. They often result in unintended data loss or alteration.
Cyber threats. Organizations often encounter attacks from hackers, either directly or via spyware, malware, bugs, or viruses. Any one of these could lead to the malicious modification of data.
While there is no foolproof solution to eliminate these threats, organizations can put up appropriate rules, processes, and standards in order to minimize risks and preserve data integrity. In the long term, it may be worthwhile for them to consider transitioning from a reactive to a more proactive approach by implementing the following measures:
Training for data users. It is important to regularly provide data users with proper training on data quality maintenance. Protocols on how to process data correctly without compromising them is crucial. Incentives and/or sanctions may also be included in company policies.
Validation process. Data validation guarantees the consistency of data through its complete lifecycle. All data input should be validated, regardless of its source.
Audit trails. Maintaining an audit trail makes it easy for an organization to determine what changes have been made, when and where they were made, and who made them. This way, if the source of a problem is attributed to changes made to the data, it can easily be traced back to the problematic modification and the latter can be reversed.
Regular clean-up. A regular clean-up schedule for a database detects, eliminates, and corrects errors and inconsistencies. This ensures that only reliable data will be processed should there be a need for reconstruction or restoration.
Backup and recovery procedures. Regular data back-ups are necessary to prevent permanent data loss. Strict compliance to a back-up and recovery strategy not only secures data but also ensures reliable data can be restored.
Access restrictions. By limiting access to certain users and identifying user permissions, unauthorized changes to data can be minimized, if not prevented.
Ensuring data integrity using these conventional methods can be strenuous, costly, and time-consuming. However, if they are properly adopted and maintained, the time, money and other resources saved—not to mention the harm that will be averted—will be more than enough to offset those potential costs. On top of that, data integrity will add value to any organization by allowing it to gain the trust and goodwill of its constituents.
During public health emergencies like the current COVID-19 pandemic, maintaining high data management standards and taking a proactive approach to protecting data integrity are essential. As we continue to face many uncertainties about this highly infectious disease, one important tool we have on our arsenal is accurate data—information we can work with in order to allow us to make better decisions and find timely solutions to a worsening public health crisis.
This article first appeared on GMA News Online on July 27, 2020 8:12 am.