How to Achieve Data Quality in the Cloud

2022-08-28 02:43:05
关注

You've finally moved to the Cloud. Congratulations! But now that your data is in the Cloud, can you trust it? With more and more applications moving to the cloud, the quality of information is becoming a growing concern. Erroneous data can cause many business problems, including decreased efficiency, lost revenue and even compliance issues. This blog post will discuss the causes of poor data quality and what companies can do to improve it.

Ensuring data quality has always been a challenge for most enterprises. This problem increases when dealing with data in the cloud or sharing data with different external organizations because of technical and architectural challenges. Cloud data sharing has become increasingly popular recently as businesses seek to utilize the cloud's scalability and cost-effectiveness. However, the return on investment from these data analytics projects can be questionable without a strategy to ensure data quality.

Related: Why Bad Data Could Cost Entrepreneurs Millions

What contributes to data quality issues in the Cloud?

Four primary factors contribute to data quality issues in the cloud:

  • When you migrate your system to the cloud, the legacy data may not be good quality. As a result, insufficient data gets carried forward into a new system.
  • Data may become corrupted during migration, or cloud systems may not be configured correctly. For example, a Fortune 500 company restricted its cloud data warehouses to store numbers up to eight decimal points. This challenge caused truncation errors during migration resulting in a $50 million reporting issue.
  • Data quality can be a problem when data from different sources must be combined. For example, two different departments of a pharmaceutical company use different units (number versus packs) to store inventory information. When this information was incorporated into the cloud data warehouse, it became a nightmare to report and analyze the data because of the inconsistencies in the unit.
  • Data from External Data vendors can have questionable quality.

Related: Your Data Might Be Safe in the Cloud But What Happens When It Leaves the Cloud?

Why is validating data quality in the cloud difficult?

Everybody knows data quality is essential. Most companies spend significant money and resources trying to improve data quality. However, despite these investments, companies lose money yearly because of insufficient data, ranging from $9.7 million to $14.2 million annually.

Traditional data quality programs do not work well for identifying data errors in cloud environments because:

  • Most organizations only look at the data risks they know, which is likely only the tip of an iceberg. Usually, data quality programs focus on completeness, integrity, duplicates and range checks. However, these checks only represent 30 to 40 percent of all data risks. Many data quality teams do not check for data drift, anomalies or inconsistencies across sources, contributing to over 50 percent of data risks.
  • The number of data sources, processes and applications has exploded because of the rapid adoption of cloud technology, big data applications and analytics. These data assets and processes require careful data quality control to prevent errors in downstream processes.
  • The data engineering team can add hundreds of new data assets to the system in a short period. However, the data quality team usually takes about one to two weeks to check for each new data asset. This means that the data quality team has to prioritize which assets need reviews first, and as a result, many assets don't get checked.
  • Organizational bureaucracy and red tape can often slow down data quality programs. Data is a corporate asset, so any change requires multiple approvals from different stakeholders. This can mean that data quality teams must go through a lengthy process of change requests, impact analysis, testing and signoffs before implementing a data quality rule. This process can take weeks or even months when the data may have significantly changed.

What can you do to improve the quality of cloud data?

It is essential to use a strategy that considers these factors to ensure data quality in the Cloud. Below are some tips for achieving data quality in the cloud:

  • Check the quality of your legacy and third-party data. Fix any errors you find before migrating to the cloud. These quality checks will increase the cost and time it takes to complete the project but having a thriving data environment in the cloud will be worth it.
  • Reconcile the cloud data with the legacy data to ensure data was not lost or changed during the migration.
  • Establish governance and control over your cloud data and process. Monitor data quality on an ongoing basis and establish corrective actions when errors are found. This will help prevent issues from getting out of hand and becoming too costly to fix.

In addition to the traditional data quality process, data quality teams must analyze and establish predictive data checks, including data drift, anomaly, data inconsistency across sources, etc. One way to achieve this is by using machine learning techniques to identify hard-to-detect data errors and augment current data quality practices. Another strategy is to adopt a more agile approach to data quality and align with the Data Operations teams to accelerate the deployment of data quality checks in the cloud.

Migrating to the cloud is complex, and data quality should be top of mind to ensure a successful transition. Adopting a strategy for achieving data quality in the cloud is essential for any business that relies on data. By considering the factors contributing to data quality issues and putting processes and tools in place, you can ensure that the highest-quality data and your cloud data projects will have a greater chance of success.

Related: Streamline Your Data Management, Web Services, Cloud, and More by Learning Amazon Web Services

参考译文
如何在云计算中实现数据质量
你终于把业务迁移到了云端,恭喜!不过,既然你的数据现在在云端,你真的能信任它吗?随着越来越多的应用程序迁移到云端,信息的质量正在成为一个日益严重的问题。错误的数据会造成许多商业问题,包括效率下降、收入损失,甚至合规性问题。本文将探讨造成数据质量不佳的原因,以及企业可以采取哪些措施来改善数据质量。 确保数据质量一直是大多数企业的挑战,当处理云端数据或者与不同外部组织共享数据时,由于技术和架构上的复杂性,这一问题会更加突出。最近,随着企业试图利用云端的可扩展性和成本效益,云端数据共享变得越来越普遍。然而,如果没有确保数据质量的策略,这些数据分析项目可能很难获得投资回报。 相关内容:为何劣质数据可能让创业者损失数百万 **导致云端数据质量问题的四大原因** 1. 当你将系统迁移到云端时,旧系统中的数据质量可能并不高。因此,质量不足的数据会被带入新系统中。 2. 数据在迁移过程中可能受到损坏,或者云端系统配置不正确。例如,一家《财富》500强公司限制其云端数据仓库只能存储最多八位小数。这一挑战导致了迁移过程中的截断错误,最终造成了5000万美元的报告问题。 3. 当需要将来自不同来源的数据进行整合时,数据质量问题可能会出现。例如,一家制药公司的两个不同部门使用不同的单位(数量与包装)来存储库存信息。当这些信息被整合到云端数据仓库中时,由于单位不一致,数据分析和报告变得一团糟。 4. 来自外部数据供应商的数据可能质量存疑。 相关内容:你的数据在云端可能是安全的,但离开云端后又会发生什么? **为什么在云端验证数据质量如此困难?** 大家都知道数据质量至关重要。大多数公司都投入了大量资金和资源来提升数据质量。然而,尽管有这些投入,由于数据质量不足,企业每年仍会损失大量资金,从每年970万美元到1420万美元不等。 传统的数据质量方案在识别云端环境中的数据错误方面效果不佳,原因如下: - 大多数组织只关注他们已知的数据风险,而这通常只是冰山一角。传统上,数据质量方案主要关注完整性、一致性、重复性以及范围检查。然而,这些检查只涵盖了所有数据风险的30%到40%。许多数据质量团队没有检查数据漂移(data drift)、异常(anomaly)以及不同数据源之间的不一致,这些却是造成超过50%数据风险的原因。 - 由于云技术、大数据应用和分析的快速发展,数据源、数据处理和应用程序的数量呈指数级增长。这些数据资产和流程需要严格的数据质量控制,以防止下游流程出错。 - 数据工程团队可以在短时间内向系统中添加数百个新的数据资产。然而,数据质量团队通常需要一周到两周的时间来检查每个新的数据资产。这意味着数据质量团队必须优先确定哪些资产需要先进行审查,因此许多资产并未被检查。 - 组织内部的官僚作风和繁文缛节往往会拖慢数据质量方案的实施。数据是企业的资产,因此任何变更都需要多个利益相关方的批准。这可能意味着数据质量团队在实施数据质量规则之前,必须经历一个漫长的变更请求、影响分析、测试和签核的过程。当数据已经发生显著变化时,这一过程可能需要数周甚至数月。 **你可以做些什么来提升云端数据的质量?** 为了确保云端数据的质量,必须采用一个综合考虑上述因素的策略。以下是一些在云端实现数据质量的建议: 1. 检查旧数据和第三方数据的质量。在迁移到云端之前,修复你发现的任何错误。虽然这些质量检查会增加项目的成本和时间,但拥有一个高效的数据环境在云端是值得的。 2. 对云端数据和旧数据进行核对,确保在迁移过程中数据没有丢失或被修改。 3. 建立对云端数据和流程的治理和控制。持续监控数据质量,并在发现错误时采取纠正措施。这将有助于防止问题扩大化,并避免修复成本过高。 4. 除了传统的数据质量流程,数据质量团队还必须分析并建立预测性数据检查机制,包括数据漂移、异常、跨来源的数据不一致等。一种实现方法是使用机器学习技术来识别难以察觉的数据错误,并增强现有的数据质量实践。另一个策略是采用更加敏捷的数据质量方法,并与数据运营团队协作,以加快云端数据质量检查的部署。 迁移到云端是一项复杂的任务,数据质量应始终是首要考虑因素,以确保顺利过渡。对于依赖数据的企业而言,采用确保云端数据质量的策略至关重要。通过考虑导致数据质量问题的因素,并建立相应的流程和工具,你可以确保数据的质量最高,从而提高云端数据项目的成功率。 相关内容:通过学习亚马逊网络服务(AWS),简化你的数据管理、网络服务和云端等业务
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告

entrepreneur

这家伙很懒,什么描述也没留下

关注

点击进入下一篇

常见磁传感器及原理和应用

提取码
复制提取码
点击跳转至百度网盘