In connection with the processing capacity issues, designing a big data architecture is a common challenge for users. Big data systems must be tailored to an organization's particular needs, a DIY undertaking that requires IT and data management teams to piece together a customized set of technologies and tools. Deploying and managing big data systems also require new skills compared to the ones that database administrators and developers focused on relational software typically possess.
Both of those issues can be eased by using a managed cloud service, but IT managers need to keep a close eye on cloud usage to make sure costs don't get out of hand. Also, migrating on-premises data sets and processing workloads to the cloud is often a complex process.
Other challenges in managing big data systems include making the data accessible to data scientists and analysts, especially in distributed environments that include a mix of different platforms and data stores. To help analysts find relevant data, data management and
analytics teams are increasingly building data catalogs that incorporate metadata management and data lineage functions. The process of integrating sets of big data is often also complicated, particularly when data variety and velocity are factors.
Keys to an effective big data strategy
In an organization, developing a big data strategy requires an understanding of business goals and the data that's currently available to use, plus an assessment of the need for additional data to help meet the objectives. The next steps to take include the following:
prioritizing planned use cases and applications;
identifying new systems and tools that are needed;
creating a deployment roadmap; and
evaluating internal skills to see if retraining or hiring are required.
To ensure that sets of big data are clean, consistent and used properly, a data governance program and associated data quality management processes also must be priorities. Other best practices for managing and analyzing
big data include focusing on business needs for information over the available technologies and using data visualization to aid in data discovery and analysis.
Big data collection practices and regulations
As the collection and use of big data have increased, so has the potential for data misuse. A public outcry about data breaches and other personal privacy violations led the European Union to approve the General Data Protection Regulation (GDPR), a data privacy law that took effect in May 2018. GDPR limits the types of data that organizations can collect and requires opt-in consent from individuals or compliance with other specified reasons for collecting personal data. It also includes a right-to-be-forgotten provision, which lets EU residents ask companies to delete their data.
While there aren't similar federal laws in the U.S., the California Consumer Privacy Act (CCPA) aims to give California residents more control over the collection and use of their personal information by companies that do business in the state. CCPA was signed into law in 2018 and took effect on Jan. 1, 2020.
To ensure that they comply with such laws, businesses need to carefully manage the process of collecting big data. Controls must be put in place to identify regulated data and prevent unauthorized employees from accessing it.