Big Data Planning Considerations
Big Data initiatives are strategic in nature and should be business driven. Given the nature of Big Data and its analytic power, there are many issues that need to be considered and planned for in the beginning. For example, it may need to be secured in a way that conforms to existing corporate standards. Data procurement may also need to be considered and it could very well be something new to the organization. Managing the privacy of data in cases where identities can be revealed through analytics must be planned for. Considerations of moving into the cloud for data storage is also something that could be considered and therefore necessary to plan for. All of this requires the establishment of distinct governance processes and decision frameworks. Organizationally, the approach for performing business analytics will change and that also needs to be planned for.
In order for data analysis and analytics to offer value, enterprises need to have data management and Big Data governance frameworks. Sound processes and sufficient skillsets for those who will be responsible for implementing, customizing, populating and using Big Data solutions are necessary. Additionally, the quality of the data targeted for processing by Big Data solutions needs to be assessed. The Big Data environment needs to be planned for, including a roadmap to ensure expansions and augmentations is planned to be in sync with the requirements of the enterprise.
Substantial budget may be required to obtain data that will be used for analysis. Much of the data may come from external data sources. Better insights can be gained from volume and variety of data that can be fed into the Big Data platform. External data sources could come from government or commercial sources. Government data may be free but commercial data sources will usually come with a high-priced subscription.
Data can contain confidential information about organizations or individuals. Sometimes seemingly benign data can reveal private information when datasets are combined and jointly analyzed. This can lead to intentional or unintentional breaches of privacy. Addressing such concerns requires an understanding of the nature of data, the data privacy regulations, and the techniques for anonymizing data.
Securing Big Data involves ensuring that the data networks and repositories are sufficiently secured via authentication and authorization mechanisms. It furthers involves establishing data access levels for different categories of users.
Provenance is information about source of data and how it has been processed. Provenance information helps determine authenticity and quality of data, and it can be used for auditing purposes. Maintaining provenance for large datasets can be complex because data will be in different states in its lifecycle and whether it is in-motion, in-use or at-rest. Whenever data changes state, provenance information should be captured and recorded as metadata.
Some other considerations
Choosing Big Data solution may involve understanding of the need for real-time or near real-time needs within the organization, and some of the open-source solutions are batch-oriented which may not offer what a company needs. Performance may also be a factor, as large datasets and complex algorithms can lead to long query wait times. Governance framework is required to ensure that data and the solution environments are regulated, standardized and evolved in a controlled manner. Methodology should be established for how data will flow into and out of the Big Data solution. Cloud could be a consideration for companies with inadequate internal hardware support or ones that don’t want to put up the large up-front capital investment. For analytics, a good understanding of the business case, business requirements and resource requirements to take on the analytical tasks should be undertaken. Then the consideration for data identification, data acquisition, data filtering, data extraction, data validation, data cleansing, data aggregation, data representation, data analysis, data visualization, and utilization of analysis results must be considered for success of Big Data implementation.