Big data
Date:2018-12-17
Big data refers to a collection of data that cannot be captured, managed, and processed using conventional software tools within a certain time frame. It is a massive, high growth rate, and diverse information asset that requires new processing modes to have stronger decision-making, insight, and process optimization capabilities. [1]
In the "Age of Big Data" written by Victor Mayer Sch ö nberg and Kenneth Cook, big data refers to using all data for analysis and processing without the shortcut of random analysis (sampling surveys). The 5V characteristics of big data (proposed by IBM) include Volume, Velocity, Diversity, Value, and Verity.

Gartner, a research firm specializing in big data, provides this definition. "Big data" requires new processing modes to have stronger decision-making power, insight and process optimization capabilities to regulate massive, high growth rate, and diverse information assets. [1]
The definition given by McKinsey Global Research is: a dataset that is large enough to greatly exceed the capabilities of traditional database software tools in terms of acquisition, storage, management, and analysis. It has four major characteristics: massive data scale, fast data flow transformation, diverse data types, and low value density. [4]
The strategic significance of big data technology lies not in mastering vast amounts of data information, but in specialized processing of meaningful data. In other words, if big data is compared to an industry, the key to achieving profitability in this industry is to improve the "processing ability" of data and achieve "value-added" through "processing". [5]
From a technical perspective, the relationship between big data and cloud computing is as inseparable as the front and back of a coin. Big data cannot be processed on a single computer and must adopt a distributed architecture. Its characteristic lies in the distributed data collection of massive amounts of data. But it must rely on distributed processing of cloud computing, distributed databases, cloud storage, and virtualization technology. [2]
With the advent of the cloud age, big data has also attracted more and more attention. The analyst team believes that big data is commonly used to describe the large amount of unstructured and semi-structured data created by a company, which can take too much time and money to download to relational databases for analysis. Big data analysis is often associated with cloud computing, as real-time analysis of large datasets requires frameworks like MapReduce to allocate work to dozens, hundreds, or even thousands of computers.
Big data requires special techniques to effectively process large amounts of data that can tolerate time. Technologies applicable to big data, including Massive Parallel Processing (MPP) databases, data mining, distributed archive systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.
Big data includes structured, semi-structured, and unstructured data, with unstructured data becoming an increasingly important part of the data. According to an IDC survey report, 80% of data in enterprises is unstructured, and this data grows exponentially by 60% every year. [7] Big data is just a manifestation or feature of the development of the Internet to its current stage, there is no need to mythologize it or have a deep understanding of it Maintain a sense of awe. Against the backdrop of technological innovation represented by cloud computing, these seemingly difficult to collect and use data are now easily utilized. Through continuous innovation in various industries, big data will gradually create more value for humanity. [8]
Secondly, in order to have a systematic understanding of big data, it is necessary to comprehensively and meticulously decompose it, starting from three levels:
The first level is theory, which is a necessary pathway for cognition and a widely recognized and disseminated baseline. Here, we understand the industry's overall description and characterization of big data from the definition of its characteristics; Deeply analyze the preciousness of big data from the exploration of its value; Insight into the development trend of big data; From the special and important perspective of big data privacy, examine the long-term game between people and data.
The second level is technology, which is the means of reflecting the value of big data and the cornerstone of progress. Here, we will explain the entire process of big data collection, processing, storage, and result formation from the development of cloud computing, distributed processing technology, storage technology, and perception technology.
The third level is practice, which is the ultimate value embodiment of big data. Here, we will depict the beautiful scene and upcoming blueprint of big data from four aspects: big data of the Internet, big data of the government, big data of enterprises, and big data of individuals. [8]