Big Data Analytics (BDA) comes into the picture when we are dealing with the enormous amount of data that is being generated from the past 10 years with the advancement of the science and technology in different fields. To process this large amount of data and getting valuable meaning from it in a short span of time is a really challenging Task. Especially when four V’s that comes into the picture, when we discuss about BDA i.e. Volume, Velocity, Variety and Veracity of data.
Why and When to go for Big Data Analytics
Big data is a Revolutionary term that describes the Very large amount (Volume) of unstructured (text, images, videos), structured (tabular data) and semi-structured (json, xml) data that have the potential to be mined for information. We should go for big data, when we encounter following scenarios.
Volume (data at scale)
Integrating and processing these huge volumes of data, stored across a scalable and distributed environment poses a huge business challenge to analysts. Big IT Giants like Yahoo, Google generates Peta Bytes of data in less span of time. In IT industry, the increase in data volume that need to handled and processed is in exponential terms compared to the past.
Velocity (Speed at which data transfer)
When we need to process huge amount of data in a fraction of seconds and deriving insights from it. For better understanding, let’s take the case of the telecom domain. This domain generates CDR (Call Detail Records) data in terms of Giga bytes per hour. So the network bandwidth and passage of these data for processing through network B/W is a very important factor. Big Data & Analytics shifts from analyzing data after it has landed in a warehouse or mart to analyzing data in motion as it is generated, in real time.
Variety (Different forms of data)
Variety is about managing many types of data and understanding and analyzing them in their native form. Almost 80% of all the data created daily is unstructured – videos, social media, satellite images, machine and sensor data. Big Data Analytics shifts from cleansing data before analysis to analyzing the information as is and cleansing only when needed.
Veracity (Uncertainty of data)
The veracity of the big data increases complexity and it becomes very hard to establish veracity, which is essential for making confident decisions in real time scenarios. In the present times, then ¾ the of all available data is uncertain and in this situation, dealing with this data and getting the confidence of valuable insights from it is the key business feature for the market situations. So confidence building to the industry clients dealing with the veracity of data is the key task.
Getting meaningful insights from large data sets and processing these data sets in a sensible amount of time is a challenging task. Traditionally, data have always been dominated by trial-and-error analysis and this approach has become impossible when data sets are large and heterogeneous. Machine learning will enable cognitive systems to learn reason and engage with human in a more natural and personalized way.
Industries in which big data had made Big Difference
The Industries such as Telecom, Retail, Finance, Healthcare, Banking, Energy and Automobile Sectors widely use Big data and its analytics and getting benefit in the current global market. Data volumes and data generation speed are significantly higher than it was before. All this new kind of data require a new set of technology to store, process and to make sense of data.
Predictive analytics improve fraud detection and speeds up claims processing. As a result of these, analytics gives more effective marketing, better customer service and new revenue generating opportunities in different industrial domains.
Compared to traditional methods and approaches, the Big Data is adding so many features to present market trend in terms of sales and revenues. The Typical applications of these Big Data Analytics include weather predictions, Geospatial pattern recognition, Disaster management and Space Technology. One of the main functions of an ETL tool is to transform structured data. The transformation step is the most vital stage of building a structured data warehouse.
Most of the data generated by different sources are unstructured data and getting meaning full insights from this data is a Herculean task by using traditional data systems. The only way to find out meaning full information from it is used of BDA. Here are some of Areas in which Big Data and Analytics play Vital Role: (Some Facts)
- The Large Hadron Collider (LHC), when operational last year generated 1 GB of data every second
- A new Radio Telescope (called the Square Kilometer Array) is currently being built in the southern hemisphere. The SKA will produce 20,000 PB / day in the year 2020 (compared with the current internet volume of 300 PB / day).
- There are 4 engines on the Airbus A380, the double-deck; largest commercial airplane in the world. Each A380 engine generates 1 PB of data on a flight, for example, from London (LHR) to Singapore.
- Social Media Analytics and its Pattern Recognition Techniques.
Type of Analytics With BDA
Using BDA, we can perform different type of Analytics, namely Predictive, Prescriptive, Descriptive and Cognitive. Among all, Predictive is important to get the useful insights from the data. Predictive analytics are mainly performed by applications which needs to predict patterns in Real Time Analytics scenario. Predictive analytics improve fraud detection and speed up claims processing.
Cognitive Analytics working on cognitive computing systems can help business organizations, in terms of fast, accurate results by using Neural and Robotics like environments and setup. These systems learn and interact naturally with people to extend what either humans or machine could do on their own.
Typical live example is IBM Watson computer won the Jeopardy game in 2011 which is a purely super computer which won against former two champions of the Jeopardy game. The IBM Watson computer, purely works on cognitive system and it works on NLP (Natural Language Processing).
Form of Data And BDA Environment
Coming to semi-structured and unstructured data, it may not fit well in traditional data warehouses based on relational databases. So the Volume of data that comes day to day cannot be managed by the traditional systems. So, we require Big Data and its analytics to apply faster methods of processing and getting valuable insights from the data in less span of time.
From working with Big Data Analytics, an open source Apache Hadoop came into the picture and it provides reliable, scalable and distributed computing environment. From descriptive analytics to predictive analytics, we can perform any type of analytics using the Hadoop Big Data Environment.
The environment includes its components like YARN, HDFS, Mapreduce, Hive, Pig and Hbase. NoSQL Databases such as MongoDB, Cassandra, CouchDB comes into the Big Data environment basket.
The following Use case gives more knowledge on Big Data Analytics in Real time scenario:--
Typical Industrial Use Case: - Machine and Sensor Data Analytics
“This use case mainly focuses on the Real time streaming data that are coming from the sensors installed inside mines. Machines with high configurations (like memory storages, RAM capacity) have sensors installed on it. Every machine is installed with 2000-3000 sensors and each sensor will produce 1 to 2 MB of data per Sec. The entire mining area has more than 1000 machines.
All the systems are monitored by centralizing administrative network. So, the size of data that is going to generate per machine is 20 Terabytes/hour. This is going to be a big challenge to process and do analytics on this huge real time geographical data (Temperature, Humidity, Seismic activity, Radiation activity). In this use case, two types of analytics involved are predictive and prescriptive.”
Processing Peta bytes of streaming data and predicting the weather patterns based on Statistical algorithms in a fraction of seconds, may give the solution for identifying the potential Risk Analysis Patterns. Here the weather prediction and disaster risk analysis engine provides the required solution for the use case. This is the best solution provided by the BDA to the Mining Industry for Disaster Risk Management.
An Enterprise distribution of Hadoop provides enterprise class architecture for BDA and for its applications. These are Cloudera distributed Hadoop (CDH), Horton works, IBM Big Insights and MapR. For Real Time Online analytics, IBM had come up with IBM Info sphere streams. The enterprise editions provide scalable, secure, distributed environment for real time and offline data analytics.
These systems will provide advanced analytics, Sentiment analytics and Text mining patterns. Different business intelligence (BI) Tools like Tableau, Qlikview and Pentaho provides live Visualizations and Reporting and add-on value to BDA in terms of front end visualization.
BDA is going to play a vital role in every industry and it adds huge value to its business. As Science and technology grows, we are going to see some advanced analytics in the coming future and BDA is going to gear up with the much more advanced neural techniques, cognitive analytics techniques and give us the valuable outcomes from it.