Analyzing Big Data

ISHMEET KAUR
4 min readMay 31, 2020

Types of Big Data Analytics

  • Descriptive: Describes the current state that answers what and when type of questions. Typically uses reports, dashboards, visualizations like charts and graphs.
  • Predictive: An analysis of historical data to predict what might happen. Yields a forecast of a probable outcome.
  • Prescriptive — This type of analysis reveals what actions should be taken. This is the most valuable kind of analysis and usually results in rules and recommendations for next steps.

Descriptive Analysis

  • Creates a summary of historical data to yield useful insights.
  • Statistics operations like sum, average, count, percentages are used to summarize data.
  • Provides historical insights into the company’s sales, finance, operations, customers, products and inventory.
  • Examples:

What are the changes in sales year over year

Which is the most profitable product brand

Which sales territory yielded highest or lowest sales and what are the sales details

What are the average dollars spent per customer

Tools for Descriptive Analytics

•Spreadsheets. Example: Microsoft Excel

•Queries on RDBMS. Example: Oracle, MySQL

•Data warehouse. Example: IBM Cognos, Teradata

•Reporting software. Example: Jasper Reports

•Business Intelligence. Example: Tableau, Qlik

•Visualizations: Tableau, Qlik

  • Programming languages: R, D3.js

Predictive Analysis

•Analyzes historical data to detect patterns and trends and make predictions on future outcomes.

•Provide estimates about the likelihood of a future outcome, based on statistics and probabilities.

Examples:

•How likely is this online user to click on this online ad?

•Predict customer behavior and purchase patterns

•Forecast inventory needs based on market trends

•Predict the sale price of a house in a specific real estate market

•Using historical data from ERP, CRM, POS etc, fill in the information that you do not have.

Techniques for Predictive Analytics-Combination of statistics, data mining and machine learning techniques

Linear Regression: Statistical approach to model the relationship between dependent variable y and one or more explanatory variables X. Example: Predicting home sale price.

Logistic Regression: Predicts the outcome of a categorical dependent variable based on one or more predictor variables. Example: Predict whether a tumor is malignant or not based on its characteristics.

  • Decision trees and Random forests: Is a decision support tool that uses a tree like graph to map observations of an item to it’s target value. Random forests are a group of decision tress to improve the predictive performance. Example: Predict if a passenger on Titanic survived or not.

Naïve Bayes theorem: Classification technique based on Bayes theorem with independence assumptions between predictors. Example: Document classification of spam and non-spam emails, text analytics, sentiment analytics.

Clustering: Is a process of partitioning a data set into a set of meaningful sub-classes, called clusters. Help users understand the natural grouping or structure in a data set. Example: Identify recommendations for shoppers.

Neural networks: Modeled after the human brain that consists of a network of nodes (neurons). A node is activated by an input and generates a response that in turn activates another node.

  • Link analysis: is part of a subset of mathematics called graph theory, which represents the relationship between objects as edges and the objects themselves as nodes. Example: Friendship and acquaintance networks in social media.

Tools for Predictive Analytics

Open source software: R, Apache Mahout, Apache Spark Mllib, H2O, NumPy, SciPy

Proprietary predictive analytics tools: IBM SPSS, SAS, SAP, RapidMiner

Prescriptive Analytics

•Advises on what action to take to achieve a possible outcome.

•Translates a forecast into a feasible plan for the business and helps users implement.

•Use a combination of techniques and tools such as business rules, algorithms, machine learning and computational modelling procedures.

•Examples:

-Optimize production in order to achieve maximum profits.

-Healthcare models advise focusing on obese patients with diabetes and cholesterol

-To prescribe how and where to drill and produce wells in order to optimize recovery, minimize cost, and reduce environmental footprint.

Tools: SAS, IBM, Dell Statistica are leaders as per Gartner’s quadrant.

--

--