Basics of Big Data & Need

Four Vs of Big Data

Volume: Extremely large volumes of data

Variety: Various forms of data — structured, semi-structured and unstructured

Velocity: Real time, batch, streams of data

Veracity/Variability: Inconsistent, sometimes inaccurate, varying data

Why Big Data is important?

  • Big data gives value only when it is analyzed to gain insights
  • Big data helps businesses in smart decision making, causing cost reduction and time reduction
  • Big data is used in healthcare to fight diseases and improve preventive care
  • Innovations like self driving cars are possible due to big data
  • Sports teams use big data to improve performance and prevent injuries.

Evolution of Big Data: Timeline

  • 1960s: When computers were adopted by commercial industries, data was stored in flat files with no data structure imposed
  • 1970s: Relational data model was invented, triggering the popularity of RDBMS databases and the structured query language SQL
  • 1990s: Data warehouses were commercialized to gain insights from data using only a subset of huge volumes of transactional data
  • 2000s: Explosion of Internet brought the need to store unstructured data from web, including audio, video, images and metadata

Evolution of Big Data: New technologies

  • NOSQL databases: Not Only SQL databases, enables high performance processing of large scale data
  • Hadoop: A software ecosystem that enables massively parallel computations distributed across thousands of commodity servers in a cluster
  • Cloud computing: Allows massive-scale complex computations without the need to maintain expensive hardware and software.

Sources of Big Data

  • Social media: Messages and information shared between virtual communities via blogs, forums, tweets, Facebook posts, LinkedIn posts, etc.
  • Machine-generated data: Data generated without human intervention by hardware, software, medical devices.
  • Business transactions: Data describing business events and relationships between different entities and a business
  • IOT: Devices connected to the Internet that communicate with each other and produce huge amounts of data and information.
  • Sensors: Measuring devices that capture physical quantities and change them into signals.

Format of Big Data

  • Structured: Data that has a defined length and format. Examples are numbers, words, dates. Easy to store and analyze. Often managed using SQL
  • Semi-structured: Between structured and unstructured, does not conform to a specific format but is self-describing involving simple key-value pairs. Examples are EDI, SWIFT and XML.
  • Unstructured: Data that does not follow a specific format. Examples are audio, images, text messages, X-Ray images etc.

Examples of Structured and Machine generated Big Data:

Sensor data, web log data, point of sale data, financial data

Examples of Structured and human generated Big Data

Click stream data, website input data, gaming related data. Click stream data is an information trail that a user leaves behind while visiting a website. Typically captured in semi-structured web log files.

Examples of Unstructured and machine generated Big Data

Satellite images, scientific data, radar or sonar data, photographs and video

Examples of Unstructured and human generated Big Data

Social media data, mobile data, website content, corporate textual data

Every 60 seconds ….

  • Search engine Google serves more that 694,445 queries
  • 6,600+ pictures are uploaded on Flickr
  • 600+ videos are uploaded on YouTube videos, amounting to 25+ hours of content
  • 695,000 status updates, 79,364 wall posts and 510,040 comments are published on Social Networking site Facebook

Big Data Analytics

  • Basic analytics: Reporting, dashboards, simple visualizations, slicing and dicing
  • Advanced analytics : Complex analytics models using machine learning, statistics, text analytics, neural networks, data mining
  • Ope-rationalized analytics: Embed big data analytics in a business process to streamline and increase efficiency
  • Analytics for business decisions: Implemented for better decision-making that drives the revenue