2.04.2015

Big Data

Why we need to know Big Data?

Big data will continue to grow in next years, due in part to the rise of the Internet of Things, which has the power to embed technology in practically anything. As ever-larger volumes of data are created, it's vital to know how to collect and analyze that data particularly when it's related to customer preferences and business processes. No matter what industry you're in, you'll miss out on key marketing and decision-making opportunities by ignoring big data.

Video link on Big Data basic -
DataBig Data Opportunity: Structured vs. Unstructured Data


What is Big Data?

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.


Is Big Data a Volume or a Technology?

While the term may seem to reference the volume of data, that isn't always the case. The term big data, especially when used by vendors, may refer to the technology (which includes tools and processes) that an organization requires to handle the large amounts of data and storage facilities. The term big data is believed to have originated with Web search companies who needed to query very large distributed aggregations of loosely-structured data.


An Example of Big Data

An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people - all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.

“So in your own company, look at your own structured data, look at the data that you have, look for insights in the unstructured data, and perhaps more importantly, look for opportunities to accumulate more data that provides some value – any data you can get your hands on and start accumulating it, because that’s a Growing asset.
And the beauty of big data is you can sell it multiple times. You can keep selling the same asset over and over again for different applications, for different uses, for different opportunities. So look at the unstructured data and get as much of it as you can.”


Big Data Sources

User , Application, System and Sensor generated Data


Tools and Technology

Big Data is too large to process using traditional methods. 

Amazon AWS Big Data solutions for every stage of the big data life-cycle:
Collect > Stream > Store > RDBMS | Data Warehouse | NoSQL > Analytics > Archive


Some technologies that are the driver of Big Data system are as follows -

Hosting Environment:
Distributed Servers or Cloud.  For example -
Amazon Elastic Compute Cloud  (Amazon EC2

Data storage:
Distributed Storage. For example -
Amazon Simple Storage Service (Amazon S3)

Programming Model:
Distributed Processing or Distributed Computing System. For example -

Database:
High performance Schema free databases. For example -

Relational Database 
- Amazon RDS (Relational Database Service)

NoSQL Database
- Amazon DynamoDB
- MongoDB (open source)
- Hbase
- Cassandra

Data warehouse:
Amazon Redshift

Different Operations performed on Data:
Analytic/Semantic Processing