THE BIG DATA : A big Problem

Hirendra kumar
6 min readSep 17, 2020

--

what is data:

>> any type of information that we send or receive for sharing is can be considered as data.

we can store data in our own system so that we can easily access that data without any internet interruption as we put some images , videos and other important pdfs etc. all these are a some kind of information or we can say data that we stored .

But we always don’t put data in own system , we usually put data on social media like fb and instagram etc to share with the world . In this way we put our data into social media databases. and these social media save our data so we can access it anytime from them through the internet. and they save it and make it secure because it is not their duty , it is their business. so now come to a conclusion that data is business.

what is Big Data:

the data which can’t store in a limited storage that can be termed as big data . suppose we have a laptop that have 10 gb hard disk and we want to store 20 gb data in that laptop as we know we can’t 20 gb data in 10 gb hard disk , so we can say this data is big data according to that laptop.

some facts :

According to the current situation, we can strongly say that it is impossible to see a person without using social media. Because the world is getting drastic exponential growth digitally around every corner of the world. According to a report, from 2017 to 2019 the total number of social media users has been increased from 2.46 to 2.77 billion.

People are using Facebook, Instagram, WhatsApp, and other social/Messaging medium while doing their daily routines. So, this caused the average time spent on social media by an individual has been increased to 2 hours 22 minutes.

Some Interesting Statics Of Social Media

In every one minute..

  • 700,000 logins on facebook
  • Around 530,000 photos are shared on snap chat
  • Around 350000 tweets are tweeted on twitter
  • 30,000 photos are shared on Instagram
  • 21 million messages on WhatsApp

we can see so much login on social media and images, videos and article etc is shared only in just one second , then login data and shared data becomes a big data problem because it is hard manage so much data.

>> fact about giant fb:

In 2012, Facebook has revealed that it is generating around 500+ terabytes of data every day. In which 2.7 billion were likes and around 300 million photos per day. Another exciting thing is Facebook is scanning around 105 terabytes of data per each half hour.

fb getting so much big data every data. and handling and then storing that data is a concern.

The Amount of Data Created Each Day on the Internet in 2019

In 2014, there were 2.4 billion internet users. That number grew to 3.4 billion by 2016, and in 2017 300 million internet users were added. As of June 2019 there are now over 4.4 billion internet users. This is an 83% increase in the number of people using the internet in just five years!

Big Stats and Facts About Big Data

  • At the beginning of 2020, the digital universe was estimated to consist of 44 zettabytes of data.
  • By 2025, approximately 463 exabytes would be created every 24 hours worldwide.
  • As of June 2019, there were more than 4.5 billion people online.
  • 80% of digital content is unavailable in nine out of every ten languages.
  • In 2019, Google processed 3.7 million queries, Facebook saw one million logins, and YouTube recorded 4.5 million videos viewed every 60 seconds.
  • Netflix’s content volume in 2019 outnumbered that of the US TV industry in 2005.
  • By 2025, there would be 75 billion Internet-of-Things (IoT) devices in the world.
  • By 2030, nine in every ten people aged six and above would be digitally active.

from above stats we can understand that data is increasing per second , per minute and per month , per year and becoming the big data. and becoming big problem.

>> why to store data:

now the question is why to store so much data as big data because we don’t have so much storage to store this data , but big MNCs has to store this because this is their business . If they don’t store data persistent then how they provide their services to the customers. and why clients will go them.

they have store everything like your login data , your uploaded images , videos and like , dislike and tweet and retweet because if clients will not get their older data ,so they don’t get satisfied with the services and can leave them.

so in short term data is business and big data handling and storing is problem.

how MNCs store big data:

To manage the problem of big data they use a concept called distributed storage , in this various laptop contribute their storage to one centralized system . and solve the problem of storage and I/O problem.

To implement the distributed storage we have multiple product provided by open source communities.

storage problem:

Big companies like dell EMC can create bigger hard disk and MNCs can use that but bigger hard disk means it will be very costly.

big companies can buy costly hard disk also to run their business but in bigger hard disk we can store tb of data and big data also but to fetch this data again when a client come would be very harder, because hard disk is very slow in input and output processing . that is called I/O problem.

Distributed Storage:

>> To overcome the problems of storage and I/O they came with a concept called distributed storage . in this concept the system which has low storage , that can ask to other system to share their storage. instead of buying storage for that system it can use storage of other storage via network. more and more the sharing system , more storage we get.

and in this process we can also avoid I/O problem ,

if we use one hard disk in one laptop then I/O problem will occur because hard disk is very slow but if we store data in many servers then to fetch that date in parallel will be very easy. we create strips of data in different size and can send to other server .

suppose we have 50 gb data then we can split this into 5 parts and can send 10 gb to other server . now the fetching of data would be easy than compare to directly from one laptop. all the server come up and make a cluster called master-slave cluster.

the master node is called name node and slave node is called data node.

master-slave cluster

Implementation of Distributed Storage:

To implement the distributed storage concept we have multiple product -

  1. Hadoop
  2. Cassandra
  3. MongoDB
  4. spark

using these powerful product MNCs are storing and manipulating the big data.

THANK YOU !!

--

--

No responses yet