ASSIGNMENT:1 TOPIC :BIG DATA HITESH KUMAR 2K15/MC/027 A proper definitionof “big data” is difficult to achieve because projects, vendors, developers,and business professionals use it quite differently. With these things in mind,generally speaking, big data is: Bunch of large datasets Or a category of computing strategies and technologies that are used to handle large datasetsWhere “large dataset”means that a dataset too large to reasonably process on a single computer orstore with traditional tooling. This means that the common scale of bigdatasets is constantly shifting and may vary significantly from organization toorganization.Technology has taken over every fieldtoday resulting in huge data growth. All of this data is valuable. 3 to 4million data is used every day.
One machine can’t store and process this hugeamount of data therefore the need to understand big data and methods to storethis data arises. Big data is a hugeamount of data which can’t be processed using traditional systems of approach (computersystem) in a given time frame. Here, big data is used tobetter understand customers and their behaviors and preferences.
Companies arekeen to expand their traditional data sets with social media data, browser logsas well as text analytics and sensor data to get a morecomplete picture of their customers.· Thereare specific attributes that define big data. In most big data circles, theseare the seven V’s: volume,variety,velocity,veracity,visibility,validityand variability.· Volume: It refersto the huge amount of data that is created in places ranging from data createdby social networking sites, banks (accounts, credit and debit cards).
· Variety:It is referred to different types of data being used for asdiscussed above (structured, semi structured and unstructured).· Velocity:While processing, more and more data keeps on coming and it has to be processedefficiently and within the time frame. For example, every minute new videos arebeing uploaded on YouTube.· Veracity:This is referred to the authenticity of the data.
For exampletwitter uses hash tags, abbreviations in user’s tweets. The accuracy of allthis content is checked by twitter. · Visibility:The type of data that is visible.· Validity:Referred to the validity of data. For example, in 1998 different kinds of fileswere than that are being used now.
· Variability That’s why big data analyticstechnology is so important to heath care. By analyzing largeamounts of information – both structured and unstructured – quickly, healthcare providers can provide lifesaving diagnoses or treatment options almostimmediately.Now how bigdoes this data need to be? There’s a common misconception while referring theword bigdata. There’s not a threshold of data above which data will beconsidered as big data. It is referred to data that iseither in gigabytes, terabytes, petabytes, exabytes or size even larger thanthis. This definition is wrong.
Big data depends purely on the context it isbeing used in. Even a small amount of data can be referred to as big data. Forexample, you can’t attach a file to an email with a size of 100 MB.
Thereforefor the email, this 100 MB is referred to as big data.