	{"id":336,"date":"2013-06-18T22:33:40","date_gmt":"2013-06-18T15:33:40","guid":{"rendered":"http:\/\/science-technology.vn\/?p=336"},"modified":"2013-10-31T09:41:35","modified_gmt":"2013-10-31T02:41:35","slug":"big-data-du-lieu-lon","status":"publish","type":"post","link":"https:\/\/science-technology.vn\/?p=336","title":{"rendered":"Big Data &#8211; D\u1eef li\u1ec7u l\u1edbn"},"content":{"rendered":"<p><span style=\"font-size: 14px;\">Khi c\u00f4ng ngh\u1ec7 th\u00f4ng tin \u0111ang \u0111\u01b0\u1ee3c d\u00f9ng nhi\u1ec1u h\u01a1n trong m\u1ecdi doanh nghi\u1ec7p, kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u c\u0169ng t\u0103ng l\u00ean nhi\u1ec1u v\u00e0 trong th\u1eddi gian ng\u1eafn, ph\u1ea7n l\u1edbn doanh nghi\u1ec7p s\u1ebd c\u00f3 nhi\u1ec1u d\u1eef li\u1ec7u h\u01a1n h\u1ecd c\u00f3 th\u1ec3 h\u00ecnh dung. Theo m\u1ed9t b\u00e1o c\u00e1o c\u00f4ng nghi\u1ec7p, nhi\u1ec1u c\u00f4ng ti s\u1ebd d\u00f9ng t\u1eeb 100 terabytes (TB) v\u00e0 9 petabytes (PB) d\u1eef li\u1ec7u v\u00e0 kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u s\u1ebd g\u1ea5p \u0111\u00f4i c\u1ee9 sau 18 th\u00e1ng. (Ngh\u0129 v\u1ec1 lu\u1eadt Moore). M\u1ecdi ng\u00e0y, d\u1eef li\u1ec7u \u0111\u01b0\u1ee3c sinh ra t\u1eeb m\u1ecdi ki\u1ec3u ngu\u1ed3n.<\/span><\/p>\n<p>Ch\u1eb3ng h\u1ea1n, Twitter nh\u1eadn \u0111\u01b0\u1ee3c 200 tri\u1ec7u tin nh\u1eafn m\u1ed9t ng\u00e0y hay 46 megabytes m\u1ed9t gi\u00e2y; Facebook thu th\u1eadp trung b\u00ecnh 15 Terabytes m\u1ed7i ng\u00e0y. Google b\u00e1o c\u00e1o r\u1eb1ng t\u1eebng ng\u00e0y b\u1ea9y tri\u1ec7u trang web \u0111\u01b0\u1ee3c th\u00eam v\u00e0o Internet. C\u00f4ng nghi\u1ec7p kinh doanh tr\u1ef1c tuy\u1ebfn th\u00eam 12 tri\u1ec7u giao t\u00e1c hay 25 petabytes d\u1eef li\u1ec7u m\u1ed7i gi\u1edd. C\u00f4ng nghi\u1ec7p vi\u1ec5n th\u00f4ng c\u00f3 tr\u00ean 5 t\u1ec9 ng\u01b0\u1eddi d\u00f9ng \u0111i\u1ec7n tho\u1ea1i tr\u00ean th\u1ebf gi\u1edbi. M\u1ed7i ng\u00e0y 2 t\u1edbi 3 t\u1ec9 ng\u01b0\u1eddi d\u00f9ng truy nh\u1eadp v\u00e0o internet \u0111\u1ec3 \u0111\u1ecdc, t\u00ecm m\u1ecdi ki\u1ec3u th\u00f4ng tin; m\u1ecdi ng\u01b0\u1eddi c\u0169ng t\u01b0\u01a1ng t\u00e1c v\u1edbi nhau b\u1eb1ng emails, tin nh\u1eafn v.v. T\u1ea5t c\u1ea3 nh\u1eefng \u0111i\u1ec1u n\u00e0y c\u0169ng l\u00e0m ph\u00e1t sinh nhi\u1ec1u d\u1eef li\u1ec7u h\u01a1n tr\u01b0\u1edbc \u0111\u00e2y. V\u00ec kh\u1ed1i l\u01b0\u1ee3ng l\u00e0 l\u1edbn th\u1ebf, t\u1edbi t\u1eeb \u0111a d\u1ea1ng ngu\u1ed3n, ph\u1ea7n l\u1edbn d\u1eef li\u1ec7u \u0111\u1ec1u phi c\u1ea5u tr\u00fac v\u00e0 b\u00ean ngo\u00e0i vi\u1ec7c x\u1eed l\u00ed c\u1ee7a c\u00f4ng c\u1ee5 qu\u1ea3n l\u00ed d\u1eef li\u1ec7u hi\u1ec7n th\u1eddi, n\u00f3 y\u00eau c\u1ea7u c\u00e1ch ti\u1ebfp c\u1eadn m\u1edbi, c\u00f4ng c\u1ee5 m\u1edbi \u0111\u1ec3 thu th\u1eadp v\u00e0 ph\u00e2n t\u00edch d\u1eef li\u1ec7u cho n\u00ean n\u00f3 \u0111\u01b0\u1ee3c cho c\u00e1i t\u00ean l\u00e0 \u201cBig Data\u201d.<\/p>\n<p>Big Data \u0111\u01b0\u1ee3c coi l\u00e0 &#8220;th\u1ee9 l\u1edbn ti\u1ebfp sau&#8221; t\u01b0\u01a1ng t\u1ef1 nh\u01b0 m\u00e1y t\u00ednh c\u00e1 nh\u00e2n trong nh\u1eefng n\u0103m 1970 v\u00e0 Internet trong nh\u1eefng n\u0103m 1990. N\u1ebfu ch\u00fang ta nh\u00ecn v\u00e0o l\u1ecbch s\u1eed ng\u1eafn ng\u1ee7i c\u1ee7a c\u00f4ng ngh\u1ec7 th\u00f4ng tin v\u1ec1 d\u1eef li\u1ec7u ch\u00fang ta c\u00f3 th\u1ec3 th\u1ea5y t\u1ea1i sao. Trong nh\u1eefng n\u0103m 1980 Qu\u1ea3n l\u00ed h\u1ec7 th\u1ed1ng c\u01a1 s\u1edf d\u1eef li\u1ec7u quan h\u1ec7 (RDBS) ch\u1ec9 l\u00e0 nh\u1eefng h\u1ec7 th\u1ed1ng c\u01a1 s\u1edf d\u1eef li\u1ec7u th\u00f4ng th\u01b0\u1eddng \u0111\u01b0\u1ee3c d\u1ea1y trong ch\u01b0\u01a1ng tr\u00ecnh Qu\u1ea3n l\u00ed h\u1ec7 th\u00f4ng tin. Tuy nhi\u00ean v\u1edbi b\u00f9ng n\u1ed5 c\u1ee7a c\u00f4ng ngh\u1ec7 th\u00f4ng tin khi nhi\u1ec1u c\u00f4ng ti thu th\u1eadp d\u1eef li\u1ec7u, \u0111\u1ed9t nhi\u00ean RDBS ph\u00e1t tri\u1ec3n th\u00e0nh kinh doanh nhi\u1ec1u t\u1ec9 \u0111\u00f4 la v\u1edbi c\u00e1c c\u00f4ng ti nh\u01b0 Oracle v\u00e0 SAP. Trong nh\u1eefng n\u0103m 1990, truy l\u1ee5c th\u00f4ng tin v\u00e0 \u0111\u1ed9ng c\u01a1 t\u00ecm ki\u1ebfm \u0111\u00e3 l\u00e0 v\u00e0i m\u00f4n h\u1ecdc \u0111\u01b0\u1ee3c d\u1ea1y trong ch\u01b0\u01a1ng tr\u00ecnh chuy\u00ean s\u00e2u khoa h\u1ecdc m\u00e1y t\u00ednh nh\u01b0ng v\u1edbi t\u0103ng tr\u01b0\u1edfng c\u1ee7a Internet, n\u00f3 \u0111\u00e3 bi\u1ebfn th\u00e0nh kinh doanh nhi\u1ec1u t\u1ec9 \u0111\u00f4 la v\u1edbi c\u00f4ng ti nh\u01b0 Google. Ng\u00e0y nay v\u1edbi Big Data, c\u00f4ng c\u1ee5 c\u01a1 s\u1edf d\u1eef li\u1ec7u v\u00e0 c\u01a1 s\u1edf d\u1eef li\u1ec7u nh\u01b0 RDBS hay SQL s\u1ebd kh\u00f4ng c\u00f3 t\u00e1c d\u1ee5ng n\u1eefa v\u00ec d\u1eef li\u1ec7u qu\u00e1 l\u1edbn v\u00e0 qu\u00e1 phi c\u1ea5u tr\u00fac. C\u00f3 vi\u1ec7c x\u00f4 v\u00e0o t\u00ecm &#8220;th\u1ee9 l\u1edbn&#8221; ti\u1ebfp m\u00e0 c\u00f3 th\u1ec3 gi\u1ea3i quy\u1ebft cho Big Data. Hi\u1ec7n th\u1eddi ch\u00fang ta \u0111ang \u1edf ng\u01b0\u1ee1ng c\u1eeda c\u1ee7a m\u1ed9t bi\u1ebfn c\u1ed1 \u0111\u1ed9t ph\u00e1 kh\u00e1c, n\u01a1i b\u1ea5t k\u00ec ai c\u00f3 th\u1ec3 &#8220;l\u00e0m ch\u1ee7 n\u00f3&#8221; s\u1ebd ph\u00e1t \u0111\u1ea1t v\u00e0 c\u00f3 th\u1ec3 tr\u1edf th\u00e0nh Bill Gates ti\u1ebfp.<\/p>\n<p>Nhi\u1ec1u ch\u00ednh ph\u1ee7 coi Big Data nh\u01b0 c\u00f4ng ngh\u1ec7 c\u00f3 t\u00e1c \u0111\u1ed9ng cao nh\u1ea5t tr\u00ean th\u1ebf gi\u1edbi ng\u00e0y nay v\u00e0 n\u00f3 s\u1ebd c\u00f3 \u1ea3nh h\u01b0\u1edfng s\u00e2u s\u1eafc l\u00ean m\u1ecdi th\u1ee9 trong th\u1ebf k\u1ec9 n\u00e0y. Big Data c\u0169ng tr\u00ecnh ra c\u01a1 h\u1ed9i l\u1edbn cho sinh vi\u00ean CNTT ng\u01b0\u1eddi l\u00e0m ch\u1ee7 tri th\u1ee9c v\u00e0 k\u0129 n\u0103ng n\u00e0y trong thu th\u1eadp, t\u1ed5 ch\u1ee9c v\u00e0 ph\u00e2n t\u00edch kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u kh\u1ed5ng l\u1ed3 n\u00e0y v\u00e0 bi\u1ebfn n\u00f3 th\u00e0nh th\u00f4ng tin c\u00f3 \u00edch cho \u01b0u th\u1ebf c\u1ea1nh tranh. (C\u00f4ng th\u1ee9c: Big Data \u00a0= Tri th\u1ee9c l\u1edbn = Th\u00f4ng tin l\u1edbn = \u01afu th\u1ebf l\u1edbn) Nghi\u00ean c\u1ee9u c\u00f4ng nghi\u1ec7p th\u1ea5y r\u1eb1ng v\u00e0o l\u00fac n\u00e0y, ch\u1ec9 r\u1ea5t \u00edt c\u00f4ng ti c\u00f3 c\u00f4ng vi\u1ec7c tr\u00ean Big Data nh\u01b0ng h\u1ecd \u0111\u00e3 l\u00e0 t\u1ed1t h\u01a1n m\u1ecdi \u0111\u1ed1i th\u1ee7 c\u1ea1nh tranh c\u1ee7a h\u1ecd, nh\u1eefng ng\u01b0\u1eddi kh\u00f4ng \u0111\u01b0\u1ee3c chu\u1ea9n b\u1ecb, b\u1edfi \u01b0u th\u1ebf l\u1edbn.<\/p>\n<p>Sinh vi\u00ean quan t\u00e2m t\u1edbi Big Data s\u1ebd c\u1ea7n tri th\u1ee9c v\u00e0 k\u0129 n\u0103ng n\u00e0o \u0111\u00f3 trong: l\u1eadp tr\u00ecnh Java, truy l\u1ee5c th\u00f4ng tin, khai ph\u00e1 v\u0103n b\u1ea3n, t\u00edch h\u1ee3p h\u1ec7 th\u1ed1ng qui m\u00f4 l\u1edbn; MapReduce (m\u1ed9t m\u00f4 th\u1ee9c l\u1eadp tr\u00ecnh t\u1ea1o kh\u1ea3 n\u0103ng cho x\u1eed l\u00ed song song); Apache \u201cHadoop\u201d (khu\u00f4n kh\u1ed5 x\u1eed l\u00ed v\u00e0 l\u01b0u gi\u1eef ngu\u1ed3n m\u1edf d\u1ef1a tr\u00ean MapReduce, d\u00f9ng h\u1ec7 th\u1ed1ng t\u1ec7p ph\u00e2n b\u1ed1);\u00a0 NoSQL(m\u1ed9t l\u1edbp c\u01a1 s\u1edf d\u1eef li\u1ec7u phi quan h\u1ec7, phi SQL bao g\u1ed3m l\u01b0u gi\u1eef t\u00e0i li\u1ec7u, l\u01b0u gi\u1eef kho\u00e1-gi\u00e1 tr\u1ecb, v\u00e0 c\u01a1 s\u1edf d\u1eef li\u1ec7u \u0111\u1ed3 ho\u1ea1 \u0111\u01b0\u1ee3c thi\u1ebft k\u1ebf cho l\u00e0m vi\u1ec7c v\u1edbi s\u1ed1 l\u01b0\u1ee3ng d\u1eef li\u1ec7u kh\u1ed5ng l\u1ed3); BigTable (m\u1ed9t ki\u1ec3u c\u01a1 s\u1edf d\u1eef li\u1ec7u NoSQL c\u00f3 t\u00ednh \u0111\u1ed5i qui m\u00f4 cao, th\u01b0a, ph\u00e2n b\u1ed1, \u00e1nh x\u1ea1 ph\u00e2n lo\u1ea1i \u0111a chi\u1ec1u b\u1ec1n); H\u1ecdc m\u00e1y (khu v\u1ef1c tr\u00ed tu\u1ec7 nh\u00e2n t\u1ea1o li\u00ean quan t\u1edbi ph\u00e1t tri\u1ec3n c\u00e1c thu\u1eadt to\u00e1n ph\u1ee9c t\u1ea1p l\u1ea5y d\u1eef li\u1ec7u v\u00e0o t\u1eeb nh\u1eefng c\u1ea3m bi\u1ebfn hay c\u01a1 s\u1edf d\u1eef li\u1ec7u \u0111\u1ec3 l\u00e0m d\u1ef1 b\u00e1o).<\/p>\n<p>&nbsp;<\/p>\n<p>&#8212;-English version&#8212;-<\/p>\n<p>&nbsp;<\/p>\n<p>Big data<\/p>\n<p>&nbsp;<\/p>\n<p>As information technology is being used more in every business, the amount of data is also increasing significantly and within a short time, most businesses will have more data than they can imagine. According to an industry report, many companies will use between 100 terabytes (TB) and 9 petabytes (PB) of data and the volume of data will double every 18 months. (Think about Moore\u2018s law). Every day, data is being generated from all types of sources.<\/p>\n<p>For example, Twitter receives 200 million tweets per day or 46 megabytes per second; Facebook collects an average of 15 Terabytes every day. Google reported that each day seven million web pages are added to the Internet. Online business industry add another 12 million transactions or 25 petabytes of data every hour. Telecommunication industry has over 5 billion mobile phone users in the world. Each day 2 to 3 billion users accessing the internet to read, search all types of information; people also interact with each other by emails, text messages etc. All of these also generate more data than ever before. Since the volume is so massive, came from a variety of sources, most data are unstructured and beyond the processing of current data management tools, it requires a new approach, new tools to collect and analyze data so it is given a name of \u201cBig Data\u201d.<\/p>\n<p>Big Data is considered \u201cThe next big-thing\u201d similar to the Personal computer in the 1970s and Internet in the 1990s. If we look at the short history of Information technology on data we can see why. In 1980s Relational Database System Management (RDBS) were just a common database systems often taught in Information System Management program. However with the explosion of information technology as more companies were collecting data, suddenly RDBS grew into multi-billion-dollar business with companies like Oracle and SAP. In 1990s, Information retrieval and Search engine were few courses taught in Computer Science advanced programs but with the growth of the Internet, it turned into a multi-billion dollars business with company like Google. Today with Big Data, current database tools and database such as RDBS or SQL will not work anymore because the data is too big and too unstructured. There is a rush to find the next \u201cbig thing\u201d that can handle Big Data. Currently we are at the threshold of another breakthrough event, where anyone who can \u201cmaster it\u201d will thrive and could become the next Bill Gates.<\/p>\n<p>Many governments consider Big Data as the highest-impact technology in the world today and it will have profound effect on everything in this century. Big Data also presents significant opportunities to IT students who master this knowledge and skills in collecting, organizing and analyzing this huge amount of data and turn it into useful information for competitive advantage. (Formula: Big Data \u00a0= Big Knowledge = Big information = Big advantage) Industry study found that at this time, only very few companies have work on Big Data but they already outperformed all of their competitors, who are unprepared, by a significant large advantage.<\/p>\n<p>Students who are interested in Big Data will need certain knowledge and skills in: Java Programming, Information retrieval, Text mining, Large scale system integration; MapReduce (A programming paradigm that enables parallel processing); Apache \u201cHadoop\u201d (An open-source storage and processing framework based on MapReduce, using a distributed file system.);\u00a0 NoSQL(A class of non-relational, non-SQL databases that encompasses document store, key-value store, and graph databases designed for working with huge quantities of data); BigTable (A type of NoSQL database that is highly scalable, sparse, distributed, persistent multidimensional sorted map) Machine Learning (An artificial intelligence area concerned with the development of complex algorithms that take input data from sensors or databases to make predictions);.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Khi c\u00f4ng ngh\u1ec7 th\u00f4ng tin \u0111ang \u0111\u01b0\u1ee3c d\u00f9ng nhi\u1ec1u h\u01a1n trong m\u1ecdi doanh nghi\u1ec7p, kh\u1ed1i l\u01b0\u1ee3ng d\u1eef li\u1ec7u c\u0169ng t\u0103ng l\u00ean nhi\u1ec1u v\u00e0 trong th\u1eddi &hellip; <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,36,26],"tags":[],"class_list":["post-336","post","type-post","status-publish","format-standard","hentry","category-xu-huong-toan-cau","category-social-media-mobility-big-data-analytics-and-cloud-computing","category-xu-huong-cong-nghe"],"_links":{"self":[{"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/posts\/336","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/science-technology.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=336"}],"version-history":[{"count":3,"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/posts\/336\/revisions"}],"predecessor-version":[{"id":338,"href":"https:\/\/science-technology.vn\/index.php?rest_route=\/wp\/v2\/posts\/336\/revisions\/338"}],"wp:attachment":[{"href":"https:\/\/science-technology.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/science-technology.vn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/science-technology.vn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}