University of Ottawa - Carleton University
Ottawa-Carleton Institute for Computer Science (OCICS) Presentation
|
March 1, 2013 @ 10:00a.m. Distributed computing Of âbig Dataâ using DFS and Map\reduce on large commodity servers
|
Speaker: Ranish Barket Ali Location: CBY A707 (Colonel By building) |
ABSTRACT A prominent parallel data processing tool MapReduce is gaining significant momentum from both industry and academia as the volume of data to analyze grows rapidly. This 'Big Data' is creating a lot of problems for big companies in different areas. Big data is difficult to work with using relational databases and desktop statistics and visualization packages, requiring instead "massively parallel software running on tens, hundreds, or even thousands of servers". What is considered 'big data' varies depending on the capabilities of the organization managing the set, and on the capabilities of the applications that are traditionally used to process and analyze the data set in its domain.
In this presentation I'll discuss a technique to overcome this problem using 'Distributed File System' and 'MapReduce' approach. MapReduce is a technique for dividing work across a distributed system. This takes advantage of the parallel processing power of distributed systems, and also reduces network bandwidth as the algorithm is passed around to where the data lives, rather than a potentially huge dataset transferred to a client algorithm. Developers can use MapReduce for things like filtering documents by tags, counting words in documents, and extracting links to related data.
|
| Return to Schedule |
|