260 likes | 500 Vues
MapReduce on Matlab By: Erum Afzal. MapReduce. MapReduce is a programming model devised at Google to facilitate the processing of large data sets. For example, it is used at Google for indexing websites. Matlab. Matlab, being software tenders with a technical computing environment.
E N D
MapReduce MapReduce is a programming model devised at Google to facilitate the processing of large data sets. For example, it is used at Google for indexing websites
Matlab • Matlab, being software tenders with a technical computing environment. • It is being used for numerical manipulation, simulations and data processing.
MapReduce on Matlab • MapReduce on Matlab allows Matlab users to apply MapReduce’s framework to their own data processing requirements. Like all data mining tasks, dense detailed digital images. Similarly if we could import matlab file to Map Reduce framework several functionalities of Matlab can processed on Hadoop as well as.
Working of MapReduce • As with the application of MapReduce, data can be processed using multiple processors in parallel. With this it can • Handle large volumes of input data. • Speed up processing due to parallelization of tasks
Continue… Map: Each piece of input data, identified by a key and a value, is mapped to 1 or more intermediate key/value pairs. Reduce Each worker processes a part of the intermediate key/values pairs, to generate the final key/value pairs.
Working of Matlab The Matlab Parallel Computing Toolbox offers the framework to write programs for a cluster of computers. This enables a master computer to dispatch jobs to workers running on McGill’s cluster. At each worker, the input key pairs are fed into the map function to get intermediate key/value pairs At each worker, the intermediate key/value pairs are fed into the reduce function to get final key/value pairs the output Master creates MapReduce job, passes user defined Map and Reduce functions to workers
Orthogonal Matching Pursuit Here in example A sparse signal is that x, can be stored by multiplying it with a measurement matrix, A: • Where, y = Ax • y can be used to recover x by • using OMP,
Application with Mapreduce OMP becomes slow in its tradition solution as A grows larger in size. If we resolve the problem by processing individual performed using MapReduce.
Continue…. • OMP becomes slow as A grows larger in size. This problem can be solved by processing individual slices of A in parallel. • The MapReduce method actually.
Results • MapReduce was implemented on Matlab, and was used to run Orthogonal Matching Pursuit.. • MapReduce on Matlab has the potential to improve the performance of numerous parallel processing algorithms by bringing the power ofthe MapReduce programming model to Matlab
Singular Value Decomposition (SVD) The Singular Value Decomposition (SVD) is a powerful matrix decomposition frequently used for dimensionality reduction. SVD is widely used in problems involving least squares problems, linear systems and finding a low rank representation of a matrix. A wide range of applications uses SVD as its main algorithmic tool.
Problem • Finding patterns in large scale graphs, with millions and billions of edges is increasing in computer network security intrusion detection, spamming, in web applications. • Such a setting is the estimation of the clustering coefficients and the transitivity ratio of the graph, which effectively translates in computing the number of triangles that each node participates in or the total number of triangles in the graph respectively. • The triangles are a frequently used network statistic in the exponential random graph model and naturally appear in models of real-world network evolution, the triangles have been used in several applications such as spam detection ,uncovering the hidden thematic structure of the web and for link recommendation in online social networks . • It is worth noting that in social networks triangles have a natural interpretation. AS “friends of friends are frequently friends themselves.”
MATLAB implementation, k-rank approx function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is the required rank approximation} n = size(A,1); 0 = zeros(n,1); {Preallocate space for 0} opts.isreal=1; opts.issym=1; {Specify that the matrix is real and symmetric} [u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and eigenvectors of A} l = diag(l)’; for j=1:n do 0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2 end for
Continue…. • In this work the EIGENTRIANGLE and EIGENTRIANGLELOCAL algorithms have been proposed to estimate the total number of triangles and the number of triangles per node respectively in an undirected, outweighed graph. The special spectral properties which real-world networks frequently possess make both algorithms efficient for the triangle counting problem. our knowledge, the knowledge
Fast Randomized Tensor Decompositions • There are many real-world problems involve multiple aspect data. For example fMRI (functional magnetic resonance imaging) scans, one of the most popular neuroimaging techniques, result in multi-aspect data: voxels × subjects × trials ×task conditions × timeticks. Monitoring systems result in three-way data, machine id × type of measurement × timeticks. The machine depending on the setting can be for instance a sensor (sensor networks) or a computer (computer networks). Large data volumes generated by personalized web search, are frequently modeled as three way tensors, i.e., users × queries × web pages. • All above is quite time taking task….
Problem • Ignoring the multi-aspect nature of the data by flattening them in a two-way matrix and applying an exploratory analysis algorithm, e.g., singular value decomposition (SVD) is not optimal and typically hurts significantly the performance • The same problem holds in the case of applying e.g., SVD on different 2-way slices of the tensor as observed by [94]. On the contrary, multiway data analysis techniques succeed in capturing the multilinear structures in the data, thus achieving better performance than the aforementioned ideas.
Problem Solution Tensor decompositions have found as solution in many applications in different scientific disciplines. Specially in computer vision and signal processing like neuroscience, time series anomaly detection, psychometrics, graph analysis and data mining.
Continue…. • Tensor decompositions are useful in many real world problems. A simple randomized algorithm MACH is purposed which is easily parallelizable and adapted to online streaming systems. • This algorithm will be incorporated in the PEGASUS library, a graph and tensor mining system for handling large amounts of data.
More Applications • Comparing the Performance of Clusters, Hadoop, and Active Disks on Microarray Correlation Computations. • Beyond Online Aggregation: Parallel and Incremental Data Mining with Online Map-Reduce (DRAFT). • Map-Reduce for Machine Learning on Multicore.
Refrences • Charalampos E. Tsourakaki “Data Mining with MAPREDUCE:Graph and Tensor Algorithmswith Applications”, March 2010. • Arjita Madan, “ MapReduce on Matlab”