Counting the number of distinct elements in a data set is a very common query. It can help give you an idea of how many duplicates you are dealing with. Let's say for example that you have a set of transactions, and you wish to detect if these transactions
. This can help you understand your clients and what type of marketing strategies you need to adopt. 29 April 2015
23 April 2015
Apache Mahout Samsara: The Quick Start
Apache Mahout Samsara: The Quick Start
Last week the newest Apache Mahout 0.10 was released. One of the new features it has is a new math environment called “Samsara”, or Mahout Scala/Spark Bindings.
Samsara is a Linear Algebra library for Mahout. It’s written in Scala, which makes it possible to use operator overloading and it features nice R-like or Matlab-like syntax for basic Linear Algebra operations. For example, matrix multiplication is just X %*% Y
. What is more, these operations can be distributed and run by an executing environment - currently by Apache Spark.
In this article we will see how to quickly set up a basic skeleton project and then we’ll try to do some very simple analysis on a 200 MB dataset.
Subscribe to:
Posts (Atom)