mlpack 3: a fast, flexible machine learning library

In the past several years, the field of machine learning has seen an explosion of interest and excitement, with hundreds or thousands of algorithms developed for different tasks every year. But a primary problem faced by the field is the ability to scale to larger and larger data -- since it is known that training on larger datasets typically produces better results. Therefore, the development of new algorithms for the continued growth of the field depends largely on the existence of good tooling and libraries that enable researchers and practitioners to quickly prototype and develop solutions. Simultaneously, useful libraries must also be efficient and well-implemented. This has motivated our development of mlpack. mlpack is a flexible and fast machine learning library written in C++ that has bindings that allow use from the command-line and from Python, with support for other languages in active development. mlpack has been developed actively for over 10 years, with over 100 contributors from around the world, and is a frequent mentoring organization in the Google Summer of Code program. If used in C++, the library allows flexibility with no speed penalty through policy-based design and template metaprogramming; but bindings are available to other languages, which allow easy use of the fast mlpack codebase. For fast linear algebra, mlpack is built on the Armadillo C++ matrix library, which in turn can use an optimized BLAS implementation such as OpenBLAS or even NVBLAS which would allow mlpack algorithms to be run on the GPU. In order to provide fast code, template metaprogramming is used throughout the library to reduce runtime overhead by performing any possible computations and optimizations at compile time. An automatic benchmarking system is developed and used to test the efficiency of mlpack's algorithms. mlpack contains a number of standard machine learning algorithms, such as logistic regression, random forests, and k-means clustering, and also contains cutting-edge techniques such as a compile-time optimized deep learning and reinforcement learning framework, dual-tree algorithms for nearest neighbor search and other tasks, a generic optimization framework with numerous optimizers, a generic hyper-parameter tuner, and other recently published machine learning algorithms. For a more comprehensive introduction to mlpack, see the website at http://www.mlpack.org/

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods