The nearest neighbor method is a very popular method in machine learning. Its principle is very easy to understand: adjacent data points are similar data points and are more likely to belong to the same category. However, applying the nearest neighbor method quickly in high-dimensional space is a very challenging task.
Spotify, the world's largest streaming music service provider, needs to recommend music to a large number of users above, and it uses the nearest neighbor method. That is, the nearest neighbor method is applied to high-dimensional space and large data sets.
Due to the high dimensions and large data size, it is not feasible to directly apply the nearest neighbor method. Therefore, the best practice is to use the approximation method to search for nearest neighbors. There are many open source libraries in this area, such as the Spotify open source Annoy library. Erik Bernhardsson, the author of the Annoy library, discovered during the development of Annoy that although there are hundreds of papers using nearest approximation methods to search for nearest neighbors, few practical comparisons can be found. Therefore, Erik developed ANN-benchmarks to evaluate the approximate nearest neighbor (ANN) algorithm.
Realization of assessment
Annoy Spotify's own C++ library (with Python bindings). The most prominent feature of Annoy is support for the use of static index files, which means that different processes can share indexes.
FLANN The C++ library produced by the University of British Columbia, Canada, provides C, MATLAB, Python, and Ruby bindings.
Scikit-learn The well-known Python machine learning library scikit-learn provides LSHForest, KDTree, and BallTree implementations.
PANNS pure Python implementation. Already "retired", the author suggests using MRPT.
NearPy pure Python implementation. Based on Locality-sensitive hashing (LSH, a dimension reduction method).
KGraph C++ library, provides Python bindings. Graph-based algorithm.
The NMSLIB (Non-Metric Space Library) C++ library provides Python bindings and supports language queries via Java or any other support for the Apache Thrift protocol. Provides SWGraph, HNSW, BallTree, MPLSH implementation.
Hnswlib (part of the NMSLIB project) Compared to the current NMSLIB release, hnswlib uses less memory.
RPForest pure Python implementation. The main feature is that there is no need to store all indexed vectors in the model.
The C++ library from FAISS Facebook offers optional GPU support (based on CUDA) and Python bindings. Includes algorithms that support searching for arbitrarily-sized vectors (even vectors that may not fit in RAM).
DolphinnPy pure Python implementation. Hyper-plane based local sensitive hash algorithm.
Datasketch pure Python implementation. Based on MinHash local sensitive hash algorithm.
PyNNDescent pure Python implementation. Based on k-neighbor graph construction (k-neighbor-graph construction).
The MRPT C++ library provides Python bindings. Based on sparse random projection and voting.
NGT: C++ library that provides Python, Go bindings. Provides a PANNG implementation.
data set
ANN-benchmarks provides some pre-processed data sets.
result
Erik provided the results of running the tests on an AWS EC2 machine (c5.4xlarge) - it took a few days to run and it cost about $100.
Glove-100-angular
Sift-128-euclidean
Fashion-mnist-784-euclidean
Gist-960-euclidean
Nytimes-256-angular
Glove-25-angular
From the above evaluation, we can see (the higher the score is, the better the score is, the higher the score is). On almost all data sets, the top five rankings are:
HNSW (low memory footprint version of NMSLIB) is 10 times faster than Annoy.
KGraph is second, and the gap with HNSW is not that great. Like HNSW, KGraph is also a graph-based algorithm.
SW-graph, another graph-based algorithm derived from NWSLIB.
FAISS-IVF, originated from Facebook's FAISS.
Annoy
In the "Evaluation Implementation" section, we see that there are many libraries that use localized sensitive hashing (LSH). The performance of these libraries is not very good. In a previous evaluation, FALCONN performed very well (the only one that performed well with local sensitive hashing). However, in this review, FALCONN looks very backward - the reason is not clear.
From this evaluation point of view, the graph-based algorithm is currently the most advanced algorithm (the top three algorithms are all based on the map), especially HNSW outstanding performance.
ZGAR AZ Ice Box Vape
ZGAR electronic cigarette uses high-tech R&D, food grade disposable pod device and high-quality raw material. All package designs are Original IP. Our designer team is from Hong Kong. We have very high requirements for product quality, flavors taste and packaging design. The E-liquid is imported, materials are food grade, and assembly plant is medical-grade dust-free workshops.
Our products include disposable e-cigarettes, rechargeable e-cigarettes, rechargreable disposable vape pen, and various of flavors of cigarette cartridges. From 600puffs to 5000puffs, ZGAR bar Disposable offer high-tech R&D, E-cigarette improves battery capacity, We offer various of flavors and support customization. And printing designs can be customized. We have our own professional team and competitive quotations for any OEM or ODM works.
We supply OEM rechargeable disposable vape pen,OEM disposable electronic cigarette,ODM disposable vape pen,ODM disposable electronic cigarette,OEM/ODM vape pen e-cigarette,OEM/ODM atomizer device.
Disposable E-cigarette, ODM disposable electronic cigarette, ZGAR AZ ice box vape , Device E-cig, OEM disposable electronic cigarette
ZGAR INTERNATIONAL(HK)CO., LIMITED , https://www.zgarvapepen.com