Pinecone Tackles Threat Detection, 'Extreme Classification' with New Vector Database
Pinecone Systems is demonstrating how vector databases in the cloud offer a fast and scalable way to develop critical business capabilities, powered by machine learning, such as IT threat detection and the complex classifications and recommendations of big data.
Pinecone has released templates to help developers and technical teams build and deploy applications that address four common scenarios with the scale and performance advantages of a vector database. And Pinecone is providing a benchmark guide that can be used to assess the vector database’s performance using an organization’s own data sets.
The use cases are an important step forward because vector databases are relatively new in the world of cloud databases. A vector database stores, searches, and retrieves vectors, which are long strings of numbers representing documents, images, and other data types used in machine learning applications. They can be used for recommendations, personalization, image search, and more.
Pinecone’s vector database, which is in beta availability, provides similarity search as a service on AWS and Google Cloud. Pinecone has published templates to help users get started with four scenarios:
IT threat detection. Pinecone shows how to build a network intrusion detector using deep learning and similarity search. By checking the similarity of incoming threats with known attacks, the database is able to detect “rare” events that may represent a potential threat.
Semantic text search. Pinecone outlines how to create a semantic text search capability for online news articles using short, simple queries. To do it, vector representations of the articles are stored in the database index.
Extreme classification. The idea is to label new items automatically when the number of possible labels is “enormous” or extreme, such as matching web content to relevant advertisements. In the example provided, 250,000 labels are converted into vector embeddings.
Video recommendations. The challenge here is to provide movie recommendations based on similar user ratings (on a scale of 1 to 5), but it is complicated by the fact that the ratings are sparse relative to all movies and biased because the user ratings are distributed differently. The solution involves a dataset of movie recommendations, deep learning models for both movies and users, and a deep ranking model to score user/movie pairings for improved relevance of recommendations.
Billions of vectors
In addition to those use cases, Pinecone has introduced a benchmarking guide for testing performance and accuracy against its similarity search using an organization’s own data. The tutorial addresses how to measure indexing runtime, query runtime, and other metrics, for both exact and approximated searches.
And finally, Pinecone is providing early access to an upcoming capability called Managed FAISS. An acronym for Facebook AI Similarity Search, FAISS is a library that developers use to search for embeddings of multimedia documents that are similar. With FAISS as a managed cloud service, Pinecone aims to scale to billions of vectors without the operational complexity of a self-hosted approach.
Bigger, better, faster
The application templates and other new developments are signs that Pinecone’s vector database is maturing, and they are prerequisites for Pinecone’s general availability as a cloud service.
When I talked to Pinecone founder and CEO Edo Liberty recently, he said the company is focused on the “production readiness” of its platform. By the end of this year, he said, the Pinecone vector database will be “much more capable, bigger, better, faster, and easier to use.”
Listen to the podcast: Pinecone Systems CEO Edo Liberty: The Cloud Database Report Podcast
Pinecone, a startup, exited stealth mode in January with $10 million in seed funding from Wing Venture Capital, an early-stage investor in Snowflake. That has prompted comparisons to Snowflake’s cloud database platform model.
Liberty says businesspeople are interested in understanding how machine learning can be applied to meet their own business objectives. “They go to their chief scientist or CTO and say, ‘Why don’t we do that?’”
The new use cases and real-world benchmarks should help elevate those conversations from the arcane technical details of vector databases to business solutions and opportunities.