Title: Google’s BigLeap in Data Analytics: Introducing Vector Search Functionality to Google BigQuery
Google has recently made a keyboards-changing announcement regarding the integration of vector search functionality into its BigQuery platform. This development signifies a remarkable advancement in data and ai capabilities, allowing users to perform vector similarity searches. These searches are vital for numerous data and ai applications such as semantic search, similarity detection, and retrieval-augmented generation (RAG) using large language models (LLMs).
In its preview mode, BigQuery’s vector search supports approximate nearest-neighbor searches – a critical component for diverse data and ai applications. The VECTOR_SEARCH function, facilitated by an optimized index, enables users to identify closely matching embeddings through efficient lookups and distance computations.
Automatic Index Updates and Optimization
Google’s vector indexes are designed to be automatically updated, ensuring a smooth integration with the latest data. The initial implementation, named IVF (Inverted File for Vectors), merges a clustering model with an inverted row locator, forming a two-piece index that optimizes performance.
Python Integration and Expanding Textual Data Approaches
Google has streamlined Python-based integrations with open-source and third-party frameworks through LangChain. This integration allows developers to easily incorporate vector search capabilities into their existing workflows.
Max Ostapenko, a senior product manager at Opera, was elated about the new feature, expressing his excitement that “Just got positively surprised trying out vector search with embeddings in BigQuery! We are diving into the world of enhancing product insights with Vertex ai now. It expands your approaches to working with textual data.”
To help users harness the power of vector search, Google has provided a comprehensive tutorial. Utilizing the Google Patents public dataset as an example, the tutorial illustrates three distinct use cases: patent search using pre-generated embeddings, patent search with BigQuery embedding generation, and RAG via integration with generative models.
Omid Fatemieh and Michael Kilberry, engineering lead and head of product at Google, respectively, emphasized the advanced capabilities of BigQuery. They highlighted that users could extend their search cases into full RAG journeys by leveraging the output from VECTOR_SEARCH queries as context for invoking Google’s natural language foundation (LLM) models via BigQuery’s ML.GENERATE_TEXT function.
Billing and Pricing
Though the introduction of vector search brings enhanced functionality to BigQuery users, it’s essential to remember that billing for the CREATE VECTOR INDEX statement and VECTOR_SEARCH function is based on BigQuery compute pricing. The indexed column is the only consideration for calculating processed bytes, ensuring transparent and predictable user billing.
Google’s dedication to enhancing BigQuery continues beyond vector search. The cloud provider has announced the availability of Gemini 1.0 Pro for BigQuery customers via Vertex ai, as well as a new BigQuery integration with Vertex ai for text and speech.
By introducing vector search to Google BigQuery, Google continues to push the boundaries of data analytics and ai, providing users with powerful tools to uncover insights and drive innovation.