In this highly competitive AI era, automation and data is king. The ability to efficiently automate the process of search and retrieval of information from vast repositories has become crucial. As technology advances so do the methods of information retrieval, leading to the development of various search mechanisms. With generative AI models becoming the center of attraction, applications need solid search and retrieval techniques. Among these, if the old full-text search has the trust factor, vector search on the other hand is emerging as the advanced search technique.

Today, we will explore both full-text and vector search, and see how these can be used in today's digital landscape.

What is full-text search?

Full-text search is a powerful technique for finding specific information within large amounts of text data. Unlike simple keyword searches, which only look for exact matches, full-text search analyzes the entire text of documents and understands the context of your query. This allows it to find relevant results, even if the query doesn't use the exact keywords you searched for.

Here's how it works

Indexing
When you add text data to a system that supports full-text search, the system first creates an index. This index is like a detailed map of the text, listing all the words and phrases it contains and where they appear.
Querying
When you perform a full-text search, you enter a query containing keywords or phrases. The system then searches the index for documents that contain all or some of the query terms.
Ranking
Depending on the specific algorithm used, the system will then rank the results based on their relevance to your query. Factors that can influence the ranking include the frequency and proximity of the query terms within the document, as well as other factors like the document's overall importance or date of publication.

What is vector search?

Vector search is the most pressing need for most of the generative AI applications. It retrieves contextually relevant information by understanding machine and human language, understanding the meaning of what users want in return for his/her query. This approach is in high-demand and receiving high praise from generative AI industry experts and organizations. Vector databases use this approach to retrieve the semantically correct information for the users queries.

For example, users don’t need to know exact words while retrieving the information — even if they know some similar words, vector search can retrieve the near accurate results. This is especially useful wherever information search needs a human touch, like an eCommerce application.

By aligning more closely with the way humans think and communicate, it opens up new possibilities for more natural and efficient interactions between users and AI systems. As this technology continues to evolve, its impact is expected to grow, further cementing its role as a cornerstone of modern information retrieval strategies in the generative AI industry.

Vector search boasts impressive feats:

Semantic understanding
Synonyms, phrases and even implied meanings are no longer a mystery.
Relevance over keywords
Finds information truly relevant to your intent, not just keyword-stuffed pages.
Personalization
Understands your preferences and recommends things you'll actually love.

But like anything else,vector search has its quirks. Training the models and calculating those fancy vectors can be computationally expensive. And while it excels at understanding meaning, sometimes a precise keyword search is all you need.

How vector search works

Here's a simplified explanation of how vector search works:

Data conversion
Each item (like a text document or image) is converted into a vector using models like word embeddings for text or convolutional neural networks for images. These models are designed to capture the semantic or visual essence of the content.
Indexing
The vectors are then indexed in a database — like SingleStore — SingleStore designed for efficient, high-dimensional vector search. This indexing often involves organizing the vectors in a way that similar items are closer in the vector space.
Query processing
When a search query is received, it is also converted into a vector using the same model that was used for the data.
Vector comparison
The search involves comparing the query vector with the vectors in the index. This is usually done using similarity measures like cosine similarity or Euclidean distance. The idea is to find vectors that are closest to the query vector.

Note: SingleStore provides direct support for Dot Product and Euclidean Distance using the vector functions DOT_PRODUCT and EUCLIDEAN_DISTANCE, respectively. Cosine Similarity is supported by combining the DOT_PRODUCT and SQRT functions.

Retrieving results
The items (documents, images, etc.) corresponding to the most similar vectors are retrieved and presented as search results.
Ranking
The results are often ranked based on the degree of similarity, with the most similar items ranked highest.
Full-text search vs.

vector search: Who wins?

While full-text search excels at precision and speed, and vector search unlocks semantic understanding, a hybrid approach emerges as the true champion. Imagine a search that understands your precise keywords like "red shoes" but also finds those comfy crimson sneakers you didn't mention. This combination delivers highly relevant results — even when you don't use perfect phrasing.

Think of it as the best of both worlds: accuracy meets serendipity, ensuring you never miss out on hidden gems just because they weren't spelled out exactly. In essence, hybrid search transcends limitations — pushing the boundaries of information retrieval to deliver an experience that's both precise and pleasantly surprising.

SingleStore supports hybrid search

In the realm of information retrieval, a new force has emerged: hybrid search. SingleStore is leading the way, empowering developers to craft rich AI and analytical applications that harness the combined strengths of vector search and full-text search.

What does that mean for you when building AI applications? You’re no longer forced to choose between robotic precision and nuanced understanding. SingleStore bridges this divide, enabling you to unlock the full potential of search and deliver truly meaningful experiences.

SingleStore revs up information retrieval with indexed vector search. This game changing feature seamlessly blends lightning-fast vector search, precise full-text search and cutting-edge indexing techniques — all powered by Approximate Nearest Neighbor (ANN) search. Get ready to experience 100-1,000x faster search and accuracy when navigating the vast seas of data.

Full-text search with SingleStore

Activate your free SingleStore trial to see how full-text search works — follow along with these steps.

Once you sign up, create a workspace.

Let’s get started with SQL Editor.

Start running the following SQL queries in your SQL Editor.

First, create a database and table that includes a FULLTEXT index on the columns you want to search.

CREATE DATABASE fulltext_search;
USE fulltext_search;
CREATE TABLE articles (
   id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
   title VARCHAR(200),
   body TEXT,
   FULLTEXT (title, body)
);

Next, insert some example data into the table you've created.

INSERT INTO articles (title, body) VALUES
('The Power of Big Data', 'Harnessing big data for insights, innovation, and decision making.'),
('Robotics in Everyday Life', 'The increasing presence and impact of robots in daily activities.'),
('Genetic Engineering: Pros and Cons', 'The ethical and practical considerations of genetic modification.'),
('Nanotechnology: A Small Revolution', 'The potential and challenges of advancements in nanotech.'),
('The Art of Podcasting', 'Exploring the surge in popularity of podcasting as a medium.'),
('The Impact of 5G Technology', 'Understanding how 5G will transform connectivity and communication.');
('Mental Health in the Digital Age', 'Addressing mental health challenges in an increasingly digital world.'),
('The Future of Online Education', 'How online learning platforms are reshaping education.'),
('E-Sports: More Than Just Games', 'The rise of e-sports as a major form of entertainment.'),
('Electric Planes: Taking Off Soon?', 'Examining the feasibility and challenges of electric aircraft.'),
('The Science of Sleep', 'Understanding the importance and mechanics of sleep for health.'),
('AI in Agriculture', 'How artificial intelligence is revolutionizing farming practices.'),
('The Ethics of Surveillance Tech', 'Debating the moral implications of surveillance technologies.');

If you have just inserted data and want to ensure the full-text index is up-to-date before querying, you can execute the OPTIMIZE TABLE command with the FLUSH option.

OPTIMIZE TABLE articles FLUSH;

After inserting the content, you can perform a full-text search using the MATCH AGAINST syntax to retrieve relevant articles based on a search term.

SELECT id, title, body
FROM articles
WHERE MATCH(title, body) AGAINST('search term');

If I add my search term as ‘ethical’ and search for the relevant information/document, I get the following result.

Vector search with SingleStore

We will use our SQL Editor, creating a new database and table with a vector field.

CREATE DATABASE VectorSearchTutorial;

We will switch to the newly created database.

USE VectorSearchTutorial;

Assume you're working with text data where each text entry has been converted to a vector using some text embedding process.

CREATE TABLE vector_data (
    id INT PRIMARY KEY AUTO_INCREMENT,
    text VARCHAR(255),
    vector BLOB
);

Insert some text data along with its corresponding vector representation into the table. You would typically generate these vectors using an external tool or library that produces vector embeddings from text data.

INSERT INTO vector_data (text, vector)
VALUES
('Sample text 1', JSON_ARRAY_PACK('[0.1, 0.2, 0.3, 0.4]')),
('Sample text 2', JSON_ARRAY_PACK('[0.5, 0.6, 0.7, 0.8]')),
('Sample text 3', JSON_ARRAY_PACK('[0.9, 0.1, 0.8, 0.2]'));

Create a query vector representing the text you want to search for. Then use a vector similarity function like DOT_PRODUCT to compute the similarity between the query vector and the vectors in your table.

SET @query_vector = JSON_ARRAY_PACK('[0.15, 0.26, 0.36, 0.46]');

SELECT id, text,
       DOT_PRODUCT(vector, @query_vector) AS similarity
FROM vector_data
ORDER BY similarity DESC
LIMIT 3;

The query result will be as follows

To calculate the Euclidean distance between vectors in SingleStore, you can use the EUCLIDEAN_DISTANCE function, which is designed for this purpose.

SET @query_vector = JSON_ARRAY_PACK('[0.15, 0.26, 0.36, 0.46]');

SELECT id, text,
       EUCLIDEAN_DISTANCE(vector, @query_vector) AS euclidean_distance
FROM vector_data
ORDER BY euclidean_distance ASC
LIMIT 3;

The query result will be as follows

You can store vector data in SingleStore easily.

You can run a query to find the similarity scores

You should see the retrieved similarity data that matches the query and respective scores.

A complete hands-on tutorial of using SingleStore as a vector database and retrieving similar data using cosine similarity can be found in our recent article.

How Vector Databases Work: A Hands-On Tutorial!

Pavan Belagatti ・ Nov 30 '23

#database #ai #datascience #dataengineering

Why Vector Search Is Not Enough for Your GenAI Applications?