Skip to main content

Vectara

Vectara provides a Trusted Generative AI platform, allowing organizations to rapidly create a ChatGPT-like experience (an AI assistant) which is grounded in the data, documents, and knowledge that they have (technically, it is Retrieval-Augmented-Generation-as-a-service).

Vectara Overview: Vectara is RAG-as-a-service, providing all the components of RAG behind an easy-to-use API, including:

  1. A way to extract text from files (PDF, PPT, DOCX, etc)
  2. ML-based chunking that provides state of the art performance.
  3. The Boomerang embeddings model.
  4. Its own internal vector database where text chunks and embedding vectors are stored.
  5. A query service that automatically encodes the query into embedding, and retrieves the most relevant text segments (including support for Hybrid Search and MMR)
  6. An LLM to for creating a generative summary, based on the retrieved documents (context), including citations.

For more information:

Installation and Setup​

To use Vectara with LangChain no special installation steps are required. To get started, sign up for a free Vectara account (if you don't already have one), and follow the quickstart guide to create a corpus and an API key. Once you have these, you can provide them as arguments to the Vectara vectorstore, or you can set them as environment variables.

  • export VECTARA_CUSTOMER_ID="your_customer_id"
  • export VECTARA_CORPUS_ID="your_corpus_id"
  • export VECTARA_API_KEY="your-vectara-api-key"

Vectara as a Vector Store​

There exists a wrapper around the Vectara platform, allowing you to use it as a vectorstore in LangChain:

To import this vectorstore:

from langchain_community.vectorstores import Vectara
API Reference:Vectara

To create an instance of the Vectara vectorstore:

vectara = Vectara(
vectara_customer_id=customer_id,
vectara_corpus_id=corpus_id,
vectara_api_key=api_key
)

The customer_id, corpus_id and api_key are optional, and if they are not supplied will be read from the environment variables VECTARA_CUSTOMER_ID, VECTARA_CORPUS_ID and VECTARA_API_KEY, respectively.

Adding Texts or Files​

After you have the vectorstore, you can add_texts or add_documents as per the standard VectorStore interface, for example:

vectara.add_texts(["to be or not to be", "that is the question"])

Since Vectara supports file-upload in the platform, we also added the ability to upload files (PDF, TXT, HTML, PPT, DOC, etc) directly. When using this method, each file is uploaded directly to the Vectara backend, processed and chunked optimally there, so you don't have to use the LangChain document loader or chunking mechanism.

As an example:

vectara.add_files(["path/to/file1.pdf", "path/to/file2.pdf",...])

Of course you do not have to add any data, and instead just connect to an existing Vectara corpus where data may already be indexed.

Querying the VectorStore​

To query the Vectara vectorstore, you can use the similarity_search method (or similarity_search_with_score), which takes a query string and returns a list of results:

results = vectara.similarity_search_with_score("what is LangChain?")

The results are returned as a list of relevant documents, and a relevance score of each document.

In this case, we used the default retrieval parameters, but you can also specify the following additional arguments in similarity_search or similarity_search_with_score:

  • k: number of results to return (defaults to 5)
  • lambda_val: the lexical matching factor for hybrid search (defaults to 0.025)
  • filter: a filter to apply to the results (default None)
  • n_sentence_context: number of sentences to include before/after the actual matching segment when returning results. This defaults to 2.
  • rerank_config: can be used to specify reranker for thr results
    • reranker: mmr, rerank_multilingual_v1 or none. Note that "rerank_multilingual_v1" is a Scale only feature
    • rerank_k: number of results to use for reranking
    • mmr_diversity_bias: 0 = no diversity, 1 = full diversity. This is the lambda parameter in the MMR formula and is in the range 0...1

To get results without the relevance score, you can simply use the 'similarity_search' method:

results = vectara.similarity_search("what is LangChain?")

Vectara for Retrieval Augmented Generation (RAG)​

Vectara provides a full RAG pipeline, including generative summarization. To use it as a complete RAG solution, you can use the as_rag method. There are a few additional parameters that can be specified in the VectaraQueryConfig object to control retrieval and summarization:

  • k: number of results to return
  • lambda_val: the lexical matching factor for hybrid search
  • summary_config (optional): can be used to request an LLM summary in RAG
    • is_enabled: True or False
    • max_results: number of results to use for summary generation
    • response_lang: language of the response summary, in ISO 639-2 format (e.g. 'en', 'fr', 'de', etc)
  • rerank_config (optional): can be used to specify Vectara Reranker of the results
    • reranker: mmr, rerank_multilingual_v1 or none
    • rerank_k: number of results to use for reranking
    • mmr_diversity_bias: 0 = no diversity, 1 = full diversity. This is the lambda parameter in the MMR formula and is in the range 0...1

For example:

summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)

Then you can use the as_rag method to create a RAG pipeline:

query_str = "what did Biden say?"

rag = vectara.as_rag(config)
rag.invoke(query_str)['answer']

The as_rag method returns a VectaraRAG object, which behaves just like any LangChain Runnable, including the invoke or stream methods.

Vectara Chat​

The RAG functionality can be used to create a chatbot. For example, you can create a simple chatbot that responds to user input:

summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)

query_str = "what did Biden say?"
bot = vectara.as_chat(config)
bot.invoke(query_str)['answer']

The main difference is the following: with as_chat Vectara internally tracks the chat history and conditions each response on the full chat history. There is no need to keep that history locally to LangChain, as Vectara will manage it internally.

Vectara as a LangChain retriever only​

If you want to use Vectara as a retriever only, you can use the as_retriever method, which returns a VectaraRetriever object.

retriever = vectara.as_retriever(config=config)
retriever.invoke(query_str)

Like with as_rag, you provide a VectaraQueryConfig object to control the retrieval parameters. In most cases you would not enable the summary_config, but it is left as an option for backwards compatibility. If no summary is requested, the response will be a list of relevant documents, each with a relevance score. If a summary is requested, the response will be a list of relevant documents as before, plus an additional document that includes the generative summary.

Hallucination Detection score​

Vectara created HHEM - an open source model that can be used to evaluate RAG responses for factual consistency. As part of the Vectara RAG, the "Factual Consistency Score" (or FCS), which is an improved version of the open source HHEM is made available via the API. This is automatically included in the output of the RAG pipeline

summary_config = SummaryConfig(is_enabled=True, max_results=7, response_lang='eng')
rerank_config = RerankConfig(reranker="mmr", rerank_k=50, mmr_diversity_bias=0.2)
config = VectaraQueryConfig(k=10, lambda_val=0.005, rerank_config=rerank_config, summary_config=summary_config)

rag = vectara.as_rag(config)
resp = rag.invoke(query_str)
print(resp['answer'])
print(f"Vectara FCS = {resp['fcs']}")

Example Notebooks​

For a more detailed examples of using Vectara with LangChain, see the following example notebooks:

  • this notebook shows how to use Vectara: with full RAG or just as a retriever.
  • this notebook shows the self-query capability with Vectara.
  • this notebook shows how to build a chatbot with Langchain and Vectara

Was this page helpful?


You can also leave detailed feedback on GitHub.