llmsmith.task.retrieval.vector package

Subpackages

llmsmith.task.retrieval.vector.options package

Submodules

llmsmith.task.retrieval.vector.base module

llmsmith.task.retrieval.vector.base.default_doc_processor(docs: List[str]) → str

Formats the retrieved documents into below format.

`` document-0-content — document-1-content — … — document-n-content ``

llmsmith.task.retrieval.vector.chromadb module

class llmsmith.task.retrieval.vector.chromadb.BaseChromaDBTask(name: str, collection: Collection, embedding_func: Callable[[List[str]], List[List[float | int]]], query_options: ChromaDBQueryOptions = {'n_results': 10}, reranker: Reranker | None = None)

Bases: Task[str, List[str]]

Task for retrieving documents from a collection in ChromaDB.

Parameters:

name (str) – The name of the task.
collection (chromadb.Collection) – The collection to retrieve documents from.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
query_options (llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions, optional) – A dictionary of options to pass to the ChromaDB collection client for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[List[str]]

Executes the task of retrieving documents from the chromadb collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the list of documents from chromadb.
Return type:: llmsmith.task.models.TaskOutput[List[str]]

class llmsmith.task.retrieval.vector.chromadb.ChromaDBRetriever(name: str, collection: ~chromadb.api.models.Collection.Collection, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions = {'n_results': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)

Bases: Task[str, str]

Task for retrieving documents from a collection in ChromaDB.

Parameters:

name (str) – The name of the task.
collection (chromadb.Collection) – The collection to retrieve documents from.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions, optional) – A dictionary of options to pass to the ChromaDB collection client for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[str]

Executes the task of retrieving documents from the chromadb collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from chromadb.
Return type:: llmsmith.task.models.TaskOutput[str]

llmsmith.task.retrieval.vector.pgvector module

class llmsmith.task.retrieval.vector.pgvector.BasePgVectorTask(name: str, db_engine: AsyncEngine, table_name: str, text_colname: str, embedding_colname: str, embedding_func: Callable[[List[str]], List[List[float | int]]], query_options: PgVectorQueryOptions = {'limit': 10}, reranker: Reranker | None = None)

Bases: Task[str, List[str]]

Task for retrieving documents from a table in Postgres DB (with PgVector extension).

Parameters:

name (str) – The name of the task.
db_engine (sqlalchemy.ext.asyncio.AsyncEngine) – Sqlalchemy async engine object
table_name (str) – table where the embeddings are stored
text_colname (str) – name of the column where embeddings are stored
embedding_colname (str) – name of the column where the actual text is stored
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
query_options (llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions, optional) – A dictionary of options to be used for querying PgVector table.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[List[str]]

Executes the task of retrieving documents from the PgVector backed table.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the list of documents from pgvector backed table.
Return type:: llmsmith.task.models.TaskOutput[List[str]]

class llmsmith.task.retrieval.vector.pgvector.PgVectorRetriever(name: str, db_engine: ~sqlalchemy.ext.asyncio.engine.AsyncEngine, table_name: str, text_colname: str, embedding_colname: str, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions = {'limit': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)

Bases: Task[str, str]

Task for retrieving documents from a table in Postgres DB (with PgVector extension).

Parameters:

name (str) – The name of the task.
db_engine (sqlalchemy.ext.asyncio.AsyncEngine) – Sqlalchemy async engine object
table_name (str) – table where the embeddings are stored
text_colname (str) – name of the column where embeddings are stored
embedding_colname (str) – name of the column where the actual text is stored
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions, optional) – A dictionary of options to be used for querying PgVector table.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[str]

Executes the task of retrieving documents from the PgVector backed table.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from sqlalchemy.
Return type:: llmsmith.task.models.TaskOutput[str]

llmsmith.task.retrieval.vector.pinecone module

class llmsmith.task.retrieval.vector.pinecone.BasePineconeTask(name: str, index: Index, embedding_func: Callable[[List[str]], List[List[float | int]]], text_field_name: str, query_options: PineconeQueryOptions = {'top_k': 10}, reranker: Reranker | None = None)

Bases: Task[str, List[str]]

Task for retrieving documents from an index in Pinecone.

Parameters:

name (str) – The name of the task.
index (pinecone.Index) – The index to retrieve documents from.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
text_field_name (str) – name of the field in the metadata to be fetched during the retrieval
query_options (llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions, optional) – A dictionary of options to pass to the Pinecone index for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[List[str]]

Executes the task of retrieving documents from the Pinecone collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from Pinecone.
Return type:: llmsmith.task.models.TaskOutput[List[str]]

class llmsmith.task.retrieval.vector.pinecone.PineconeRetriever(name: str, index: ~pinecone.data.index.Index, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], text_field_name: str, doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions = {'top_k': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)

Bases: Task[str, str]

Task for retrieving documents from an index in Pinecone.

Parameters:

name (str) – The name of the task.
index (pinecone.Index) – The index to retrieve documents from.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
text_field_name (str) – name of the field in the metadata to be fetched during the retrieval
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions, optional) – A dictionary of options to pass to the Pinecone index for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[str]

Executes the task of retrieving documents from the Pinecone collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from Pinecone.
Return type:: llmsmith.task.models.TaskOutput[str]

llmsmith.task.retrieval.vector.qdrant module

class llmsmith.task.retrieval.vector.qdrant.BaseQdrantTask(name: str, client: AsyncQdrantClient, collection_name: str, embedding_func: Callable[[List[str]], List[List[float | int]]], embedded_field_name: str, query_options: QdrantQueryOptions = {'limit': 10, 'with_vectors': False}, reranker: Reranker | None = None)

Bases: Task[str, List[str]]

Task for retrieving documents from a collection in Qdrant.

Parameters:

name (str) – The name of the task.
client (qdrant_client.AsyncQdrantClient) – Qdrant client.
collection_name (str) – Qdrant collection name.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
embedded_field_name (str) – name of the field in the document on which embeddedings are created while uploading data to the Qdrant collection
query_options (llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions, optional) – A dictionary of options to pass to the Qdrant client for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[List[str]]

Executes the task of retrieving documents from the qdrant collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from Qdrant.
Return type:: llmsmith.task.models.TaskOutput[List[str]]

class llmsmith.task.retrieval.vector.qdrant.QdrantRetriever(name: str, client: ~qdrant_client.async_qdrant_client.AsyncQdrantClient, collection_name: str, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], embedded_field_name: str, doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions = {'limit': 10, 'with_vectors': False}, reranker: ~llmsmith.reranker.base.Reranker | None = None)

Bases: Task[str, str]

Task for retrieving documents from a collection in Qdrant.

Parameters:

name (str) – The name of the task.
client (qdrant_client.AsyncQdrantClient) – Qdrant client.
collection_name (str) – Qdrant collection name.
embedding_func (llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding function
embedded_field_name (str) – name of the field in the document on which embeddedings are created while uploading data to the Qdrant collection
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions, optional) – A dictionary of options to pass to the Qdrant client for querying.
reranker (llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.

async execute(task_input: TaskInput[str]) → TaskOutput[str]

Executes the task of retrieving documents from the qdrant collection.

Parameters:: task_input (llmsmith.task.models.TaskInput[str]) – The input for the task.
Returns:: The output of the task, which includes the processed result and the raw output from Qdrant.
Return type:: llmsmith.task.models.TaskOutput[str]

llmsmith.task.retrieval.vector package

Subpackages

Submodules

llmsmith.task.retrieval.vector.base module

llmsmith.task.retrieval.vector.chromadb module

llmsmith.task.retrieval.vector.pgvector module

llmsmith.task.retrieval.vector.pinecone module

llmsmith.task.retrieval.vector.qdrant module

Module contents