llmsmith.task.retrieval.vector package
Subpackages
Submodules
llmsmith.task.retrieval.vector.base module
- llmsmith.task.retrieval.vector.base.default_doc_processor(docs: List[str]) str
Formats the retrieved documents into below format.
`` document-0-content — document-1-content — … — document-n-content ``
llmsmith.task.retrieval.vector.chromadb module
- class llmsmith.task.retrieval.vector.chromadb.BaseChromaDBTask(name: str, collection: Collection, embedding_func: Callable[[List[str]], List[List[float | int]]], query_options: ChromaDBQueryOptions = {'n_results': 10}, reranker: Reranker | None = None)
Bases:
Task[str,List[str]]Task for retrieving documents from a collection in ChromaDB.
- Parameters:
name (str) – The name of the task.
collection (
chromadb.Collection) – The collection to retrieve documents from.embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functionquery_options (
llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions, optional) – A dictionary of options to pass to the ChromaDB collection client for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[List[str]]
Executes the task of retrieving documents from the chromadb collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the list of documents from chromadb.
- Return type:
llmsmith.task.models.TaskOutput[List[str]]
- class llmsmith.task.retrieval.vector.chromadb.ChromaDBRetriever(name: str, collection: ~chromadb.api.models.Collection.Collection, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions = {'n_results': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)
Bases:
Task[str,str]Task for retrieving documents from a collection in ChromaDB.
- Parameters:
name (str) – The name of the task.
collection (
chromadb.Collection) – The collection to retrieve documents from.embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functiondoc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (
llmsmith.task.retrieval.vector.options.chromadb.ChromaDBQueryOptions, optional) – A dictionary of options to pass to the ChromaDB collection client for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[str]
Executes the task of retrieving documents from the chromadb collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from chromadb.
- Return type:
llmsmith.task.models.TaskOutput[str]
llmsmith.task.retrieval.vector.pgvector module
- class llmsmith.task.retrieval.vector.pgvector.BasePgVectorTask(name: str, db_engine: AsyncEngine, table_name: str, text_colname: str, embedding_colname: str, embedding_func: Callable[[List[str]], List[List[float | int]]], query_options: PgVectorQueryOptions = {'limit': 10}, reranker: Reranker | None = None)
Bases:
Task[str,List[str]]Task for retrieving documents from a table in Postgres DB (with PgVector extension).
- Parameters:
name (str) – The name of the task.
db_engine (
sqlalchemy.ext.asyncio.AsyncEngine) – Sqlalchemy async engine objecttable_name (str) – table where the embeddings are stored
text_colname (str) – name of the column where embeddings are stored
embedding_colname (str) – name of the column where the actual text is stored
embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functionquery_options (
llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions, optional) – A dictionary of options to be used for querying PgVector table.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[List[str]]
Executes the task of retrieving documents from the PgVector backed table.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the list of documents from pgvector backed table.
- Return type:
llmsmith.task.models.TaskOutput[List[str]]
- class llmsmith.task.retrieval.vector.pgvector.PgVectorRetriever(name: str, db_engine: ~sqlalchemy.ext.asyncio.engine.AsyncEngine, table_name: str, text_colname: str, embedding_colname: str, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions = {'limit': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)
Bases:
Task[str,str]Task for retrieving documents from a table in Postgres DB (with PgVector extension).
- Parameters:
name (str) – The name of the task.
db_engine (
sqlalchemy.ext.asyncio.AsyncEngine) – Sqlalchemy async engine objecttable_name (str) – table where the embeddings are stored
text_colname (str) – name of the column where embeddings are stored
embedding_colname (str) – name of the column where the actual text is stored
embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functiondoc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (
llmsmith.task.retrieval.vector.options.pgvector.PgVectorQueryOptions, optional) – A dictionary of options to be used for querying PgVector table.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[str]
Executes the task of retrieving documents from the PgVector backed table.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from sqlalchemy.
- Return type:
llmsmith.task.models.TaskOutput[str]
llmsmith.task.retrieval.vector.pinecone module
- class llmsmith.task.retrieval.vector.pinecone.BasePineconeTask(name: str, index: Index, embedding_func: Callable[[List[str]], List[List[float | int]]], text_field_name: str, query_options: PineconeQueryOptions = {'top_k': 10}, reranker: Reranker | None = None)
Bases:
Task[str,List[str]]Task for retrieving documents from an index in Pinecone.
- Parameters:
name (str) – The name of the task.
index (
pinecone.Index) – The index to retrieve documents from.embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functiontext_field_name (str) – name of the field in the metadata to be fetched during the retrieval
query_options (
llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions, optional) – A dictionary of options to pass to the Pinecone index for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[List[str]]
Executes the task of retrieving documents from the Pinecone collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from Pinecone.
- Return type:
llmsmith.task.models.TaskOutput[List[str]]
- class llmsmith.task.retrieval.vector.pinecone.PineconeRetriever(name: str, index: ~pinecone.data.index.Index, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], text_field_name: str, doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions = {'top_k': 10}, reranker: ~llmsmith.reranker.base.Reranker | None = None)
Bases:
Task[str,str]Task for retrieving documents from an index in Pinecone.
- Parameters:
name (str) – The name of the task.
index (
pinecone.Index) – The index to retrieve documents from.embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functiontext_field_name (str) – name of the field in the metadata to be fetched during the retrieval
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (
llmsmith.task.retrieval.vector.options.pinecone.PineconeQueryOptions, optional) – A dictionary of options to pass to the Pinecone index for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[str]
Executes the task of retrieving documents from the Pinecone collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from Pinecone.
- Return type:
llmsmith.task.models.TaskOutput[str]
llmsmith.task.retrieval.vector.qdrant module
- class llmsmith.task.retrieval.vector.qdrant.BaseQdrantTask(name: str, client: AsyncQdrantClient, collection_name: str, embedding_func: Callable[[List[str]], List[List[float | int]]], embedded_field_name: str, query_options: QdrantQueryOptions = {'limit': 10, 'with_vectors': False}, reranker: Reranker | None = None)
Bases:
Task[str,List[str]]Task for retrieving documents from a collection in Qdrant.
- Parameters:
name (str) – The name of the task.
client (
qdrant_client.AsyncQdrantClient) – Qdrant client.collection_name (str) – Qdrant collection name.
embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functionembedded_field_name (str) – name of the field in the document on which embeddedings are created while uploading data to the Qdrant collection
query_options (
llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions, optional) – A dictionary of options to pass to the Qdrant client for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[List[str]]
Executes the task of retrieving documents from the qdrant collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from Qdrant.
- Return type:
llmsmith.task.models.TaskOutput[List[str]]
- class llmsmith.task.retrieval.vector.qdrant.QdrantRetriever(name: str, client: ~qdrant_client.async_qdrant_client.AsyncQdrantClient, collection_name: str, embedding_func: ~typing.Callable[[~typing.List[str]], ~typing.List[~typing.List[float | int]]], embedded_field_name: str, doc_processing_func: ~typing.Callable[[~typing.List[str]], str] = <function default_doc_processor>, query_options: ~llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions = {'limit': 10, 'with_vectors': False}, reranker: ~llmsmith.reranker.base.Reranker | None = None)
Bases:
Task[str,str]Task for retrieving documents from a collection in Qdrant.
- Parameters:
name (str) – The name of the task.
client (
qdrant_client.AsyncQdrantClient) – Qdrant client.collection_name (str) – Qdrant collection name.
embedding_func (
llmsmith.task.retrieval.vector.base.EmbeddingFunc) – Embedding functionembedded_field_name (str) – name of the field in the document on which embeddedings are created while uploading data to the Qdrant collection
doc_processing_func (Callable[[List[str]], str], optional) – The function to process the query result, defaults to llmsmith.task.retrieval.vector.base.default_doc_processor.
query_options (
llmsmith.task.retrieval.vector.options.qdrant.QdrantQueryOptions, optional) – A dictionary of options to pass to the Qdrant client for querying.reranker (
llmsmith.reranker.base.Reranker, optional) – Rerank the documents based on the query used to retrieve the documents.
- async execute(task_input: TaskInput[str]) TaskOutput[str]
Executes the task of retrieving documents from the qdrant collection.
- Parameters:
task_input (
llmsmith.task.models.TaskInput[str]) – The input for the task.- Returns:
The output of the task, which includes the processed result and the raw output from Qdrant.
- Return type:
llmsmith.task.models.TaskOutput[str]