馃懛 Mutlivectors Improvement Proposals
Current Proposals(non prioritized):
-
add minibatch in retriever.store method -
Add agents interfaces, and use multivector docstring as prompt -
Rework signatures to allow retriever "warm start" from non empty vector store -
add metadata table in vector store to avoid storing multiple signature instances and store metadat that could be usefull for advanced vectors (like idf counts ?) -
Rust extensions to accelerate hash based vectorsImprove min hash perf -
Add "mutate_query" function to change the queryside during eval -
add code to generate a gradio chat frontend from a well initialised VS (either in memory or codegen - preferably both) to allow superfast projects quickstart once ingestion is taken care of -
Add support for images in RetrieverTool (currently smolagents only allows tools with 1 modality, we will need to find a workaround) -
Add compatibility matrix in signature checks -
Add BM25 approximate vectors for efficient full text search -
Rework litellm batching to extend it to all vectors -
Add complex vectors and fourier derived similarity measures (heaveside, RBF Kernels) -
Add "maximize" vectors to emulate sorting -
Add AG-UI compatibility in codegen for better out of the box UIs and richer interactions -
Add knn classifiers and regressors -
Add preprocessors -
Add reformulation vectors to facilitate Hyde usecases / information extraction usecases -
Add weight optimisation for knn variants instead on learning to rank -
Add LSH vectors to generalise Lp distances with p<2 -
Add ProbMinHash vector -
Node in Graph vector types based on graph traversal techniques to generalize hierarchy vectors -
Retreive multivector and multivector -
Add multihash class that operates like a multivectors but on binary arrays where similarity is XNOR pop count / N (potentially usefull to leverage more efficient storage)