Can fine tuning embedding models improve your RAG application? Yes! And it doesn’t even have to be that complicated. In this video we show how to train a query only linear adapter on your own RAG data to improve your document retrieval accuracy- a lightweight approach that can be applied to any embedding model without needing to fully fine tune the model itself, OR re-embed your knowledgebase.
Resources:
GitHub Repo - github.com/ALucek/linear-adapter-embedding
Trained Adapters - huggingface.co/AdamLucek/all-MiniLM-L6-v2-query-on…
Dataset - huggingface.co/datasets/AdamLucek/apple-environmen…
ChromaDB Research - research.trychroma.com/embedding-adapters
Efficient Domain Adaptation of Sentence Embeddings Using Adapters - arxiv.org/pdf/2307.03104
Improving Text Embeddings with Large Language Models - arxiv.org/pdf/2401.00368
Chapters:
00:00 - Introduction
00:39 - What is an Embedding Adapter?
03:04 - Defining our RAG Application
04:30 - Creating a Synthetic Dataset
09:03 - Setting Up Vector Database
11:23 - Evaluating our Model Baseline
14:16 - Training: Context
14:40 - Training: Triplet Margin Loss
16:01 - Training: Random Negative Sampling
17:01 - Training: Linear Layer Explanation
18:59 - Training: Triplet Data Loader
19:44 - Training: Training Script
20:17 - Training: Execution & Hyperparameters
21:22 - Assessment: New Embedding Function
22:04 - Assessment: Evaluating the Adapter
22:40 - Assessment: Metric Interpretation
23:28 - Assessment: Visualization
24:09 - Assessment: Training Data Fitting
25:35 - Closing Thoughts
#ai #datascience #programming