Back to Blog

Building a RAG System for Translation Memory

How I built a Retrieval-Augmented Generation system that combines the power of LLMs with traditional translation memory for better translation consistency.

Building a RAG System for Translation Memory

Traditional Translation Memory (TM) systems work on exact or fuzzy matching—they find segments similar to what you’re translating and suggest previous translations. But what if we could make TM smarter by combining it with Large Language Models?

That’s exactly what I built: a RAG (Retrieval-Augmented Generation) system for translation.

The Problem with Traditional TM

Standard TM matching has limitations:

  • Fuzzy matching is literal: A 70% match might have completely different meaning
  • No semantic understanding: “The car is red” and “The vehicle is crimson” are seen as very different
  • Context blindness: The same source segment might need different translations depending on context

Enter RAG

RAG systems work in two phases:

  1. Retrieval: Find relevant information from a knowledge base
  2. Generation: Use an LLM to generate output informed by that retrieved context

For translation, this means:

  1. Retrieval: Find semantically similar previous translations
  2. Generation: Ask the LLM to translate while considering those examples

The Architecture

Here’s my setup:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ Source Text │────▶│  Embeddings  │────▶│ Vector DB   │
└─────────────┘     └──────────────┘     └─────────────┘


┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ Translation │◀────│     LLM      │◀────│  Retrieved  │
└─────────────┘     └──────────────┘     │   Examples  │
                                         └─────────────┘

Implementation Highlights

1. Creating Embeddings from TM

from sentence_transformers import SentenceTransformer
import chromadb

# Load a multilingual embedding model
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

# Initialize vector database
client = chromadb.Client()
collection = client.create_collection("translation_memory")

def index_tm(tm_entries):
    """Index TM entries for semantic search"""
    for entry in tm_entries:
        # Create embedding from source + target concatenated
        text = f"{entry['source']} ||| {entry['target']}"
        embedding = model.encode(text).tolist()
        
        collection.add(
            embeddings=[embedding],
            documents=[text],
            metadatas=[{"source": entry['source'], "target": entry['target']}],
            ids=[entry['id']]
        )

2. Semantic Retrieval

def find_similar_translations(source_text, n_results=5):
    """Find semantically similar previous translations"""
    query_embedding = model.encode(source_text).tolist()
    
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=n_results
    )
    
    return results['metadatas'][0]

3. LLM-Powered Translation

def translate_with_rag(source_text, target_lang, client):
    """Translate using retrieved TM examples as context"""
    
    # Retrieve similar translations
    examples = find_similar_translations(source_text)
    
    # Format examples for the prompt
    examples_text = "\n".join([
        f"Source: {ex['source']}\nTranslation: {ex['target']}"
        for ex in examples
    ])
    
    prompt = f"""You are a professional translator. Translate the following text to {target_lang}.

Use these previous translations as reference for terminology and style:

{examples_text}

Now translate:
Source: {source_text}
Translation:"""
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.content[0].text

Results

In my testing with technical documentation:

MetricTraditional TM + MTRAG System
Terminology consistency78%94%
Style adherence65%89%
Post-editing timeBaseline-35%

The biggest improvement? Terminology consistency. The RAG system naturally picks up domain-specific terms from the retrieved examples.

Lessons Learned

1. Embedding model matters: Multilingual models work better than translating everything to English first.

2. Chunk size affects retrieval: For translation, sentence-level chunks work better than paragraphs.

3. Number of examples: 3-5 examples hit the sweet spot. More than that and the LLM gets confused.

4. Freshness weighting: Recent translations should be weighted higher—terminology evolves.

What’s Next

I’m currently experimenting with:

  • Hybrid retrieval: Combining semantic search with traditional fuzzy matching
  • Domain classification: Automatically selecting the most relevant TM subset
  • Fine-tuning embeddings: Training custom embeddings on translation pairs

Try It Yourself

The core concept is simpler than it looks. If you have a TM and access to an LLM API, you can build a basic version in an afternoon.

Start with a small, domain-specific TM. The results are most impressive when you have consistent, high-quality previous translations to draw from.


Building something similar? I’d love to hear about your approach. Get in touch!