Traditional Translation Memory (TM) systems work on exact or fuzzy matching—they find segments similar to what you’re translating and suggest previous translations. But what if we could make TM smarter by combining it with Large Language Models?
That’s exactly what I built: a RAG (Retrieval-Augmented Generation) system for translation.
The Problem with Traditional TM
Standard TM matching has limitations:
- Fuzzy matching is literal: A 70% match might have completely different meaning
- No semantic understanding: “The car is red” and “The vehicle is crimson” are seen as very different
- Context blindness: The same source segment might need different translations depending on context
Enter RAG
RAG systems work in two phases:
- Retrieval: Find relevant information from a knowledge base
- Generation: Use an LLM to generate output informed by that retrieved context
For translation, this means:
- Retrieval: Find semantically similar previous translations
- Generation: Ask the LLM to translate while considering those examples
The Architecture
Here’s my setup:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Source Text │────▶│ Embeddings │────▶│ Vector DB │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Translation │◀────│ LLM │◀────│ Retrieved │
└─────────────┘ └──────────────┘ │ Examples │
└─────────────┘
Implementation Highlights
1. Creating Embeddings from TM
from sentence_transformers import SentenceTransformer
import chromadb
# Load a multilingual embedding model
model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
# Initialize vector database
client = chromadb.Client()
collection = client.create_collection("translation_memory")
def index_tm(tm_entries):
"""Index TM entries for semantic search"""
for entry in tm_entries:
# Create embedding from source + target concatenated
text = f"{entry['source']} ||| {entry['target']}"
embedding = model.encode(text).tolist()
collection.add(
embeddings=[embedding],
documents=[text],
metadatas=[{"source": entry['source'], "target": entry['target']}],
ids=[entry['id']]
)
2. Semantic Retrieval
def find_similar_translations(source_text, n_results=5):
"""Find semantically similar previous translations"""
query_embedding = model.encode(source_text).tolist()
results = collection.query(
query_embeddings=[query_embedding],
n_results=n_results
)
return results['metadatas'][0]
3. LLM-Powered Translation
def translate_with_rag(source_text, target_lang, client):
"""Translate using retrieved TM examples as context"""
# Retrieve similar translations
examples = find_similar_translations(source_text)
# Format examples for the prompt
examples_text = "\n".join([
f"Source: {ex['source']}\nTranslation: {ex['target']}"
for ex in examples
])
prompt = f"""You are a professional translator. Translate the following text to {target_lang}.
Use these previous translations as reference for terminology and style:
{examples_text}
Now translate:
Source: {source_text}
Translation:"""
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
Results
In my testing with technical documentation:
| Metric | Traditional TM + MT | RAG System |
|---|---|---|
| Terminology consistency | 78% | 94% |
| Style adherence | 65% | 89% |
| Post-editing time | Baseline | -35% |
The biggest improvement? Terminology consistency. The RAG system naturally picks up domain-specific terms from the retrieved examples.
Lessons Learned
1. Embedding model matters: Multilingual models work better than translating everything to English first.
2. Chunk size affects retrieval: For translation, sentence-level chunks work better than paragraphs.
3. Number of examples: 3-5 examples hit the sweet spot. More than that and the LLM gets confused.
4. Freshness weighting: Recent translations should be weighted higher—terminology evolves.
What’s Next
I’m currently experimenting with:
- Hybrid retrieval: Combining semantic search with traditional fuzzy matching
- Domain classification: Automatically selecting the most relevant TM subset
- Fine-tuning embeddings: Training custom embeddings on translation pairs
Try It Yourself
The core concept is simpler than it looks. If you have a TM and access to an LLM API, you can build a basic version in an afternoon.
Start with a small, domain-specific TM. The results are most impressive when you have consistent, high-quality previous translations to draw from.
Building something similar? I’d love to hear about your approach. Get in touch!