Building a Hybrid FTS5 + Embedding Search for Code
Explore our innovative hybrid search combining FTS5 and embeddings, enhancing code indexing for AI coding assistants. Learn why both methods are essential.

How Did We Build a Hybrid FTS5 + Embedding Search for Code? Why Do You Need Both?
Learn more about boost your web development productivity with claude workers
Creating effective AI coding assistants requires a solid grasp of the codebase. This understanding relies heavily on the search capabilities embedded within the tool. At srclight, we realized that a single search method wouldn't meet the needs of our deep code indexing MCP server. This server provides AI agents with a comprehensive understanding of the codebase. Therefore, we developed a hybrid search solution that combines FTS5 for keyword search and embeddings for semantic search.
Why Do You Need Both Keyword and Semantic Search?
When developing AI coding assistants, the search must cater to different user needs:
- Keyword Search: Users who know the exact function name need a quick way to locate it.
- Semantic Search: Users searching for concepts—like "code that handles authentication"—may not know the precise terms.
Most tools focus on either keyword or semantic search, creating gaps in functionality. By integrating both methods, we empower users to find exactly what they need, regardless of their knowledge level.
What Are the Limitations of FTS5 and Embeddings?
FTS5 excels at finding exact matches but struggles with the nuances of code naming conventions. For example:
For a deep dive on bus stop balancing: fast, cheap, and effective solutions, see our full guide
calculateTotalPricecalculate_total_priceCalculateTotalPrice
A single FTS5 index cannot accommodate these variations. Additionally, users often seek concepts rather than keywords. For instance, searching for "code that validates user input" emphasizes understanding over keyword recognition.
Embeddings are effective for meaning-based matches but face challenges such as:
- Exact symbol names (e.g., searching for
handleAuthshould yieldhandleAuth). - Substring matches (e.g., searching for
parseshould findparseJSON). - Short queries that often lack context.
- Various naming conventions.
How Did We Create Our Innovative Hybrid Approach?
To tackle these challenges, we developed three distinct FTS5 indexes, each tailored for specific use cases:
-
Case and Underscore Split: This index splits names based on case changes and underscores, accommodating various naming conventions.
- Example:
calculateTotalPricebecomescalculate,Total,Price. - Example:
handle_user_authbecomeshandle,user,auth.
- Example:
-
Substring Indexing: This index captures every 3-character substring, enabling substring matches even within longer words.
-
Stemming: We implemented a stemming process to normalize words. For example,
running,ran, andrunnerall map torun, enhancing docstring searches.
In addition to FTS5, we utilize semantic vectors for meaning-based matching with two types of embeddings: qwen3-embedding (4096 dimensions) and nomic-embed-text (768 dimensions).
How Do We Combine the Two Search Methods?
We execute each query across all four indexes, rank the results, and merge them using the Reciprocal Rank Fusion (RRF) method:
RRF_score(d) = Σ 1 / (k + rank(d))
where k = 60 (a standard constant).
For example:
-
A result at rank 1 in FTS5 and rank 2 in embeddings:
- FTS5: 1 / (60 + 1) = 0.0164
- Embeddings: 1 / (60 + 2) = 0.0161
- Total: 0.0325
-
A result at rank 10 in embeddings only gets:
- 1 / (60 + 10) = 0.0143
This scoring system allows exact keyword matches to coexist effectively with semantic matches, ensuring users benefit from both approaches.
What Additional Features Does srclight Offer?
Beyond our hybrid search, we developed features to enhance user experience:
- GPU Vector Cache: Embeddings load to VRAM once, allowing for quick queries (~3ms) after an initial load (~300ms).
- Incremental Indexing: This feature ensures only changed symbols are re-indexed, tracked via content hash.
- Git Intelligence: Users can query recent changes, leveraging
git blame, hotspots, and uncommitted work in progress. - Multi-repo Workspaces: We support SQLite
ATTACH+UNIONacross 10+ repositories, boosting flexibility.
How Easy Is It to Install srclight?
Our goal was to create a system that installs with a single command:
pip install srclight
srclight index --embed qwen3-embedding
srclight serve
This means no JVM, no Docker, no Redis, and no cloud. Your code remains on your machine, ensuring privacy and security. We can index 13 repositories with 45,000 symbols in a workspace. For example, Claude Code's tool calls per task dropped from about 20 to 6, as it can now simply ask, "Who calls this?" instead of running multiple greps.
Conclusion: Why Is Hybrid Search Essential?
In summary, building a hybrid FTS5 and embedding search system is crucial for developing effective AI coding assistants. Keyword matches provide precision, while embeddings enhance recall. The RRF fusion technique seamlessly merges these two methods, creating a powerful search tool.
What search challenges are you facing with AI coding assistants? Share your insights in the comments below—your feedback could drive the next evolution in coding tools.
Frequently Asked Questions
Q: What is Artificial Intelligence?
A: Artificial Intelligence is a fundamental concept in modern development. It refers to...
Q: Why should I learn Artificial Intelligence?
A: Learning Artificial Intelligence helps you write better, more maintainable code and stay current with industry best practices.
Q: When should I use Artificial Intelligence?
A: Use Artificial Intelligence when you need to...
Q: How do I get started with Artificial Intelligence?
A: Getting started with Artificial Intelligence is straightforward. First, ensure you have the necessary prerequisites installed, then follow the tutorials above.
Q: What's the difference between Artificial Intelligence and Software Development?
A: While both Artificial Intelligence and Software Development serve similar purposes, they differ in implementation and use cases...
Continue learning: Next, explore new drug target discovered for 'brain on fire' disease
Continue learning: Next, explore new drug target discovered for 'brain on fire' disease
Related Articles

Xbox CEO Asha Sharma: Reviving Xbox's Legacy and Strategy
Discover how Xbox CEO Asha Sharma plans to revitalize the brand and tackle industry challenges with innovative strategies.
Feb 26, 2026

Boost Your Web Development Productivity with Claude Workers
Discover how to integrate Claude Workers into your web development workflow to enhance productivity and streamline tasks.
Feb 26, 2026

Bus Stop Balancing: Fast, Cheap, and Effective Solutions
Explore the fast, cheap, and effective method of bus stop balancing that enhances public transportation efficiency and user satisfaction.
Feb 26, 2026
