GitHub’s search can be frustrating. Great projects often go undiscovered because keyword-based search misses context, ignores synonyms, and ranks results inconsistently. That’s where GitFindr comes in.
GitFindr is an advanced search tool that enhances repository discovery using an optimized BM25 ranking algorithm, enriched with repository insights like ⭐ stars, 🍴 forks, and 👀 clicks to deliver meaningful and relevant results.
GitHub’s default search struggles with:
✅ Limited README & Description Analysis – If a keyword isn’t explicitly mentioned, projects won’t show up.
✅ No Synonym Matching – Searching “image processing” won’t find “computer vision” projects.
✅ Inconsistent Ranking – Older, highly forked repositories dominate results, even when newer ones are more relevant.
🔹 GitFindr fixes this by introducing README scanning, synonym matching, and a refined BM25 ranking system. Now, even loosely related queries surface the right repositories.
GitFindr is built for speed and precision, leveraging Golang for backend performance and Python (FastAPI) for NLP-powered keyword extraction.
1️⃣ Processing Repository Data – When a repository link is provided, our FastAPI backend extracts insights.
2️⃣ Extracting Meaningful Keywords – Using spaCy and regex, we analyze README files, descriptions, and codebases for relevant terms.
🔹 Building Inverted Indexes – The Go backend maintains two indexes:
🔄 Synonym-Based Index – Expands search results by linking related terms.
✅ Exact-Term Index – Ensures highly precise matches.
🔹 Handling Synonyms
A synonym API maps words with related meanings.
Caching with Redis prevents redundant API calls.
Initial tests led to overly broad results, prompting refinements.
🔹 Refining Search Accuracy
Exact-Term Matching prioritizes precision.
Hybrid BM25 Ranking assigns higher weight to exact matches while still considering synonyms.
This ensures GitFindr delivers the best balance between recall and precision, making GitHub search smarter.
🔥 Say goodbye to buried repositories!