Bug Reproduction: Gitea #37221

"Indexer will not search for all-numeric search terms"

Bug Summary

BUG: Pure-numeric search term "699" returned 0 results. The letter tokenizer discards all digit tokens at both index and query time.

When the Gitea repo indexer (bleve) is enabled, searching for purely numeric strings (e.g. "699", "65001", "192.168.1.1") consistently returns "No matching results found" — even when the indexed files clearly contain those strings. Searching for letter-based words in the same files works correctly, confirming the indexer has run and indexed the content.

Combined queries like "vlan 699" only match the letter part ("vlan") — the numeric part is silently dropped, exactly as described in the issue.

Root Cause: modules/indexer/code/bleve/bleve.gogenerateBleveIndexMapping()
The repoIndexerAnalyzer is configured with "tokenizer": letter.Name (the bleve "letter" tokenizer). This tokenizer produces tokens only from sequences of Unicode letters. Pure digit sequences like "699" produce zero tokens at both index time and query time, so they can never appear in any search result. Since mapping.DefaultAnalyzer = repoIndexerAnalyzer, this affects the Content field for all indexed documents.

Fix direction: Change to a tokenizer that includes digits (e.g. unicode.Name or a whitespace/regexp tokenizer), or add a separate numeric-aware analyzer for the Content field.

Evidence Table

KeyValue
Issuehttps://github.com/go-gitea/gitea/issues/37221
App Versionv1.25.5 (BUG confirmed)
App URLhttp://localhost:44689
Control Query"vlan" → 1 result(s) ✓ (indexer working)
Bug Query"699" → 0 result(s) ✗ (letter tokenizer drops digits)
Bug Exact Mode"699" (exact) → 0 result(s) ✗
Combined Query"vlan 699" → 1 result(s) — matches only "vlan", ignores "699"
No-Results Text Visibletrue
Root Causemodules/indexer/code/bleve/bleve.go: repoIndexerAnalyzer uses letter.Name tokenizer which drops all pure-digit sequences. mapping.DefaultAnalyzer = repoIndexerAnalyzer applies this to Content field.
Assertionexpect(numericResultCount).toBeGreaterThan(0) — FAILED (received 0)

Reproduction Steps

Step 1 Setup: Repo with test file created

The repository "network-config-search-test" was created via API and the file network.cfg was pushed. The file contains "interface vlan 699" on line 6.

Setup: Repo with test file created
01-repo-overview-file-created.png
Step 2 Control search: "vlan" returns results (indexer is working)

Searching for "vlan" (a letter-only word) returns 1 result — network.cfg. This proves the indexer has run and indexed the repository.

Control search:
02-control-search-vlan-result.png
Step 3 Bug search: "699" returns NO results (the bug) — state before assertion

Searching for "699" (a pure-numeric string present in the same file on the same line as "vlan") returns "No matching results found." The letter tokenizer discards all digit tokens at both index and query time.

Bug evidence: BUG (v1.25.5): The repoIndexerAnalyzer uses letter.Name tokenizer which only produces tokens from Unicode letter sequences. The digit string "699" produces zero tokens — it cannot be indexed or queried. Expected: would return network.cfg containing "interface vlan 699".
Bug search:
03-bug-search-numeric-699-before-assertion.png
Step 4 Bug reproduced in "exact" mode too

Searching for "699" with search_mode=exact (same as used by the reporter) also returns "No matching results found." Both exact and words modes are affected.

Bug evidence: BUG: Same failure in exact mode — the tokenizer issue affects all search modes since it operates at the analyzer level, not the query type level.
Bug reproduced in
04-bug-search-numeric-699-exact-mode.png
Step 5 Combined search "vlan 699" matches only "vlan"

Searching for "vlan 699" returns 1 result — but only because "vlan" matched. The "699" part was silently dropped. This is exactly the behavior described in the issue: "if I search vlan 699 (in both modes) I get tons of results matching just vlan".

Bug evidence: BUG: Combined query "vlan 699" returns results as if "699" was not part of the query. Only the letter-based tokens survive the analyzer.
Combined search
05-combined-search-vlan-699.png
Step 6 Pre-assertion state: "699" search — page state at assertion time

Final navigation back to the numeric search page immediately before the assertion fires. Confirms the no-results state is stable and not a transient rendering issue.

Bug evidence: BUG (v1.25.5): "No matching results found." — this is the page state when the assertion expect(numericResultCount).toBeGreaterThan(0) evaluates to FAIL.
Pre-assertion state:
06-pre-assertion-numeric-699-final-state.png

Test File Content (network.cfg)

! Network configuration - automated test fixture
interface GigabitEthernet0/0
 description WAN uplink
 ip address 192.168.1.1 255.255.255.0
!
interface vlan 699
 description Finance VLAN
 ip address 10.20.30.1 255.255.255.0
!
interface vlan 700
 description HR VLAN
!
router bgp 65001
 neighbor 10.0.0.1 remote-as 65002
!
! Config version 20240101