BUG: Pure-numeric search term "699" returned 0 results. The letter tokenizer discards all digit tokens at both index and query time.
When the Gitea repo indexer (bleve) is enabled, searching for purely numeric strings
(e.g. "699", "65001", "192.168.1.1") consistently returns
"No matching results found" — even when the indexed files clearly contain those strings.
Searching for letter-based words in the same files works correctly, confirming the indexer
has run and indexed the content.
Combined queries like "vlan 699" only match the letter part ("vlan") — the
numeric part is silently dropped, exactly as described in the issue.
modules/indexer/code/bleve/bleve.go — generateBleveIndexMapping()repoIndexerAnalyzer is configured with "tokenizer": letter.Name
(the bleve "letter" tokenizer). This tokenizer produces tokens only from sequences of
Unicode letters. Pure digit sequences like "699" produce zero tokens
at both index time and query time, so they can never appear in any search result.
Since mapping.DefaultAnalyzer = repoIndexerAnalyzer, this affects the
Content field for all indexed documents.
unicode.Name or a whitespace/regexp tokenizer), or add a separate
numeric-aware analyzer for the Content field.
| Key | Value |
|---|---|
| Issue | https://github.com/go-gitea/gitea/issues/37221 |
| App Version | v1.25.5 (BUG confirmed) |
| App URL | http://localhost:44689 |
| Control Query | "vlan" → 1 result(s) ✓ (indexer working) |
| Bug Query | "699" → 0 result(s) ✗ (letter tokenizer drops digits) |
| Bug Exact Mode | "699" (exact) → 0 result(s) ✗ |
| Combined Query | "vlan 699" → 1 result(s) — matches only "vlan", ignores "699" |
| No-Results Text Visible | true |
| Root Cause | modules/indexer/code/bleve/bleve.go: repoIndexerAnalyzer uses letter.Name tokenizer which drops all pure-digit sequences. mapping.DefaultAnalyzer = repoIndexerAnalyzer applies this to Content field. |
| Assertion | expect(numericResultCount).toBeGreaterThan(0) — FAILED (received 0) |
The repository "network-config-search-test" was created via API and the file network.cfg was pushed. The file contains "interface vlan 699" on line 6.
Searching for "vlan" (a letter-only word) returns 1 result — network.cfg. This proves the indexer has run and indexed the repository.
Searching for "699" (a pure-numeric string present in the same file on the same line as "vlan") returns "No matching results found." The letter tokenizer discards all digit tokens at both index and query time.
Searching for "699" with search_mode=exact (same as used by the reporter) also returns "No matching results found." Both exact and words modes are affected.
Searching for "vlan 699" returns 1 result — but only because "vlan" matched. The "699" part was silently dropped. This is exactly the behavior described in the issue: "if I search vlan 699 (in both modes) I get tons of results matching just vlan".
Final navigation back to the numeric search page immediately before the assertion fires. Confirms the no-results state is stable and not a transient rendering issue.
! Network configuration - automated test fixture interface GigabitEthernet0/0 description WAN uplink ip address 192.168.1.1 255.255.255.0 ! interface vlan 699 description Finance VLAN ip address 10.20.30.1 255.255.255.0 ! interface vlan 700 description HR VLAN ! router bgp 65001 neighbor 10.0.0.1 remote-as 65002 ! ! Config version 20240101