QMD second pass — BM25 gap probes

Local index pass using qmd BM25 search over collection graphelogos-quran. Complements fetch/Atlas/entity scripts and Quartz by surfacing where the vault talks about gaps, stubs, review backlog, and tooling.

Preconditions

  • Vault path: Graphe/Quran (13,077 markdown files on disk).
  • qmd collection graphelogos-quran must exist (this script runs qmd collection add if missing).
  • After large edits, refresh the index: qmd update (re-indexes all collections; resolve any EPERM on optional collections like ~/Documents).
  • Optional: qmd embed for vectors, then qmd vsearch / qmd query locally (hybrid needs LLM; often disabled when CI=true).

Probe results

Explicit gap / leverage language

Query: gap2 hit(s)

ScoreWikilinkTitleSnippet
0.91qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -2,4 @@ (1 before, 176 after) noindex: true title: QMD pipeline gap pass (BM25) generated: 2026-03-20 20:57 UTC tags: [quran, qmd, pipeline, research]
0.89research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 249 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…

Fetch coverage / missing surahs

Query: not fetched5 hit(s)

ScoreWikilinkTitleSnippet
0.91research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 249 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…
0.9qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -26,4 @@ (25 before, 152 after) |------:|----------|-------|---------| | 0.79 | research.md | Quran corpus — research & build plan | @@ -29,4 @@ (28 before, 213 after) Gap (highest leverage):
0.9surahs.mdSurahs@@ -8,4 @@ (7 before, 8 after) This folder holds one markdown file per surah (Surah NNN - Name.md), fetched/updated with .dev/scripts/fetch_quran.py. Each verse is a ### Ayah k section so other notes can tr…
0.88surahs.mdSurahs in this vault@@ -7,4 @@ (6 before, 61 after) Arabic–English surah files live under Graphe/Quran/Surahs/ (one .md per surah); see Surahs|Surahs folder note for how that directory relates to Ayah and…
0.88research.mdQuran research@@ -2,4 @@ (1 before, 14 after) noindex: true title: Quran research notes description: Literary overviews, entity pilot, and links to the master RESEARCH plan. tags: [quran, research, index]

Partial corpus / scoped fetch

Query: subset3 hit(s)

ScoreWikilinkTitleSnippet
0.89qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -26,4 @@ (25 before, 152 after) |------:|----------|-------|---------| | 0.79 | research.md | Quran corpus — research & build plan | @@ -29,4 @@ (28 before, 213 after) Gap (highest leverage):
0.86research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 249 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…
0.82qmd-atlas-entity-graph.mdqmd entity + relationship hints@@ -90,4 @@ (89 before, 0 after) uv run .dev/scripts/quran_qmd_entity_extract.py # subset: uv run .dev/scripts/quran_qmd_entity_extract.py —family people,places —max-entities 40 ```

Entity review backlog

Query: review queue5 hit(s)

ScoreWikilinkTitleSnippet
0.95research.mdQuran research@@ -14,4 @@ (13 before, 2 after) - entity-corpus-summary|Entity corpus summary — full extraction counts by confidence/family - [[Graphe/Quran/Research/entity-review-queue|Entity review queue]…
0.95qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -57,4 @@ (56 before, 121 after) Query: review queue5 hit(s) | Score | Wikilink | Title | Snippet |
0.95research.mdQuran corpus — research & build plan@@ -110,4 @@ (109 before, 168 after) 3. Confidence queue — emit summary + review queue: ```bash
0.94entity-review-qmd-evidence.mdEntity review qmd evidence@@ -9,4 @@ (8 before, 274 after) Generated from entity-review-queue|entity-review-queue. Ranking rule: prioritize direct surah/ayah sources; research artifacts are filtered by default.
0.9entity-review-queue.mdQuran entity review queue@@ -2,4 @@ (1 before, 1016 after) noindex: true title: “Quran entity review queue” description: Medium/low-confidence Atlas matches requiring review. tags: [quran, atlas, extraction, review-queue]

Embed integrity (Phase B)

Query: broken embed2 hit(s)

ScoreWikilinkTitleSnippet
0.78research.mdQuran corpus — research & build plan@@ -69,4 @@ (68 before, 209 after) - Regenerate: uv run .dev/scripts/generate_quran_juz_ayah.py after any rename (uses quran_api + /chapters + /juzs). - DoD: No broken `![[Graphe/Quran/Surahs/…#Ayah … |
| 0.77 | qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -69,4 @@ (68 before, 109 after) Query: broken embed1 hit(s) | Score | Wikilink | Title | Snippet |

DoD / checklist

Query: Definition of Done3 hit(s)

ScoreWikilinkTitleSnippet
0.86qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -77,4 @@ (76 before, 101 after) Query: Definition of Done2 hit(s) | Score | Wikilink | Title | Snippet |
0.83research.mdQuran corpus — research & build plan@@ -46,4 @@ (45 before, 232 after) Each stage below lists inputs, outputs, tools, and Definition of Done (observable). ---
0.51surah-017-al-isra.mdSurah 17: Al-Isra@@ -772,4 @@ (771 before, 369 after) [Mention, O Muḥammad], the Day We will call forth every people with their record [of deeds] Then whoever is given his record in his right hand - those will read their records, and…

Entity pipeline & validation

Query: entity extraction12 hit(s)

ScoreWikilinkTitleSnippet
0.94entity-validation-report.mdEntity extraction validation report@@ -2,4 @@ (1 before, 14 after) noindex: true title: “Entity extraction validation report” description: Structural and regression checks for Quran Atlas extraction sidecars. tags: [quran, atlas, validation]
0.94qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -26,4 @@ (25 before, 152 after) |------:|----------|-------|---------| | 0.79 | research.md | Quran corpus — research & build plan | @@ -29,4 @@ (28 before, 213 after) Gap (highest leverage):
0.94entity-corpus-summary.mdQuran entity extraction summary@@ -2,4 @@ (1 before, 53 after) noindex: true title: “Quran entity extraction summary” description: Corpus-wide stats for Atlas candidate extraction. tags: [quran, atlas, extraction, summary]
0.94research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 249 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…
0.94entity-pilot-surah-001.mdEntity extraction pilot — Surah 1@@ -7,4 @@ (6 before, 33 after) # Entity extraction pilot — Surah 1 Source: Surah 001 - Al-Fatihah.md|Surah file · Generator: .dev/scripts/quran_entity_pilot.py
0.93entity-scan-surah-077.mdEntity scan — Surah 77@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 77” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-102.mdEntity scan — Surah 102@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 102” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-103.mdEntity scan — Surah 103@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 103” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-109.mdEntity scan — Surah 109@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 109” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-101.mdEntity scan — Surah 101@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 101” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-107.mdEntity scan — Surah 107@@ -2,4 @@ (1 before, 16 after) noindex: true title: “Entity scan — Surah 107” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-086.mdEntity scan — Surah 86@@ -2,4 @@ (1 before, 19 after) noindex: true title: “Entity scan — Surah 86” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]

Stubs / thin notes (often Atlas)

Query: stub12 hit(s)

ScoreWikilinkTitleSnippet
0.88qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -9,4 @@ (8 before, 169 after) Local index pass using qmd BM25 search over collection graphelogos-quran. Complements fetch/Atlas/entity scripts and Quartz by surfacing where t…
0.87tabuk.mdTabūk@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87iraq.mdIraq@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87nile.mdNile@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87yemen.mdYemen@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87aylah.mdAylah@@ -26,4 @@ (25 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87west.mdMaghrib@@ -25,4 @@ (24 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87jordan.mdJordan River@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87tih.mdal-Tīḥ@@ -25,4 @@ (24 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87sham.mdal-Shām@@ -26,4 @@ (25 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87ararat.mdMount Judi@@ -26,4 @@ (25 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87dead-sea.mdDead Sea@@ -23,4 @@ (22 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also

Blockers

Query: blocker2 hit(s)

ScoreWikilinkTitleSnippet
0.88qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -122,4 @@ (121 before, 56 after) Blockers Query: blocker1 hit(s)
0.79research.mdQuran corpus — research & build plan@@ -10,4 @@ (9 before, 268 after) Hypothesis (cycle): Finishing the 114 surah fetch plus keeping surah-hashes.json|surah-hashes.json + Quartz paths stable removes the largest blockers…

Hash manifest

Query: surah-hashes0 hit(s)

No BM25 matches.

Fetch script references

Query: fetch_quran0 hit(s)

No BM25 matches.

Auto-generated Atlas verse markers

Query: AUTO_ASMA0 hit(s)

No BM25 matches.

People/places seed data

Query: people_places0 hit(s)

No BM25 matches.

Publish / site

Query: Quartz4 hit(s)

ScoreWikilinkTitleSnippet
0.94research.mdQuran corpus — research & build plan@@ -3,4 @@ (2 before, 275 after) description: End-to-end plan to fetch, organize, Atlas entity work, categorize, tag, hash, and index the full Quranic corpus in this vault (wikilinked). tags: [quran, research, pipelin…
0.92qmd-pipeline-gaps.mdQMD second pass — BM25 gap probes@@ -9,4 @@ (8 before, 169 after) Local index pass using qmd BM25 search over collection graphelogos-quran. Complements fetch/Atlas/entity scripts and Quartz by surfacing where t…
0.92research.mdQuran research@@ -9,4 @@ (8 before, 7 after) - RESEARCH|RESEARCH — master plan (fetch → Atlas → Quartz) - Literary structures overview|Literary structures overview — surah-level rhetoric …
0.87surahs.mdSurahs in this vault@@ -64,4 @@ (63 before, 4 after) - Juz literary overview|Juz — literary overview — ajzāʾ as reading grid vs surah-level rhetoric; ḥizb/maqraʾ; Juz ʿAmma - RESEARCH|RESEARCH —…

Entity sidecar schema

Query: schema_version0 hit(s)

No BM25 matches.

How to regenerate

uv run .dev/scripts/quran_qmd_gap_pass.py
# optional: re-index everything first
# uv run .dev/scripts/quran_qmd_gap_pass.py --reindex

Total BM25 rows listed above: 50 (probes may overlap the same note).