QMD second pass — BM25 gap probes

Local index pass using qmd BM25 search over collection graphelogos-quran. Complements fetch/Atlas/entity scripts and Quartz by surfacing where the vault talks about gaps, stubs, review backlog, and tooling.

Preconditions

  • Vault path: Graphe/Quran (6,702 markdown files on disk).
  • qmd collection graphelogos-quran must exist (this script runs qmd collection add if missing).
  • After large edits, refresh the index: qmd update (re-indexes all collections; resolve any EPERM on optional collections like ~/Documents).
  • Optional: qmd embed for vectors, then qmd vsearch / qmd query locally (hybrid needs LLM; often disabled when CI=true).

Probe results

Explicit gap / leverage language

Query: gap1 hit(s)

ScoreWikilinkTitleSnippet
0.79research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 213 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…

Fetch coverage / missing surahs

Query: not fetched12 hit(s)

ScoreWikilinkTitleSnippet
0.88research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 213 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…
0.87surahs.mdSurahs@@ -8,4 @@ (7 before, 13 after) This folder holds one markdown file per surah (Surah NNN - Name.md), fetched/updated with .dev/scripts/fetch_quran.py. Each verse is a ### Ayah k section so other notes can t…
0.86index.mdQuran@@ -12,4 @@ (11 before, 1 after) - RESEARCH — full pipeline plan (fetch → Atlas → Quartz) - Research notes — literary overviews, Juz rhetoric, entity pilot Surah text files: [Surah…
0.85index.mdAyah notes@@ -2,4 @@ (1 before, 14 after) title: “Ayah index” description: “One note per verse (6,236 files); each embeds a block from the surah source file.” tags: - quran
0.85index.mdJuz (ajzāʾ)@@ -11,4 @@ (10 before, 37 after) Thirty roughly equal parts of the mushaf (boundaries from the Quran.com API); they structure reading and memorization, not thematic “chapters.…
0.85index.mdQuran research@@ -1,4 @@ (0 before, 13 after) --- title: Quran research notes description: Literary overviews, entity pilot, and links to the master RESEARCH plan. tags: [quran, research, index]
0.83atlas.mdQuran Atlas@@ -8,4 @@ (7 before, 45 after) Like Atlas|Torah Atlas, this collection links Divine Names, People, Places, and Books (scriptural book entities such as Tawrat and Injil) so reading…
0.83surahs.mdSurahs in this vault@@ -7,4 @@ (6 before, 62 after) Arabic–English surah files live under Graphe/Quran/Surahs/ (one .md per surah); see Surahs|Surahs folder note for how that directory relates to Ayah and…
0.82surah-111-al-masad.mdSurah 111: Al-Masad@@ -34,4 @@ (33 before, 34 after) His wealth will not avail him or that which he gained.
0.82surah-105-al-fil.mdSurah 105: Al-Fil@@ -25,4 @@ (24 before, 43 after) Have you not considered, [O Muḥammad], how your Lord dealt with the companions of the elephant?1
0.82surah-109-al-kafirun.mdSurah 109: Al-Kafirun@@ -35,4 @@ (34 before, 43 after) I do not worship what you worship.
0.82surah-107-al-ma-un.mdSurah 107: Al-Ma’un@@ -45,4 @@ (44 before, 43 after) And does not encourage the feeding of the poor.

Partial corpus / scoped fetch

Query: subset1 hit(s)

ScoreWikilinkTitleSnippet
0.8research.mdQuran corpus — research & build plan@@ -29,4 @@ (28 before, 213 after) Gap (highest leverage): the 114 surahs are not all present as files yet—only a subset is fetched. Downstream embeds, entity extraction, and published HTML all de…

Entity review backlog

Query: review queue5 hit(s)

ScoreWikilinkTitleSnippet
0.94index.mdQuran research@@ -13,4 @@ (12 before, 1 after) - entity-corpus-summary|Entity corpus summary — full extraction counts by confidence/family - [[Graphe/Quran/Research/entity-review-queue|Entity review queue]…
0.94atlas.mdQuran Atlas@@ -39,4 @@ (38 before, 14 after) uv run .dev/scripts/quran_entity_pipeline.py —all-surahs —write-sidecars —write-reports uv run .dev/scripts/quran_entity_pipeline.py —all-surahs —write-summary —write-review-que…
0.94research.mdQuran corpus — research & build plan@@ -110,4 @@ (109 before, 132 after) 3. Confidence queue — emit summary + review queue: ```bash
0.91schema.mdEntity sidecar schema (surah-NNN.yaml)@@ -38,4 @@ (37 before, 62 after) | confidencestringhigh, medium, or low (balanced gate default: only high auto-applied). | | review_reasonsarray | Optional reasons used to queue review (`alia…
0.82entity-review-queue.mdQuran entity review queue@@ -1,4 @@ (0 before, 1016 after) --- title: “Quran entity review queue” description: Medium/low-confidence Atlas matches requiring review. tags: [quran, atlas, extraction, review-queue]

Embed integrity (Phase B)

Query: broken embed1 hit(s)

ScoreWikilinkTitleSnippet
0.73research.mdQuran corpus — research & build plan@@ -69,4 @@ (68 before, 173 after) - Regenerate: uv run .dev/scripts/generate_quran_juz_ayah.py after any rename (uses quran_api + /chapters + /juzs). - DoD: No broken `![[Graphe/Quran/Surahs/…#Ayah …

DoD / checklist

Query: Definition of Done2 hit(s)

ScoreWikilinkTitleSnippet
0.79research.mdQuran corpus — research & build plan@@ -46,4 @@ (45 before, 196 after) Each stage below lists inputs, outputs, tools, and Definition of Done (observable). ---
0.38surah-017-al-isra.mdSurah 17: Al-Isra@@ -761,4 @@ (760 before, 367 after) [Mention, O Muḥammad], the Day We will call forth every people with their record [of deeds] Then whoever is given his record in his right hand - those will read their records, and…

Entity pipeline & validation

Query: entity extraction12 hit(s)

ScoreWikilinkTitleSnippet
0.93entity-validation-report.mdEntity extraction validation report@@ -1,4 @@ (0 before, 14 after) --- title: “Entity extraction validation report” description: Structural and regression checks for Quran Atlas extraction sidecars. tags: [quran, atlas, validation]
0.93entity-corpus-summary.mdQuran entity extraction summary@@ -1,4 @@ (0 before, 53 after) --- title: “Quran entity extraction summary” description: Corpus-wide stats for Atlas candidate extraction. tags: [quran, atlas, extraction, summary]
0.93entity-scan-surah-077.mdEntity scan — Surah 77@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 77” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-102.mdEntity scan — Surah 102@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 102” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-103.mdEntity scan — Surah 103@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 103” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-109.mdEntity scan — Surah 109@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 109” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-101.mdEntity scan — Surah 101@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 101” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-107.mdEntity scan — Surah 107@@ -1,4 @@ (0 before, 16 after) --- title: “Entity scan — Surah 107” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-pilot-surah-001.mdEntity extraction pilot — Surah 1@@ -6,4 @@ (5 before, 33 after) # Entity extraction pilot — Surah 1 Source: Surah 001 - Al-Fatihah.md|Surah file · Generator: .dev/scripts/quran_entity_pilot.py
0.93entity-scan-surah-086.mdEntity scan — Surah 86@@ -1,4 @@ (0 before, 19 after) --- title: “Entity scan — Surah 86” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-104.mdEntity scan — Surah 104@@ -1,4 @@ (0 before, 19 after) --- title: “Entity scan — Surah 104” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]
0.93entity-scan-surah-111.mdEntity scan — Surah 111@@ -1,4 @@ (0 before, 19 after) --- title: “Entity scan — Surah 111” description: Candidate Atlas entity mentions with confidence tiers. tags: [quran, atlas, extraction, review]

Stubs / thin notes (often Atlas)

Query: stub12 hit(s)

ScoreWikilinkTitleSnippet
0.87tabuk.mdTabūk@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87iraq.mdIraq@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87nile.mdNile@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87yemen.mdYemen@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87aylah.mdAylah@@ -18,4 @@ (17 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87west.mdMaghrib@@ -17,4 @@ (16 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87jordan.mdJordan River@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87tih.mdal-Tīḥ@@ -17,4 @@ (16 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87sham.mdal-Shām@@ -18,4 @@ (17 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87ararat.mdMount Judi@@ -18,4 @@ (17 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87dead-sea.mdDead Sea@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also
0.87red-sea.mdRed Sea@@ -15,4 @@ (14 before, 3 after) Stub from .dev/data/quran/people_places.json. Expand with surah references and links. See also

Blockers

Query: blocker1 hit(s)

ScoreWikilinkTitleSnippet
0.7research.mdQuran corpus — research & build plan@@ -10,4 @@ (9 before, 232 after) Hypothesis (cycle): Finishing the 114 surah fetch plus keeping surah-hashes.json|surah-hashes.json + Quartz paths stable removes the largest blockers…

Hash manifest

Query: surah-hashes0 hit(s)

No BM25 matches.

Fetch script references

Query: fetch_quran0 hit(s)

No BM25 matches.

Auto-generated Atlas verse markers

Query: AUTO_ASMA0 hit(s)

No BM25 matches.

People/places seed data

Query: people_places0 hit(s)

No BM25 matches.

Publish / site

Query: Quartz4 hit(s)

ScoreWikilinkTitleSnippet
0.93research.mdQuran corpus — research & build plan@@ -3,4 @@ (2 before, 239 after) description: End-to-end plan to fetch, organize, Atlas entity work, categorize, tag, hash, and index the full Quranic corpus in this vault (wikilinked). tags: [quran, research, pipelin…
0.93index.mdQuran@@ -2,4 @@ (1 before, 11 after) title: Quran description: Entry point for the Quranic corpus in this vault (Quartz home page). tags: [quran] ---
0.91index.mdQuran research@@ -8,4 @@ (7 before, 6 after) - RESEARCH|RESEARCH — master plan (fetch → Atlas → Quartz) - Literary structures overview|Literary structures overview — surah-level rhetoric …
0.82surahs.mdSurahs in this vault@@ -65,4 @@ (64 before, 4 after) - Juz literary overview|Juz — literary overview — ajzāʾ as reading grid vs surah-level rhetoric; ḥizb/maqraʾ; Juz ʿAmma - RESEARCH|RESEARCH —…

Entity sidecar schema

Query: schema_version0 hit(s)

No BM25 matches.

How to regenerate

uv run .dev/scripts/quran_qmd_gap_pass.py
# optional: re-index everything first
# uv run .dev/scripts/quran_qmd_gap_pass.py --reindex

Total BM25 rows listed above: 51 (probes may overlap the same note).