Entity extraction pilot — Surah 1

Source: Surah file · Generator: .dev/scripts/quran_entity_pilot.py

Method

  • Load aliases, aliases_ar, arabic, title, transliteration from Atlas.
  • English: fold accents, lower case, Unicode word boundaries, longest alias per entity.
  • Arabic: NFKC/NFC, strip combining marks, tatweel, alef-wasla→alef; longest Arabic term per entity.
  • Sidecar: SCHEMA (schema_version: 2) + triples.

Per-ayah sidecar

Machine-readable: surah-001.yaml (--write-sidecar).

Hits by ayah

Ayah 1

  • Allāh — EN: allah — AR: الله
  • Ar-Raḥīm — EN: the especially merciful — AR: الرحيم
  • Ar-Raḥmān — EN: the entirely merciful — AR: الرحمن

Ayah 2

  • Allāh — EN: allah
  • Rabb — EN: lord of the worlds — AR: رب

Ayah 3

  • Ar-Raḥīm — EN: the especially merciful — AR: الرحيم
  • Ar-Raḥmān — EN: the entirely merciful — AR: الرحمن

Review before inserting links into surah body.