Mother of Mankind — a culturally-rooted transformer language model trained on Black voices, Black language, Black healing. Same mathematics. Different soul.
~ Al-Kebulan. The name that was erased. The mother tongue. ~
Every language model today was trained on data curated by and for a single cultural lens. Same attention mechanism. Different fuel. Different driver. Different destination.— AI-Kebulan Whitepaper
Before the continent was renamed, there was Al-Kebulan — Mother of Mankind. This language model carries that memory.
Al-Kebulan — the pre-colonial name for the African continent. "Mother of Mankind." "Land of the Blacks." "The Ones Before." It is the name that carries the weight of human origin, the place where consciousness first spoke.
Every AI model today is trained on data filtered through a single cultural lens. The internet does not represent humanity — it represents one slice of it. AI-Kebulan is built on authentic cultural data: the living speech of Black communities, the literary lineage of African American thought, and a healing-weighted objective that amplifies restoration over trauma.
This is not symbolic representation. This is architecture-level cultural grounding — in the training data, in the objective function, in the vocabulary. Same transformer math. Different fuel.
A from-scratch transformer language model built with cultural integrity.
| Type | Transformer (pre-norm, causal) |
| Dimensions | 256-dim, 4 heads, 4 layers |
| Feed-forward | 512 (GELU activation) |
| Vocabulary | 8,192 BPE (trained on corpus) |
| Context window | 256 tokens |
| Position encoding | Sinusoidal (no learned params) |
| Weight tying | Input/output embeddings shared |
| Framework | PyTorch — from scratch |
| Lines of code | ~520 — readable, educational |
Four sources. One lineage. Authentic Black language data — not tokens scraped from the general internet.
220+ sociolinguistic interviews capturing authentic Black language patterns and regional variation across the United States. Living speech.
5.5GB of African American English tweets — the largest publicly documented AAE corpus for language modeling. Digital diaspora.
Works from 1853–1923 spanning the full arc of Black literary tradition — from slave narratives to reconstruction-era writing. Written witness.
Every training text carries a healing weight. The model learns more from healing-forward content per epoch. Psychiatrist, not victim.
Standard language models weight all tokens equally. AI-Kebulan amplifies texts that speak toward healing, restoration, and cultural truth. The loss function itself is biased — intentionally, toward wellness over trauma.
Clone the repo. Install dependencies. Train your own culturally-rooted language model.
git clone the repo, pip install -r requirements.txt. Minimal dependencies — just PyTorch and tokenizers.
Run bash scripts/download_data.sh to fetch the curated cultural corpus. ~6GB total.
python -m alkebulan.train --data ./data --output ./checkpoints — trains from scratch.
python -m alkebulan.generate --model ./checkpoints/best.pt --prompt "The truth about healing is" — hear what she has to say.