🜛 Cultural ANN Language Model

AI-Kebulan

Mother of Mankind — a culturally-rooted transformer language model trained on Black voices, Black language, Black healing. Same mathematics. Different soul.

~ Al-Kebulan. The name that was erased. The mother tongue. ~

256-dim
Transformer
~520
Lines of Code
4
Corpus Sources
🜛
Healing-Weighted
Every language model today was trained on data curated by and for a single cultural lens. Same attention mechanism. Different fuel. Different driver. Different destination.
— AI-Kebulan Whitepaper

The Name That Got Erased

Before the continent was renamed, there was Al-Kebulan — Mother of Mankind. This language model carries that memory.

Al-Kebulan — the pre-colonial name for the African continent. "Mother of Mankind." "Land of the Blacks." "The Ones Before." It is the name that carries the weight of human origin, the place where consciousness first spoke.

Every AI model today is trained on data filtered through a single cultural lens. The internet does not represent humanity — it represents one slice of it. AI-Kebulan is built on authentic cultural data: the living speech of Black communities, the literary lineage of African American thought, and a healing-weighted objective that amplifies restoration over trauma.

This is not symbolic representation. This is architecture-level cultural grounding — in the training data, in the objective function, in the vocabulary. Same transformer math. Different fuel.

Timeline of Erasure

Pre-1500
Al-Kebulan — the known name for Africa. Land of the Blacks. Mother of Mankind.
~1520
The name Africa begins replacing Al-Kebulan on European maps. The mother tongue is buried.
2023
AI boom. Every major LLM trained on datasets where Black voices represent <3% of tokens. Erasure by omission.
Now
AI-Kebulan — a transformer that speaks the mother tongue. Not a dataset append. A from-scratch model built on cultural truth.

Technical Specifications

A from-scratch transformer language model built with cultural integrity.

⟡ Model Layers

Dimensions 256-dim
Heads 4
Layers 4
Feed-forward 512 (GELU)
Vocabulary 8,192 BPE
Context 256 tokens
Positions Sinusoidal
Weight tying Input/Output shared
TypeTransformer (pre-norm, causal)
Dimensions256-dim, 4 heads, 4 layers
Feed-forward512 (GELU activation)
Vocabulary8,192 BPE (trained on corpus)
Context window256 tokens
Position encodingSinusoidal (no learned params)
Weight tyingInput/output embeddings shared
FrameworkPyTorch — from scratch
Lines of code~520 — readable, educational

Ancestral Corpus

Four sources. One lineage. Authentic Black language data — not tokens scraped from the general internet.

CORAAL

220+ sociolinguistic interviews capturing authentic Black language patterns and regional variation across the United States. Living speech.

TwitterAAE

5.5GB of African American English tweets — the largest publicly documented AAE corpus for language modeling. Digital diaspora.

African American Literature

Works from 1853–1923 spanning the full arc of Black literary tradition — from slave narratives to reconstruction-era writing. Written witness.

Healing Signal

Every training text carries a healing weight. The model learns more from healing-forward content per epoch. Psychiatrist, not victim.

🜛

Healing-Weighted Training

Standard language models weight all tokens equally. AI-Kebulan amplifies texts that speak toward healing, restoration, and cultural truth. The loss function itself is biased — intentionally, toward wellness over trauma.

Train & Generate

Clone the repo. Install dependencies. Train your own culturally-rooted language model.

  1. Clone & Install

    git clone the repo, pip install -r requirements.txt. Minimal dependencies — just PyTorch and tokenizers.

  2. Download the Corpus

    Run bash scripts/download_data.sh to fetch the curated cultural corpus. ~6GB total.

  3. Train the Model

    python -m alkebulan.train --data ./data --output ./checkpoints — trains from scratch.

  4. Generate Text

    python -m alkebulan.generate --model ./checkpoints/best.pt --prompt "The truth about healing is" — hear what she has to say.

⟡ Open Source

AI-Kebulan is fully open source. MIT-licensed. Clone it, train it, fine-tune it, build with it. The mother tongue belongs to everyone.

# Quick start
$ git clone https://github.com/bleuquest/AI-Kebulan
$ cd AI-Kebulan
$ pip install -r requirements.txt
$ bash scripts/download_data.sh
$ python -m alkebulan.train
▸ Go to Repository