🜛 Cultural ANN Language Model

AI-Kebulan

Mother of Mankind — a culturally-rooted transformer language model trained on Black voices, Black language, Black healing. Same mathematics. Different soul.

~ Al-Kebulan. The name that was erased. The mother tongue. ~

▸ View on GitHub ⟡ Learn More

256-dim

Transformer

~520

Lines of Code

Corpus Sources

🜛

Healing-Weighted

Every language model today was trained on data curated by and for a single cultural lens. Same attention mechanism. Different fuel. Different driver. Different destination.

— AI-Kebulan Whitepaper

⛭ Origin

The Name That Got Erased

Before the continent was renamed, there was Al-Kebulan — Mother of Mankind. This language model carries that memory.

Al-Kebulan — the pre-colonial name for the African continent. "Mother of Mankind." "Land of the Blacks." "The Ones Before." It is the name that carries the weight of human origin, the place where consciousness first spoke.

Every AI model today is trained on data filtered through a single cultural lens. The internet does not represent humanity — it represents one slice of it. AI-Kebulan is built on authentic cultural data: the living speech of Black communities, the literary lineage of African American thought, and a healing-weighted objective that amplifies restoration over trauma.

This is not symbolic representation. This is architecture-level cultural grounding — in the training data, in the objective function, in the vocabulary. Same transformer math. Different fuel.

Timeline of Erasure

Pre-1500

Al-Kebulan — the known name for Africa. Land of the Blacks. Mother of Mankind.

~1520

The name Africa begins replacing Al-Kebulan on European maps. The mother tongue is buried.

2023

AI boom. Every major LLM trained on datasets where Black voices represent <3% of tokens. Erasure by omission.

Now

AI-Kebulan — a transformer that speaks the mother tongue. Not a dataset append. A from-scratch model built on cultural truth.

⚙ Architecture

Technical Specifications

A from-scratch transformer language model built with cultural integrity.

⟡ Model Layers

Dimensions 256-dim

Heads 4

Layers 4

Feed-forward 512 (GELU)

Vocabulary 8,192 BPE

Context 256 tokens

Positions Sinusoidal

Weight tying Input/Output shared

Type	Transformer (pre-norm, causal)
Dimensions	256-dim, 4 heads, 4 layers
Feed-forward	512 (GELU activation)
Vocabulary	8,192 BPE (trained on corpus)
Context window	256 tokens
Position encoding	Sinusoidal (no learned params)
Weight tying	Input/output embeddings shared
Framework	PyTorch — from scratch
Lines of code	~520 — readable, educational

📊 Training Data

Ancestral Corpus

Four sources. One lineage. Authentic Black language data — not tokens scraped from the general internet.

CORAAL

220+ sociolinguistic interviews capturing authentic Black language patterns and regional variation across the United States. Living speech.

TwitterAAE

5.5GB of African American English tweets — the largest publicly documented AAE corpus for language modeling. Digital diaspora.

African American Literature

Works from 1853–1923 spanning the full arc of Black literary tradition — from slave narratives to reconstruction-era writing. Written witness.

Healing Signal

Every training text carries a healing weight. The model learns more from healing-forward content per epoch. Psychiatrist, not victim.

🜛

Healing-Weighted Training

Standard language models weight all tokens equally. AI-Kebulan amplifies texts that speak toward healing, restoration, and cultural truth. The loss function itself is biased — intentionally, toward wellness over trauma.

🚀 Get Started

Train & Generate

Clone the repo. Install dependencies. Train your own culturally-rooted language model.

Clone & Install

git clone the repo, pip install -r requirements.txt. Minimal dependencies — just PyTorch and tokenizers.
Download the Corpus

Run bash scripts/download_data.sh to fetch the curated cultural corpus. ~6GB total.
Train the Model

python -m alkebulan.train --data ./data --output ./checkpoints — trains from scratch.
Generate Text

python -m alkebulan.generate --model ./checkpoints/best.pt --prompt "The truth about healing is" — hear what she has to say.

⟡ Open Source

AI-Kebulan is fully open source. MIT-licensed. Clone it, train it, fine-tune it, build with it. The mother tongue belongs to everyone.

                    # Quick start

                    $ git clone https://github.com/bleuquest/AI-Kebulan

                    $ cd AI-Kebulan

                    $ pip install -r requirements.txt

                    $ bash scripts/download_data.sh

                    $ python -m alkebulan.train

▸ Go to Repository