Swahili Dictionary

@_esilas , do you know of any existing Swahili dictionaries with parts of speech (noun, verb, etc) that we can convert to an NLP++ .dict file?

Yes, their is a Swahili dictionary know as Kamusi and it comes as a hard copy. The dictionary contains all possible Swahili words with a detailed explanation about them.

That is a hard copy. But to transfer that to digital is what we need to do.

Here are some comprehensive Swahili word lists available on GitHub. Here are a few options:

  • All Swahili Words Dictionary: A repository containing a text file with a collection of Swahili words.
  • Swahili Wordlist: A word list that includes Swahili words among other language resources.
  • Kamusi Project: A repository offering JSON and CSV data for a Swahili dictionary with over 16,600 words, including meanings, synonyms, and conjugations.

Do any of these have parts of speech?

1 Like

Part of speech according to the Swahili dictionary “Kamusi”

English Noun
Noun (cat) Nomino (paka)
Pronouns (you) Kiwakilishi cha nafsi (wewe)
Verb (eat) Kielezi (kula)
Adjective (big) Kivumishi (kubwa)
Adverb (quickly) Vielezi (haraka)
Preposition (from) Kihusishi (kutoka)
Conjunction (but) Kunganishi (ila)
Interjection (wow!) Kingizi (lo!)

In reference to the GitHub repository, Kalebu/kamusi, I was able to identify the above 8 parts of speech, but some words still do not have an extensive definition. For instance, lo! (“tamko la kuonyesha mshangao, furaha au hofu”, kefle!) is a representation of an interjection but not highlighted to which part of speech it belongs to.

In reference to the GitHub repository, odolezal/wordlists, the Swahili words listed lack explanation as to what they mean and which part of speech a word belongs to.

I’m thinking we could use a LLM to query the part of speech using English for each word in the list.

I think that’s a brilliant proposal, and I would like to know when we could start working on it.

Thank you!

Let me know if you need any help or have questions…