python-iso639
ISO 639 language codes, names, and other associated information
Description
python-iso639
python-iso639 is a Python package for ISO 639 language codes, names, and
other associated information.
Current features:
- 🌐 A representation of languages mapped across ISO 639-1, 639-2, and 639-3.
- 🔎 Functionality to "guess" what a language is for a given unknown language code or name.
- 🚀 Optimized for speed in retrieving language information.
Installation
pip install python-iso639
Usage
python-iso639 revolves around a Language class.
Instances of Language have attributes and methods that you will find useful.
Note that while the package name registered on PyPI is python-iso639,
the actual import name during runtime is iso639
(which means you should do import iso639 in your Python code).
Creating Language Instances
Create a Language instance by one of the class methods.
from_part3, with an ISO 639-3 code
>>> import iso639
>>> lang1 = iso639.Language.from_part3('fra')
>>> type(lang1)
<class 'iso639.language.Language'>
>>> lang1
Language(part3='fra', part2b='fre', part2t='fra', part1='fr', scope='I', type='L', name='French', comment=None, other_names=None, macrolanguage=None, retire_reason=None, retire_change_to=None, retire_remedy=None, retire_date=None)
Fast object instantiation for retrieving language information (run on Python 3.13, macOS 15.3.1, Apple M1 Pro)
In [1]: import iso639
In [2]: %timeit iso639.Language.from_part3("fra")
217 ns ± 0.139 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
From Another ISO 639 Code Set or a Reference Name
>>> lang2 = iso639.Language.from_part2b('fre') # ISO 639-2 (bibliographic)
>>> lang3 = iso639.Language.from_part2t('fra') # ISO 639-2 (terminological)
>>> lang4 = iso639.Language.from_part1('fr') # ISO 639-1
>>> lang5 = iso639.Language.from_name('French') # ISO 639-3 reference language name
A LanguageNotFoundError is Raised for Invalid Inputs
>>> iso639.Language.from_part3('Fra') # The user input is case-sensitive!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'Fra' isn't an ISO language code or name
>>>
>>> iso639.Language.from_name("unknown language")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'unknown language' isn't an ISO language code or name
Accessing Attributes
>>> lang1
Language(part3='fra', part2b='fre', part2t='fra', part1='fr', scope='I', type='L', name='French', comment=None, other_names=None, macrolanguage=None, retire_reason=None, retire_change_to=None, retire_remedy=None, retire_date=None)
>>> lang1.part3
'fra'
>>> lang1.name
'French'
Comparison
>>> lang1 == lang2 == lang3 == lang4 == lang5 # All are French
True
>>> lang6 = iso639.Language.from_part3('spa') # Spanish
>>> lang1 == lang6 # French vs. Spanish
False
>>> 'French' == lang1.name == lang2.name == lang3.name == lang4.name == lang5.name
True
>>> lang6.name
'Spanish'
Guess a Language: Classmethod match
You don't know which code set or name your input is from?
Use the match classmethod:
>>> lang1 = iso639.Language.match('fra')
>>> lang2 = iso639.Language.match('fre')
>>> lang3 = iso639.Language.match('fr')
>>> lang4 = iso639.Language.match('French')
>>> lang1 == lang2 == lang3 == lang4
True
By default, the classmethod match is case-sensitive.
To ignore case instead, pass in strict_case=False:
>>> lang5 = iso639.Language.match('FRA', strict_case=False)
>>> lang6 = iso639.Language.match('french', strict_case=False)
>>> lang4 == lang5 == lang6
True
>>> iso639.Language.match("french")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LanguageNotFoundError: 'french' isn't an ISO language code or name
[!NOTE]
Depending on your use case, ignoring case could potentially lead to matching issues, where a language code might match an unintended language name (or vice versa), e.g., conflating "igo" and "Igo", while there exist the ISO 639-3 codeahlfor Igo and the ISO 639-3 codeigofor Isebe.
The classmethod match is particularly useful for consistently
accessing a specific attribute from unknown inputs, e.g., the ISO 639-3 code.
>>> 'fra' == lang1.part3 == lang2.part3 == lang3.part3 == lang4.part3 == lang5.part3 == lang6.part3 == lang7.part3
True
If there's no match, a LanguageNotFoundError is raised,
which you may want to catch:
>>> try:
... lang = iso639.Language.match('not gonna find a match')
... except iso639.LanguageNotFoundError:
... print("no match found!")
...
no match found!
Macrolanguages and Alternative Names
>>> language = iso639.Language.match('yue')
>>> language.name
'Yue Chinese' # also commonly known as Cantonese
>>> language.macrolanguage
'zho' # Chinese
>>> language.other_names
[Name(print='Yue Chinese', inverted='Chinese, Yue')]
>>> for name in language.other_names:
... print(f'{name.print} | {name.inverted}')
...
Yue Chinese | Chinese, Yue
Retired Language Codes:
>>> language = iso639.Language.match('bvs')
>>> language.part3
'bvs'
>>> language.name
'Belgian Sign Language'
>>> language.status
'R' # (R)etired
>>> language.retire_reason
'S' # (S)plit
>>> language.retire_change_to is None
True
>>> language.retire_remedy
'Split into Langue des signes de Belgique Francophone [sfb], and Vlaamse Gebarentaal [vgt]'
>>> language.retire_date
datetime.date(2007, 7, 18)
Into the Weeds
Attributes of a Language Instance
A Language instance has the following attributes:
| Attribute | Data type | Can it be None? | Description |
|---|---|---|---|
part3 | str | ✗ | ISO 639-3 code |
part2b | str | ✓ | ISO 639-2 code (bibliographic) |
part2t | str | ✓ | ISO 639-2 code (terminological) |
part1 | str | ✓ | ISO 639-1 code |
scope | str | ✗ | One of {(I)ndividual, (M)acrolanguage, (S)pecial} |
type | str | ✓ | One of {(A)ncient, (C)onstructed, (E)xtinct, (H)istorical, (L)iving, (S)pecial} [1] |
status | str | ✗ | One of {(A)ctive, (R)etired}, describing the ISO 639-3 code |
name | str | ✗ | Reference language name in ISO 639-3 |
comment | str | ✓ | Comment from ISO 639-3 |
other_names | list[Name] | ✓ | Other print and inverted names [2] |
macrolanguage | str | ✓ | Macrolanguage |
retire_reason | str | ✓ | Retirement reason, one of {(C)hange, (D)uplicate, (N)on-existent, (S)plit, (M)erge} |
retire_change_to | str | ✓ | ISO 639-3 code to which this language can be changed, if retirement reason is one of {(C)hange, (D)uplicate, (M)erge} |
retire_remedy | str | ✓ | Instructions for updating this retired language code |
retire_date | datetime.date | ✓ | The date the retirement became effective |
[1] If the ISO 639-3 code is retired, then the type attribute is None,
because its value is not clearly discernible from the SIL data source.
[2] A Name instance has the attributes print and inverted,
for the print name and inverted name, respectively.
If reference name, print name, and inverted name are all the same, then
that particular (print name, inverted name) pair is excluded from
the other_names attribute.
For example, for Spanish (ISO 639-3: spa), one (print name, inverted name)
pair is (Spanish, Spanish) from the SIL data source, but this pair is
excluded from its list of other_names.
How Language.match Matches the Language
At a high level, Language.match assumes the input is m