wikipedia-api
Python Wrapper for Wikipedia
Rank: #1509Downloads: 6,503,204 (30 days)Stars: 718Forks: 86
Description
Wikipedia API
=============
``Wikipedia-API`` is easy to use Python wrapper for `Wikipedias'`_ API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.
.. _Wikipedias': https://www.mediawiki.org/wiki/API:Main_page
|github-stars-flat| |cc-coverage| |docs| |version| |pyversions|
Installation
------------
This package requires at least Python 3.9 to install because it's using IntEnum.
.. code-block:: python
pip3 install wikipedia-api
Usage
-----
Goal of ``Wikipedia-API`` is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.
Importing
~~~~~~~~~
.. code-block:: python
import wikipediaapi
How To Get Single Page
~~~~~~~~~~~~~~~~~~~~~~
Getting single page is straightforward. You have to initialize ``Wikipedia`` object and ask for page by its name.
To initialize it, you have to provide:
* `user_agent` to identify your project. Please follow the recommended `format`_.
* `language` to specify language mutation. It has to be one of `supported languages`_.
.. _format: https://meta.wikimedia.org/wiki/User-Agent_policy
.. _supported languages: http://meta.wikimedia.org/wiki/List_of_Wikipedias
.. code-block:: python
import wikipediaapi
wiki_wiki = wikipediaapi.Wikipedia(user_agent='MyProjectName (merlin@example.com)', language='en')
page_py = wiki_wiki.page('Python_(programming_language)')
How To Check If Wiki Page Exists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For checking, whether page exists, you can use function ``exists``.
.. code-block:: python
page_py = wiki_wiki.page('Python_(programming_language)')
print("Page - Exists: %s" % page_py.exists())
# Page - Exists: True
page_missing = wiki_wiki.page('NonExistingPageWithStrangeName')
print("Page - Exists: %s" % page_missing.exists())
# Page - Exists: False
How To Get Page Summary
~~~~~~~~~~~~~~~~~~~~~~~
Class ``WikipediaPage`` has property ``summary``, which returns description of Wiki page.
.. code-block:: python
import wikipediaapi
wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en')
print("Page - Title: %s" % page_py.title)
# Page - Title: Python (programming language)
print("Page - Summary: %s" % page_py.summary[0:60])
# Page - Summary: Python is a widely used high-level programming language for
How To Get Page URL
~~~~~~~~~~~~~~~~~~~
``WikipediaPage`` has two properties with URL of the page. It is ``fullurl`` and ``canonicalurl``.
.. code-block:: python
print(page_py.fullurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)
print(page_py.canonicalurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)
How To Get Full Text
~~~~~~~~~~~~~~~~~~~~
To get full text of Wikipedia page you should use property ``text`` which constructs text of the page
as concatanation of summary and sections with their titles and texts.
.. code-block:: python
wiki_wiki = wikipediaapi.Wikipedia(
user_agent='MyProjectName (merlin@example.com)',
language='en',
extract_format=wikipediaapi.ExtractFormat.WIKI
)
p_wiki = wiki_wiki.page("Test 1")
print(p_wiki.text)
# Summary
# Section 1
# Text of section 1
# Section 1.1
# Text of section 1.1
# ...
wiki_html = wikipediaapi.Wikipedia(
user_agent='MyProjectName (merlin@example.com)',
language='en',
extract_format=wikipediaapi.ExtractFormat.HTML
)
p_html = wiki_html.page("Test 1")
print(p_html.text)
# <p>Summary</p>
# <h2>Section 1</h2>
# <p>Text of section 1</p>
# <h3>Section 1.1</h3>
# <p>Text of section 1.1</p>
# ...
How To Get Page Sections
~~~~~~~~~~~~~~~~~~~~~~~~
To get all top level sections of page, you have to use property ``sections``. It returns list of
``WikipediaPageSection``, so you have to use recursion to get all subsections.
.. code-block:: python
def print_sections(sections, level=0):
for s in sections:
print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
print_sections(s.sections, level + 1)
print_sections(page_py.sections)
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l
How To Get Page Section By Title
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get last section of page with given title, you have to use function ``section_by_title``.
It returns the last ``WikipediaPageSection`` with this title.
.. code-block:: python
section_history = page_py.section_by_title('History')
print("%s - %s" % (section_history.title, section_history.text[0:40]))
# History - Python was conceived in the late 1980s b
How To Get All Page Sections By Title
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get all sections of page with given title, you have to use function ``sections_by_title``.
It returns the all ``WikipediaPageSection`` with this title.
.. code-block:: python
page_1920 = wiki_wiki.page('1920')
sections_january = page_1920.sections_by_title('January')
for s in sections_january:
print("* %s - %s" % (s.title, s.text[0:40]))
# * January - January 1
# Polish–Soviet War in 1920: The
# * January - January 2
# Isaac Asimov, American author
# * January - January 1 – Zygmunt Gorazdowski, Polish
How To Get Page In Other Languages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to get other translations of given page, you should use property ``langlinks``. It is map,
where key is language code and value is ``WikipediaPage``.
.. code-block:: python
def print_langlinks(page):
langlinks = page.langlinks
for k in sorted(langlinks.keys()):
v = langlinks[k]
print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))
print_langlinks(page_py)
# af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
# als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
# an: an - Python: https://an.wikipedia.org/wiki/Python
# ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
# as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8
page_py_cs = page_py.langlinks['cs']
print("Page - Summary: %s" % page_py_cs.summary[0:60])
# Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový sk
How To Get Links To Other Pages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to get all links to other wiki pages from given page, you need to use property ``links``.
It's map, where key is page title and value is ``WikipediaPage``.
.. code-block:: python
def print_links(page):
links = page.links
for title in sorted(links.keys()):
print("%s: %s" % (title, links[title]))
print_links(page_py)
# 3ds Max: 3ds Max (id: ??, ns: 0)
# ?:: ?: (id: ??, ns: 0)
# ABC (programming language): ABC (programming language) (id: ??, ns: 0)
# ALGOL 68: ALGOL 68 (id: ??, ns: 0)
# Abaqus: Abaqus (id: ??, ns: 0)
# ...
How To Get Page Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to get all categories under which page belongs, you should use property ``categories``.
It's map, where key is category title and value is ``WikipediaPage``.
.. code-block:: python
def print_categories(page):
categories = page.categories
for title in sorted(categories.keys()):
print("%s: %s" % (title, categories[title]))
print("Categories")
print_categories(page_py)
# Category:All articles containing potentially dated statements: ...
# Category:All articles with unsourced statements: ...
# Category:Articles containing potentially dated statements from August 2016: ...
# Category:Articles containing potentially dated statements from March 2017: ...
# Category:Articles containing potentially dated statements from September 2017: ...
How To Get All Pages From Category
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To get all pages from given category, you should use property ``categorymembers``. It returns all members of given category.
You have to implement recursion and deduplication by yourself.
.. code-block:: python
def print_categorymembers(categorymembers, level=0, max_level=1):
for c in categorymembers.values():
print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level:
print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level)
cat = wiki_wiki.page("Category:Physics")
print("Category members: Category:Physics")
print_categorymembers(cat.categorymembers)
# Category members: Category:Physics
# * Statistical mechanics (ns: 0)
# * Category:Physical quantities (ns: 14)
# ** Refractive index (ns: 0)
# ** Vapor quality (ns: 0)
# ** Electric susceptibility (ns: 0)
# ** Specific weight (ns: 0)
# ** Category:Viscosity (ns: 14)
# *** Brookfield Engineering (ns: 0)
Use Extra API Parameters
~~~~~~~~~~~~~~~~~~~~~~~~
Official API supports many different parameters. You can see them in the `sandbox`_. Not all these
parameters are supported directly as parameters of the functions. If you want to specify them,
you can pass them as additional parameters in the constructor. For the `info API call`_ you can
specify parameter `converttitles`. If you want to sp