Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

wikipedia-api

Python Wrapper for Wikipedia

Rank: #1509Downloads: 6,503,204 (30 days)Stars: 718Forks: 86

Description

Wikipedia API
=============

``Wikipedia-API`` is easy to use Python wrapper for `Wikipedias'`_ API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.

.. _Wikipedias': https://www.mediawiki.org/wiki/API:Main_page

|github-stars-flat| |cc-coverage| |docs| |version| |pyversions|

Installation
------------

This package requires at least Python 3.9 to install because it's using IntEnum.

.. code-block:: python

    pip3 install wikipedia-api


Usage
-----

Goal of ``Wikipedia-API`` is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.

Importing
~~~~~~~~~

.. code-block:: python

    import wikipediaapi

How To Get Single Page
~~~~~~~~~~~~~~~~~~~~~~

Getting single page is straightforward. You have to initialize ``Wikipedia`` object and ask for page by its name.
To initialize it, you have to provide:

* `user_agent` to identify your project. Please follow the recommended `format`_.
* `language` to specify language mutation. It has to be one of `supported languages`_.

.. _format: https://meta.wikimedia.org/wiki/User-Agent_policy
.. _supported languages: http://meta.wikimedia.org/wiki/List_of_Wikipedias

.. code-block:: python

    import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia(user_agent='MyProjectName (merlin@example.com)', language='en')

    page_py = wiki_wiki.page('Python_(programming_language)')


How To Check If Wiki Page Exists
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For checking, whether page exists, you can use function ``exists``.

.. code-block:: python

    page_py = wiki_wiki.page('Python_(programming_language)')
    print("Page - Exists: %s" % page_py.exists())
    # Page - Exists: True

    page_missing = wiki_wiki.page('NonExistingPageWithStrangeName')
    print("Page - Exists: %s" %     page_missing.exists())
    # Page - Exists: False

How To Get Page Summary
~~~~~~~~~~~~~~~~~~~~~~~

Class ``WikipediaPage`` has property ``summary``, which returns description of Wiki page.

.. code-block:: python


    import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en')

    print("Page - Title: %s" % page_py.title)
    # Page - Title: Python (programming language)

    print("Page - Summary: %s" % page_py.summary[0:60])
    # Page - Summary: Python is a widely used high-level programming language for


How To Get Page URL
~~~~~~~~~~~~~~~~~~~

``WikipediaPage`` has two properties with URL of the page. It is ``fullurl`` and ``canonicalurl``.

.. code-block:: python

    print(page_py.fullurl)
    # https://en.wikipedia.org/wiki/Python_(programming_language)

    print(page_py.canonicalurl)
    # https://en.wikipedia.org/wiki/Python_(programming_language)

How To Get Full Text
~~~~~~~~~~~~~~~~~~~~

To get full text of Wikipedia page you should use property ``text`` which constructs text of the page
as concatanation of summary and sections with their titles and texts.

.. code-block:: python

    wiki_wiki = wikipediaapi.Wikipedia(
        user_agent='MyProjectName (merlin@example.com)',
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI
    )

    p_wiki = wiki_wiki.page("Test 1")
    print(p_wiki.text)
    # Summary
    # Section 1
    # Text of section 1
    # Section 1.1
    # Text of section 1.1
    # ...


    wiki_html = wikipediaapi.Wikipedia(
        user_agent='MyProjectName (merlin@example.com)',
        language='en',
        extract_format=wikipediaapi.ExtractFormat.HTML
    )
    p_html = wiki_html.page("Test 1")
    print(p_html.text)
    # <p>Summary</p>
    # <h2>Section 1</h2>
    # <p>Text of section 1</p>
    # <h3>Section 1.1</h3>
    # <p>Text of section 1.1</p>
    # ...

How To Get Page Sections
~~~~~~~~~~~~~~~~~~~~~~~~

To get all top level sections of page, you have to use property ``sections``. It returns list of
``WikipediaPageSection``, so you have to use recursion to get all subsections.

.. code-block:: python

    def print_sections(sections, level=0):
        for s in sections:
            print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
            print_sections(s.sections, level + 1)


    print_sections(page_py.sections)
    # *: History - Python was conceived in the late 1980s,
    # *: Features and philosophy - Python is a multi-paradigm programming l
    # *: Syntax and semantics - Python is meant to be an easily readable
    # **: Indentation - Python uses whitespace indentation, rath
    # **: Statements and control flow - Python's statements include (among other
    # **: Expressions - Some Python expressions are similar to l

How To Get Page Section By Title
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get last section of page with given title, you have to use function ``section_by_title``.
It returns the last ``WikipediaPageSection`` with this title.

.. code-block:: python

    section_history = page_py.section_by_title('History')
    print("%s - %s" % (section_history.title, section_history.text[0:40]))

    # History - Python was conceived in the late 1980s b

How To Get All Page Sections By Title
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get all sections of page with given title, you have to use function ``sections_by_title``.
It returns the all ``WikipediaPageSection`` with this title.

.. code-block:: python

    page_1920 = wiki_wiki.page('1920')
    sections_january = page_1920.sections_by_title('January')
    for s in sections_january:
        print("* %s - %s" % (s.title, s.text[0:40]))

    # * January - January 1
    # Polish–Soviet War in 1920: The
    # * January - January 2
    # Isaac Asimov, American author
    # * January - January 1 – Zygmunt Gorazdowski, Polish

How To Get Page In Other Languages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to get other translations of given page, you should use property ``langlinks``. It is map,
where key is language code and value is ``WikipediaPage``.

.. code-block:: python

    def print_langlinks(page):
        langlinks = page.langlinks
        for k in sorted(langlinks.keys()):
            v = langlinks[k]
            print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))

    print_langlinks(page_py)
    # af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
    # als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
    # an: an - Python: https://an.wikipedia.org/wiki/Python
    # ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
    # as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8

    page_py_cs = page_py.langlinks['cs']
    print("Page - Summary: %s" % page_py_cs.summary[0:60])
    # Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový sk

How To Get Links To Other Pages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to get all links to other wiki pages from given page, you need to use property ``links``.
It's map, where key is page title and value is ``WikipediaPage``.

.. code-block:: python

    def print_links(page):
        links = page.links
        for title in sorted(links.keys()):
            print("%s: %s" % (title, links[title]))

    print_links(page_py)
    # 3ds Max: 3ds Max (id: ??, ns: 0)
    # ?:: ?: (id: ??, ns: 0)
    # ABC (programming language): ABC (programming language) (id: ??, ns: 0)
    # ALGOL 68: ALGOL 68 (id: ??, ns: 0)
    # Abaqus: Abaqus (id: ??, ns: 0)
    # ...

How To Get Page Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to get all categories under which page belongs, you should use property ``categories``.
It's map, where key is category title and value is ``WikipediaPage``.

.. code-block:: python

    def print_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


    print("Categories")
    print_categories(page_py)
    # Category:All articles containing potentially dated statements: ...
    # Category:All articles with unsourced statements: ...
    # Category:Articles containing potentially dated statements from August 2016: ...
    # Category:Articles containing potentially dated statements from March 2017: ...
    # Category:Articles containing potentially dated statements from September 2017: ...

How To Get All Pages From Category
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get all pages from given category, you should use property ``categorymembers``. It returns all members of given category.
You have to implement recursion and deduplication by yourself.

.. code-block:: python

    def print_categorymembers(categorymembers, level=0, max_level=1):
        for c in categorymembers.values():
            print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
            if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level:
                print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level)


    cat = wiki_wiki.page("Category:Physics")
    print("Category members: Category:Physics")
    print_categorymembers(cat.categorymembers)

    # Category members: Category:Physics
    # * Statistical mechanics (ns: 0)
    # * Category:Physical quantities (ns: 14)
    # ** Refractive index (ns: 0)
    # ** Vapor quality (ns: 0)
    # ** Electric susceptibility (ns: 0)
    # ** Specific weight (ns: 0)
    # ** Category:Viscosity (ns: 14)
    # *** Brookfield Engineering (ns: 0)

Use Extra API Parameters
~~~~~~~~~~~~~~~~~~~~~~~~

Official API supports many different parameters. You can see them in the `sandbox`_. Not all these
parameters are supported directly as parameters of the functions. If you want to specify them,
you can pass them as additional parameters in the constructor. For the `info API call`_ you can
specify parameter `converttitles`. If you want to sp