Give AlbumentationsX a star on GitHub — it powers this leaderboard

Star on GitHub

diskcache

Disk Cache -- Disk and file backed persistent cache.

Downloads: 0 (30 days)

Description

DiskCache: Disk Backed Cache
============================

`DiskCache`_ is an Apache2 licensed disk and file backed cache library, written
in pure-Python, and compatible with Django.

The cloud-based computing of 2023 puts a premium on memory. Gigabytes of empty
space is left on disks as processes vie for memory. Among these processes is
Memcached (and sometimes Redis) which is used as a cache. Wouldn't it be nice
to leverage empty disk space for caching?

Django is Python's most popular web framework and ships with several caching
backends. Unfortunately the file-based cache in Django is essentially
broken. The culling method is random and large caches repeatedly scan a cache
directory which slows linearly with growth. Can you really allow it to take
sixty milliseconds to store a key in a cache with a thousand items?

In Python, we can do better. And we can do it in pure-Python!

::

   In [1]: import pylibmc
   In [2]: client = pylibmc.Client(['127.0.0.1'], binary=True)
   In [3]: client[b'key'] = b'value'
   In [4]: %timeit client[b'key']

   10000 loops, best of 3: 25.4 µs per loop

   In [5]: import diskcache as dc
   In [6]: cache = dc.Cache('tmp')
   In [7]: cache[b'key'] = b'value'
   In [8]: %timeit cache[b'key']

   100000 loops, best of 3: 11.8 µs per loop

**Note:** Micro-benchmarks have their place but are not a substitute for real
measurements. DiskCache offers cache benchmarks to defend its performance
claims. Micro-optimizations are avoided but your mileage may vary.

DiskCache efficiently makes gigabytes of storage space available for
caching. By leveraging rock-solid database libraries and memory-mapped files,
cache performance can match and exceed industry-standard solutions. There's no
need for a C compiler or running another process. Performance is a feature and
testing has 100% coverage with unit tests and hours of stress.

Testimonials
------------

`Daren Hasenkamp`_, Founder --

    "It's a useful, simple API, just like I love about Redis. It has reduced
    the amount of queries hitting my Elasticsearch cluster by over 25% for a
    website that gets over a million users/day (100+ hits/second)."

`Mathias Petermann`_, Senior Linux System Engineer --

    "I implemented it into a wrapper for our Ansible lookup modules and we were
    able to speed up some Ansible runs by almost 3 times. DiskCache is saving
    us a ton of time."

Does your company or website use `DiskCache`_? Send us a `message
<contact@grantjenks.com>`_ and let us know.

.. _`Daren Hasenkamp`: https://www.linkedin.com/in/daren-hasenkamp-93006438/
.. _`Mathias Petermann`: https://www.linkedin.com/in/mathias-petermann-a8aa273b/

Features
--------

- Pure-Python
- Fully Documented
- Benchmark comparisons (alternatives, Django cache backends)
- 100% test coverage
- Hours of stress testing
- Performance matters
- Django compatible API
- Thread-safe and process-safe
- Supports multiple eviction policies (LRU and LFU included)
- Keys support "tag" metadata and eviction
- Developed on Python 3.10
- Tested on CPython 3.6, 3.7, 3.8, 3.9, 3.10
- Tested on Linux, Mac OS X, and Windows
- Tested using GitHub Actions

.. image:: https://github.com/grantjenks/python-diskcache/workflows/integration/badge.svg
   :target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Aintegration

.. image:: https://github.com/grantjenks/python-diskcache/workflows/release/badge.svg
   :target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Arelease

Quickstart
----------

Installing `DiskCache`_ is simple with `pip <http://www.pip-installer.org/>`_::

  $ pip install diskcache

You can access documentation in the interpreter with Python's built-in help
function::

  >>> import diskcache
  >>> help(diskcache)                             # doctest: +SKIP

The core of `DiskCache`_ is three data types intended for caching. `Cache`_
objects manage a SQLite database and filesystem directory to store key and
value pairs. `FanoutCache`_ provides a sharding layer to utilize multiple
caches and `DjangoCache`_ integrates that with `Django`_::

  >>> from diskcache import Cache, FanoutCache, DjangoCache
  >>> help(Cache)                                 # doctest: +SKIP
  >>> help(FanoutCache)                           # doctest: +SKIP
  >>> help(DjangoCache)                           # doctest: +SKIP

Built atop the caching data types, are `Deque`_ and `Index`_ which work as a
cross-process, persistent replacements for Python's ``collections.deque`` and
``dict``. These implement the sequence and mapping container base classes::

  >>> from diskcache import Deque, Index
  >>> help(Deque)                                 # doctest: +SKIP
  >>> help(Index)                                 # doctest: +SKIP

Finally, a number of `recipes`_ for cross-process synchronization are provided
using an underlying cache. Features like memoization with cache stampede
prevention, cross-process locking, and cross-process throttling are available::

  >>> from diskcache import memoize_stampede, Lock, throttle
  >>> help(memoize_stampede)                      # doctest: +SKIP
  >>> help(Lock)                                  # doctest: +SKIP
  >>> help(throttle)                              # doctest: +SKIP

Python's docstrings are a quick way to get started but not intended as a
replacement for the `DiskCache Tutorial`_ and `DiskCache API Reference`_.

.. _`Cache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#cache
.. _`FanoutCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#fanoutcache
.. _`DjangoCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#djangocache
.. _`Django`: https://www.djangoproject.com/
.. _`Deque`: http://www.grantjenks.com/docs/diskcache/tutorial.html#deque
.. _`Index`: http://www.grantjenks.com/docs/diskcache/tutorial.html#index
.. _`recipes`: http://www.grantjenks.com/docs/diskcache/tutorial.html#recipes

User Guide
----------

For those wanting more details, this part of the documentation describes
tutorial, benchmarks, API, and development.

* `DiskCache Tutorial`_
* `DiskCache Cache Benchmarks`_
* `DiskCache DjangoCache Benchmarks`_
* `Case Study: Web Crawler`_
* `Case Study: Landing Page Caching`_
* `Talk: All Things Cached - SF Python 2017 Meetup`_
* `DiskCache API Reference`_
* `DiskCache Development`_

.. _`DiskCache Tutorial`: http://www.grantjenks.com/docs/diskcache/tutorial.html
.. _`DiskCache Cache Benchmarks`: http://www.grantjenks.com/docs/diskcache/cache-benchmarks.html
.. _`DiskCache DjangoCache Benchmarks`: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html
.. _`Talk: All Things Cached - SF Python 2017 Meetup`: http://www.grantjenks.com/docs/diskcache/sf-python-2017-meetup-talk.html
.. _`Case Study: Web Crawler`: http://www.grantjenks.com/docs/diskcache/case-study-web-crawler.html
.. _`Case Study: Landing Page Caching`: http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html
.. _`DiskCache API Reference`: http://www.grantjenks.com/docs/diskcache/api.html
.. _`DiskCache Development`: http://www.grantjenks.com/docs/diskcache/development.html

Comparisons
-----------

Comparisons to popular projects related to `DiskCache`_.

Key-Value Stores
................

`DiskCache`_ is mostly a simple key-value store. Feature comparisons with four
other projects are shown in the tables below.

* `dbm`_ is part of Python's standard library and implements a generic
  interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of
  these modules is installed, the slow-but-simple dbm.dumb is used.
* `shelve`_ is part of Python's standard library and implements a “shelf” as a
  persistent, dictionary-like object. The difference with “dbm” databases is
  that the values can be anything that the pickle module can handle.
* `sqlitedict`_ is a lightweight wrapper around Python's sqlite3 database with
  a simple, Pythonic dict-like interface and support for multi-thread
  access. Keys are arbitrary strings, values arbitrary pickle-able objects.
* `pickleDB`_ is a lightweight and simple key-value store. It is built upon
  Python's simplejson module and was inspired by Redis. It is licensed with the
  BSD three-clause license.

.. _`dbm`: https://docs.python.org/3/library/dbm.html
.. _`shelve`: https://docs.python.org/3/library/shelve.html
.. _`sqlitedict`: https://github.com/RaRe-Technologies/sqlitedict
.. _`pickleDB`: https://pythonhosted.org/pickleDB/

**Features**

================ ============= ========= ========= ============ ============
Feature          diskcache     dbm       shelve    sqlitedict   pickleDB
================ ============= ========= ========= ============ ============
Atomic?          Always        Maybe     Maybe     Maybe        No
Persistent?      Yes           Yes       Yes       Yes          Yes
Thread-safe?     Yes           No        No        Yes          No
Process-safe?    Yes           No        No        Maybe        No
Backend?         SQLite        DBM       DBM       SQLite       File
Serialization?   Customizable  None      Pickle    Customizable JSON
Data Types?      Mapping/Deque Mapping   Mapping   Mapping      Mapping
Ordering?        Insert/Sorted None      None      None         None
Eviction?        LRU/LFU/more  None      None      None         None
Vacuum?          Automatic     Maybe     Maybe     Manual       Automatic
Transactions?    Yes           No        No        Maybe        No
Multiprocessing? Yes           No        No        No           No
Forkable?        Yes           No        No        No           No
Metadata?        Yes           No        No        No           No
================ ============= ========= ========= ============ ============

**Quality**

================ ============= ========= ========= ============ ============
Project          diskcache     dbm       shelve    sqlitedict   pickleDB
================ ============= ========= ========= ============ =======