diskcache
Disk Cache -- Disk and file backed persistent cache.
Downloads: 0 (30 days)
Description
DiskCache: Disk Backed Cache
============================
`DiskCache`_ is an Apache2 licensed disk and file backed cache library, written
in pure-Python, and compatible with Django.
The cloud-based computing of 2023 puts a premium on memory. Gigabytes of empty
space is left on disks as processes vie for memory. Among these processes is
Memcached (and sometimes Redis) which is used as a cache. Wouldn't it be nice
to leverage empty disk space for caching?
Django is Python's most popular web framework and ships with several caching
backends. Unfortunately the file-based cache in Django is essentially
broken. The culling method is random and large caches repeatedly scan a cache
directory which slows linearly with growth. Can you really allow it to take
sixty milliseconds to store a key in a cache with a thousand items?
In Python, we can do better. And we can do it in pure-Python!
::
In [1]: import pylibmc
In [2]: client = pylibmc.Client(['127.0.0.1'], binary=True)
In [3]: client[b'key'] = b'value'
In [4]: %timeit client[b'key']
10000 loops, best of 3: 25.4 µs per loop
In [5]: import diskcache as dc
In [6]: cache = dc.Cache('tmp')
In [7]: cache[b'key'] = b'value'
In [8]: %timeit cache[b'key']
100000 loops, best of 3: 11.8 µs per loop
**Note:** Micro-benchmarks have their place but are not a substitute for real
measurements. DiskCache offers cache benchmarks to defend its performance
claims. Micro-optimizations are avoided but your mileage may vary.
DiskCache efficiently makes gigabytes of storage space available for
caching. By leveraging rock-solid database libraries and memory-mapped files,
cache performance can match and exceed industry-standard solutions. There's no
need for a C compiler or running another process. Performance is a feature and
testing has 100% coverage with unit tests and hours of stress.
Testimonials
------------
`Daren Hasenkamp`_, Founder --
"It's a useful, simple API, just like I love about Redis. It has reduced
the amount of queries hitting my Elasticsearch cluster by over 25% for a
website that gets over a million users/day (100+ hits/second)."
`Mathias Petermann`_, Senior Linux System Engineer --
"I implemented it into a wrapper for our Ansible lookup modules and we were
able to speed up some Ansible runs by almost 3 times. DiskCache is saving
us a ton of time."
Does your company or website use `DiskCache`_? Send us a `message
<contact@grantjenks.com>`_ and let us know.
.. _`Daren Hasenkamp`: https://www.linkedin.com/in/daren-hasenkamp-93006438/
.. _`Mathias Petermann`: https://www.linkedin.com/in/mathias-petermann-a8aa273b/
Features
--------
- Pure-Python
- Fully Documented
- Benchmark comparisons (alternatives, Django cache backends)
- 100% test coverage
- Hours of stress testing
- Performance matters
- Django compatible API
- Thread-safe and process-safe
- Supports multiple eviction policies (LRU and LFU included)
- Keys support "tag" metadata and eviction
- Developed on Python 3.10
- Tested on CPython 3.6, 3.7, 3.8, 3.9, 3.10
- Tested on Linux, Mac OS X, and Windows
- Tested using GitHub Actions
.. image:: https://github.com/grantjenks/python-diskcache/workflows/integration/badge.svg
:target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Aintegration
.. image:: https://github.com/grantjenks/python-diskcache/workflows/release/badge.svg
:target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Arelease
Quickstart
----------
Installing `DiskCache`_ is simple with `pip <http://www.pip-installer.org/>`_::
$ pip install diskcache
You can access documentation in the interpreter with Python's built-in help
function::
>>> import diskcache
>>> help(diskcache) # doctest: +SKIP
The core of `DiskCache`_ is three data types intended for caching. `Cache`_
objects manage a SQLite database and filesystem directory to store key and
value pairs. `FanoutCache`_ provides a sharding layer to utilize multiple
caches and `DjangoCache`_ integrates that with `Django`_::
>>> from diskcache import Cache, FanoutCache, DjangoCache
>>> help(Cache) # doctest: +SKIP
>>> help(FanoutCache) # doctest: +SKIP
>>> help(DjangoCache) # doctest: +SKIP
Built atop the caching data types, are `Deque`_ and `Index`_ which work as a
cross-process, persistent replacements for Python's ``collections.deque`` and
``dict``. These implement the sequence and mapping container base classes::
>>> from diskcache import Deque, Index
>>> help(Deque) # doctest: +SKIP
>>> help(Index) # doctest: +SKIP
Finally, a number of `recipes`_ for cross-process synchronization are provided
using an underlying cache. Features like memoization with cache stampede
prevention, cross-process locking, and cross-process throttling are available::
>>> from diskcache import memoize_stampede, Lock, throttle
>>> help(memoize_stampede) # doctest: +SKIP
>>> help(Lock) # doctest: +SKIP
>>> help(throttle) # doctest: +SKIP
Python's docstrings are a quick way to get started but not intended as a
replacement for the `DiskCache Tutorial`_ and `DiskCache API Reference`_.
.. _`Cache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#cache
.. _`FanoutCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#fanoutcache
.. _`DjangoCache`: http://www.grantjenks.com/docs/diskcache/tutorial.html#djangocache
.. _`Django`: https://www.djangoproject.com/
.. _`Deque`: http://www.grantjenks.com/docs/diskcache/tutorial.html#deque
.. _`Index`: http://www.grantjenks.com/docs/diskcache/tutorial.html#index
.. _`recipes`: http://www.grantjenks.com/docs/diskcache/tutorial.html#recipes
User Guide
----------
For those wanting more details, this part of the documentation describes
tutorial, benchmarks, API, and development.
* `DiskCache Tutorial`_
* `DiskCache Cache Benchmarks`_
* `DiskCache DjangoCache Benchmarks`_
* `Case Study: Web Crawler`_
* `Case Study: Landing Page Caching`_
* `Talk: All Things Cached - SF Python 2017 Meetup`_
* `DiskCache API Reference`_
* `DiskCache Development`_
.. _`DiskCache Tutorial`: http://www.grantjenks.com/docs/diskcache/tutorial.html
.. _`DiskCache Cache Benchmarks`: http://www.grantjenks.com/docs/diskcache/cache-benchmarks.html
.. _`DiskCache DjangoCache Benchmarks`: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html
.. _`Talk: All Things Cached - SF Python 2017 Meetup`: http://www.grantjenks.com/docs/diskcache/sf-python-2017-meetup-talk.html
.. _`Case Study: Web Crawler`: http://www.grantjenks.com/docs/diskcache/case-study-web-crawler.html
.. _`Case Study: Landing Page Caching`: http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html
.. _`DiskCache API Reference`: http://www.grantjenks.com/docs/diskcache/api.html
.. _`DiskCache Development`: http://www.grantjenks.com/docs/diskcache/development.html
Comparisons
-----------
Comparisons to popular projects related to `DiskCache`_.
Key-Value Stores
................
`DiskCache`_ is mostly a simple key-value store. Feature comparisons with four
other projects are shown in the tables below.
* `dbm`_ is part of Python's standard library and implements a generic
interface to variants of the DBM database — dbm.gnu or dbm.ndbm. If none of
these modules is installed, the slow-but-simple dbm.dumb is used.
* `shelve`_ is part of Python's standard library and implements a “shelf” as a
persistent, dictionary-like object. The difference with “dbm” databases is
that the values can be anything that the pickle module can handle.
* `sqlitedict`_ is a lightweight wrapper around Python's sqlite3 database with
a simple, Pythonic dict-like interface and support for multi-thread
access. Keys are arbitrary strings, values arbitrary pickle-able objects.
* `pickleDB`_ is a lightweight and simple key-value store. It is built upon
Python's simplejson module and was inspired by Redis. It is licensed with the
BSD three-clause license.
.. _`dbm`: https://docs.python.org/3/library/dbm.html
.. _`shelve`: https://docs.python.org/3/library/shelve.html
.. _`sqlitedict`: https://github.com/RaRe-Technologies/sqlitedict
.. _`pickleDB`: https://pythonhosted.org/pickleDB/
**Features**
================ ============= ========= ========= ============ ============
Feature diskcache dbm shelve sqlitedict pickleDB
================ ============= ========= ========= ============ ============
Atomic? Always Maybe Maybe Maybe No
Persistent? Yes Yes Yes Yes Yes
Thread-safe? Yes No No Yes No
Process-safe? Yes No No Maybe No
Backend? SQLite DBM DBM SQLite File
Serialization? Customizable None Pickle Customizable JSON
Data Types? Mapping/Deque Mapping Mapping Mapping Mapping
Ordering? Insert/Sorted None None None None
Eviction? LRU/LFU/more None None None None
Vacuum? Automatic Maybe Maybe Manual Automatic
Transactions? Yes No No Maybe No
Multiprocessing? Yes No No No No
Forkable? Yes No No No No
Metadata? Yes No No No No
================ ============= ========= ========= ============ ============
**Quality**
================ ============= ========= ========= ============ ============
Project diskcache dbm shelve sqlitedict pickleDB
================ ============= ========= ========= ============ =======