Usage#

Cache instances#

deche is built on top of the excellent fsspec library, which means it can be used with any backends supported by fsspec, such as memory, local, s3 among many others.

Simply pass

from deche import Cache

cache = Cache(fs_protocol="memory")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 3
      1 from deche import Cache
----> 3 cache = Cache(fs_protocol="memory")

TypeError: Cache.__init__() missing 1 required positional argument: 'prefix'

Using deche#

Let’s create some simple functions simulating some work, and wrap them in our cache decorator

import time

@cache
def get(x):
    time.sleep(1)
    return x

@cache
def inc(y):
    time.sleep(1)
    return y + 1

Try it out!

%time inc(1)
CPU times: user 1.98 ms, sys: 356 µs, total: 2.33 ms
Wall time: 1 s
2
%time inc(1)
CPU times: user 90 µs, sys: 16 µs, total: 106 µs
Wall time: 108 µs
2
inc.list_cached_data()
['645076b4c840c53438b6eec928fee62ea2e2f700b62bcf2efb030766341c5113']

Whats happening under the hood?#

deche computes a hash key from the fully qualified kwargs to your function, and when the function is called, it saves the kwargs, the return value (actual object or exception!) to a location specified by the Cache instance.

@cache
def my_func(a, b=1):
    return a + b

print(my_func.tokenize(a=1))
print(my_func.tokenize(a=1, b=1))
print(my_func.tokenize(b=1, a=1))
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6

Features#

Return values (including exceptions) are cached#

@cache
def func(n):
    return n

@cache
def divide_by(n):
    return 1/n
func.has_data(kwargs={"n": 5})
False
func(n=5)
5
func.has_data(kwargs={"n": 5})
True
divide_by(1)
1.0
divide_by(0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/tmp/ipykernel_1020930/2118990485.py in <module>
----> 1 divide_by(0)

~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
    327                     logger.debug(f"Function {func} raised {e}")
    328                     self.write_output(path=f"{path}/{key}{Extensions.exception}", output=e)
--> 329                     raise e
    330 
    331                 return output

~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
    321                     self.write_input(path=f"{path}/{key}", inputs=inputs)
    322                     logger.debug(f"Calling {func}")
--> 323                     output = func(*args, **kwargs)
    324                     logger.debug(f"Function {func} ran successfully")
    325                     self.write_output(path=f"{path}/{key}", output=output)

/tmp/ipykernel_1020930/121188569.py in divide_by(n)
      5 @cache
      6 def divide_by(n):
----> 7     return 1/n

ZeroDivisionError: division by zero
divide_by.has_exception(kwargs={"n": 0})
True

When a method has thrown an exception - it will not be called again, instead, just like for cached data, it will load the exception from the cache (notice the load rather than the functio being called)

divide_by(0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
/tmp/ipykernel_1020930/2118990485.py in <module>
----> 1 divide_by(0)

~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
    317                     return self._load(func=func)(key=key)
    318                 elif self._exists(func=func, ext=Extensions.exception)(key=key):
--> 319                     raise self._load(func=func, ext=Extensions.exception)(key=key)
    320                 try:
    321                     self.write_input(path=f"{path}/{key}", inputs=inputs)

ZeroDivisionError: division by zero

View cached kwargs/data/exceptions via .list_cached_*#

func.list_cached_data()
['034b4d98f849295d44a5231fb156299fbdb5d8131186c9766ec2d125ec34eecb']
divide_by.list_cached_exceptions()
['4a19b9cb964e5c15171025cb0a657a269d5e20087c41fc7b967c3928d602e14a']

Retrieve values from the cache via key= or kwargs=#

divide_by.load_cached_inputs(key="4a19b9cb964e5c15171025cb0a657a269d5e20087c41fc7b967c3928d602e14a")
frozendict.frozendict({'n': 0})
divide_by.load_cached_exception(kwargs={'n': 0})
ZeroDivisionError('division by zero')

Easily create multiple instances of Cache via replace#

Having different config for different functions is simple, just create a new instance via replace. Typically usage is to create one “base” cache instance and use replace for actual functions:

from deche import Cache

cache = Cache(fs_protocol="memory")

@cache.replace(non_hashable_kwargs=("a",))
def myfunc(a=1):
    return a

Cache entries can be valid for certain period of time via cache_ttl#

import datetime


@cache.replace(cache_ttl=datetime.timedelta(days=7))
def myfunc(a=1):
    return a
myfunc()
print(myfunc.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']
time.sleep(1.1)
myfunc()
print(myfunc.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']

Cache entries can overwrite or append via cache_expiry_mode#

NB: reading of appended data not gracefully supported yet

On occassion, you may want to cache something for some period using cache_ttl, but continue caching versions of that data periodically - for example saving a webpage once weekly.

For this purpose you can use cache_expiry_mode=CacheExpiryMode.APPEND which will simply create new keys with an increment each time the function is rerun after the ttl period.

import datetime
from deche import CacheExpiryMode

@cache.replace(cache_ttl=datetime.timedelta(seconds=1), cache_expiry_mode=CacheExpiryMode.APPEND)
def append_func(a=1):
    return a
append_func()
print(append_func.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']
time.sleep(1.1)
append_func()
cache.fs.glob("**append_func**/*")
['/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a',
 '/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a-1',
 '/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a.inputs']

Persist kwargs and return values in any format via input_serializer/input_deserializer, output_serializer/output_deserializer#

The default serializer for deche is pickle which works well in some cases with stable environments, but eventually it will make more sense to persist data via a better format, such a JSON for simple object or parquet for pandas DataFrames.

This can be accomplied via the input_serializer/input_deserializer, output_serializer/output_deserializer kwargs.


def serialize(df: pd.DataFrame) -> bytes:
    buff = BytesIO()
    df.to_parquet(buff)
    return buff.getvalue()


def deserialize(raw: bytes) -> pd.DataFrame:
    return pd.read_parquet(BytesIO(raw))


@cache.replace(output_serializer=serialize, output_deserializer=deserialize)
def make_dataframe():
    return pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})

Ignore certain kwargs via non_hashable_kwargs#

If some of your arguments are non-hashable objects or arguments that do not determine the return value of the function, they can be ignore in the computing of the hash via non_hashable_kwargs.

Be careful not to do this if they do determine the result, as collisions will occur (a collison is where two different results share the same key/token and overwrite each other)

@cache.replace(non_hashable_kwargs=("a",))
def func(a=1, b=1):
    print(f"Running func with {b=}")
    return a + b

func(a=1, b=2)
Running func with b=2
3
# `a` is ignored, changing it will not rerun the function while `b` is the same (and therefore cached)
func(a=2, b=2)
3
# Changing `b` will however trigger a recomputation 
func(a=1, b=5)
Running func with b=5
6

Cache class methods alongside attributes#

This is experimental and not well tested - please raise any issues on github

Class methods can also be cached. They ignore the class instance self by default as the hash of this object changes, but can be used with other attributes:

class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    @cache.replace(cls_attrs=("a",))
    def func(self):
        print(f"Running func with {self.a=}")
        return self.a + self.b
    

cls1 = MyClass(a=1, b=2)
cls1.func()
Running func with self.a=1
3
# New class instance, but same token
cls2 = MyClass(a=1, b=5)
cls2.func()
3
# New class instance, `a` has changed so method is rerun 
cls3 = MyClass(a=2, b=5)
cls3.func()
Running func with self.a=2
7