Usage#
Cache instances#
deche
is built on top of the excellent fsspec library, which means it can be used with any backends supported by fsspec
, such as memory
, local
, s3
among many others.
Simply pass
from deche import Cache
cache = Cache(fs_protocol="memory")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[1], line 3
1 from deche import Cache
----> 3 cache = Cache(fs_protocol="memory")
TypeError: Cache.__init__() missing 1 required positional argument: 'prefix'
Using deche#
Let’s create some simple functions simulating some work, and wrap them in our cache
decorator
import time
@cache
def get(x):
time.sleep(1)
return x
@cache
def inc(y):
time.sleep(1)
return y + 1
Try it out!
%time inc(1)
CPU times: user 1.98 ms, sys: 356 µs, total: 2.33 ms
Wall time: 1 s
2
%time inc(1)
CPU times: user 90 µs, sys: 16 µs, total: 106 µs
Wall time: 108 µs
2
inc.list_cached_data()
['645076b4c840c53438b6eec928fee62ea2e2f700b62bcf2efb030766341c5113']
Whats happening under the hood?#
deche computes a hash key
from the fully qualified kwargs to your function, and when the function is called, it saves the kwargs
, the return value
(actual object or exception!) to a location specified by the Cache instance.
@cache
def my_func(a, b=1):
return a + b
print(my_func.tokenize(a=1))
print(my_func.tokenize(a=1, b=1))
print(my_func.tokenize(b=1, a=1))
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6
2a8dafae61fc6a537258ca1909b5db314e584b8596cca0c3a9d0c0624e81caa6
Features#
Return values (including exceptions) are cached#
@cache
def func(n):
return n
@cache
def divide_by(n):
return 1/n
func.has_data(kwargs={"n": 5})
False
func(n=5)
5
func.has_data(kwargs={"n": 5})
True
divide_by(1)
1.0
divide_by(0)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
/tmp/ipykernel_1020930/2118990485.py in <module>
----> 1 divide_by(0)
~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
327 logger.debug(f"Function {func} raised {e}")
328 self.write_output(path=f"{path}/{key}{Extensions.exception}", output=e)
--> 329 raise e
330
331 return output
~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
321 self.write_input(path=f"{path}/{key}", inputs=inputs)
322 logger.debug(f"Calling {func}")
--> 323 output = func(*args, **kwargs)
324 logger.debug(f"Function {func} ran successfully")
325 self.write_output(path=f"{path}/{key}", output=output)
/tmp/ipykernel_1020930/121188569.py in divide_by(n)
5 @cache
6 def divide_by(n):
----> 7 return 1/n
ZeroDivisionError: division by zero
divide_by.has_exception(kwargs={"n": 0})
True
When a method has thrown an exception - it will not be called again, instead, just like for cached data, it will load the exception from the cache (notice the load rather than the functio being called)
divide_by(0)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
/tmp/ipykernel_1020930/2118990485.py in <module>
----> 1 divide_by(0)
~/projects/deche/deche/core.py in wrapper(*args, **kwargs)
317 return self._load(func=func)(key=key)
318 elif self._exists(func=func, ext=Extensions.exception)(key=key):
--> 319 raise self._load(func=func, ext=Extensions.exception)(key=key)
320 try:
321 self.write_input(path=f"{path}/{key}", inputs=inputs)
ZeroDivisionError: division by zero
View cached kwargs/data/exceptions via .list_cached_*
#
func.list_cached_data()
['034b4d98f849295d44a5231fb156299fbdb5d8131186c9766ec2d125ec34eecb']
divide_by.list_cached_exceptions()
['4a19b9cb964e5c15171025cb0a657a269d5e20087c41fc7b967c3928d602e14a']
Retrieve values from the cache via key=
or kwargs=
#
divide_by.load_cached_inputs(key="4a19b9cb964e5c15171025cb0a657a269d5e20087c41fc7b967c3928d602e14a")
frozendict.frozendict({'n': 0})
divide_by.load_cached_exception(kwargs={'n': 0})
ZeroDivisionError('division by zero')
Easily create multiple instances of Cache
via replace
#
Having different config for different functions is simple, just create a new instance via replace
. Typically usage is to create one “base” cache instance and use replace
for actual functions:
from deche import Cache
cache = Cache(fs_protocol="memory")
@cache.replace(non_hashable_kwargs=("a",))
def myfunc(a=1):
return a
Cache entries can be valid for certain period of time via cache_ttl
#
import datetime
@cache.replace(cache_ttl=datetime.timedelta(days=7))
def myfunc(a=1):
return a
myfunc()
print(myfunc.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']
time.sleep(1.1)
myfunc()
print(myfunc.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']
Cache entries can overwrite or append via cache_expiry_mode
#
NB: reading of appended data not gracefully supported yet
On occassion, you may want to cache something for some period using cache_ttl
, but continue caching versions of that data periodically - for example saving a webpage once weekly.
For this purpose you can use cache_expiry_mode=CacheExpiryMode.APPEND
which will simply create new keys with an increment each time the function is rerun after the ttl period.
import datetime
from deche import CacheExpiryMode
@cache.replace(cache_ttl=datetime.timedelta(seconds=1), cache_expiry_mode=CacheExpiryMode.APPEND)
def append_func(a=1):
return a
append_func()
print(append_func.list_cached_data())
['eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a']
time.sleep(1.1)
append_func()
cache.fs.glob("**append_func**/*")
['/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a',
'/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a-1',
'/__main__.append_func/eb15d4f9ea9af826de550a47179c491f84b8f6028c3de97ca43df6de79287d2a.inputs']
Persist kwargs and return values in any format via input_serializer/input_deserializer
, output_serializer/output_deserializer
#
The default serializer for deche is pickle
which works well in some cases with stable environments, but eventually it will make more sense to persist data via a better format, such a JSON for simple object or parquet for pandas DataFrames.
This can be accomplied via the input_serializer/input_deserializer
, output_serializer/output_deserializer
kwargs.
def serialize(df: pd.DataFrame) -> bytes:
buff = BytesIO()
df.to_parquet(buff)
return buff.getvalue()
def deserialize(raw: bytes) -> pd.DataFrame:
return pd.read_parquet(BytesIO(raw))
@cache.replace(output_serializer=serialize, output_deserializer=deserialize)
def make_dataframe():
return pd.DataFrame({'a': np.random.randn(10), 'b': np.random.randn(10)})
Ignore certain kwargs via non_hashable_kwargs
#
If some of your arguments are non-hashable objects or arguments that do not determine the return value of the function, they can be ignore in the computing of the hash via non_hashable_kwargs
.
Be careful not to do this if they do determine the result, as collisions will occur (a collison is where two different results share the same key/token and overwrite each other)
@cache.replace(non_hashable_kwargs=("a",))
def func(a=1, b=1):
print(f"Running func with {b=}")
return a + b
func(a=1, b=2)
Running func with b=2
3
# `a` is ignored, changing it will not rerun the function while `b` is the same (and therefore cached)
func(a=2, b=2)
3
# Changing `b` will however trigger a recomputation
func(a=1, b=5)
Running func with b=5
6
Cache class methods alongside attributes#
This is experimental and not well tested - please raise any issues on github
Class methods can also be cached. They ignore the class instance self
by default as the hash of this object changes, but can be used with other attributes:
class MyClass:
def __init__(self, a, b):
self.a = a
self.b = b
@cache.replace(cls_attrs=("a",))
def func(self):
print(f"Running func with {self.a=}")
return self.a + self.b
cls1 = MyClass(a=1, b=2)
cls1.func()
Running func with self.a=1
3
# New class instance, but same token
cls2 = MyClass(a=1, b=5)
cls2.func()
3
# New class instance, `a` has changed so method is rerun
cls3 = MyClass(a=2, b=5)
cls3.func()
Running func with self.a=2
7