Skip to main content

Cache

Summary

Add cache support for Databend Query so that our users can load hot data from our cache services instead of reading from backend storage services again.

Motivation

Databend design is based on the separation of storage and computing:

As a result, databend will support different cloud storage services as its backend like AWS s3, azure blob, and google cloud storage. The advantage of this is that we can get very high throughput bandwidth, but the disadvantage is that the latency of our individual requests increases.

Adding a caching layer will improve the overall latency and reduce the number of unnecessary requests, thus reducing costs.

Databend used to have a local disk-based cache at common-cache. This implementation use databend-query's local path to cache blocks.

Guide-level explanation

After introducing cache, the databend storage layer will be composed as follows:

The query will maintain three OpenDAL operators:

  • Persist Operator: Read and write persist data from/into storage services directly.
  • Cache Operator: Read and write data via transparent caching.
  • Temporary Operator: Read and write temporary data from/into cache services.

Storage services and cache services we mentioned here are both OpenDAL-supported services. The difference is cache services' backends could have their GC or background auto eviction logic, which means cache services is non-persist, a.k.a, volatile.

The query should never try to write persisted data to Cache Operator or Temporary Operator.

For End Users

Users can specify different cache services for caching or storing temporary data. For example, they can cache data into local fs (the same as current behavior):

[cache]
type = "fs"

[cache.fs]
data_path = "/var/cache/databend/"

Or they can cache data into Redis:

[cache]
type = "redis"

[cache.redis]
endpoint = "192.168.1.2"
password = "PASSWORD"

Temporary data could be shared crossing the whole cluster, so users must use distributed cache services instead of local fs or memory:

[temporary]
type = "opencache"

[temporary.opencache]
endpoints = ["192.168.1.2"]

For Developers

As discussed before, developers now have three kinds of Operator to handle different workloads:

  • Persist: Read and write data directly without caching logic, similar to O_DIRECT.
  • Cache: Read and write via transparent caching, similar to the kernel's page cache.
  • Temporary: Read and write temporary data to cache services, similar to Linux's tmpfs.

Reference-level explanation

Cache Operator

Cache Operator will be implemented as an OpenDAL Layer:

#[derive(Clone, Default, Debug)]
pub struct CacheLayer {
// Operator for persist data (via `storage` config)
persist: Operator,
// Operator for caching data (via `cache` config)
cache: Operator,
// Client for KVApi
kv: Arc<dyn KVApi>,
}

impl Layer for CacheLayer {}
impl Accessor for CacheLayer {}

databend-query will init the cache layer along with persist operator:

pub fn get_cache_operator(&self) -> Operator {
self.get_storage_operator().layer(CacheLayer::new())
}

The detailed implementation of CacheLayer will not discuss in this RFC.

Temporary Operator

The temporary Operator will connect to a distributed cache service that shares the same view for the whole cluster. The query can write temporary intermediate results here to share with other nodes.

Most work should be down at the cache services side. We will not discuss this in this RFC.

Drawbacks

None.

Rationale and alternatives

None.

Prior art

None.

Unresolved questions

None.

Future possibilities

OpenCache

OpenCache is an ongoing work from databend community to build a distributed cache services. OpenDAL will implement native support for OpenCache once its API is stable. After that, users can deploy and use OpenCache as cache or temporary storage.