Datasources

Datasources are DipDup connectors to various APIs. The table below shows how different datasources can be used.

Index datasource is the one used by DipDup internally to process specific index (set with datasource: ... in config). Currently, it can be only tzkt. Datasources available in context can be accessed in handlers and hooks via ctx.get_<kind>_datasource() methods and used to perform arbitrary requests. Finally, standalone services implement a subset of DipDup datasources and config directives. You can't use services-specific datasources like tezos-node in the main framework, they are here for informational purposes only.

	index	context	`mempool` service	`metadata` service
`tzkt`	✴	✅	✴	✴
`tezos-node`	❌	❌	✴	❌
`coinbase`	❌	✅	❌	❌
`metadata`	❌	✅	❌	❌
`ipfs`	❌	✅	❌	❌
`http`	❌	✅	❌	❌

✴ required ✅ supported ❌ not supported

TzKT

TzKT provides REST endpoints to query historical data and SignalR (Websocket) subscriptions to get realtime updates. Flexible filters allow you to request only data needed for your application and drastically speed up the indexing process.

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io

The number of items in each request can be configured with batch_size directive. Affects request number and memory usage.

datasources:
  tzkt_mainnet:
    http:
      ...
      batch_size: 10000

The rest HTTP tunables are the same as for other datasources.

Also, you can wait for several block confirmations before processing the operations:

datasources:
  tzkt_mainnet:
    ...
    buffer_size: 1  # indexing with a single block lag

Since 6.0 chain reorgs are processed automatically, but you may find this feature useful for other cases.

Tezos node

Tezos RPC is a standard interface provided by the Tezos node. This datasource is used solely by mempool and metadata standalone services; you can't use it in regular DipDup indexes.

datasources:
  tezos_node_mainnet:
    kind: tezos-node
    url: https://mainnet-tezos.giganode.io

Coinbase

A connector for Coinbase Pro API. Provides get_candles and get_oracle_data methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.

datasources:
  coinbase:
    kind: coinbase

Please note that Coinbase can't replace TzKT being an index datasource. But you can access it via ctx.datasources mapping both within handler and job callbacks.

DipDup Metadata

dipdup-metadata is a standalone companion indexer for DipDup written in Go. Configure datasource in the following way:

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet | ithacanet

Then, in your hook or handler code:

datasource = ctx.get_metadata_datasource('metadata')
token_metadata = await datasource.get_token_metadata('KT1...', '0')

IPFS

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup has a separate datasource to perform such requests via public nodes.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

HTTP (generic)

If you need to perform arbitrary requests to APIs not supported by DipDup, use generic HTTP datasource instead of plain aiohttp requests. That way you can use the same features DipDup uses for internal requests: retry with backoff, rate limiting, Prometheus integration etc.

datasources:
  my_api:
    kind: http
    url: https://my_api.local/v1

api = ctx.get_http_datasource('my_api')
response = await api.request(
    method='get',
    url='hello',  # relative to URL in config
    weigth=1,  # ratelimiter leaky-bucket drops
    params={
      'foo': 'bar',
    },
)

All DipDup datasources are inherited from http, so you can send arbitrary requests with any datasource. Let's say you want to fetch the protocol of the chain you're currently indexing (tzkt datasource doesn't have a separate method for it):

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, retry with backoff, etc.) are applied on every request.