⚰️ DipDup 6.5 branch is no longer supported. Please, follow the⠀Migration Guide⠀to update to the latest version.

Twitter GitHub stars PyPI monthly downloads PyPI - Python Version License: MIT
Latest stable release Latest pre-release GitHub issues GitHub pull requests

        ____   _         ____              
       / __ \ (_)____   / __ \ __  __ ____ 
      / / / // // __ \ / / / // / / // __ \
     / /_/ // // /_/ // /_/ // /_/ // /_/ /
    /_____//_// .___//_____/ \__,_// .___/ 
             /_/                  /_/      

DipDup is a Python framework for building smart contract indexers. It helps developers focus on business logic instead of writing a boilerplate to store and serve data. DipDup-based indexers are selective, which means only required data is requested. This approach allows to achieve faster indexing times and decreased load on underlying APIs.

This project is maintained by the Baking Bad team.
Development is supported by Tezos Foundation.

Thanks

Sponsors

Decentralized web requires decentralized funding. The following people and organizations help the project to be sustainable.

Want your project to be listed here? We have nice perks for sponsors! Visit our GitHub Sponsors page.

Contributors

We are grateful to all the people who help us with the project.

If we forgot to mention you, or you want to update your record, please, open an issue or pull request.

Quickstart

This page will guide you through the steps to get your first selective indexer up and running in a few minutes without getting too deep into the details.

Let's create an indexer for the tzBTC FA1.2 token contract. Our goal is to save all token transfers to the database and then calculate some statistics of its holders' activity.

A modern Linux/macOS distribution with Python 3.10 installed is required to run DipDup.

Create a new project

You can initialize a hello-world project interactively by choosing configuration options in the terminal. The following command will install DipDup for the current user:

curl -Lsf https://dipdup.io/install_lts.py | python

Now, let's create a new project:

dipdup new

Follow the instructions; the project will be created in the current directory. You can skip reading the rest of this page and slap dipdup run instead.

From scratch

Currently, we mainly use Poetry for dependency management in DipDup. If you prefer hatch, pdb, piptools or others — use them instead. Below are some snippets to get you started.

# Create a new project directory
mkdir dipdup-indexer; cd dipdup-indexer

# Plain pip
python -m venv .venv
. .venv/bin/activate
pip install dipdup

# or Poetry
poetry init --python ">=3.10,<3.11"
poetry add dipdup
poetry shell

Write a configuration file

DipDup configuration is stored in YAML files of a specific format. Create a new file named dipdup.yml in your current working directory with the following content:

spec_version: 1.2
package: demo_token

database:
  kind: sqlite
  path: demo-token.sqlite3

contracts:
  tzbtc_mainnet:
    address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
    typename: tzbtc

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io

indexes:
  tzbtc_holders_mainnet:
    template: tzbtc_holders
    values:
      contract: tzbtc_mainnet
      datasource: tzkt_mainnet

templates:
  tzbtc_holders:
    kind: operation
    datasource: <datasource>
    contracts:
      - <contract>
    handlers:
      - callback: on_transfer
        pattern:
          - destination: <contract>
            entrypoint: transfer
      - callback: on_mint
        pattern:
          - destination: <contract>
            entrypoint: mint

Initialize project tree

Now it's time to generate typeclasses and callback stubs. Run the following command:

dipdup init

DipDup will create a Python package demo_token having the following structure:

demo_token
├── graphql
├── handlers
│   ├── __init__.py
│   ├── on_mint.py
│   └── on_transfer.py
├── hooks
│   ├── __init__.py
│   ├── on_reindex.py
│   ├── on_restart.py
│   ├── on_index_rollback.py
│   └── on_synchronized.py
├── __init__.py
├── models.py
├── sql
│   ├── on_reindex
│   ├── on_restart
│   ├── on_index_rollback
│   └── on_synchronized
└── types
    ├── __init__.py
    └── tzbtc
        ├── __init__.py
        ├── parameter
        │   ├── __init__.py
        │   ├── mint.py
        │   └── transfer.py
        └── storage.py

That's a lot of files and directories! But don't worry, we will need only models.py and handlers modules in this guide.

Define data models

Our schema will consist of a single model Holder having several fields:

  • address — account address
  • balance — in tzBTC
  • volume — total transfer/mint amount bypassed
  • tx_count — number of transfers/mints
  • last_seen — time of the last transfer/mint

Put the following content in the models.py file:

from tortoise import fields

from dipdup.models import Model


class Holder(Model):
    address = fields.CharField(max_length=36, pk=True)
    balance = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
    turnover = fields.DecimalField(decimal_places=8, max_digits=20, default=0)
    tx_count = fields.BigIntField(default=0)
    last_seen = fields.DatetimeField(null=True)

Implement handlers

Everything's ready to implement an actual indexer logic.

Our task is to index all the balance updates, so we'll start with a helper method to handle them. Create a file named on_balance_update.py in the handlers package with the following content:

from datetime import datetime
from decimal import Decimal

import demo_token.models as models


async def on_balance_update(
    address: str,
    balance_update: Decimal,
    timestamp: datetime,
) -> None:
    holder, _ = await models.Holder.get_or_create(address=address)
    holder.balance += balance_update
    holder.turnover += abs(balance_update)
    holder.tx_count += 1
    holder.last_seen = timestamp
    await holder.save()

Three methods of tzBTC contract can alter token balances — transfer, mint, and burn. The last one is omitted in this tutorial for simplicity. Edit corresponding handlers to call the on_balance_update method with data from matched operations:

on_transfer.py

from decimal import Decimal

from demo_token.handlers.on_balance_update import on_balance_update
from demo_token.types.tzbtc.parameter.transfer import TransferParameter
from demo_token.types.tzbtc.storage import TzbtcStorage
from dipdup.context import HandlerContext
from dipdup.models import Transaction


async def on_transfer(
    ctx: HandlerContext,
    transfer: Transaction[TransferParameter, TzbtcStorage],
) -> None:
    if transfer.parameter.from_ == transfer.parameter.to:
        # NOTE: Internal tzBTC transfer
        return

    amount = Decimal(transfer.parameter.value) / (10**8)
    await on_balance_update(
        address=transfer.parameter.from_,
        balance_update=-amount,
        timestamp=transfer.data.timestamp,
    )
    await on_balance_update(
        address=transfer.parameter.to,
        balance_update=amount,
        timestamp=transfer.data.timestamp,
    )

on_mint.py

from decimal import Decimal

from demo_token.handlers.on_balance_update import on_balance_update
from demo_token.types.tzbtc.parameter.mint import MintParameter
from demo_token.types.tzbtc.storage import TzbtcStorage
from dipdup.context import HandlerContext
from dipdup.models import Transaction


async def on_mint(
    ctx: HandlerContext,
    mint: Transaction[MintParameter, TzbtcStorage],
) -> None:
    amount = Decimal(mint.parameter.value) / (10**8)
    await on_balance_update(
        address=mint.parameter.to,
        balance_update=amount,
        timestamp=mint.data.timestamp,
    )

And that's all! We can run the indexer now.

Run your indexer

dipdup run

DipDup will fetch all the historical data and then switch to realtime updates. Your application data has been successfully indexed!

Getting started

This part of the docs covers the same features the Quickstart article does but is more focused on details.

Installation

This page covers the installation of DipDup in different environments.

Host requirements

A Linux/MacOS environment with Python 3.10 installed is required to use DipDup. Other UNIX-like systems should work but are not supported officially.

Minimum hardware requirements are 256 MB RAM, 1 CPU core, and some disk space for the database. RAM requirements increase with the number of indexes.

Non-UNIX environments

Windows is not officially supported, but there's a possibility everything will work fine. In case of issues throw us a message and use WSL or Docker.

We aim to improve cross-platform compatibility in future releases (issue).

Local installation

The following command will install DipDup for the current user:

curl -Lsf https://dipdup.io/install_lts.py | python

This script uses pipx under the hood to install dipdup and datamodel-codegen as CLI tools. Then you can use any package manager of your choice to manage versions of DipDup and other project dependencies.

Manually

Currently, we mainly use Poetry for dependency management in DipDup. If you prefer hatch, pdb, piptools or others — use them instead. Below are some snippets to get you started.

# Create a new project directory
mkdir dipdup-indexer; cd dipdup-indexer

# Plain pip
python -m venv .venv
. .venv/bin/activate
pip install dipdup

# or Poetry
poetry init --python ">=3.10,<3.11"
poetry add dipdup
poetry shell

Docker

See 6.2. Running in Docker page.

Core concepts

Big picture

Initially, DipDup was heavily inspired by The Graph Protocol, but there are several differences. The most important one is that DipDup indexers are completely off-chain.

DipDup utilizes a microservice approach and relies heavily on existing solutions, making the SDK very lightweight and allowing developers to switch API engines on demand.

DipDup works with operation groups (explicit operation and all internal ones, a single contract call) and Big_map updates (lazy hash map structures, read more) — until fully-fledged protocol-level events are not implemented in Tezos.

Consider DipDup a set of best practices for building custom backends for decentralized applications, plus a toolkit that spares you from writing boilerplate code.

DipDup is tightly coupled with TzKT API but can generally use any data provider which implements a particular feature set. TzKT provides REST endpoints and Websocket subscriptions with flexible filters enabling selective indexing and returns "humanified" contract data, which means you don't have to handle raw Michelson expressions.

DipDup offers PostgreSQL + Hasura GraphQL Engine combo out-of-the-box to expose indexed data via REST and GraphQL with minimal configuration. However, you can use any database and API engine (e.g., write API backend in-house).

Default DipDup setup and data flow

How it works

From the developer's perspective, there are three main steps for creating an indexer using the DipDup framework:

  1. Write a declarative configuration file containing all the inventory and indexing rules.
  2. Describe your domain-specific data models.
  3. Implement the business logic, which is how to convert blockchain data to your models.

As a result, you get a service responsible for filling the database with the indexed data.

Within this service, there can be multiple indexers running independently.

Atomicity and persistency

DipDup applies all updates atomically block by block. In case of an emergency shutdown, it can safely recover later and continue from the level it ended. DipDup state is stored in the database per index and can be used by API consumers to determine the current indexer head.

Here are a few essential things to know before running your indexer:

  • Ensure that the database (or schema in the case of PostgreSQL) you're connecting to is used by DipDup exclusively. Changes in index configuration or models require DipDup to drop the whole database (schema) and start indexing from scratch. You can, however, mark specific tables as immune to preserve them from being dropped.
  • Changing index config triggers reindexing. Also, do not change aliases of existing indexes in the config file without cleaning up the database first. DipDup won't handle that automatically and will treat the renamed index as new.
  • Multiple indexes pointing to different contracts should not reuse the same models (unless you know what you are doing) because synchronization is done sequentially by index.

Schema migration

DipDup does not support database schema migration: if there's any model change, it will trigger reindexing. The rationale is that it's easier and faster to start over than handle migrations that can be of arbitrary complexity and do not guarantee data consistency.

DipDup stores a hash of the SQL version of the DB schema and checks for changes each time you run indexing.

Handling chain reorgs

Reorg messages signaling chain reorganizations. That means some blocks, including all operations, are rolled back in favor of another with higher fitness. Chain reorgs happen regularly (especially in testnets), so it's not something you can ignore. These messages must be handled correctly -- otherwise, you will likely accumulate duplicate or invalid data.

Singe version 6.0 DipDup processes chain reorgs seamlessly restoring a previous database state. You can implement your rollback logic by editing the on_index_rollback event hook.

Creating config

Developing a DipDup indexer begins with creating a YAML config file. You can find a minimal example to start indexing on the Quickstart page.

General structure

DipDup configuration is stored in YAML files of a specific format. By default, DipDup searches for dipdup.yml file in the current working directory, but you can provide any path with a -c CLI option.

DipDup config file consists of several logical blocks:

Headerspec_version*14.15. spec_version
package*14.12. package
Inventorydatabase14.5. database
contracts14.3. contracts
datasources14.6. datasources
custom14.4. custom
Index definitionsindexes14.9. indexes
templates14.16. templates
Hook definitionshooks14.8. hooks
jobs14.10. jobs
Integrationshasura14.7. hasura
sentry14.14. sentry
prometheus14.13. prometheus
Tunablesadvanced14.2. advanced
logging14.11. logging

Header contains two required fields, package and spec_version. They are used to identify the project and the version of the DipDup specification. All other fields in the config are optional.

Inventory specifies contracts that need to be indexed, datasources to fetch data from, and the database to store data in.

Index definitions define the index templates that will be used to index the contract data.

Hook definitions define callback functions that will be called manually or on schedule.

Integrations are used to integrate with third-party services.

Tunables affect the behavior of the whole framework.

Merging config files

DipDup allows you to customize the configuration for a specific environment or workflow. It works similarly to docker-compose anchors but only for top-level sections. If you want to override a nested property, you need to recreate a whole top-level section. To merge several DipDup config files, provide the -c command-line option multiple times:

dipdup -c dipdup.yml -c dipdup.prod.yml run

Run config export command if unsure about the final config used by DipDup.

Full example

This page or paragraph is yet to be written. Come back later.

Let's put it all together. The config below is an artificial example but contains almost all available options.

spec_version: 1.2
package: my_indexer

database:
  kind: postgres
  host: db
  port: 5432
  user: dipdup
  password: changeme
  database: dipdup
  schema_name: public
  immune_tables:
    - token_metadata
    - contract_metadata

contracts:
  some_dex:
    address: KT1K4EwTpbvYN9agJdjpyJm4ZZdhpUNKB3F6
    typename: quipu_fa12

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io
  my_api:
    kind: http
    url: https://my_api.local/v1
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs
  coinbase:
    kind: coinbase
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet

indexes:
  operation_index_from_template:
    template: operation_template
    values:
      datasource: tzkt
      contract: some_dex

  big_map_index_from_template:
    template: big_map_template
    values:
      datasource: tzkt
      contract: some_dex
    first_level: 1
    last_level: 46963
    skip_history: never

  factory:
    kind: operation
    datasource: tzkt
    types:
      - origination
    contracts:
      - some_dex
    handlers:
      - callback: on_factory_origination
        pattern:
          - type: origination
            similar_to: some_dex

templates:
  operation_template:
    kind: operation
    datasource: <datasource>
    types:
      - origination
      - transaction
    contracts:
      - <contract>
    handlers:
      - callback: on_origination
        pattern:
          - type: origination
            originated_contract: <contract>
      - callback: on_some_call
        pattern:
          - type: transaction
            destination: <contract>
            entrypoint: some_call

  big_map_template:
    kind: big_map
    datasource: <datasource>
    handlers:
      - callback: on_update_records
        contract: <name_registry>
        path: store.records
      - callback: on_update_expiry_map
        contract: <name_registry>
        path: store.expiry_map


hooks:
  calculate_stats:
    callback: calculate_stats
    atomic: False
    args:
     major: bool

jobs:
  midnight_stats:
    hook: calculate_stats
    crontab: "0 0 * * *"
    args:
      major: True

sentry:
  dsn: https://localhost
  environment: dev
  debug: False

prometheus:
  host: 0.0.0.0

hasura:
  url: http://hasura:8080
  admin_secret: changeme
  allow_aggregations: False
  camel_case: true
  select_limit: 100

advanced:
  early_realtime: True
  merge_subscriptions: False
  postpone_jobs: False
  metadata_interface: False
  skip_version_check: False
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception
  rollback_depth: 2
  crash_reporting: False

logging: verbose

Project structure

The structure of the DipDup project package is the following:

demo_token
├── graphql
├── handlers
│   ├── __init__.py
│   ├── on_mint.py
│   └── on_transfer.py
├── hasura
├── hooks
│   ├── __init__.py
│   ├── on_reindex.py
│   ├── on_restart.py
│   ├── on_index_rollback.py
│   └── on_synchronized.py
├── __init__.py
├── models.py
├── sql
│   ├── on_reindex
│   ├── on_restart
│   ├── on_index_rollback
│   └── on_synchronized
└── types
    ├── __init__.py
    └── tzbtc
        ├── __init__.py
        ├── parameter
        │   ├── __init__.py
        │   ├── mint.py
        │   └── transfer.py
        └── storage.py
pathdescription
graphqlGraphQL queries for Hasura (*.graphql)
handlersUser-defined callbacks to process matched operations and big map diffs
hasuraArbitrary Hasura metadata (*.json)
hooksUser-defined callbacks to run manually or by schedule
models.pyTortoise ORM models
sqlSQL scripts to run from callbacks (*.sql)
typesCodegened Pydantic typeclasses for contract storage/parameter

DipDup will generate all the necessary directories and files inside the project's root on init command. These include contract type definitions and callback stubs to be implemented by the developer.

Type classes

DipDup receives all smart contract data (transaction parameters, resulting storage, big_map updates) in normalized form (read more about how TzKT handles Michelson expressions) but still as raw JSON. DipDup uses contract type information to generate data classes, which allow developers to work with strictly typed data.

DipDup generates Pydantic models out of JSONSchema. You might want to install additional plugins (PyCharm, mypy) for convenient work with this library.

The following models are created at init for different indexes:

  • operation: storage type for all contracts in handler patterns plus parameter type for all destination+entrypoint pairs.
  • big_map: key and storage types for all used contracts and big map paths.
  • event: payload types for all used contracts and tags.

Other index kinds do not use code generated types.

Nested packages

Callback modules don't have to be in top-level hooks/handlers directories. Add one or multiple dots to the callback name to define nested packages:

package: indexer
hooks:
  foo.bar:
    callback: foo.bar

After running the init command, you'll get the following directory tree (shortened for readability):

indexer
├── hooks
│   ├── foo
│   │   ├── bar.py
│   │   └── __init__.py
│   └── __init__.py
└── sql
    └── foo
        └── bar
            └── .keep

The same rules apply to handler callbacks. Note that the callback field must be a valid Python package name - lowercase letters, underscores, and dots.

Defining models

DipDup uses the Tortoise ORM library to cover database operations. During initialization, DipDup generates a models.py file on the top level of the package that will contain all database models. The name and location of this file cannot be changed.

A typical models.py file looks like the following (example from demo_domains package):

from typing import Optional

from tortoise import fields
from tortoise.fields.relational import ForeignKeyFieldInstance

from dipdup.models import Model


class TLD(Model):
    id = fields.CharField(max_length=255, pk=True)
    owner = fields.CharField(max_length=36)


class Domain(Model):
    id = fields.CharField(max_length=255, pk=True)
    tld: ForeignKeyFieldInstance[TLD] = fields.ForeignKeyField('models.TLD', 'domains')
    expiry = fields.DatetimeField(null=True)
    owner = fields.CharField(max_length=36)
    token_id = fields.BigIntField(null=True)

    tld_id: Optional[str]


class Record(Model):
    id = fields.CharField(max_length=255, pk=True)
    domain: ForeignKeyFieldInstance[Domain] = fields.ForeignKeyField('models.Domain', 'records')
    address = fields.CharField(max_length=36, null=True)

See the links below to learn how to use this library.

Limitations

Some limitations are applied to model names and fields to avoid ambiguity in GraphQL API.

  • Table names must be in snake_case
  • Model fields must be in snake_case
  • Model fields must differ from table name

Implementing handlers

DipDup generates a separate file with a callback stub for each handler in every index specified in the configuration file.

In the case of the transaction handler, the callback method signature is the following:

from <package>.types.<typename>.parameter.entrypoint_foo import EntryPointFooParameter
from <package>.types.<typename>.parameter.entrypoint_bar import EntryPointBarParameter
from <package>.types.<typename>.storage import TypeNameStorage


async def on_transaction(
    ctx: HandlerContext,
    entrypoint_foo: Transaction[EntryPointFooParameter, TypeNameStorage],
    entrypoint_bar: Transaction[EntryPointBarParameter, TypeNameStorage]
) -> None:
    ...

where:

  • entrypoint_foo ... entrypoint_bar are items from the according to handler pattern.
  • ctx: HandlerContext provides useful helpers and contains an internal state (see ).
  • A Transaction model contains transaction typed parameter and storage, plus other fields.

For the origination case, the handler signature will look similar:

from <package>.types.<typename>.storage import TypeNameStorage


async def on_origination(
    ctx: HandlerContext,
    origination: Origination[TypeNameStorage],
)

An Origination model contains the origination script, initial storage (typed), amount, delegate, etc.

A Big_map update handler will look like the following:

from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue


async def on_update(
    ctx: HandlerContext,
    update: BigMapDiff[PathKey, PathValue],
)

BigMapDiff contains action (allocate, update, or remove), nullable key and value (typed).

Naming conventions

Python language requires all module and function names in snake case and all class names in pascal case.

A typical imports section of big_map handler callback looks like this:

from <package>.types.<typename>.storage import TypeNameStorage
from <package>.types.<typename>.parameter.<entrypoint> import EntryPointParameter
from <package>.types.<typename>.big_map.<path>_key import PathKey
from <package>.types.<typename>.big_map.<path>_value import PathValue

Here typename is defined in the contract inventory, entrypoint is specified in the handler pattern, and path is in the handler config.

Handling name collisions

Indexing operations of multiple contracts with the same entrypoints can lead to name collisions during code generation. In this case DipDup raises a ConfigurationError and suggests to set alias for each conflicting handler. That applies to operation indexes only. Consider the following index definition, some kind of "chain minting" contract:

kind: operation
handlers:
  - callback: on_mint
    pattern:
    - type: transaction
      entrypoint: mint
      alias: foo_mint
    - type: transaction
      entrypoint: mint
      alias: bar_mint

The following code will be generated for on_mint callback:

from example.types.foo.parameter.mint import MintParameter as FooMintParameter
from example.types.foo.storage import FooStorage
from example.types.bar.parameter.mint import MintParameter as BarMintParameter
from example.types.bar.storage import BarStorage


async def on_transaction(
    ctx: HandlerContext,
    foo_mint: Transaction[FooMintParameter, FooStorage],
    bar_mint: Transaction[BarMintParameter, BarStorage]
) -> None:
    ...

You can safely change argument names if you want to.

Templates and variables

Environment variables

DipDup supports compose-style variable expansion with optional default value:

database:
  kind: postgres
  host: ${POSTGRES_HOST:-localhost}
  password: ${POSTGRES_PASSWORD}

You can use environment variables anywhere throughout the configuration file. Consider the following example (absolutely useless but illustrative):

custom:
  ${FOO}: ${BAR:-bar}
  ${FIZZ:-fizz}: ${BUZZ}

Running FOO=foo BUZZ=buzz dipdup config export --unsafe will produce the following output:

custom:
  fizz: buzz
  foo: bar

Use this feature to store sensitive data outside of the configuration file and make your app fully declarative.

Index templates

Templates allow you to reuse index configuration, e.g., for different networks (mainnet/ghostnet) or multiple contracts sharing the same codebase.

templates:
  my_template:
    kind: operation
    datasource: <datasource>
    contracts:
      - <contract>
    handlers:
      - callback: callback
        pattern:
          - destination: <contract>
            entrypoint: call

Templates have the same syntax as indexes of all kinds; the only difference is that they additionally support placeholders enabling parameterization:

field: <placeholder>

The template above can be resolved in the following way:

contracts:
  some_dex: ...

datasources:
  tzkt: ...

indexes:
  my_template_instance:
    template: my_template
    values:
      datasource: tzkt_mainnet
      contract: some_dex

Any string value wrapped in angle brackets is treated as a placeholder, so make sure there are no collisions with the actual values. You can use a single placeholder multiple times. In contradiction to environment variables, dictionary keys cannot be placeholders.

An index created from a template must have a value for each placeholder; the exception is raised otherwise. These values are available in the handler context as ctx.template_values dictionary.

You can also spawn indexes from templates in runtime. To achieve the same effect as above, you can use the following code:

ctx.add_index(
    name='my_template_instance',
    template='my_template',
    values={
        'datasource': 'tzkt_mainnet',
        'contract': 'some_dex',
    },
)

Indexes

Index — is a primary DipDup entity connecting the inventory and specifying data handling rules.

Each index has a linked TzKT datasource and a set of handlers. Indexes can join multiple contracts considered as a single application. Also, contracts can be used by multiple indexes of any kind, but make sure that data don't overlap. See 2.2. Core concepts → atomicity-and-persistency.

indexes:
  contract_operations:
    kind: operation
    datasource: tzkt_mainnet
    handlers:
      - callback: on_operation
        pattern: ...

Multiple indexes are available for different kinds of blockchain data. Currently, the following options are available:

  • big_map
  • event
  • head
  • operation
  • token_transfer

Every index is linked to specific datasource from 14.6. datasources config section.

Using templates

Index definitions can be templated to reduce the amount of boilerplate code. To create an index from the template during startup, add an item with the template and values field to the indexes section:

templates:
  operation_index_template:
    kind: operation
    datasource: <datasource>
    ...

indexes:
  template_instance:
    template: operation_index_template
    values:
      datasource: tzkt_mainnet

You can also create indexes from templates later in runtime. See 2.7. Templates and variables page.

Indexing scope

One can optionally specify block levels DipDup has to start and stop indexing at, e.g., there's a new version of the contract, and there's no need to track the old one anymore.

indexes:
  my_index:
    first_level: 1000000
    last_level: 2000000

big_map index

Big maps are lazy structures allowing to access and update only exact keys. Gas costs for these operations doesn't depend on the size of a big map, but you can't iterate over it's keys onchain.

big_map index allows querying only updates of specific big maps. In some cases, it can drastically reduce the amount of data transferred and thus indexing speed compared to fetching all operations.

indexes:
  token_big_map_index:
    kind: big_map
    datasource: tzkt
    skip_history: never
    handlers:
      - callback: on_ledger_update
        contract: token
        path: data.ledger
      - callback: on_token_metadata_update
        contract: token
        path: token_metadata

Handlers

Each big map handler contains three required fields:

  • callback — a name of async function with a particular signature; DipDup will search for it in <package>.handlers.<callback> module.
  • contract — big map parent contract
  • path — path to the big map in the contract storage (use dot as a delimiter)

Index only the current state

When the skip_history field is set to once, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.

All big map diffs DipDup passes to handlers during fast sync have the action field set to BigMapAction.ADD_KEY. Remember that DipDup fetches all keys in this mode, including ones removed from the big map. You can filter them out later by BigMapDiff.data.active field if needed.

event index

Kathmandu Tezos protocol upgrade has introduced contract events, a new way to interact with smart contracts. This index allows indexing events using strictly typed payloads. From the developer's perspective, it's similar to the big_map index with a few differences.

An example below is artificial since no known contracts in mainnet are currently using events.

        contract: events_contract
        tag: move
      - callback: on_roll_event
        contract: events_contract
        tag: roll
      - callback: on_other_event
        contract: events_contract

Unlike big maps, contracts may introduce new event tags and payloads at any time, so the index must be updated accordingly.

async def on_move_event(
    ctx: HandlerContext,
    event: Event[MovePayload],
) -> None: ...

Each contract can have a fallback handler called for all unknown events so you can process untyped data.

async def on_other_event(
    ctx: HandlerContext,
    event: UnknownEvent,
) -> None: ...

head index

This very simple index provides metadata of the latest block when it's baked. Only realtime data is processed; the synchronization stage is skipped for this index.

spec_version: 1.2
package: demo_head

database:
  kind: sqlite
  path: demo-head.sqlite3

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: ${TZKT_URL:-https://api.tzkt.io}

indexes:
  mainnet_head:
    kind: head
    datasource: tzkt_mainnet
    handlers:
      - callback: on_mainnet_head

Head index callback receives HeadBlockData model that contains only basic info; no operations are included. Being useless by itself, this index is helpful for monitoring and cron-like tasks. You can define multiple indexes for each datasource used.

Subscription to the head channel is enabled by default, even if no head indexes are defined. Each time the block is baked, the dipdup_head table is updated per datasource. Use it to ensure that both index datasource and underlying blockchain are up and running.

See Also

Operation group in BCD explorer `operation` index config

operation index

Operation index allows you to query only operations related to your dapp and match them with handlers by content. A single contract call consists of implicit operation and, optionally, internal operations. For each of them, you can specify a handler that will be called when the operation group matches. As a result, you get something like an event log for your dapp.

Handlers

Each operation handler contains two required fields:

  • callback — a name of async function with a particular signature; DipDup will search for it in <package>.handlers.<callback> module.
  • pattern — a non-empty list of items that need to be matched.
indexes:
  my_index:
    kind: operation
    datasource: tzkt
    contracts:
      - some_contract
    handlers:
      - callback: on_call
        pattern:
          - destination: some_contract
            entrypoint: transfer

You can think of the operation pattern as a regular expression on a sequence of operations (both external and internal) with a global flag enabled (there can be multiple matches). Multiple operation parameters can be used for matching (source, destination, etc.).

You will get slightly different callback argument types depending on whether pattern item is typed or not. If so, DipDup will generate the dataclass for a particular entrypoint/storage, otherwise you will have to handle untyped parameters/storage updates stored in OperationData model.

Matching originations

namedescriptionsupportedtyped
originated_contract.addressOrigination of exact contract.
originated_contract.code_hashOriginations of all contracts having the same code.
source.addressSpecial cases only. This filter is very slow and doesn't support strict typing. Usually, originated_contract.code_hash suits better.
source.code_hashCurrently not supported.
similar_to.addressCompatibility alias to originated_contract.code_hash. Can be removed some day.➡️➡️
similar_to.code_hashCompatibility alias to originated_contract.code_hash. Can be removed some day.➡️➡️

Matching transactions

namedescriptionsupportedtyped
source.addressSent by exact address.N/A
source.code_hashSent by any contract having this code hashN/A
destination.addressInvoked contract address
destination.code_hashInvoked contract code hash
destination.entrypointEntrypoint called

Optional items

Pattern items have optional field to continue matching even if this item is not found. It's usually unnecessary to match the entire operation content; you can skip external/internal calls that are not relevant. However, there is a limitation: optional items cannot be followed by operations ignored by the pattern.

pattern:
  # Implicit transaction
  - destination: some_contract
    entrypoint: mint

  # Internal transactions below
  - destination: another_contract
    entrypoint: transfer

  - source: some_contract
    type: transaction

Specifying contracts to index

DipDup will try to guess the list of used contracts by handlers' signatures. If you want to specify it explicitly, use contracts field:

indexes:
  my_index:
    kind: operation
    datasource: tzkt
    contracts:
      - foo
      - bar

Specifying operation types

By default, DipDup processes only transactions, but you can enable other operation types you want to process (currently, transaction, origination, and migration are supported).

indexes:
  my_index:
    kind: operation
    datasource: tzkt
    types:
      - transaction
      - origination
      - migration

token_transfer index

This index allows indexing token transfers of contracts compatible with FA1.2 or FA2 standards.

spec_version: 1.2
package: demo_token_transfers

database:
  kind: sqlite
  path: demo-token-transfers.sqlite3

contracts:
  tzbtc_mainnet:
    address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
    typename: tzbtc

datasources:
  tzkt:
    kind: tzkt
    url: https://api.tzkt.io

indexes:
  tzbtc_holders_mainnet:
    kind: token_transfer
    datasource: tzkt
    handlers:
      - callback: on_token_transfer
        contract: tzbtc_mainnet

Callback receives TokenTransferData model that optionally contains the transfer sender, receiver, amount, and token metadata.

from decimal import Decimal
from decimal import InvalidOperation

from demo_token_transfers.handlers.on_balance_update import on_balance_update
from dipdup.context import HandlerContext
from dipdup.models import TokenTransferData


async def on_token_transfer(
    ctx: HandlerContext,
    token_transfer: TokenTransferData,
) -> None:
    from_, to = token_transfer.from_address, token_transfer.to_address
    if not from_ or not to or from_ == to:
        return
    try:
        amount = Decimal(token_transfer.amount or 0) / (10**8)
    except InvalidOperation:
        return
    if not amount:
        return

    await on_balance_update(address=from_, balance_update=-amount, timestamp=token_transfer.timestamp)
    await on_balance_update(address=to, balance_update=amount, timestamp=token_transfer.timestamp)

GraphQL API

In this section, we assume you use Hasura GraphQL Engine integration to power your API.

Before starting to do client integration, it's good to know the specifics of Hasura GraphQL protocol implementation and the general state of the GQL ecosystem.

Queries

By default, Hasura generates three types of queries for each table in your schema:

  • Generic query enabling filters by all columns
  • Single item query (by primary key)
  • Aggregation query (can be disabled in config)

All the GQL features such as fragments, variables, aliases, directives are supported, as well as batching.
Read more in Hasura docs.

It's important to understand that a GraphQL query is just a POST request with JSON payload, and in some instances, you don't need a complicated library to talk to your backend.

Pagination

By default, Hasura does not restrict the number of rows returned per request, which could lead to abuses and a heavy load on your server. You can set up limits in the configuration file. See 14.7. hasura → limit-number-of-rows. But then, you will face the need to paginate over the items if the response does not fit the limits.

Subscriptions

From Hasura documentation:

Hasura GraphQL engine subscriptions are live queries, i.e., a subscription will return the latest result of the query and not necessarily all the individual events leading up to it.

This feature is essential to avoid complex state management (merging query results and subscription feed). In most scenarios, live queries are what you need to sync the latest changes from the backend.

If the live query has a significant response size that does not fit into the limits, you need one of the following:

  1. Paginate with offset (which is not convenient)
  2. Use cursor-based pagination (e.g., by an increasing unique id).
  3. Narrow down request scope with filtering (e.g., by timestamp or level).

Ultimately you can get "subscriptions" on top of live quires by requesting all the items having ID greater than the maximum existing or all the items with a timestamp greater than now.

Websocket transport

Hasura is compatible with subscriptions-transport-ws library, which is currently deprecated but still used by most clients.

Mutations

The purpose of DipDup is to create indexers, which means you can consistently reproduce the state as long as data sources are accessible. It makes your backend "stateless", meaning tolerant to data loss.

However, you might need to introduce a non-recoverable state and mix indexed and user-generated content in some cases. DipDup allows marking these UGC tables "immune", protecting them from being wiped. In addition to that, you will need to set up Hasura Auth and adjust write permissions for the tables (by default, they are read-only).

Lastly, you will need to execute GQL mutations to modify the state from the client side. Read more about how to do that with Hasura.

Hasura integration

DipDup uses this optional section to configure the Hasura engine to track your tables automatically.

hasura:
  url: http://hasura:8080
  admin_secret: ${HASURA_ADMIN_SECRET:-changeme}

If you have enabled this integration, DipDup will generate Hasura metadata based on your DB schema and apply it using Metadata API.

Hasura metadata is all about data representation in GraphQL API. The structure of the database itself is managed solely by Tortoise ORM.

Metadata configuration is idempotent: each time you call run or hasura configure command, DipDup queries the existing schema and updates metadata if required. DipDup configures Hasura after reindexing, saves the hash of resulting metadata in the dipdup_schema table, and doesn't touch Hasura until needed.

Database limitations

The current version of Hasura GraphQL Engine treats public and other schemas differently. Table schema.customer becomes schema_customer root field (or schemaCustomer if camel_case option is enabled in DipDup config). Table public.customer becomes customer field, without schema prefix. There's no way to remove this prefix for now. You can track related issue on Hasura's GitHub to know when the situation will change. Starting with 3.0.0-rc1, DipDup enforces public schema name to avoid ambiguity and issues with the GenQL library. You can still use any schema name if Hasura integration is not enabled.

Unauthorized access

DipDup creates user role that allows querying /graphql endpoint without authorization. All tables are set to read-only for this role.

You can limit the maximum number of rows such queries return and also disable aggregation queries automatically generated by Hasura:

hasura:
  select_limit: 100
  allow_aggregations: False

Note that with limits enabled, you have to use either offset or cursor-based pagination on the client-side.

Convert field names to camel case

For those of you from the JavaScript world, it may be more familiar to use camelCase for variable names instead of snake_case Hasura uses by default. DipDup now allows to convert all fields in metadata to this casing:

hasura:
  camel_case: true

Now this example query to hic et nunc demo indexer...

query MyQuery {
  hic_et_nunc_token(limit: 1) {
    id
    creator_id
  }
}

...will become this one:

query MyQuery {
  hicEtNuncToken(limit: 1) {
    id
    creatorId
  }
}

All fields auto-generated by Hasura will be renamed accordingly: hic_et_nunc_token_by_pk to hicEtNuncTokenByPk, delete_hic_et_nunc_token to deleteHicEtNuncToken and so on. To return to defaults, set camel_case to False and run hasura configure --force.

Remember that "camelcasing" is a separate stage performed after all tables are registered. So during configuration, you can observe fields in snake_case for several seconds even if conversion to camel case is enabled.

Custom Hasura Metadata

There are some cases where you want to apply custom modifications to the Hasura metadata. For example, assume that your database schema has a view that contains data from the main table, in which case you cannot set a foreign key between them. Then you can place files with a .json extension in the hasura directory of your project with the content in Hasura query format, and DipDup will execute them in alphabetical order of file names when the indexing is complete.

The format of the queries can be found in the Metadata API documentation.

Feature flag allow_inconsistent_metadata set in hasura configuration section allows users to modify the behavior of the requests error handling. By default, this value is False.

REST endpoints

Hasura 2.0 introduced the ability to expose arbitrary GraphQL queries as REST endpoints. By default, DipDup will generate GET and POST endpoints to fetch rows by primary key for all tables:

curl http://127.0.0.1:8080/api/rest/hicEtNuncHolder?address=tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw
{
  "hicEtNuncHolderByPk": {
    "address": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw"
  }
}

However, there's a limitation dictated by how Hasura parses HTTP requests: only models with primary keys of basic types (int, string, and so on) can be fetched with GET requests. An attempt to fetch model with BIGINT primary key will lead to the error: Expected bigint for variable id got Number. A workaround to fetching any model is to send a POST request containing a JSON payload with a single key:

curl -d '{"id": 152}' http://127.0.0.1:8080/api/rest/hicEtNuncToken
{
  "hicEtNuncTokenByPk": {
    "creatorId": "tz1UBZUkXpKGhYsP5KtzDNqLLchwF4uHrGjw",
    "id": 152,
    "level": 1365242,
    "supply": 1,
    "timestamp": "2021-03-01T03:39:21+00:00"
  }
}

We hope to get rid of this limitation someday and will let you know as soon as it happens.

Custom endpoints

You can put any number of .graphql files into graphql directory in your project's root, and DipDup will create REST endpoints for each of those queries. Let's say we want to fetch not only a specific token, but also the number of all tokens minted by its creator:

query token_and_mint_count($id: bigint) {
  hicEtNuncToken(where: {id: {_eq: $id}}) {
    creator {
      address
      tokens_aggregate {
        aggregate {
          count
        }
      }
    }
    id
    level
    supply
    timestamp
  }
}

Save this query as graphql/token_and_mint_count.graphql and run dipdup configure-hasura. Now, this query is available via REST endpoint at http://127.0.0.1:8080/api/rest/token_and_mint_count.

You can disable exposing of REST endpoints in the config:

hasura:
  rest: False

GenQL

GenQL is a great library and CLI tool that automatically generates a fully typed SDK with a built-in GQL client. It works flawlessly with Hasura and is recommended for DipDup on the client-side.

Project structure

GenQL CLI generates a ready-to-use package, compiled and prepared to publish to NPM. A typical setup is a mono repository containing several packages, including the auto-generated SDK and your front-end application.

project_root/
├── package.json
└── packages/
    ├── app/
    │   ├── package.json
    │   └── src/
    └── sdk/
        └── package.json

SDK package config

Your minimal package.json file will look like the following:

{
  "name": "%PACKAGE_NAME%",
  "version": "0.0.1",
  "main": "dist/index.js",
  "types": "dist/index.d.ts",
  "devDependencies": {
    "@genql/cli": "^2.6.0"
  },
  "dependencies": {
    "@genql/runtime": "2.6.0",
    "graphql": "^15.5.0"
  },
  "scripts": {
    "build": "genql --endpoint %GRAPHQL_ENDPOINT% --output ./dist"
  }
}

That's it! Now you only need to install dependencies and execute the build target:

yarn
yarn build

Read more about CLI options available.

Demo

Create a package.json file with

  • %PACKAGE_NAME% => metadata-sdk
  • %GRAPHQL_ENDPOINT% => https://metadata.dipdup.net/v1/graphql

And generate the client:

yarn
yarn build

Then create new file index.ts and paste this query:

import { createClient, everything } from './dist'

const client = createClient()

client.chain.query
    .token_metadata({ where: { network: { _eq: 'mainnet' } }})
    .get({ ...everything })
    .then(res => console.log(res))

We need some additional dependencies to run our sample:

yarn add typescript ts-node

Finally:

npx ts-node index.ts

You should see a list of tokens with metadata attached in your console.

Advanced usage

In this section, you will find information about advanced DipDup features.

Datasources

Datasources are DipDup connectors to various APIs. The table below shows how different datasources can be used.

Index datasource is the one used by DipDup internally to process specific index (set with datasource: ... in config). Currently, it can be only tzkt. Datasources available in context can be accessed in handlers and hooks via ctx.get_<kind>_datasource() methods and used to perform arbitrary requests. Finally, standalone services implement a subset of DipDup datasources and config directives. You can't use services-specific datasources like tezos-node in the main framework, they are here for informational purposes only.

indexcontextmempool servicemetadata service
tzkt
tezos-node
coinbase
metadata
ipfs
http

requiredsupportednot supported

TzKT

TzKT provides REST endpoints to query historical data and SignalR (Websocket) subscriptions to get realtime updates. Flexible filters allow you to request only data needed for your application and drastically speed up the indexing process.

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io

The number of items in each request can be configured with batch_size directive. Affects request number and memory usage.

datasources:
  tzkt_mainnet:
    http:
      ...
      batch_size: 10000

The rest HTTP tunables are the same as for other datasources.

Also, you can wait for several block confirmations before processing the operations:

datasources:
  tzkt_mainnet:
    ...
    buffer_size: 1  # indexing with a single block lag

Since 6.0 chain reorgs are processed automatically, but you may find this feature useful for other cases.

Tezos node

Tezos RPC is a standard interface provided by the Tezos node. This datasource is used solely by mempool and metadata standalone services; you can't use it in regular DipDup indexes.

datasources:
  tezos_node_mainnet:
    kind: tezos-node
    url: https://mainnet-tezos.giganode.io

Coinbase

A connector for Coinbase Pro API. Provides get_candles and get_oracle_data methods. It may be useful in enriching indexes of DeFi contracts with off-chain data.

datasources:
  coinbase:
    kind: coinbase

Please note that Coinbase can't replace TzKT being an index datasource. But you can access it via ctx.datasources mapping both within handler and job callbacks.

DipDup Metadata

dipdup-metadata is a standalone companion indexer for DipDup written in Go. Configure datasource in the following way:

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet | ithacanet

Then, in your hook or handler code:

datasource = ctx.get_metadata_datasource('metadata')
token_metadata = await datasource.get_token_metadata('KT1...', '0')

IPFS

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup has a separate datasource to perform such requests via public nodes.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

HTTP (generic)

If you need to perform arbitrary requests to APIs not supported by DipDup, use generic HTTP datasource instead of plain aiohttp requests. That way you can use the same features DipDup uses for internal requests: retry with backoff, rate limiting, Prometheus integration etc.

datasources:
  my_api:
    kind: http
    url: https://my_api.local/v1
api = ctx.get_http_datasource('my_api')
response = await api.request(
    method='get',
    url='hello',  # relative to URL in config
    weigth=1,  # ratelimiter leaky-bucket drops
    params={
      'foo': 'bar',
    },
)

All DipDup datasources are inherited from http, so you can send arbitrary requests with any datasource. Let's say you want to fetch the protocol of the chain you're currently indexing (tzkt datasource doesn't have a separate method for it):

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, retry with backoff, etc.) are applied on every request.

Hooks

Hooks are user-defined callbacks called either from the ctx.fire_hook method or by the job scheduler.

Let's assume we want to calculate some statistics on-demand to avoid blocking an indexer with heavy computations. Add the following lines to the DipDup config:

hooks:
  calculate_stats:
    callback: calculate_stats
    atomic: False
    args:
     major: bool
     depth: int

Here are a couple of things here to pay attention to:

  • An atomic option defines whether the hook callback will be wrapped in a single SQL transaction or not. If this option is set to true main indexing loop will be blocked until hook execution is complete. Some statements, like REFRESH MATERIALIZED VIEW, do not require to be wrapped in transactions, so choosing a value of the atomic option could decrease the time needed to perform initial indexing.
  • Values of args mapping are used as type hints in a signature of a generated callback. We will return to this topic later in this article.

Now it's time to call dipdup init. The following files will be created in the project's root:

├── hooks
│   └── calculate_stats.py
└── sql
    └── calculate_stats
        └── .keep

Content of the generated callback stub:

from dipdup.context import HookContext

async def calculate_stats(
    ctx: HookContext,
    major: bool,
    depth: int,
) -> None:
    await ctx.execute_sql('calculate_stats')

By default, hooks execute SQL scripts from the corresponding subdirectory of sql/. Remove or comment out the execute_sql call to prevent this. This way, both Python and SQL code may be executed in a single hook if needed.

Arguments typechecking

DipDup will ensure that arguments passed to the hooks have the correct types when possible. CallbackTypeError exception will be raised otherwise. Values of an args mapping in a hook config should be either built-in types or __qualname__ of external type like decimal.Decimal. Generic types are not supported: hints like Optional[int] = None will be correctly parsed during codegen but ignored on type checking.

See Also

Event hooks

Every DipDup project has multiple event hooks (previously "default hooks"); they fire on system-wide events and, like regular hooks, are not linked to any index. Names of those hooks are reserved; you can't use them in config. It's also impossible to fire them manually or with a job scheduler.

on_restart

This hook executes right before starting indexing. It allows configuring DipDup in runtime based on data from external sources. Datasources are already initialized at execution and available at ctx.datasources. You can, for example, configure logging here or add contracts and indexes in runtime instead of from static config.

on_reindex

This hook fires after the database are re-initialized after reindexing (wipe). Helpful in modifying schema with arbitrary SQL scripts before indexing.

on_synchronized

This hook fires when every active index reaches a realtime state. Here you can clear caches internal caches or do other cleanups.

on_index_rollback

Fires when TzKT datasource has received a chain reorg message which can't be processed by dropping buffered messages (buffer_size option).

Since version 6.0 this hook performs a database-level rollback by default. If it doesn't work for you for some reason remove ctx.rollback call and implement your own rollback logic.

Job scheduler

Jobs are schedules for hooks. In some cases, it may come in handy to have the ability to run some code on schedule. For example, you want to calculate statistics once per hour instead of every time handler gets matched.

Add the following section to the DipDup config:

jobs:
  midnight_stats:
    hook: calculate_stats
    crontab: "0 0 * * *"
    args:
      major: True
  leet_stats:
    hook: calculate_stats
    interval: 1337  # in seconds
    args:
      major: False

If you're unfamiliar with the crontab syntax, an online service crontab.guru will help you build the desired expression.

Scheduler configuration

DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler will spawn up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:

advanced:
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3

See apscheduler docs for details.

Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception will be raised.

Reindexing

In some cases, DipDup can't proceed with indexing without a full wipe. Several reasons trigger reindexing:

reasondescription
manualReindexing triggered manually from callback with ctx.reindex.
migrationApplied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared.
rollbackReorg message received from TzKT can not be processed.
config_modifiedOne of the index configs has been modified.
schema_modifiedDatabase schema has been modified. Try to avoid manual schema modifications in favor of 5.7. SQL scripts.

It is possible to configure desirable action on reindexing triggered by a specific reason.

actiondescription
exception (default)Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers.
wipeDrop the whole database and start indexing from scratch. Be careful with this option!
ignoreIgnore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing.

To configure actions for each reason, add the following section to the DipDup config:

advanced:
  ...
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

Feature flags

Feature flags set in advanced config section allow users to modify parameters that affect the behavior of the whole framework. Choosing the right combination of flags for an indexer project can improve performance, reduce RAM consumption, or enable useful features.

flagdescription
crash_reportingEnable sending crash reports to the Baking Bad team
early_realtimeStart collecting realtime messages while sync is in progress
merge_subscriptionsSubscribe to all operations/big map diffs during realtime indexing
metadata_interfaceEnable contract and token metadata interfaces
postpone_jobsDo not start the job scheduler until all indexes are synchronized
skip_version_checkDisable warning about running unstable or out-of-date DipDup version

Crash reporting

Enables sending crash reports to the Baking Bad team. This is disabled by default. You can inspect crash dumps saved as /tmp/dipdup/crashdumps/XXXXXXX.json before enabling this option.

Early realtime

By default, DipDup enters a sync state twice: before and after establishing a realtime connection. This flag allows collecting realtime messages while the sync is in progress, right after indexes load.

Let's consider two scenarios:

  1. Indexing 10 contracts with 10 000 operations each. Initial indexing could take several hours. There is no need to accumulate incoming operations since resync time after establishing a realtime connection depends on the contract number, thus taking a negligible amount of time.

  2. Indexing 10 000 contracts with 10 operations each. Both initial sync and resync will take a while. But the number of operations received during this time won't affect RAM consumption much.

If you do not have strict RAM constraints, it's recommended to enable this flag. You'll get faster indexing times and decreased load on TzKT API.

Merge subscriptions

Subscribe to all operations/big map diffs during realtime indexing instead of separate channels. This flag helps to avoid the 10.000 subscription limit of TzKT and speed up processing. The downside is an increased RAM consumption during sync, especially if early_realtime flag is enabled too.

Metadata interface

Without this flag calling ctx.update_contract_metadata and ctx.update_token_metadata methods will have no effect. Corresponding internal tables are created on reindexing in any way.

Postpone jobs

Do not start the job scheduler until all indexes are synchronized. If your jobs perform some calculations that make sense only after the indexer has reached realtime, this toggle can save you some IOPS.

Skip version check

Disables warning about running unstable or out-of-date DipDup version.

Internal environment variables

DipDup uses multiple environment variables internally. They read once on process start and usually do not change during runtime. Some variables modify the framework's behavior, while others are informational.

Please note that they are not currently a part of the public API and can be changed without notice.

env variablemodule pathdescription
DIPDUP_CIdipdup.env.CIRunning in GitHub Actions
DIPDUP_DOCKERdipdup.env.DOCKERRunning in Docker
DIPDUP_DOCKER_IMAGEdipdup.env.DOCKER_IMAGEBase image used when building Docker image (default, slim or pytezos)
DIPDUP_NEXTdipdup.env.NEXTEnable features thar require schema changes
DIPDUP_PACKAGE_PATHdipdup.env.PACKAGE_PATHPath to the currently used package
DIPDUP_REPLAY_PATHdipdup.env.REPLAY_PATHPath to datasource replay files; used in tests
DIPDUP_TESTdipdup.env.TESTRunning in pytest

DIPDUP_NEXT flag will give you the picture of what's coming in the next major release, but enabling it on existing schema will trigger a reindexing.

SQL scripts

Put your *.sql scripts to <package>/sql. You can run these scripts from any callback with ctx.execute_sql('name'). If name is a directory, each script it contains will be executed.

Scripts are executed without being wrapped with SQL transactions. It's generally a good idea to avoid touching table data in scripts.

SQL scripts are ignored if SQLite is used as a database backend.

By default, an empty sql/<hook_name> directory is generated for every hook in config during init. Remove ctx.execute_sql call from hook callback to avoid executing them.

Event hooks

Scripts from sql/on_restart directory are executed each time you run DipDup. Those scripts may contain CREATE OR REPLACE VIEW or similar non-destructive operations.

Scripts from sql/on_reindex directory are executed after the database schema is created based on the models.py module but before indexing starts. It may be useful to change the database schema in ways that are not supported by the Tortoise ORM, e.g., to create a composite primary key.

See Also

Improving performance

This page contains tips that may help to increase indexing speed.

Optimize database schema

Postgres indexes are tables that Postgres can use to speed up data lookup. A database index acts like a pointer to data in a table, just like an index in a printed book. If you look in the index first, you will find the data much quicker than searching the whole book (or — in this case — database).

You should add indexes on columns often appearing in WHERE clauses in your GraphQL queries and subscriptions.

Tortoise ORM uses BTree indexes by default. To set index on a field, add index=True to the field definition:

from dipdup.models import Model
from tortoise import fields


class Trade(Model):
    id = fields.BigIntField(pk=True)
    amount = fields.BigIntField()
    level = fields.BigIntField(index=True)
    timestamp = fields.DatetimeField(index=True)

Tune datasources

All datasources now share the same code under the hood to communicate with underlying APIs via HTTP. Configs of all datasources and also Hasura's one can have an optional section http with any number of the following parameters set:

datasources:
  tzkt:
    kind: tzkt
    ...
    http:
      retry_count: 10
      retry_sleep: 1
      retry_multiplier: 1.2
      ratelimit_rate: 100
      ratelimit_period: 60
      connection_limit: 25
      batch_size: 10000
hasura:
  url: http://hasura:8080
  http:
    ...
fielddescription
retry_countNumber of retries after request failed before giving up
retry_sleepSleep time between retries
retry_multiplierMultiplier for sleep time between retries
ratelimit_rateNumber of requests per period ("drops" in leaky bucket)
ratelimit_periodPeriod for rate limiting in seconds
connection_limitNumber of simultaneous connections
connection_timeoutConnection timeout in seconds
batch_sizeNumber of items fetched in a single paginated request (for some APIs)

Each datasource has its defaults. Usually, there's no reason to alter these settings unless you use self-hosted instances of TzKT or other datasource.

By default, DipDup retries failed requests infinitely, exponentially increasing the delay between attempts. Set retry_count parameter to limit the number of attempts.

batch_size parameter is TzKT-specific. By default, DipDup limit requests to 10000 items, the maximum value allowed on public instances provided by Baking Bad. Decreasing this value will reduce the time required for TzKT to process a single request and thus reduce the load. By reducing the connection_limit parameter, you can achieve the same effect (limited to synchronizing multiple indexes concurrently).

See 12.4. datasources for details.

Use TimescaleDB for time-series

This page or paragraph is yet to be written. Come back later.

DipDup is fully compatible with TimescaleDB. Try its "continuous aggregates" feature, especially if dealing with market data like DEX quotes.

Cache commonly used models

If your indexer contains models having few fields and used primarily on relations, you can cache such models during synchronization.

Example code:

class Trader(Model):
    address = fields.CharField(36, pk=True)


class TraderCache:
    def __init__(self, size: int = 1000) -> None:
        self._size = size
        self._traders: OrderedDict[str, Trader] = OrderedDict()

    async def get(self, address: str) -> Trader:
        if address not in self._traders:
            # NOTE: Already created on origination
            self._traders[address], _ = await Trader.get_or_create(address=address)
              if len(self._traders) > self._size:
                self._traders.popitem(last=False)

        return self._traders[address]

trader_cache = TraderCache()

Use trader_cache.get in handlers. After sync is complete, you can clear this cache to free some RAM:

async def on_synchronized(
    ctx: HookContext,
) -> None:
    ...
    models.trader_cache.clear()

Perform heavy computations in separate processes

It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with AsyncExitStack() as stack:
        await stack.enter_async_context(tortoise_wrapper(url, models))
        ...

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=False)

Then use python -m <project>.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.

Callback context (ctx)

An instance of the HandlerContext class is passed to every handler providing a set of helper methods and read-only properties.

Reference

class dipdup.context.DipDupContext(datasources, config, callbacks, transactions)

Common execution context for handler and hook callbacks.

Parameters:
  • datasources (dict[str, Datasource]) – Mapping of available datasources

  • config (DipDupConfig) – DipDup configuration

  • logger – Context-aware logger instance

  • callbacks (CallbackManager) –

  • transactions (TransactionManager) –

class dipdup.context.HandlerContext(datasources, config, callbacks, transactions, logger, handler_config, datasource)

Execution context of handler callbacks.

Parameters:
  • handler_config (HandlerConfig) – Configuration of the current handler

  • datasource (TzktDatasource) – Index datasource instance

  • datasources (dict[str, Datasource]) –

  • config (DipDupConfig) –

  • callbacks (CallbackManager) –

  • transactions (TransactionManager) –

  • logger (FormattedLogger) –

class dipdup.context.HookContext(datasources, config, callbacks, transactions, logger, hook_config)

Execution context of hook callbacks.

Parameters:
  • hook_config (HookConfig) – Configuration of the current hook

  • datasources (dict[str, Datasource]) –

  • config (DipDupConfig) –

  • callbacks (CallbackManager) –

  • transactions (TransactionManager) –

  • logger (FormattedLogger) –

async DipDupContext.add_contract(name, address=None, typename=None, code_hash=None)

Adds contract to the inventory.

Parameters:
  • name (str) – Contract name

  • address (str | None) – Contract address

  • typename (str | None) – Alias for the contract script

  • code_hash (str | int | None) – Contract code hash

Return type:

None

async DipDupContext.add_index(name, template, values, first_level=0, last_level=0, state=None)

Adds a new contract to the inventory.

Parameters:
  • name (str) – Index name

  • template (str) – Index template to use

  • values (dict[str, Any]) – Mapping of values to fill template with

  • first_level (int) –

  • last_level (int) –

  • state (Index | None) –

Return type:

None

async DipDupContext.execute_sql(name, *args, **kwargs)

Executes SQL script(s) with given name.

If the name path is a directory, all .sql scripts within it will be executed in alphabetical order.

Parameters:
  • name (str) – File or directory within project’s sql directory

  • args (Any) –

  • kwargs (Any) –

Return type:

None

async DipDupContext.execute_sql_query(name, *args)

Executes SQL query with given name

Parameters:
  • name (str) – SQL query name within <project>/sql directory

  • args (Any) –

Return type:

Any

async DipDupContext.fire_hook(name, fmt=None, wait=True, *args, **kwargs)

Fire hook with given name and arguments.

Parameters:
  • name (str) – Hook name

  • fmt (str | None) – Format string for ctx.logger messages

  • wait (bool) – Wait for hook to finish or fire and forget

  • args (Any) –

  • kwargs (Any) –

Return type:

None

DipDupContext.get_coinbase_datasource(name)

Get coinbase datasource by name

Parameters:

name (str) –

Return type:

CoinbaseDatasource

DipDupContext.get_http_datasource(name)

Get http datasource by name

Parameters:

name (str) –

Return type:

HttpDatasource

DipDupContext.get_ipfs_datasource(name)

Get ipfs datasource by name

Parameters:

name (str) –

Return type:

IpfsDatasource

DipDupContext.get_metadata_datasource(name)

Get metadata datasource by name

Parameters:

name (str) –

Return type:

MetadataDatasource

DipDupContext.get_tzkt_datasource(name)

Get tzkt datasource by name

Parameters:

name (str) –

Return type:

TzktDatasource

async DipDupContext.reindex(reason=None, **context)

Drops the entire database and starts the indexing process from scratch.

Parameters:
  • reason (str | ReindexingReason | None) – Reason for reindexing in free-form string

  • context (Any) – Additional information to include in exception message

Return type:

None

async DipDupContext.restart()

Restart process and continue indexing.

Return type:

None

async DipDupContext.update_contract_metadata(network, address, metadata)

Inserts or updates corresponding rows in the internal dipdup_contract_metadata table to provide a generic metadata interface (see docs).

Parameters:
  • network (str) – Network name (e.g. mainnet)

  • address (str) – Contract address

  • metadata (dict[str, Any]) – Contract metadata to insert/update

Return type:

None

async DipDupContext.update_token_metadata(network, address, token_id, metadata)

Inserts or updates corresponding rows in the internal dipdup_token_metadata table to provide a generic metadata interface (see docs).

Parameters:
  • network (str) – Network name (e.g. mainnet)

  • address (str) – Contract address

  • token_id (str) – Token ID

  • metadata (dict[str, Any]) – Token metadata to insert/update

Return type:

None

async HookContext.rollback(index, from_level, to_level)

Rollback index to a given level reverting all changes made since that level.

Parameters:
  • index (str) – Index name

  • from_level (int) – Level to rollback from

  • to_level (int) – Level to rollback to

Return type:

None

Internal models

This page describes the internal models used by DipDup. You shouldn't modify data in these models directly.

modeltabledescription
ModelN/ABase class for all models in DipDup project. Provides advanced transaction management.
Schemadipdup_schemaHash of database schema to detect changes that require reindexing.
Headdipdup_headThe latest block received by a datasource from a WebSocket connection.
Indexdipdup_indexIndexing status, level of the latest processed block, template, and template values if applicable.
Contractdipdup_contractNothing useful for us humans. It helps DipDup to keep track of dynamically spawned contracts.
ModelUpdatedipdup_model_updateService table to store model diffs for database rollback.
ContractMetadatadipdup_contract_metadataSee 5.12. Metadata interface
TokenMetadatadipdup_token_metadataSee 5.12. Metadata interface

With the help of these tables, you can set up monitoring of DipDup deployment to know when something goes wrong:

-- This query will return time since the latest block was received by a datasource.
SELECT NOW() - timestamp FROM dipdup_head;

Index factories

This page or paragraph is yet to be written. Come back later.

DipDup allows creating new indexes in runtime. To begin with, you need to define index templates in the top-level templates section of the config. Then call ctx.add_contract and ctx.add_index methods from any user callback.

The most common way to spawn indexes is to create an index that tracks the originations of contracts with similar code or originated by a specific contract. A minimal example looks like this:

contracts:
  registry:
    address: KT19CF3KKrvdW77ttFomCuin2k4uAVkryYqh

indexes:
  factory:
    kind: operation
    datasource: tzkt
    types:
      - origination
    handlers:
      - callback: on_factory_origination
        pattern:
          - type: origination
            similar_to: registry

Another solution is to implement custom logic in on_restart hook (see 5.3. Event hooks → on_restart)

Metadata Interface

When issuing a token on Tezos blockchain, there is an important yet not enough covered aspect related to how various ecosystem applications (wallets, explorers, marketplaces, and others) will display and interact with it. It's about token metadata, stored wholly or partially on-chain but intended for off-chain use only.

Token metadata standards

There are several standards regulating the metadata file format and the way it can be stored and exposed to consumers:

  • TZIP-21 | Rich Metadata — describes a metadata schema and standards for contracts and tokens
  • TZIP-12 | FA2.0 — a standard for a unified token contract interface, includes an article about how to store and encode token metadata
  • TZIP-7 | FA1.2 — single asset token standard; reuses the token metadata approach from FA2.0

Keeping aside the metadata schema, let's focus on which approaches for storing are currently standardized, their pros and cons, and what to do if any of the options available do not fit your case.

The most straightforward approach is to store everything in the contract storage, especially if it's just the basic fields (name, symbol, decimals):

storage
└── token_metadata [big_map]
    └── 0
        ├── token_id: 0
        └── token_info
            ├── name: ""
            ├── symbol: ""
            └── decimals: ""

But typically, you want to store more like a token thumbnail icon, and it is no longer feasible to keep such large data on-chain (because you pay gas for every byte stored).
Then you can put large files somewhere off-chain (e.g., IPFS) and store just links:

storage
└── token_metadata [big_map]
    └── 0
        ├── token_id: 0
        └── token_info
            ├── ...
            └── thumbnailUri: "ipfs://"

This approach is still costly, but sometimes (in rare cases), you need to have access to the metadata from the contract (example: Dogami).
We can go further and put the entire token info structure to IPFS:

storage
└── token_metadata [big_map]
    └── 0
        ├── token_id: 0
        └── token_info
            └── "": "ipfs://"

It is the most common case right now (example: HEN).

The main advantage of the basic approach is that all the changes applied to token metadata will result in big map diffs that are easily traceable by indexers. Even if you decide to replace the off-chain file, it will cause the IPFS link to change. In the case of HTTP links, indexers cannot detect the content change; thus, token metadata won't be updated.

Custom: off-chain view

The second approach presented in the TZIP-12 spec was intended to cover the cases when there's a need in reusing the same token info or when it's not possible to expose the %token_metadata big map in the standard form. Instead, it's offered to execute a special Michelson script against the contract storage and treat the result as the token info for a particular token (requested). The tricky part is that the script code itself is typically stored off-chain, and the whole algorithm would look like this:

  1. Try to fetch the empty string key of the %metadata big map to retrieve the TZIP-16 file location
  2. Resolve the TZIP-16 file (typically from IPFS) — it should contain the off-chain view body
  3. Fetch the current contract storage
  4. Build arguments for the off-chain view token_metadata using fetched storage and
  5. Execute the script using Tezos node RPC

Although this approach is more or less viable for wallets (when you need to fetch metadata for a relatively small amount of tokens), it becomes very inefficient for indexers dealing with millions of tokens:

  • After every contract origination, one has to try to fetch the views (even if there aren't any) — it means synchronous fetching, which can take seconds in the case of IPFS
  • Executing a Michelson script is currently only* possible via Tezos node, and it's quite a heavy call (setting up the VM and contract context takes time)
  • There's no clear way to detect new token metadata addition or change — that is actually the most critical one; you never know for sure when to call the view

Off-chain view approach is not supported by TzKT indexer, and we strongly recommend not to use it, especially for contracts that can issue multiple tokens.

DipDup-based solution

The alternative we offer for the very non-standard cases is using our selective indexing framework for custom token metadata retrieval and then feeding it back to the TzKT indexer, which essentially acts as a metadata aggregator. Note that while this can seem like a circular dependency, it's resolved on the interface level: all custom DipDup metadata indexers should expose specific GraphQL tables with certain fields:

query MyQuery {
  token_metadata() {
    metadata    // TZIP-21 JSON
    network     // mainnet or <protocol>net
    contract    // token contract address
    token_id    // token ID in the scope of the contract
    update_id   // integer cursor used for pagination
  }
}

DipDup handles table management for you and exposes a context-level helper.

Tezos Domains example:

await ctx.update_token_metadata(
    network=ctx.datasource.network,
    address=store_records.data.contract_address,
    token_id=token_id,
    metadata={
        'name': record_name,
        'symbol': 'TD',
        'decimals': '0',
        'isBooleanAmount': True,
        'domainData': decode_domain_data(store_records.value.data)
    },
)

TzKT can be configured to subscribe to one or multiple DipDup metadata sources, currently we use in production:

TzKT token metadata flow

Deployment and operations

This section contains recipes to deploy and maintain DipDup instances.

Database engines

DipDup officially supports the following databases: SQLite, PostgreSQL, TimescaleDB. This table will help you choose a database engine that mostly suits your needs.

SQLitePostgreSQLTimescaleDB
Supported versionslatest13, 14pg13, pg14
Best applicationdevelopmentgeneral usageworking with timeseries
SQL scripts
Immune tables
Hasura integration

By default DipDup uses in-memory SQLite database that destroys after the process exits.

While sometimes it's convenient to use one database engine for development and another one for production, be careful with specific column types that behave differently in various engines. However, Tortoise ORM mostly hides these differences.

Running in Docker

Base images

Warning

-pytezos tag is deprecated and will be removed in the next major release. Also -slim images will be based on Ubuntu instead of Alpine.

DipDup provides multiple prebuilt images for different environments hosted on Docker Hub. Choose the one according to your needs from the table below.

defaultpytezosslim
base imagepython:3.10-slimpython:3.10-slimpython:3.10-alpine
platformsamd64, arm64amd64, arm64amd64, arm64
latest tag66-pytezos6-slim
image size376M496M97M
dipdup init command
git and poetry included
PyTezos included

The default DipDup image is suitable for development and testing. It also includes some tools to make package management easier. If unsure, use this image.

-slim image based on Alpine Linux, thus is much smaller than the default one. Also, it doesn't include codegen functionality (init command, unlikely to be useful in production). This image will eventually become the default one.

-pytezos image includes pre-installed PyTezos library. DipDup doesn't provide any further PyTezos integration. Having some patience, you can build a trading robot or something like that using this image.

Nightly builds (ghcr.io)

In addition to Docker Hub we also publish images on GitHub Container Registry. Builds are triggered on push to any branch for developers' convenience, but only Alpine images are published. Do not use nightlies in production!

# Latest image for `aux/arm64` branch
FROM ghcr.io/dipdup-io/dipdup:aux-arm64

Writing Dockerfile

Start with creating .dockerignore for your project if it's missing.

# Ignore all
*

# Add build files
!Makefile
!pyproject.toml
!poetry.lock
!requirements**
!README.md

# Add code
!src

# Add configs
!*.yml

# Ignore caches
**/.mypy_cache
**/.pytest_cache
**/__pycache__

A typical Dockerfile looks like this:

FROM dipdup/dipdup:6
# FROM dipdup/dipdup:6-pytezos
# FROM dipdup/dipdup:6-slim

# Optional: install additional dependencies using poetry
# COPY pyproject.toml poetry.lock .
# RUN install_dependencies

# Optional: install additional dependencies using pip
# COPY requirements.txt .
# RUN install_dependencies requirements.txt

COPY . .

Note that Poetry integration is not available in the slim image.

Deploying with docker-compose

Make sure you have docker run and docker-compose installed.

Example docker-compose.yml file:

version: "3.8"

services:
  dipdup:
    build: .
    depends_on:
      - db
    command: ["-c", "dipdup.yml", "-c", "dipdup.prod.yml", "run"]
    restart: always
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
      - ADMIN_SECRET=${ADMIN_SECRET:-changeme}
    ports:
      - 127.0.0.1:9000:9000

  db:
    image: postgres:14
    ports:
      - 127.0.0.1:5432:5432
    volumes:
      - db:/var/lib/postgresql/data
    restart: always
    environment:
      - POSTGRES_USER=dipdup
      - POSTGRES_DB=dipdup
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U dipdup"]
      interval: 10s
      timeout: 5s
      retries: 5

  hasura:
    image: hasura/graphql-engine:latest
    ports:
      - 127.0.0.1:8080:8080
    depends_on:
      - db
    restart: always
    environment:
      - HASURA_GRAPHQL_DATABASE_URL=postgres://dipdup:${POSTGRES_PASSWORD:-changeme}@db:5432
      - HASURA_GRAPHQL_ENABLE_CONSOLE=true
      - HASURA_GRAPHQL_DEV_MODE=true
      - HASURA_GRAPHQL_ENABLED_LOG_TYPES=startup, http-log, webhook-log, websocket-log, query-log
      - HASURA_GRAPHQL_ADMIN_SECRET=${HASURA_SECRET:-changeme}
      - HASURA_GRAPHQL_UNAUTHORIZED_ROLE=user
      - HASURA_GRAPHQL_STRINGIFY_NUMERIC_TYPES=true

volumes:
  db:

Environment variables are expanded in the DipDup config file; Postgres password and Hasura secret are forwarded in this example.

Create a separate dipdup.<environment>.yml file for this stack:

database:
  kind: postgres
  host: db
  port: 5432
  user: ${POSTGRES_USER:-dipdup}
  password: ${POSTGRES_PASSWORD:-changeme}
  database: ${POSTGRES_DB:-dipdup}

hasura:
  url: http://hasura:8080
  admin_secret: ${HASURA_SECRET:-changeme}
  allow_aggregations: false
  camel_case: true

sentry:
  dsn: ${SENTRY_DSN:-""}
  environment: ${SENTRY_ENVIRONMENT:-prod}

prometheus:
  host: 0.0.0.0

advanced:
  early_realtime: True
  crash_reporting: False

Note the hostnames (resolved in the docker network) and environment variables (expanded by DipDup).

Build and run the containers:

docker-compose up -d --build

Try lazydocker tool to manage Docker containers interactively.

Deploying with Docker Swarm

This page or paragraph is yet to be written. Come back later.

Example stack:

version: "3.8"

services:
  dipdup:
    image: ${DOCKER_REGISTRY:-ghcr.io}/dipdup-io/dipdup:${TAG:-master}
    depends_on:
      - db
      - hasura
    command: ["-c", "dipdup.yml", "-c", "dipdup.prod.yml", "run"]
    environment:
      - "POSTGRES_USER=dipdup"
      - "POSTGRES_PASSWORD=changeme"
      - "POSTGRES_DB=dipdup"
      - "HASURA_SECRET=changeme"
    networks:
      - dipdup-private
      - prometheus-private
    deploy:
      mode: replicated
      replicas: ${INDEXER_ENABLED:-1}
      labels:
        - prometheus-job=${SERVICE}
        - prometheus-port=8000
      placement: &placement
        constraints:
          - node.labels.${SERVICE} == true
    logging: &logging
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "10"
        tag: "\{\{.Name\}\}.\{\{.ImageID\}\}"

  db:
    image: postgres:14
    volumes:
      - db:/var/lib/postgresql/data
    environment: 
      - POSTGRES_USER=dipdup
      - POSTGRES_DB=dipdup
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    networks:
      - dipdup-private
    deploy:
      mode: replicated
      replicas: 1
      placement: *placement
    logging: *logging

  hasura:
    image: hasura/graphql-engine:latest
    depends_on:
      - db
    environment:
      - HASURA_GRAPHQL_DATABASE_URL=postgres://dipdup:${POSTGRES_PASSWORD:-changeme}@db:5432
      - HASURA_GRAPHQL_ENABLE_CONSOLE=true
      - HASURA_GRAPHQL_DEV_MODE=false
      - HASURA_GRAPHQL_ENABLED_LOG_TYPES=startup, http-log, websocket-log, query-log
      - HASURA_GRAPHQL_LOG_LEVEL=warn
      - HASURA_GRAPHQL_ENABLE_TELEMETRY=false
      - HASURA_GRAPHQL_ADMIN_SECRET=${HASURA_SECRET}
      - HASURA_GRAPHQL_UNAUTHORIZED_ROLE=user
      - HASURA_GRAPHQL_STRINGIFY_NUMERIC_TYPES=true
    networks:
      - dipdup-private
      - traefik-public
    deploy:
      mode: replicated
      replicas: 1
      labels:
        - traefik.enable=true
        - traefik.http.services.${SERVICE}.loadbalancer.server.port=8080
        - "traefik.http.routers.${SERVICE}.rule=Host(`${HOST}`) && (PathPrefix(`/v1/graphql`) || PathPrefix(`/api/rest`))"
        - traefik.http.routers.${SERVICE}.entrypoints=http,${INGRESS:-ingress}
        - "traefik.http.routers.${SERVICE}-console.rule=Host(`${SERVICE}.${SWARM_ROOT_DOMAIN}`)"
        - traefik.http.routers.${SERVICE}-console.entrypoints=https
        - traefik.http.middlewares.${SERVICE}-console.headers.customrequestheaders.X-Hasura-Admin-Secret=${HASURA_SECRET}
        - traefik.http.routers.${SERVICE}-console.middlewares=authelia@docker,${SERVICE}-console
      placement: *placement
    logging: *logging


volumes:
  db:

networks:
  dipdup-private:
  traefik-public:
    external: true
  prometheus-private:
    external: true

Sentry integration

Sentry is an error tracking software that can be used either as a service or on-premise. It dramatically improves the troubleshooting experience and requires nearly zero configuration. To start catching exceptions with Sentry in your project, add the following section in dipdup.yml config:

sentry:
  dsn: https://...
  environment: dev
  debug: False

You can obtain Sentry DSN from the web interface at Settings -> Projects -> <project_name> -> Client Keys (DSN). The cool thing is that if you catch an exception and suspect there's a bug in DipDup, you can share this event with us using a public link (created at Share menu).

Prometheus integration

DipDup provides basic integration with the Prometheus monitoring system by exposing some metrics.

When running DipDup in Docker make sure that the Prometheus instance is in the same network.

Available metrics

The following metrics are exposed under dipdup namespace:

metric namedescription
dipdup_indexes_totalNumber of indexes in operation by status
dipdup_index_level_sync_duration_secondsDuration of indexing a single level
dipdup_index_level_realtime_duration_secondsDuration of last index syncronization
dipdup_index_total_sync_duration_secondsDuration of the last index syncronization
dipdup_index_total_realtime_duration_secondsDuration of the last index realtime syncronization
dipdup_index_levels_to_sync_totalNumber of levels to reach synced state
dipdup_index_levels_to_realtime_totalNumber of levels to reach realtime state
dipdup_index_handlers_matched_totalIndex total hits
dipdup_datasource_head_updated_timestampTimestamp of the last head update
dipdup_datasource_rollbacks_totalNumber of rollbacks
dipdup_http_errors_totalNumber of http errors
dipdup_callback_duration_secondsDuration of callback execution

You can also query 5.10. Internal models for monitoring purposes.

Logging

To control the number of logs DipDup produces, set the logging field in config:

logging: default|verbose|quiet

If you need more fined tuning, perform it in the on_restart hook:

import logging

async def on_restart(
    ctx: HookContext,
) -> None:
    logging.getLogger('some_logger').setLevel('DEBUG')

Backup and restore

DipDup has no built-in functionality to backup and restore database at the moment. Good news is that DipDup indexes are fully atomic. That means you can perform backup with regular psql/pgdump regardless of the DipDup state.

This page contains several recipes for backup/restore.

Scheduled backup to S3

This example is for Swarm deployments. We use this solution to backup our services in production. Adapt it to your needs if needed.

version: "3.8"
services:
  indexer:
    ...
  db:
    ...
  hasura:
    ...

  backuper:
    image: ghcr.io/dipdup-io/postgres-s3-backup:master
    environment:
      - S3_ENDPOINT=${S3_ENDPOINT:-https://fra1.digitaloceanspaces.com}
      - S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID}
      - S3_SECRET_ACCESS_KEY=${S3_SECRET_ACCESS_KEY}
      - S3_BUCKET=dipdup
      - S3_PATH=dipdup
      - S3_FILENAME=${SERVICE}-postgres
      - PG_BACKUP_FILE=${PG_BACKUP_FILE}
      - PG_BACKUP_ACTION=${PG_BACKUP_ACTION:-dump}
      - PG_RESTORE_JOBS=${PG_RESTORE_JOBS:-8}
      - POSTGRES_USER=${POSTGRES_USER:-dipdup}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-changeme}
      - POSTGRES_DB=${POSTGRES_DB:-dipdup}
      - POSTGRES_HOST=${POSTGRES_HOST:-db}
      - HEARTBEAT_URI=${HEARTBEAT_URI}
      - SCHEDULE=${SCHEDULE}
    deploy:
      mode: replicated
      replicas: ${BACKUP_ENABLED:-0}
      restart_policy:
        condition: on-failure
        delay: 10s
        max_attempts: 5
        window: 120s
      placement: *placement
    networks:
      - internal
    logging: *logging

Automatic restore on rollback

This awesome code was contributed by @852Kerfunkle, author of tz1and project.

<project>/backups.py

...

def backup(level: int, database_config: PostgresDatabaseConfig):
    ...

    with open('backup.sql', 'wb') as f:
        try:
            err_buf = StringIO()
            pg_dump('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}', '--clean',
                '-n', database_config.schema_name, _out=f, _err=err_buf) #, '-E', 'UTF8'
        except ErrorReturnCode:
            err = err_buf.getvalue()
            _logger.error(f'Database backup failed: {err}')


def restore(level: int, database_config: PostgresDatabaseConfig):
    ...

    with open('backup.sql', 'r') as f:
        try:
            err_buf = StringIO()
            psql('-d', f'postgresql://{database_config.user}:{database_config.password}@{database_config.host}:{database_config.port}/{database_config.database}',
                '-n', database_config.schema_name, _in=f, _err=err_buf)
        except ErrorReturnCode:
            err = err_buf.getvalue()
            _logger.error(f'Database restore failed: {err}')
            raise Exception("Failed to restore")

def get_available_backups():
    ...


def delete_old_backups():
    ...

<project>/hooks/on_index_rollback.py

...

async def on_index_rollback(
    ctx: HookContext,
    index: Index,  # type: ignore[type-arg]
    from_level: int,
    to_level: int,
) -> None:
    await ctx.execute_sql('on_index_rollback')

    database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database

    # if not a postgres db, reindex.
    if database_config.kind != "postgres":
        await ctx.reindex(ReindexingReason.ROLLBACK)

    available_levels = backups.get_available_backups()

    # if no backups available, reindex
    if not available_levels:
        await ctx.reindex(ReindexingReason.ROLLBACK)

    # find the right level. ie the on that's closest to to_level
    chosen_level = 0
    for level in available_levels:
        if level <= to_level and level > chosen_level:
            chosen_level = level

    # try to restore or reindex
    try:
        backups.restore(chosen_level, database_config)
        await ctx.restart()
    except Exception:
        await ctx.reindex(ReindexingReason.ROLLBACK)

<project>/hooks/run_backups.py

...

async def run_backups(
    ctx: HookContext,
) -> None:
    database_config: Union[SqliteDatabaseConfig, PostgresDatabaseConfig] = ctx.config.database

    if database_config.kind != "postgres":
        return

    level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)

    if level is None:
        return

    backups.backup(level, database_config)
    backups.delete_old_backups()

<project>/hooks/simulate_reorg.py

...

async def simulate_reorg(
    ctx: HookContext
) -> None:
    level = ctx.get_tzkt_datasource("tzkt_mainnet")._level.get(MessageType.head)

    if level:
        await ctx.fire_hook(
            "on_index_rollback",
            wait=True
            index=None,  # type: ignore[arg-type]
            from_level=level,
            to_level=level - 2,
        )

Monitoring

To perform up-to-date and freshness checks, DipDup provides a standard REST endpoint you can use together with Betteruptime or similar services that can search for a keyword in the response.

This check says that DipDup is not stuck and keeps receiving new data (the last known block timestamp is not older than three minutes from now). Note that this is not enough to ensure everything works as expected. But it can at least cover the cases when datasource API is down or your indexer has crashed.

URI format

https://<your-indexer-host>/api/rest/dipdup_head_status?name=<datasource-uri>

If you have camel case enabled in the Hasura config:

https://<your-indexer-host>/api/rest/dipdupHeadStatus?name=<datasource-uri>

For example:

Response

If the (latest block) head subscription state was updated less than three minutes ago, everything is OK:

{
  "dipdup_head_status": [
    {
      "status": "OK"
    }
  ]
}

Otherwise, the state is considered OUTDATED:

{
  "dipdup_head_status": [
    {
      "status": "OUTDATED"
    }
  ]
}

Custom checks

The default check looks like the following:

CREATE
OR REPLACE VIEW dipdup_head_status AS
SELECT
    name,
    CASE
        WHEN timestamp < NOW() - interval '3 minutes' THEN 'OUTDATED'
        ELSE 'OK'
    END AS status
FROM
    dipdup_head;

You can also create your custom alert endpoints using SQL views and functions and then convert them to Hasura REST endpoints.

F.A.Q

How to index the different contracts that share the same interface?

Multiple contracts can provide the same interface (like FA1.2 and FA2 standard tokens) but have a different storage structure. If you try to use the same typename for them, indexing will fail. However, you can modify typeclasses manually. Modify types/<typename>/storage.py file and comment out unique fields that are not important for your index:

# dipdup: ignore

...

class ContractStorage(BaseModel):
    class Config:
        extra = Extra.ignore

    common_ledger: Dict[str, str]
    # unique_field_foo: str
    # unique_field_bar: str

Note the # dipdup: ignore comment on the first line. It tells DipDup not to overwrite this file on init --overwrite-types command.

Don't forget Extra.ignore Pydantic hint, otherwise, storage deserialization will fail.

What is the correct way to process off-chain data?

DipDup provides convenient helpers to process off-chain data like market quotes or IPFS metadata. Follow the tips below to use them most efficiently.

  • Do not perform off-chain requests in handers until necessary. Use hooks instead, enriching indexed data on-demand.
  • Use generic http datasources for external APIs instead of plain aiohttp requests. This way you can use the same features DipDup uses for internal requests: retry with backoff, rate limiting, Prometheus integration etc.
  • Database tables that store off-chain data can be marked as immune, preventing them from being removed on reindexing.

One of my indexes depends on another one's indexed data. How to process them in a specific order?

Indexes of all kinds are fully independent. They are processed in parallel, have their message queues, and don't share any state. It is one of the essential DipDup concepts, so there's no "official" way to manage the order of indexing.

Avoid waiting for sync primitives like asyncio.Event or asyncio.Lock in handlers. Indexing will be stuck forever, waiting for the database transaction to complete.

Instead, save raw data in handlers and process it later with hooks when all conditions are met. For example, process data batch only when all indexes in the dipdup_index table have reached a specific level.

How to perform database migrations?

DipDup does not provide any tooling for database migrations. The reason is that schema changes almost always imply reindexing when speaking about indexers. However, you can perform migrations yourself using any tool you like. First, disable schema hash check in config:

advanced:
  reindex:
    schema_modified: ignore

You can also use the schema approve command for a single schema change.

To determine what manual modifications you need to apply after changing models.py, you can compare raw SQL schema before and after the change. Consider the following example:

-    timestamp = fields.DatetimeField()
+    timestamp = fields.DatetimeField(auto_now=True)
dipdup schema export > old
# ...modify `models.py` here...
dipdup schema export > new
diff old new
76c76
<     "timestamp" TIMESTAMP NOT NULL,
---
>     "timestamp" TIMESTAMP NOT NULL  DEFAULT CURRENT_TIMESTAMP,

Now you can prepare and execute an ALTER TABLE query manually or using SQL hooks.

Troubleshooting

This page contains tips for troubleshooting DipDup issues.

Update DipDup to the latest version

DipDup framework evolves rapidly just like Tezos itself does. We recommend keeping your project up-to-date with the latest version of DipDup.

If you're using Poetry, set caret version constraint in pyproject.toml to use the latest release of the current major version:

[tool.poetry.dependencies]
python = ">=3.10,<3.11"
dipdup = "^6"

pipx installations always use latest version.

Run dipdup update periodically to update to the latest version.

While building Docker images you can use X and X.Y tags to lock to specific major/minor releases:

FROM dipdup/dipdup:6

Ensure that config is correct

DipDup config can be correct syntactically but not necessarily semantically. It's especially easy to make a mistake when actively using templates and environment variables. Use config export command to dump config the way DipDup "sees" it, after resolving all links and templates. config env command can help you to find missing environment variables.

dipdup -c dipdup.yml -c dipdup.prod.yml config export
dipdup -c dipdup.yml -c dipdup.prod.yml config env

Enable debug logging and crash reporting

More logs can give you a clue about the reason for an issue. Enable them in config:

logging: verbose

When an exception occurs DipDup saves crash dumps to /tmp/dipdup/crashdumps/XXXXXXX.json. You can send those dumps to Baking Bad automatically:

advanced:
  crash_reporting: True

Use linters to find errors in your Python code

Exceptions that occurred during callback execution are reraised as CallbackErrors. If you experience this issue, most likely your code is incorrect. Luckily, the Python ecosystem has lots of tools called linters to help you find mistakes. In DipDup we mostly use a combo of flake8 and mypy. You can start using both with zero configuration:

poetry add --with dev flake8 mypy
poetry run flake8
poetry run mypy

You can find a more complex example in DipDup's pyproject.toml or use the cookiecutter template to create a new project with all recommended bells and whistles (see 1. Quickstart → from-template).

Explore contract calls in Better Call Dev

Better Call Dev is a blockchain explorer for Tezos smart contracts. It provides a more human-friendly interface than TzKT to explore exact contract calls and parameter/storage structures.

BCD

Try it out when writing index definitions.

Packaging issues

Poetry package manager we recommend using with DipDup is not the most stable software in the world. If you experience issues with it like SolverProblemError, try the following:

  • Remove .venv and poetry.lock from the project root
  • Remove ~/.cache/pypoetry and ~/.cache/pip/ directories
  • Run poetry install and hope for the best.

Got stuck? Ask for help

We are always ready to answer your questions!

If you think you've found a bug, please report it directly to the GitHub Issues. For all other discussions, join our socials:

Demo projects

The DipDup repository contains several minimal examples of how to use various features for real-case scenarios. Please, do not use these examples in production unmodified. We have not put a production-grade amount of effort into developing them, so they may contain flaws in indexing logic.

Some projects that started as a demo now evolved into full-fledged applications running in production. Check out 10. Built with DipDup page.

TzBTC token

source: demo_token

The most basic indexer used in Quickstart. A single operation index to track balances of TzBTC token holders, nothing else.

hic et nunc

source: demo_nft_marketplace

Indexes trades and swaps of "hic et nunc", one of the most popular NFT marketplaces on Tezos.

Quipuswap

source: demo_dex

Covers all available operations of Quipuswap DEX contracts: trades, transfers, moving liquidity. A more complex example with index templates.

Homebase RegistryDAO

source: demo_factories

Homebase enables users to create DAO contracts. In this example indexes are spawned in runtime (5.11. Index factories) for all contracts having the same script.

Tezos Domains (operation)

source: demo_domains

Tezos Domains is a distributed naming system. You probably have seen those fancy user.tez names while browsing explorers. This is a pretty basic example of how to index them.

Tezos Domains (big_map)

source: demo_big_maps

The same as above, but uses big_map index instead of operation one. The storage structure of this contract is pretty straightforward; we only need to track a single big map. This example contains skip_history: once directive to index only the current state of the contract before switching to realtime processing. It allows to speed up indexing even more.

TzColors

source: demo_auction

A very basic indexer of TzColors NFT token and marketplace. Unlike hic et nunc this marketplace provides auction functionality. Other than that, it is pretty much the same.

Built with DipDup

This page is a brief overview of projects which use DipDup as an indexing solution.

Want to see your project on this page? Create an issue on GitHub!

Rarible / Tezos marketplace indexer

Homepage | Github

Rarible is a multichain community-centric NFT marketplace, that also allows to trade on aggregated listings from other NFT platforms. Rarible uses DipDup-based solution to aggregate sells and auctions from major Tezos marketplaces.

Ubisoft / Quartz tokens metadata

Homepage | API | Github

Ubisoft Quartz is a new platform for players to get Digits, the first NFTs playable in AAA games. Under the hood Quartz uses Aleph as persistent token metadata storage and a non-standard token metadata signalling schema. In order to enable wallets and other TzKT API consumers with Quartz NFTs data we created a custom indexing plugin.

Youves

Homepage | Github

Youves is a decentralized, non-custodial and self-governed platform for the creation and management of synthetic assets. Youves uses DipDup-based solution to track vaults, DEX trades, and oracle price quotes.

Kord.fi

Homepage

Kord.Fi is a DeFi protocol for Tezos that allows liquidity providers to tap into additional leverage provided by Tezos Blockchain asset owners.

StakeNow

Homepage

StakeNow.fi gives you a 360° view of your investments and lets you manage your Tezos assets in one place.

Mavryk

Homepage

Mavryk is a DAO operated financial ecosystem that lets users borrow and earn on their terms, while participating in the governance of the platform.

Vortex

Homepage

Vortez is an all-in-one decentralized finance protocol on Tezos blockchain built by Smartlink. Vortex uses DipDup indexer to track AMM swaps, pools, positions, as well as yield farms, and NFT collections.

Versum

Homepage

Versum combines features of NFT platforms and social platforms to introduce a marketplace that focuses on organic discovery, decentralized storage, and accessibility.

Yupana

Homepage

Yupana.Finance is an open-source, decentralized, and non-custodial liquidity protocol built to securely lend and borrow digital assets via smart contracts.

HicDEX (Teia)

Homepage | API | GitHub

HicDEX is a Tezos indexer for hicetnunc.art marketplace (currently teia.art). Indexed data is available with a public GraphQL endpoint.

Homebase

Homepage | GitHub

Homebase is a web application that enables users to create and manage/use DAOs on the Tezos blockchain. This application aims to help empower community members and developers to launch and participate in Tezos-based DAOs.

Tezos Profiles

Homepage | API | GitHub

Tezos Profiles enables you to associate your online identity with your Tezos account.

Juster

Homepage | API | GitHub

Juster is an on-chain smart contract platform allowing users to take part in an automated betting market by creating events, providing liquidity to them, and making bets.

tz1and

Homepage | API | GitHub

A Virtual World and NFT Marketplace.

Tezotopia

Homepage

Tezotopia is a Tezos-based Real-Time Strategy (RTS) gaming platform that allows players to acquire land (Tezotops), items and resources.

mempool

This is an optional section used by the mempool indexer plugin. It uses contracts and datasources aliases as well as the database connection.

Mempool configuration has two sections: settings and indexers (required).

{% page-ref page="../advanced/mempool-plugin.md" %}

Settings

This section is optional so are all the setting keys.

mempool:
  settings:
    keep_operations_seconds: 172800
    expired_after_blocks: 60
    keep_in_chain_blocks: 10
    mempool_request_interval_seconds: 10
    rpc_timeout_seconds: 10
  indexers:
    ...

keep_operations_seconds

How long to store operations that did not get into the chain. After that period, such operations will be wiped from the database. Default value is 172800 seconds (2 days).

expired_after_blocks

When level(head) - level(operation.branch) >= expired_after_blocks and operation is still on in chain it's marked as expired. Default value is 60 blocks (~1 hour).

keep_in_chain_blocks

Since the main purpose of this plugin is to index mempool operations (actually it's a rolling index), all the operations that were included in the chain are removed from the database after specified period of time. Default value is 10 blocks (~10 minutes).

mempool_request_interval_seconds

How often Tezos nodes should be polled for pending mempool operations. Default value is 10 seconds.

rpc_timeout_seconds

Tezos node request timeout. Default value is 10 seconds.

Indexers

You can index several networks at once, or index different nodes independently. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:

 mempool:
   settings:
     ...
   indexers:
     mainnet:
       filters:
         kinds:
           - transaction
         accounts:
           - contract_alias
       datasources:
         tzkt: tzkt_mainnet
         rpc: 
           - node_mainnet
     edonet:
     florencenet: 

Each indexer object has two keys: filters and datasources (required).

Filters

An optional section specifying which mempool operations should be indexed. By default all transactions will be indexed.

kinds

Array of operations kinds, default value is transaction (single item).
The complete list of values allowed:

  • activate_account
  • ballot
  • delegation*
  • double_baking_evidence
  • double_endorsement_evidence
  • endorsement
  • origination*
  • proposal
  • reveal*
  • seed_nonce_revelation
  • transaction*

* — manager operations.

accounts

Array of contract aliases used to filter operations by source or destination.
NOTE: applied to manager operations only.

Datasources

Mempool plugin is tightly coupled with TzKT and Tezos node providers.

tzkt

An alias pointing to a datasource of kind tzkt is expected.

rpc

An array of aliases pointing to datasources of kind tezos-node
Polling multiple nodes allows to detect more refused operations and makes indexing more robust in general.

metadata

This is an optional section used by the metadata indexer plugin. It uses contracts and datasources aliases as well as the database connection.

Metadata configuration has two required sections: settings and indexers

Settings

metadata:
  settings:
    ipfs_gateways:
      - https://cloudflare-ipfs.com
    ipfs_timeout: 10
    http_timeout: 10
    max_retry_count_on_error: 3
    contract_service_workers: 15
    token_service_workers: 75
  indexers:
    ...

ipfs_gateways

An array of IPFS gateways. The indexer polls them sequentially until it gets a result or runs out of attempts. It is recommended to specify more than one gateway to overcome propagation issues, rate limits, and other problems.

ipfs_timeout

How long DipDup will wait for a single IPFS gateway response. Default value is 10 seconds.

http_timeout

How long DipDup will wait for a HTTP server response. Default value is 10 seconds.

max_retry_count_on_error

If DipDup fails to get a response from IPFS gateway or HTTP server, it will try again after some time, until it runs out of attempts. Default value is 3 attempts.

contract_service_workers

Count of contract service workers which resolves contract metadata. Default value is 5.

token_service_workers

Count of token service workers which resolves token metadata. Default value is 5.

Indexers

You can index several networks at once, or go with a single one. Indexer names are not standardized, but for clarity it's better to stick with some meaningful keys:

metadata:
  settings:
    ...
  indexers:
    mainnet:
      filters:
        accounts:
          - contract_alias
      datasources:
        tzkt: tzkt_mainnet

Each indexer object has two keys: filters and datasources (required).

Filters

accounts

Array of contract aliases used to filter Big_map updates by the owner contract address.

Datasources

Metadata plugin is tightly coupled with TzKT provider.

tzkt

An alias pointing to a datasource of kind tzkt is expected.

dipdup

Manage and run DipDup indexers.

Documentation: https://docs.dipdup.io

Issues: https://github.com/dipdup-io/dipdup/issues

dipdup [OPTIONS] COMMAND [ARGS]...

Options

--version

Show the version and exit.

-c, --config <PATH>

A path to DipDup project config (default: dipdup.yml).

-e, --env-file <PATH>

A path to .env file containing KEY=value strings.

config

Commands to manage DipDup configuration.

dipdup config [OPTIONS] COMMAND [ARGS]...

env

Dump environment variables used in DipDup config.

If variable is not set, default value will be used.

dipdup config env [OPTIONS]

Options

-f, --file <file>

Output to file instead of stdout.

export

Print config after resolving all links and, optionally, templates.

WARNING: Avoid sharing output with 3rd-parties when –unsafe flag set - it may contain secrets!

dipdup config export [OPTIONS]

Options

--unsafe

Resolve environment variables or use default values from config.

--full

Resolve index templates.

hasura

Commands related to Hasura integration.

dipdup hasura [OPTIONS] COMMAND [ARGS]...

configure

Configure Hasura GraphQL Engine to use with DipDup.

dipdup hasura configure [OPTIONS]

Options

--force

Proceed even if Hasura is already configured.

init

Generate project tree, callbacks and types.

This command is idempotent, meaning it won’t overwrite previously generated files unless asked explicitly.

dipdup init [OPTIONS]

Options

--overwrite-types

Regenerate existing types.

--keep-schemas

Do not remove JSONSchemas after generating types.

install

Install DipDup for the current user.

dipdup install [OPTIONS]

Options

-q, --quiet

Use default values for all prompts.

-f, --force

Force reinstall.

-r, --ref <ref>

Install DipDup from a specific git ref.

-p, --path <path>

Install DipDup from a local path.

migrate

Migrate project to the new spec version.

If you’re getting MigrationRequiredError after updating DipDup, this command will fix imports and type annotations to match the current spec_version. Review and commit changes after running it.

dipdup migrate [OPTIONS]

new

Create a new project interactively.

dipdup new [OPTIONS]

Options

-q, --quiet

Use default values for all prompts.

-f, --force

Overwrite existing files.

-r, --replay <replay>

Replay a previously saved state.

run

Run indexer.

Execution can be gracefully interrupted with Ctrl+C or SIGINT signal.

dipdup run [OPTIONS]

schema

Commands to manage database schema.

dipdup schema [OPTIONS] COMMAND [ARGS]...

approve

Continue to use existing schema after reindexing was triggered.

dipdup schema approve [OPTIONS]

export

Print SQL schema including scripts from sql/on_reindex.

This command may help you debug inconsistency between project models and expected SQL schema.

dipdup schema export [OPTIONS]

init

Prepare a database for running DipDip.

This command creates tables based on your models, then executes sql/on_reindex to finish preparation - the same things DipDup does when run on a clean database.

dipdup schema init [OPTIONS]

wipe

Drop all database tables, functions and views.

WARNING: This action is irreversible! All indexed data will be lost!

dipdup schema wipe [OPTIONS]

Options

--immune

Drop immune tables too.

--force

Skip confirmation prompt.

status

Show the current status of indexes in the database.

dipdup status [OPTIONS]

uninstall

Uninstall DipDup for the current user.

dipdup uninstall [OPTIONS]

Options

-q, --quiet

Use default values for all prompts.

update

Update DipDup for the current user.

dipdup update [OPTIONS]

Options

-q, --quiet

Use default values for all prompts.

-f, --force

Force reinstall.

Config file reference

Headerspec_version*14.15. spec_version
package*14.12. package
Inventorydatabase14.5. database
contracts14.3. contracts
datasources14.6. datasources
custom14.4. custom
Index definitionsindexes14.9. indexes
templates14.16. templates
Hook definitionshooks14.8. hooks
jobs14.10. jobs
Integrationshasura14.7. hasura
sentry14.14. sentry
prometheus14.13. prometheus
Tunablesadvanced14.2. advanced
logging14.11. logging
class DipDupConfig(spec_version, package, datasources=<factory>, database=SqliteDatabaseConfig(kind='sqlite', path=':memory:'), contracts=<factory>, indexes=<factory>, templates=<factory>, jobs=<factory>, hooks=<factory>, hasura=None, sentry=SentryConfig(dsn='', environment=None, server_name=None, release=None, user_id=None, debug=False), prometheus=None, advanced=AdvancedConfig(reindex={}, scheduler=None, postpone_jobs=False, early_realtime=False, merge_subscriptions=False, metadata_interface=False, skip_version_check=False, rollback_depth=2, crash_reporting=False, decimal_precision=None, alt_operation_matcher=False), custom=<factory>, logging=LoggingValues.default)

Main indexer config

Parameters:
class AdvancedConfig(reindex=<factory>, scheduler=None, postpone_jobs=False, early_realtime=False, merge_subscriptions=False, metadata_interface=False, skip_version_check=False, rollback_depth=2, crash_reporting=False, decimal_precision=None, alt_operation_matcher=False)

Feature flags and other advanced config.

Parameters:
  • reindex (dict[ReindexingReason, ReindexingAction]) – Mapping of reindexing reasons and actions DipDup performs

  • scheduler (dict[str, Any] | None) – apscheduler scheduler config

  • postpone_jobs (bool) – Do not start job scheduler until all indexes are in realtime state

  • early_realtime (bool) – Establish realtime connection immediately after startup

  • merge_subscriptions (bool) – Subscribe to all operations instead of exact channels

  • metadata_interface (bool) – Expose metadata interface for TzKT

  • skip_version_check (bool) – Do not check for new DipDup versions on startup

  • rollback_depth (int) – A number of levels to keep for rollback

  • crash_reporting (bool) – Enable crash reporting

  • decimal_precision (int | None) – Adjust decimal context precision.

  • alt_operation_matcher (bool) – Use different algorithm to match operations (undocumented)

class BigMapHandlerConfig(callback, contract, path)

Big map handler config

Parameters:
  • callback (str) – Callback name

  • contract (str | ContractConfig) – Contract to fetch big map from

  • path (str) – Path to big map (alphanumeric string with dots)

class BigMapIndexConfig(kind, datasource, handlers, skip_history=SkipHistory.never, first_level=0, last_level=0)

Big map index config

Parameters:
  • kind (Literal['big_map']) – always big_map

  • datasource (str | TzktDatasourceConfig) – Index datasource to fetch big maps with

  • handlers (tuple[BigMapHandlerConfig, ...]) – Mapping of big map diff handlers

  • skip_history (SkipHistory) – Fetch only current big map keys ignoring historical changes

  • first_level (int) – Level to start indexing from

  • last_level (int) – Level to stop indexing at

class CoinbaseDatasourceConfig(kind, api_key=None, secret_key=None, passphrase=None, http=None)

Coinbase datasource config

Parameters:
  • kind (Literal['coinbase']) – always ‘coinbase’

  • api_key (str | None) – API key

  • secret_key (str | None) – API secret key

  • passphrase (str | None) – API passphrase

  • http (HTTPConfig | None) – HTTP client configuration

class ContractConfig(address=None, code_hash=None, typename=None)

Contract config

Parameters:
  • address (str | None) – Contract address

  • code_hash (int | str | None) – Contract code hash or address to fetch it from

  • typename (str | None) – User-defined alias for the contract script

class EventHandlerConfig(callback, contract, tag)

Event handler config

Parameters:
  • callback (str) – Callback name

  • contract (str | ContractConfig) – Contract which emits event

  • tag (str) – Event tag

class EventIndexConfig(kind, datasource, handlers=<factory>, first_level=0, last_level=0)

Event index config

Parameters:
class HasuraConfig(url, admin_secret=None, create_source=False, source='default', select_limit=100, allow_aggregations=True, allow_inconsistent_metadata=False, camel_case=False, rest=True, http=None)

Config for the Hasura integration.

Parameters:
  • url (str) – URL of the Hasura instance.

  • admin_secret (str | None) – Admin secret of the Hasura instance.

  • create_source (bool) – Whether source should be added to Hasura if missing.

  • source (str) – Hasura source for DipDup to configure, others will be left untouched.

  • select_limit (int) – Row limit for unauthenticated queries.

  • allow_aggregations (bool) – Whether to allow aggregations in unauthenticated queries.

  • allow_inconsistent_metadata (bool) – Whether to ignore errors when applying Hasura metadata.

  • camel_case (bool) – Whether to use camelCase instead of default pascal_case for the field names (incompatible with metadata_interface flag)

  • rest (bool) – Enable REST API both for autogenerated and custom queries.

  • http (HTTPConfig | None) – HTTP connection tunables

class HeadHandlerConfig(callback)

Head block handler config

Parameters:

callback (str) – Callback name

class HeadIndexConfig(kind, datasource, handlers)

Head block index config

Parameters:
  • kind (Literal['head']) – always head

  • datasource (str | TzktDatasourceConfig) – Index datasource to receive head blocks

  • handlers (tuple[HeadHandlerConfig, ...]) – Mapping of head block handlers

class HookConfig(callback, args=<factory>, atomic=False)

Hook config

Parameters:
  • args (dict[str, str]) – Mapping of argument names and annotations (checked lazily when possible)

  • atomic (bool) – Wrap hook in a single database transaction

  • callback (str) – Callback name

class HTTPConfig(retry_count=None, retry_sleep=None, retry_multiplier=None, ratelimit_rate=None, ratelimit_period=None, ratelimit_sleep=None, connection_limit=None, connection_timeout=None, batch_size=None, replay_path=None)

Advanced configuration of HTTP client

Parameters:
  • retry_count (int | None) – Number of retries after request failed before giving up

  • retry_sleep (float | None) – Sleep time between retries

  • retry_multiplier (float | None) – Multiplier for sleep time between retries

  • ratelimit_rate (int | None) – Number of requests per period (“drops” in leaky bucket)

  • ratelimit_period (int | None) – Time period for rate limiting in seconds

  • ratelimit_sleep (float | None) – Sleep time between requests when rate limit is reached

  • connection_limit (int | None) – Number of simultaneous connections

  • connection_timeout (int | None) – Connection timeout in seconds

  • batch_size (int | None) – Number of items fetched in a single paginated request (for some APIs)

  • replay_path (str | None) – Development-only option to use cached HTTP responses instead of making real requests

class HttpDatasourceConfig(kind, url, http=None)

Generic HTTP datasource config

Parameters:
  • kind (Literal['http']) – always ‘http’

  • url (str) – URL to fetch data from

  • http (HTTPConfig | None) – HTTP client configuration

class IndexTemplateConfig(template, values, first_level=0, last_level=0)

Index template config

Parameters:
  • kind – always template

  • values (dict[str, str]) – Values to be substituted in template (<key> -> value)

  • first_level (int) – Level to start indexing from

  • last_level (int) – Level to stop indexing at

  • template (str) – Template alias in templates section

class IpfsDatasourceConfig(kind, url='https://ipfs.io/ipfs', http=None)

IPFS datasource config

Parameters:
  • kind (Literal['ipfs']) – always ‘ipfs’

  • url (str) – IPFS node URL, e.g. https://ipfs.io/ipfs/

  • http (HTTPConfig | None) – HTTP client configuration

class JobConfig(hook, args=<factory>, crontab=None, interval=None, daemon=False)

Job schedule config

Parameters:
  • hook (str | HookConfig) – Name of hook to run

  • crontab (str | None) – Schedule with crontab syntax (* * * * *)

  • interval (int | None) – Schedule with interval in seconds

  • daemon (bool) – Run hook as a daemon (never stops)

  • args (dict[str, Any]) – Arguments to pass to the hook

class LoggingValues(value)

Enum for logging field values.

default = 'default'
quiet = 'quiet'
verbose = 'verbose'
class MetadataDatasourceConfig(kind, network, url='https://metadata.dipdup.net', http=None)

DipDup Metadata datasource config

Parameters:
  • kind (Literal['metadata']) – always ‘metadata’

  • network (MetadataNetwork) – Network name, e.g. mainnet, ghostnet, etc.

  • url (str) – GraphQL API URL, e.g. https://metadata.dipdup.net

  • http (HTTPConfig | None) – HTTP client configuration

class OperationHandlerConfig(callback, pattern)

Operation handler config

Parameters:
class OperationHandlerOriginationPatternConfig(type='origination', source=None, similar_to=None, originated_contract=None, optional=False, strict=False, alias=None)

Origination handler pattern config

Parameters:
  • type (Literal['origination']) – always ‘origination’

  • source (str | ContractConfig | None) – Match operations by source contract alias

  • similar_to (str | ContractConfig | None) – Match operations which have the same code/signature (depending on strict field)

  • originated_contract (str | ContractConfig | None) – Match origination of exact contract

  • optional (bool) – Whether can operation be missing in operation group

  • strict (bool) – Match operations by storage only or by the whole code

  • alias (str | None) – Alias for transaction (helps to avoid duplicates)

class OperationHandlerTransactionPatternConfig(type='transaction', source=None, destination=None, entrypoint=None, optional=False, alias=None)

Operation handler pattern config

Parameters:
  • type (Literal['transaction']) – always ‘transaction’

  • source (str | ContractConfig | None) – Match operations by source contract alias

  • destination (str | ContractConfig | None) – Match operations by destination contract alias

  • entrypoint (str | None) – Match operations by contract entrypoint

  • optional (bool) – Whether can operation be missing in operation group

  • alias (str | None) – Alias for transaction (helps to avoid duplicates)

class OperationIndexConfig(kind, datasource, handlers, contracts=<factory>, types=(OperationType.transaction, ), first_level=0, last_level=0)

Operation index config

Parameters:
  • kind (Literal['operation']) – always operation

  • datasource (str | TzktDatasourceConfig) – Alias of index datasource in datasources section

  • handlers (tuple[OperationHandlerConfig, ...]) – List of indexer handlers

  • types (tuple[OperationType, ...]) – Types of transaction to fetch

  • contracts (list[str | ContractConfig]) – Aliases of contracts being indexed in contracts section

  • first_level (int) – Level to start indexing from

  • last_level (int) – Level to stop indexing at

class OperationType(value)

Type of blockchain operation

migration = 'migration'
origination = 'origination'
transaction = 'transaction'
class OperationUnfilteredIndexConfig(kind, datasource, callback, types=(OperationType.transaction,), first_level=0, last_level=0)

Operation index config

Parameters:
  • kind (Literal['operation_unfiltered']) – always operation_unfiltered

  • datasource (str | TzktDatasourceConfig) – Alias of index datasource in datasources section

  • callback (str) – Callback name

  • types (tuple[OperationType, ...]) – Types of transaction to fetch

  • first_level (int) – Level to start indexing from

  • last_level (int) – Level to stop indexing at

class PostgresDatabaseConfig(kind, host, user='postgres', database='postgres', port=5432, schema_name='public', password='', immune_tables=<factory>, connection_timeout=60)

Postgres database connection config

Parameters:
  • kind (Literal['postgres']) – always ‘postgres’

  • host (str) – Host

  • port (int) – Port

  • user (str) – User

  • password (str) – Password

  • database (str) – Database name

  • schema_name (str) – Schema name

  • immune_tables (set[str]) – List of tables to preserve during reindexing

  • connection_timeout (int) – Connection timeout

class PrometheusConfig(host, port=8000, update_interval=1.0)

Config for Prometheus integration.

Parameters:
  • host (str) – Host to bind to

  • port (int) – Port to bind to

  • update_interval (float) – Interval to update some metrics in seconds

class ReindexingAction(value)

Action that should be performed on reindexing

exception = 'exception'
ignore = 'ignore'
wipe = 'wipe'
class ReindexingReason(value)

Reason that caused reindexing

config_modified = 'config_modified'
manual = 'manual'
migration = 'migration'
rollback = 'rollback'
schema_modified = 'schema_modified'
class SentryConfig(dsn='', environment=None, server_name=None, release=None, user_id=None, debug=False)

Config for Sentry integration.

Parameters:
  • dsn (str) – DSN of the Sentry instance

  • environment (str | None) – Environment; if not set, guessed from docker/ci/gha/local.

  • server_name (str | None) – Server name; defaults to obfuscated hostname.

  • release (str | None) – Release version; defaults to DipDup package version.

  • user_id (str | None) – User ID; defaults to obfuscated package/environment.

  • debug (bool) – Catch warning messages, increase verbosity.

class SkipHistory(value)

Whether to skip indexing operation history and use only current state

always = 'always'
never = 'never'
once = 'once'
class SqliteDatabaseConfig(kind, path=':memory:')

SQLite connection config

Parameters:
  • kind (Literal['sqlite']) – always ‘sqlite’

  • path (str) – Path to .sqlite3 file, leave default for in-memory database (:memory:)

class TokenTransferHandlerConfig(callback, contract=None, token_id=None, from_=None, to=None)

Token transfer handler config

Parameters:
  • callback (str) – Callback name

  • contract (str | ContractConfig | None) – Filter by contract

  • token_id (int | None) – Filter by token ID

  • from – Filter by sender

  • to (str | ContractConfig | None) – Filter by recipient

  • from_ (str | ContractConfig | None) –

class TokenTransferIndexConfig(kind, datasource, handlers=<factory>, first_level=0, last_level=0)

Token transfer index config

Parameters:
  • kind (Literal['token_transfer']) – always token_transfer

  • datasource (str | TzktDatasourceConfig) – Index datasource to use

  • handlers (tuple[TokenTransferHandlerConfig, ...]) – Mapping of token transfer handlers

  • first_level (int) – Level to start indexing from

  • last_level (int) – Level to stop indexing at

class TzktDatasourceConfig(kind, url='https://api.tzkt.io', http=None, buffer_size=0)

TzKT datasource config

Parameters:
  • kind (Literal['tzkt']) – always ‘tzkt’

  • url (str) – Base API URL, e.g. https://api.tzkt.io/

  • http (HTTPConfig | None) – HTTP client configuration

  • buffer_size (int) – Number of levels to keep in FIFO buffer before processing

class UnknownEventHandlerConfig(callback, contract)

Unknown event handler config

Parameters:
  • callback (str) – Callback name

  • contract (str | ContractConfig) – Contract which emits event

advanced

advanced:
  early_realtime: False
  merge_subscriptions: False
  postpone_jobs: False
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

This config section allows users to tune some system-wide options, either experimental or unsuitable for generic configurations.

fielddescription
reindexMapping of reindexing reasons and actions DipDup performs
schedulerapscheduler scheduler config
postpone_jobsDo not start job scheduler until all indexes are in realtime state
early_realtimeEstablish realtime connection immediately after startup
merge_subscriptionsSubscribe to all operations instead of exact channels
metadata_interfaceExpose metadata interface for TzKT

CLI flags have priority over self-titled AdvancedConfig fields.

contracts

A list of the contracts you can use in the index definitions. Each contract entry has two fields:

  • address — either originated or implicit account address encoded in base58.
  • typename — an alias for the particular contract script, meaning that two contracts sharing the same code can have the same type name.
contracts:
  kusd_dex_mainnet:
    address: KT1CiSKXR68qYSxnbzjwvfeMCRburaSDonT2
    typename: quipu_fa12
  tzbtc_dex_mainnet:
    address: KT1N1wwNPqT5jGhM91GQ2ae5uY8UzFaXHMJS
    typename: quipu_fa12
  kusd_token_mainnet:
    address: KT1K9gCRgaLRFKTErYt1wVxA3Frb9FjasjTV
    typename: kusd_token
  tzbtc_token_mainnet:
    address: KT1PWx2mnDueood7fEmfbBDKx1D9BAnnXitn
    typename: tzbtc_token

If the typename field is not set, a contract alias will be used instead.

Contract entry does not contain information about the network, so it's a good idea to include the network name in the alias. This design choice makes possible a generic index parameterization via templates. See 2.7. Templates and variables for details.

If multiple contracts you index have the same interface but different code, see 7. F.A.Q. to learn how to avoid conflicts.

custom

An arbitrary YAML object you can use to store internal indexer configuration.

package: my_indexer
...
custom:
  foo: bar

Access or modify it from any callback:

ctx.config.custom['foo'] = 'buzz'

database

DipDup supports several database engines for development and production. The obligatory field kind specifies which engine has to be used:

  • sqlite
  • postgres (and compatible engines)

6.1. Database engines article may help you choose a database that better suits your needs.

SQLite

path field must be either path to the .sqlite3 file or :memory: to keep a database in memory only (default):

database:
  kind: sqlite
  path: db.sqlite3
fielddescription
kindalways 'sqlite'
pathPath to .sqlite3 file, leave default for in-memory database

PostgreSQL

Requires host, port, user, password, and database fields. You can set schema_name to values other than public, but Hasura integration won't be available.

database:
  kind: postgres
  host: db
  port: 5432
  user: dipdup
  password: ${POSTGRES_PASSWORD:-changeme}
  database: dipdup
  schema_name: public
fielddescription
kindalways 'postgres'
hostHost
portPort
userUser
passwordPassword
databaseDatabase name
schema_nameSchema name
immune_tablesList of tables to preserve during reindexing
connection_timeoutConnection timeout in seconds

You can also use compose-style environment variable substitutions with default values for secrets and other fields. See 2.7. Templates and variables.

Immune tables

You might want to keep several tables during schema wipe if the data in them is not dependent on index states yet heavy. A typical example is indexing IPFS data — changes in your code won't affect off-chain storage, so you can safely reuse this data.

database:
  immune_tables:
    - ipfs_assets

immune_tables is an optional array of table names that will be ignored during schema wipe. Once an immune table is created, DipDup will never touch it again; to change the schema of an immune table, you need to perform a migration manually. Check schema export output before doing this to ensure the resulting schema is the same as Tortoise ORM would generate.

datasources

A list of API endpoints DipDup uses to retrieve indexing data to process.

A datasource config entry is an alias for the endpoint URI; there's no network mention. Thus it's good to add a network name to the datasource alias, e.g. tzkt_mainnet.

tzkt

datasources:
  tzkt:
    kind: tzkt
    url: ${TZKT_URL:-https://api.tzkt.io}
    http:
      retry_count:  # retry infinetely
      retry_sleep:
      retry_multiplier:
      ratelimit_rate:
      ratelimit_period:
      connection_limit: 100
      connection_timeout: 60
      batch_size: 10000
    buffer_size: 0

coinbase

datasources:
  coinbase:
    kind: coinbase

dipdup-metadata

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet|ghostnet|mumbainet

ipfs

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

See Also

hasura

This optional section used by DipDup executor to automatically configure Hasura engine to track your tables.

hasura:
  url: http://hasura:8080
  admin_secret: ${HASURA_ADMIN_SECRET:-changeme}
  allow_aggregations: false
  camel_case: true
  rest: true
  select_limit: 100
  source: default

hooks

Hooks are user-defined callbacks you can execute with a job scheduler or within another callback (with ctx.fire_hook).

hooks:
  calculate_stats:
    callback: calculate_stats
    atomic: False
    args:
     major: bool
     depth: int

See Also

jobs

Add the following section to DipDup config:

jobs:
  midnight_stats:
    hook: calculate_stats
    crontab: "0 0 * * *"
    args:
      major: True
  leet_stats:
    hook: calculate_stats
    interval: 1337  # in seconds
    args:
      major: False

If you're unfamiliar with the crontab syntax, an online service crontab.guru will help you build the desired expression.

logging

You can configure an amount of logging output by modifying the logging field.

logging: default | quiet | verbose

At the moment these values are equal to setting dipdup log level to INFO, WARNING or DEBUG, but this may change in the future.

package

DipDup uses this field to discover the Python package of your project.

package: my_indexer_name

DipDup will search for a module named my_module_name in PYTHONPATH

This field helps to decouple DipDup configuration file from the indexer implementation and gives more flexibility in managing the source code.

See 2.4. Project structure for details.

prometheus

prometheus:
  host: 0.0.0.0

Prometheus integration options

fielddescription
hostHost to bind to
portPort to bind to
update_intervalInterval to update some metrics in seconds

sentry

sentry:
  dsn: https://...
  environment: dev
  debug: False
fielddescription
dsnDSN of the Sentry instance
environmentEnvironment to report to Sentry (informational only)
debugCatch warning messages and more context

spec_version

The DipDup specification version defines the format of the configuration file and available features.

spec_version: 1.2

This table shows which specific SDK releases support which DipDup file versions.

spec_version valueSupported DipDup versions
0.1>=0.0.1, <= 0.4.3
1.0>=1.0.0, <=1.1.2
1.1>=2.0.0, <=2.0.9
1.2>=3.0.0

If you're getting MigrationRequiredError after updating the framework, run the dipdup migrate command to perform project migration.

At the moment, spec_version has not changed for a very long time. Consider recreating the package from scratch and migrating logic manually if you have another value in your configuration file.

templates

indexes:
  foo:
    kind: template
    name: bar
    first_level: 12341234
    template_values:
      network: mainnet

templates:
  bar:
    kind: index
    datasource: tzkt_<network>  # resolves into `tzkt_mainnet`
    ...
fielddescription
kindalways template
nameName of index template
template_valuesValues to be substituted in template (<key>value)
first_levelLevel to start indexing from
last_levelLevel to stop indexing at (DipDup will terminate at this level)

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

6.5.16 - 2024-03-05

This is the last release in the 6.5 branch. Please update to 7.x to get the latest features and bug fixes.

Fixed

  • tzkt: Don't use deprecated /events WebSockets endpoint.

Other

  • deps: Updated pytezos to 3.11.3.

Other

  • metadata: Added oxfordnet to supported networks.

6.5.15 - 2023-12-01

Other

  • deps: Updated pytezos to 3.10.3.

6.5.14 - 2023-10-20

Fixed

  • token_transfer: Fixed filtering transfers by token_id.

6.5.13 - 2023-10-10

Fixed

  • tzkt: Fixed regression in get_transactions method pagination.

6.5.12 - 2023-09-15

Fixed

  • tzkt: Fixed issue with processing rollbacks while sync is in progress.
  • tzkt: Fixed operation matching when contract code hash specified as a string.
  • tzkt: Fixed parsing contract event data.

6.5.11 - 2023-09-02

Fixed

  • index: Fixed crash when parsing typed transactions with empty parameter.
  • tzkt: Fixed pagination when requesting transactions.
  • tzkt: Use cursor iteration where possible.

6.5.10 - 2023-08-02

Fixed

  • index: Remove Python limitation on large int<->str conversions.

6.5.9 - 2023-07-11

Fixed

  • tzkt: Optimized queries for operation_unfiltered index.

6.5.8 - 2023-06-28

Fixed

  • cli: Fixed init crash when package name is equal to one of the project typenames.

6.5.7 - 2023-05-30

Added

  • config: Added advanced.decimal_precision option to adjust decimal context precision.

Fixed

  • database: Fixed OperationalError raised in some cases after calling bulk_create.
  • database: Allow running project scripts and queries on SQLite.
  • database: Don't cleanup model updates on every loop.
  • http: Mark asyncio.TimeoutError exception as safe to retry.

Other

  • http: Deserialize JSON responses with orjson.

6.5.6 - 2023-05-02

Fixed

  • config: Fixed crash due to incorrect parsing of event index definitions.
  • http: Fixed waiting for response indefinitely when IPFS hash is not available.

Other

  • ci: Slim Docker image updated to Alpine 3.17.
  • metadata: Added nairobinet to supported networks.

6.5.5 - 2023-04-17

Fixed

  • config: Enable early realtime mode when config contains bigmap indexes with skip_history.
  • http: Fixed crash when using custom datasources.
  • index: Allow mixing source and entrypoint filters in operation index pattern.

Other

  • ci: Default git branch switched to next.

6.5.4 - 2023-03-31

Fixed

  • config: Fixed incorrest parsing of token_transfer index filters.

Other

  • deps: Updated pytezos to 3.9.0.

6.5.3 - 2023-03-28

Fixed

  • cli: Don't enforce logging DeprecationWarning warnings.
  • cli: Fixed BrokenPipeError messages when interrupting with DipDup with SIGINT.
  • config: Fixed crash when token_transfer index has from or to filter.

Security

6.5.2 - 2023-03-09

Fixed

  • codegen: Fixed type generation for contracts with "default" entrypoint.
  • metadata: Add "mumbainet" to available networks.
  • sentry: Fixed bug leading to crash reports not being sent in some cases.
  • sentry: Fixed crash report grouping.

Deprecated

  • ci: -slim images will be based on Ubuntu instead of Alpine in the next major release.

6.5.1 - 2023-02-21

Fixed

  • codegen: Fixed bug leading to incorrect imports in generated callbacks in some cases.
  • codegen: Fixed validation of created package after dipdup init.
  • config: Allow using empty string as default env ({DEFAULT_EMPTY:-}).

Other

  • deps: Updated pydantic to 1.10.5
  • deps: Updated datamodel-code-generator to 0.17.1
  • deps: Updated tortoise-orm to 0.19.3
  • deps: Updated pytezos to 3.8.0

6.5.0 - 2023-01-28

Added

  • hasura: Apply arbitrary metadata from hasura project directory.
  • config: Added allow_inconsistent_metadata option to hasura section.

Fixed

  • config: Do not include coinbase datasource credentials in config repr.
  • database: Fixed crash when schema generation should fail with schema_modified.
  • hasura: Stop using deprecated schema/metadata API.
  • index: Fixed unnecessary prefetching of migration originations in operation index.
  • index: Remove disabled indexes from the dispatcher queue.
  • sentry: Flush and reopen session daily.
  • tzkt: Fixed OperationData.type field value for migration originations.
  • tzkt: Added missing last_level argument to migration origination fetching methods.

Other

  • tzkt: Updated current testnet protocol (limanet).
  • deps: Updated asyncpg to 0.27.0
  • deps: Updated hasura to 2.17.0

6.4.3 - 2023-01-05

Fixed

  • context: Fixed order of add_contract method arguments.
  • index: Fixed matching operations when both address and code_hash filters are specified.
  • sentry: Fixed sending crash reports when DSN is not set implicitly.
  • sentry: Increase event length limit.

6.4.2 - 2022-12-31

Added

  • config: Added http.ratelimit_sleep option to set fixed sleep time on 429 responses.
  • context: Allow adding contracts by code hash in runtime.

Fixed

  • http: Fixed merging user-defined HTTP settings and datasource defaults.
  • tzkt: Fixed iterating over big map keys.

6.4.1 - 2022-12-22

Fixed

  • models: Fixed package model detection.

6.4.0 - 2022-12-20

Fixed

  • cli: update and uninstall commands no longer require a valid config.
  • cli: Fixed a regression in new command leading to crash with TypeError.
  • config: Fixed jobs section deserialization.
  • database: Ignore abstract models during module validation.

6.4.0rc1 - 2022-12-09

Added

  • config: Added optional code_hash field to contract config.
  • context: Added first_level and last_level arguments to ctx.add_index methods.
  • index: Filtering by code_hash is available for operation index.
  • tzkt: Added datasource methods get_contract_address and get_contract_hashes.
  • tzkt: Originations and operations now can be fetched by contract code hashes.
  • tzkt: Added sender_code_hash and target_code_hash fields to OperationData model.

Fixed

  • codegen: Unresolved index templates are now correctly processed during types generation.
  • demos: Fixed outdated demo_dao project.
  • http: Fixed a crash when datasource URL contains trailing slash.
  • metadata: Add limanet to supported networks.
  • projects: Do not scaffold an outdated poetry.lock.

Changed

  • demos: Demos were renamed to better indicate their purpose.
  • exceptions: FrameworkException is raised instead of plain RuntimeError when a framework error occurs.
  • exceptions: Known exceptions are inherited from FrameworkError.
  • tzkt: Some datasource methods have changed their signatures.

Deprecated

  • config: similar_to.address filter is an alias for originated_contract.code_hash and will be removed in the next major release.
  • config: DipDupError is an alias for FrameworkError and will be removed in the next major release.

6.3.1 - 2022-11-25

Fixed

  • cli: Do not apply cli hacks on module import.
  • codegen: Include PEP 561 marker in generated packages.
  • codegen: Untyped originations are now correctly handled.
  • codegen: Fixed alias config field having no effect on originations.
  • codegen: Fixed optional arguments in generated callbacks.
  • config: Suggest snake_case for package name.
  • config: Fixed crash with RuntimeError when index has no subscriptions.
  • http: Limit aiohttp sessions to specific base URL.
  • index: Do not deserialize originations matched by the source filter.
  • index: Wrap storage deserialization exceptions with InvalidDataError.
  • projects: Fixed Hasura environment in docker-compose examples.

Security

Other

  • ci: mypy --strict is now enforced on a codebase.
  • ci: Finished migration to pytest.

6.3.0 - 2022-11-15

Added

  • context: Added execute_sql_query method to run queries from sql project directory.
  • context: execute_sql method now accepts arbitrary arguments to format SQL script (unsafe, use with caution).
  • index: New filters for token_transfer index.

Fixed

  • cli: Fixed missing log messages from ctx.logger.
  • codegen: Better PEP 8 compatibility of generated callbacks.
  • context: Fixed SQL scripts executed in the wrong order.
  • context: Fixed execute_sql method crashes when the path is not a directory.
  • database: Fixed crash with CannotConnectNowError before establishing the database connection.
  • database: Fixed crash when using F expressions inside versioned transactions.
  • http: Fixed caching datasource responses when replay_path contains tilde.
  • http: Adjusted per-datasource default config values.
  • project: Use the latest stable version instead of hardcoded values.
  • tzkt: Fixed deserializing of EventData and OperationData models.
  • tzkt: Fixed matching migration originations by address.

Deprecated

  • ci: pytezos extra and corresponding Docker image are deprecated.

6.2.0 - 2022-10-12

Added

  • cli: new command to create a new project interactively.
  • cli: install/update/uninstall commands to manage local DipDup installation.
  • index: New index kind event to process contract events.
  • install: New interactive installer based on pipx (install.py or dipdup-install).

Fixed

  • cli: Fixed commands that don't require a valid config yet crash with ConfigurationError.
  • codegen: Fail on demand when datamodel-codegen is not available.
  • codegen: Fixed Jinja2 template caching.
  • config: Allow sentry.dsn field to be empty.
  • config: Fixed greedy environment variable regex.
  • hooks: Raise a FeatureAvailabilityHook instead of a warning when trying to execute hooks on SQLite.

Improved

  • cli: Detect src/ layout when guessing package path.
  • codegen: Improved cross-platform compatibility.
  • config: sentry.user_id option to set user ID for Sentry (affects release adoption data).
  • sentry: Detect environment when not set in config (docker/gha/tests/local)
  • sentry: Expose more tags under the dipdup namespace.

Performance

  • cli: Up to 5x faster startup for some commands.

Security

  • sentry: Prevent Sentry from leaking hostname if server_name is not set.
  • sentry: Notify about using Sentry when DSN is set or crash reporting is enabled.

Other

  • ci: A significantly faster execution of GitHub Actions.
  • docs: Updated "Contributing Guide" page.

6.1.3 - 2022-09-21

Added

  • sentry: Enable crash-free session reporting.

Fixed

  • metadata: Updated protocol aliases.
  • sentry: Unwrap CallbackError traceback to fix event grouping.
  • sentry: Hide "attempting to send..." message on shutdown.

Other

  • ci: Do not build default and -pytezos nightly images.

6.1.2 - 2022-09-16

Added

  • config: Added alias field to operation pattern items.
  • tzkt: Added quote field gbp.

Fixed

  • config: Require aliases for multiple operations with the same entrypoint.
  • http: Raise InvalidRequestError on 204 No Content responses.
  • tzkt: Verify API version on datasource initialization.
  • tzkt: Remove deprecated block field priority.

6.1.1 - 2022-09-01

Fixed

  • ci: Lock Pydantic to 1.9.2 to avoid breaking changes in dataclasses.

6.1.0 - 2022-08-30

Added

  • ci: Build arm64 images for M1/M2 silicon.
  • ci: Build -slim images based on Alpine Linux.
  • ci: Introduced official MacOS support.
  • ci: Introduced interactive installer (dipdup.io/install.py).

6.0.1 - 2022-08-19

Fixed

  • codegen: Fixed invalid models.py template.
  • context: Do not wrap known exceptions with CallbackError.
  • database: Raise DatabaseConfigurationError when backward relation name equals table name.
  • database: Wrap schema wiping in a transaction to avoid orphaned tables in the immune schema.
  • hasura: Fixed processing M2M relations.
  • sentry: Fixed "invalid value environment" error.
  • sentry: Ignore events from project callbacks when crash_reporting is enabled.

6.0.0 - 2022-08-08

This release contains no changes except for the version number.

6.0.0rc2 - 2022-08-06

Added

  • config: Added advanced.crash_reporting flag to enable reporting crashes to Baking Bad.
  • dipdup: Save Sentry crashdump in /tmp/dipdup/crashdumps/XXXXXXX.json on a crash.

Fixed

  • config: Do not perform env variable substitution in commented-out lines.

Removed

  • cli: --logging-config option is removed.
  • cli: All run command flags are removed. Use the advanced section of the config.
  • cli: cache show and cache clear commands are removed.
  • config: http.cache flag is removed.

6.0.0-rc1 - 2022-07-26

Added

  • cli: Added config export --full flag to resolve templates before printing config.
  • config: Added advanced.rollback_depth field, a number of levels to keep in a database for rollback.
  • context: Added rollback method to perform database rollback.
  • database: Added an internal ModelUpdate model to store the latest database changes.

Fixed

  • prometheus: Fixed updating dipdup_index_handlers_matched_total metric.

Changed

  • codegen: on_index_rollback hook calls ctx.rollback by default.
  • database: Project models must be subclassed from dipdup.models.Model
  • database: bulk_create and bulk_update model methods are no longer supported.

Removed

  • hooks: Removed deprecated on_rollback hook.
  • index: Do not try to avoid single-level rollbacks by comparing operation hashes.

5.2.5 - 2022-07-26

Fixed

  • index: Fixed crash when adding an index with new subscriptions in runtime.

5.2.4 - 2022-07-17

Fixed

  • cli: Fixed logs being printed to stderr instead of stdout.
  • config: Fixed job scheduler not starting when config contains no indexes.

5.2.3 - 2022-07-07

Added

  • sentry: Allow customizing server_name and release tags with corresponding fields in Sentry config.

Fixed

  • cli: Fixed hasura configure command crash when models have empty Meta.table.
  • config: Removed secrets from config __repr__.

5.2.2 - 2022-07-03

Fixed

  • hasura: Fixed metadata generation.

5.2.1 - 2022-07-02

Fixed

  • cli: Fixed setting default logging level.
  • hasura: Fixed metadata generation for relations with a custom field name.
  • hasura: Fixed configuring existing instances after changing camel_case field in config.

5.2.0 - 2022-06-28

Added

  • config: Added logging config field.
  • config: Added hasura.create_source flag to create PostgreSQL source if missing.

Fixed

  • hasura: Do not apply table customizations to tables from other sources.

Deprecated

  • cli: --logging-config option is deprecated.
  • cli: All run command flags are deprecated. Use the advanced section of the config.
  • cli: cache show and cache clear commands are deprecated.
  • config: http.cache flag is deprecated.

5.1.7 - 2022-06-15

Fixed

  • index: Fixed token_transfer index not receiving realtime updates.

5.1.6 - 2022-06-08

Fixed

  • cli: Commands with --help option no longer require a working DipDup config.
  • index: Fixed crash with RuntimeError after continuous realtime connection loss.

Performance

  • cli: Lazy import dependencies to speed up startup.

Other

  • docs: Migrate docs from GitBook to mdbook.

5.1.5 - 2022-06-05

Fixed

  • config: Fixed crash when rollback hook is about to be called.

5.1.4 - 2022-06-02

Fixed

  • config: Fixed OperationIndexConfig.types field being partially ignored.
  • index: Allow mixing oneshot and regular indexes in a single config.
  • index: Call rollback hook instead of triggering reindex when single-level rollback has failed.
  • index: Fixed crash with RuntimeError after continuous realtime connection loss.
  • tzkt: Fixed origination subscription missing when merge_subscriptions flag is set.

Performance

  • ci: Decrease the size of generic and -pytezos Docker images by 11% and 16%, respectively.

5.1.3 - 2022-05-26

Fixed

  • database: Fixed special characters in password not being URL encoded.

Performance

  • context: Do not reinitialize config when adding a single index.

5.1.2 - 2022-05-24

Added

  • tzkt: Added originated_contract_tzips field to OperationData.

Fixed

  • jobs: Fixed jobs with daemon schedule never start.
  • jobs: Fixed failed jobs not throwing exceptions into the main loop.

Other

  • database: Tortoise ORM updated to 0.19.1.

5.1.1 - 2022-05-13

Fixed

  • index: Ignore indexes with different message types on rollback.
  • metadata: Add ithacanet to available networks.

5.1.0 - 2022-05-12

Added

  • ci: Push X and X.Y tags to the Docker Hub on release.
  • cli: Added config env command to export env-file with default values.
  • cli: Show warning when running an outdated version of DipDup.
  • hooks: Added a new hook on_index_rollback to perform per-index rollbacks.

Fixed

  • index: Fixed fetching migration operations.
  • tzkt: Fixed possible data corruption when using the buffer_size option.
  • tzkt: Fixed reconnection due to websockets message size limit.

Deprecated

  • hooks: The on_rollback default hook is superseded by on_index_rollback and will be removed later.

5.0.4 - 2022-05-05

Fixed

  • exceptions: Fixed incorrect formatting and broken links in help messages.
  • index: Fixed crash when the only index in config is head.
  • index: Fixed fetching originations during the initial sync.

5.0.3 - 2022-05-04

Fixed

  • index: Fixed crash when no block with the same level arrived after a single-level rollback.
  • index: Fixed setting initial index level when IndexConfig.first_level is set.
  • tzkt: Fixed delayed emitting of buffered realtime messages.
  • tzkt: Fixed inconsistent behavior of first_level/last_level arguments in different getter methods.

5.0.2 - 2022-04-21

Fixed

  • context: Fixed reporting incorrect reindexing reason.
  • exceptions: Fixed crash with FrozenInstanceError when an exception is raised from a callback.
  • jobs: Fixed graceful shutdown of daemon jobs.

Improved

  • codegen: Refined on_rollback hook template.
  • exceptions: Updated help messages for known exceptions.
  • tzkt: Do not request reindexing if missing subgroups have matched no handlers.

5.0.1 - 2022-04-12

Fixed

  • cli: Fixed schema init command crash with SQLite databases.
  • index: Fixed spawning datasources in oneshot mode.
  • tzkt: Fixed processing realtime messages.

5.0.0 - 2022-04-08

This release contains no changes except for the version number.

5.0.0-rc4 - 2022-04-04

Added

  • tzkt: Added ability to process realtime messages with lag.

4.2.7 - 2022-04-02

Fixed

  • config: Fixed jobs config section validation.
  • hasura: Fixed metadata generation for v2.3.0 and above.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods response.

5.0.0-rc3 - 2022-03-28

Added

  • config: Added custom section to store arbitrary user data.

Fixed

  • config: Fixed default SQLite path (:memory:).
  • tzkt: Fixed pagination in several getter methods.
  • tzkt: Fixed data loss when skip_history option is enabled.

Removed

  • config: Removed dummy advanced.oneshot flag.
  • cli: Removed docker init command.
  • cli: Removed dummy schema approve --hashes flag.

5.0.0-rc2 - 2022-03-13

Fixed

  • tzkt: Fixed crash in methods that do not support cursor pagination.
  • prometheus: Fixed invalid metric labels.

5.0.0-rc1 - 2022-03-02

Added

  • metadata: Added metadata_interface feature flag to expose metadata in TzKT format.
  • prometheus: Added ability to expose Prometheus metrics.
  • tzkt: Added missing fields to the HeadBlockData model.
  • tzkt: Added iter_... methods to iterate over item batches.

Fixed

  • tzkt: Fixed possible OOM while calling methods that support pagination.
  • tzkt: Fixed possible data loss in get_originations and get_quotes methods.

Changed

  • tzkt: Added offset and limit arguments to all methods that support pagination.

Removed

  • bcd: Removed bcd datasource and config section.

Performance

  • dipdup: Use fast orjson library instead of built-in json where possible.

4.2.6 - 2022-02-25

Fixed

  • database: Fixed generating table names from uppercase model names.
  • http: Fixed bug that leads to caching invalid responses on the disk.
  • tzkt: Fixed processing realtime messages with data from multiple levels.

4.2.5 - 2022-02-21

Fixed

  • database: Do not add the schema argument to the PostgreSQL connection string when not needed.
  • hasura: Wait for Hasura to be configured before starting indexing.

4.2.4 - 2022-02-14

Added

  • config: Added http datasource to making arbitrary http requests.

Fixed

  • context: Fixed crash when calling fire_hook method.
  • context: Fixed HookConfig.atomic flag, which was ignored in fire_hook method.
  • database: Create missing tables even if Schema model is present.
  • database: Fixed excess increasing of decimal context precision.
  • index: Fixed loading handler callbacks from nested packages (@veqtor).

Other

  • ci: Added GitHub Action to build and publish Docker images for each PR opened.

4.2.3 - 2022-02-08

Fixed

  • ci: Removed black 21.12b0 dependency since bug in datamodel-codegen-generator is fixed.
  • cli: Fixed config export command crash when advanced.reindex dictionary is present.
  • cli: Removed optionals from config export output so the result can be loaded again.
  • config: Verify advanced.scheduler config for the correctness and unsupported features.
  • context: Fixed ignored wait argument of fire_hook method.
  • hasura: Fixed processing relation fields with missing related_name.
  • jobs: Fixed default apscheduler config.
  • tzkt: Fixed crash occurring when reorg message is the first one received by the datasource.

4.2.2 - 2022-02-01

Fixed

  • config: Fixed ipfs datasource config.

4.2.1 - 2022-01-31

Fixed

  • ci: Added black 21.12b0 dependency to avoid possible conflict with datamodel-codegen-generator.

4.2.0 - 2022-01-31

Added

  • context: Added wait argument to fire_hook method to escape current transaction context.
  • context: Added ctx.get_<kind>_datasource helpers to avoid type casting.
  • hooks: Added ability to configure apscheduler with AdvancedConfig.scheduler field.
  • http: Added request method to send arbitrary requests (affects all datasources).
  • ipfs: Added ipfs datasource to download JSON and binary data from IPFS.

Fixed

  • http: Removed dangerous method close_session.
  • context: Fixed help message of IndexAlreadyExistsError exception.

Deprecated

  • bcd: Added deprecation notice.

Other

  • dipdup: Removed unused internal methods.

4.1.2 - 2022-01-27

Added

  • cli: Added schema wipe --force argument to skip confirmation prompt.

Fixed

  • cli: Show warning about deprecated --hashes argument
  • cli: Ignore SIGINT signal when shutdown is in progress.
  • sentry: Ignore exceptions when shutdown is in progress.

4.1.1 - 2022-01-25

Fixed

  • cli: Fixed stacktraces missing on exception.
  • cli: Fixed wrapping OSError with ConfigurationError during config loading.
  • hasura: Fixed printing help messages on HasuraError.
  • hasura: Preserve a list of sources in Hasura Cloud environments.
  • hasura: Fixed HasuraConfig.source config option.

Changed

  • cli: Unknown exceptions are no longer wrapped with DipDupError.

Performance

  • hasura: Removed some useless requests.

4.1.0 - 2022-01-24

Added

  • cli: Added schema init command to initialize database schema.
  • cli: Added --force flag to hasura configure command.
  • codegen: Added support for subpackages inside callback directories.
  • hasura: Added dipdup_head_status view and REST endpoint.
  • index: Added an ability to skip historical data while synchronizing big_map indexes.
  • metadata: Added metadata datasource.
  • tzkt: Added get_big_map and get_contract_big_maps datasource methods.

4.0.5 - 2022-01-20

Fixed

  • index: Fixed deserializing manually modified typeclasses.

4.0.4 - 2022-01-17

Added

  • cli: Added --keep-schemas flag to init command to preserve JSONSchemas along with generated types.

Fixed

  • demos: Tezos Domains and Homebase DAO demos were updated from edo2net to mainnet contracts.
  • hasura: Fixed missing relations for models with ManyToManyField fields.
  • tzkt: Fixed parsing storage with nested structures.

Performance

  • dipdup: Minor overall performance improvements.

Other

  • ci: Cache virtual environment in GitHub Actions.
  • ci: Detect CI environment and skip tests that fail in GitHub Actions.
  • ci: Execute tests in parallel with pytest-xdist when possible.
  • ci: More strict linting rules of flake8.

4.0.3 - 2022-01-09

Fixed

  • tzkt: Fixed parsing parameter with an optional value.

4.0.2 - 2022-01-06

Added

  • tzkt: Added optional delegate_address and delegate_alias fields to OperationData.

Fixed

  • tzkt: Fixed crash due to unprocessed pysignalr exception.
  • tzkt: Fixed parsing OperationData.amount field.
  • tzkt: Fixed parsing storage with top-level boolean fields.

4.0.1 - 2021-12-30

Fixed

  • codegen: Fixed generating storage typeclasses with Union fields.
  • codegen: Fixed preprocessing contract JSONSchema.
  • index: Fixed processing reindexing reason saved in the database.
  • tzkt: Fixed processing operations with default entrypoint and empty parameter.
  • tzkt: Fixed crash while recursively applying bigmap diffs to the storage.

Performance

  • tzkt: Increased speed of applying bigmap diffs to operation storage.

4.0.0 - 2021-12-24

This release contains no changes except for the version number.

4.0.0-rc3 - 2021-12-20

Fixed

  • cli: Fixed missing schema approve --hashes argument.
  • codegen: Fixed contract address used instead of an alias when typename is not set.
  • tzkt: Fixed processing operations with entrypoint default.
  • tzkt: Fixed regression in processing migration originations.
  • tzkt: Fixed filtering of big map diffs by the path.

Removed

  • cli: Removed deprecated run --oneshot argument and clear-cache command.

4.0.0-rc2 - 2021-12-11

Migration

  • Run dipdup init command to generate on_synchronized hook stubs.

Added

  • hooks: Added on_synchronized hook, which fires each time all indexes reach realtime state.

Fixed

  • cli: Fixed config not being verified when invoking some commands.
  • codegen: Fixed generating callback arguments for untyped operations.
  • index: Fixed incorrect log messages, remove duplicate ones.
  • index: Fixed crash while processing storage of some contracts.
  • index: Fixed matching of untyped operations filtered by source field (@pravin-d).

Performance

  • index: Checks performed on each iteration of the main DipDup loop are slightly faster now.

4.0.0-rc1 - 2021-12-02

Migration

  • Run dipdup schema approve command on every database you want to use with 4.0.0-rc1. Running dipdup migrate is not necessary since spec_version hasn't changed in this release.

Added

  • cli: Added run --early-realtime flag to establish a realtime connection before all indexes are synchronized.
  • cli: Added run --merge-subscriptions flag to subscribe to all operations/big map diffs during realtime indexing.
  • cli: Added status command to print the current status of indexes from the database.
  • cli: Added config export [--unsafe] command to print config after resolving all links and variables.
  • cli: Added cache show command to get information about file caches used by DipDup.
  • config: Added first_level and last_level optional fields to TemplateIndexConfig. These limits are applied after ones from the template itself.
  • config: Added daemon boolean field to JobConfig to run a single callback indefinitely. Conflicts with crontab and interval fields.
  • config: Added advanced top-level section.

Fixed

  • cli: Fixed crashes and output inconsistency when piping DipDup commands.
  • cli: Fixed schema wipe --immune flag being ignored.
  • codegen: Fixed missing imports in handlers generated during init.
  • coinbase: Fixed possible data inconsistency caused by caching enabled for method get_candles.
  • http: Fixed increasing sleep time between failed request attempts.
  • index: Fixed invocation of head index callback.
  • index: Fixed CallbackError raised instead of ReindexingRequiredError in some cases.
  • tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
  • tzkt: Fixed sending useless subscription requests when adding indexes in runtime.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods whose output was limited to HTTPConfig.batch_size field.
  • tzkt: Fixed lots of SignalR bugs by replacing aiosignalrcore library with pysignalr.

Changed

  • cli: dipdup schema wipe command now requires confirmation when invoked in the interactive shell.
  • cli: dipdup schema approve command now also causes a recalculation of schema and index config hashes.
  • index: DipDup will recalculate respective hashes if reindexing is triggered with config_modified: ignore or schema_modified: ignore in advanced config.

Deprecated

  • cli: run --oneshot option is deprecated and will be removed in the next major release. The oneshot mode applies automatically when last_level field is set in the index config.
  • cli: clear-cache command is deprecated and will be removed in the next major release. Use cache clear command instead.

Performance

  • config: Configuration files are loaded 10x times faster.
  • index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
  • tzkt: Rate limit was increased. Try to set connection_timeout to a higher value if requests fail with ConnectionTimeout exception.
  • tzkt: Improved performance of response deserialization.

3.1.3 - 2021-11-15

Fixed

  • codegen: Fixed missing imports in operation handlers.
  • codegen: Fixed invalid imports and arguments in big_map handlers.

3.1.2 - 2021-11-02

Fixed

  • Fixed crash occurred during synchronization of big map indexes.

3.1.1 - 2021-10-18

Fixed

  • Fixed loss of realtime subscriptions occurred after TzKT API outage.
  • Fixed updating schema hash in schema approve command.
  • Fixed possible crash occurred while Hasura is not ready.

3.1.0 - 2021-10-12

Added

  • New index class HeadIndex (configuration: dipdup.config.HeadIndexConfig). Use this index type to handle head (limited block header content) updates. This index type is realtime-only: historical data won't be indexed during the synchronization stage.
  • Added three new commands: schema approve, schema wipe, and schema export. Run dipdup schema --help command for details.

Changed

  • Triggering reindexing won't lead to dropping the database automatically anymore. ReindexingRequiredError is raised instead. --forbid-reindexing option has become default.
  • --reindex option is removed. Use dipdup schema wipe instead.
  • Values of dipdup_schema.reindex field updated to simplify querying database. See dipdup.enums.ReindexingReason class for possible values.

Fixed

  • Fixed ReindexRequiredError not being raised when running DipDup after reindexing was triggered.
  • Fixed index config hash calculation. Hashes of existing indexes in a database will be updated during the first run.
  • Fixed issue in BigMapIndex causing the partial loss of big map diffs.
  • Fixed printing help for CLI commands.
  • Fixed merging storage which contains specific nested structures.

Improved

  • Raise DatabaseConfigurationError exception when project models are not compatible with GraphQL.
  • Another bunch of performance optimizations. Reduced DB pressure, speeded up parallel processing lots of indexes.
  • Added initial set of performance benchmarks (run: ./scripts/run_benchmarks.sh)

3.0.4 - 2021-10-04

Improved

  • A significant increase in indexing speed.

Fixed

  • Fixed unexpected reindexing caused by the bug in processing zero- and single-level rollbacks.
  • Removed unnecessary file IO calls that could cause PermissionError exception in Docker environments.
  • Fixed possible violation of block-level atomicity during realtime indexing.

Changes

  • Public methods of TzktDatasource now return immutable sequences.

3.0.3 - 2021-10-01

Fixed

  • Fixed processing of single-level rollbacks emitted before rolled back head.

3.0.2 - 2021-09-30

Added

  • Human-readable CHANGELOG.md 🕺
  • Two new options added to dipdup run command:
    • --forbid-reindexing – raise ReindexingRequiredError instead of truncating database when reindexing is triggered for any reason. To continue indexing with existing database run UPDATE dipdup_schema SET reindex = NULL;
    • --postpone-jobs – job scheduler won't start until all indexes are synchronized.

Changed

  • Migration to this version requires reindexing.
  • dipdup_index.head_id foreign key removed. dipdup_head table still contains the latest blocks from Websocket received by each datasource.

Fixed

  • Removed unnecessary calls to TzKT API.
  • Fixed removal of PostgreSQL extensions (timescaledb, pgcrypto) by function truncate_database triggered on reindex.
  • Fixed creation of missing project package on init.
  • Fixed invalid handler callbacks generated on init.
  • Fixed detection of existing types in the project.
  • Fixed race condition caused by event emitter concurrency.
  • Capture unknown exceptions with Sentry before wrapping to DipDupError.
  • Fixed job scheduler start delay.
  • Fixed processing of reorg messages.

3.0.1 - 2021-09-24

Added

  • Added get_quote and get_quotes methods to TzKTDatasource.

Fixed

  • Defer spawning index datasources until initial sync is complete. It helps to mitigate some WebSocket-related crashes, but initial sync is a bit slower now.
  • Fixed possible race conditions in TzKTDatasource.
  • Start jobs scheduler after all indexes sync with a current head to speed up indexing.

Release notes

This section contains information about changes introduced with specific DipDup releases.

6.2.0

What's New

New interactive installer

Starting from this release, DipDup comes with an interactive installer to help you install necessary dependencies.

Run the command below in the terminal:

curl -Lsf https://dipdup.io/install_lts.py | python

Follow the instructions to complete the installation.

Now you have dipdup command available systemwide! Run it without arguments to see available commands.

You can use dipdup install/update/uninstall commands to manage the local installation.

Project scaffolding

dipdup new command is now available to create a new project from a template. Run it and follow the questions; a new project will be created in the current directory. You can also use a replay file instead; see dipdup new --help for details.

Scaffolder screenshot

Kathmandu contract events

Kathmandu Tezos protocol upgrade has introduced contract events, a new way to interact with smart contracts. This index allows indexing events using strictly typed payloads. From the developer's perspective, it's similar to the big_map index with a few differences.

An example below is artificial since no known contracts in mainnet are currently using events.

        contract: events_contract
        tag: move
      - callback: on_roll_event
        contract: events_contract
        tag: roll
      - callback: on_other_event
        contract: events_contract

Unlike big maps, contracts may introduce new event tags and payloads at any time, so the index must be updated accordingly.

async def on_move_event(
    ctx: HandlerContext,
    event: Event[MovePayload],
) -> None: ...

Each contract can have a fallback handler called for all unknown events so you can process untyped data.

async def on_other_event(
    ctx: HandlerContext,
    event: UnknownEvent,
) -> None: ...

Changes since 5.1.3

Added

  • cli: new command to create a new project interactively.
  • cli: install/update/uninstall commands to manage local DipDup installation.
  • index: New index kind event to process contract events.
  • install: New interactive installer based on pipx (install.py or dipdup-install).

Fixed

  • cli: Fixed commands that don't require a valid config yet crash with ConfigurationError.
  • codegen: Fail on demand when datamodel-codegen is not available.
  • codegen: Fixed Jinja2 template caching.
  • config: Allow sentry.dsn field to be empty.
  • config: Fixed greedy environment variable regex.
  • hooks: Raise a FeatureAvailabilityHook instead of a warning when trying to execute hooks on SQLite.

Improved

  • cli: Detect src/ layout when guessing package path.
  • codegen: Improved cross-platform compatibility.
  • config: sentry.user_id option to set user ID for Sentry (affects release adoption data).
  • sentry: Detect environment when not set in config (docker/gha/tests/local)
  • sentry: Expose more tags under the dipdup namespace.

Performance

  • cli: Up to 5x faster startup for some commands.

Security

  • sentry: Prevent Sentry from leaking hostname if server_name is not set.
  • sentry: Notify about using Sentry when DSN is set or crash reporting is enabled.

Other

  • ci: A significantly faster execution of GitHub Actions.
  • docs: Updated "Contributing Guide" page.

6.0.0

⚠ Breaking Changes

  • Project models in models.py must be subclassed from dipdup.models.Model instead of tortoise.Model.
  • The deprecated on_rollback event hook has been removed in favor of on_index_rollback.
  • HTTP datasources no longer use caching. cache show and cache clear commands have been removed. http.cache config flag has been removed.
  • --logging-config option has been removed. Use the logging config section or set up logging manually.
  • Feature flag options of run command have been removed. Use the advanced config section instead.

Migration from 5.x

  • Replace tortoise.Model import with dipdup.models.Model in your models.py module.
  • Remove the on_rollback event hook if it still persists in your project. Ensure that on_index_rollback.py contains ctx.rollback call, or remove it and call dipdup init.
  • If you have used buffer_size config option, remove it to use database-level rollbacks.
  • Run schema approve command with every schema you want to use with DipDup 6.0.

What's New

Seamless database-level rollbacks

The era of handling chain reorgs manually is finally over! Now when DipDup receives a reorg message from TzKT it just rewinds a database to the previous state reverting changes in backtracked blocks level by level. To make this possible, DipDup catches all database modifications and saves diffs in a separate table, dipdup_model_update (you don't need to access it directly).

# INSERT saved with no data, just drop this row on reorg.
trader = Trader(name='Alice', balance=100, active=True)
await trader.save()

# UPDATE saved with data old values to set them on reorg.
# Diff: {'balance': 100}
trader.balance = 200
await trader.save() 

# DELETE saved with full copy of data. On reorg this row will be recreated with the same PK.
# Diff: {'name': 'Alice', 'balance': 200, 'active': True}
await trader.delete()

Bulk class methods like bulk_insert and bulk_update are supported too. However, for raw queries, DipDup uses prefetching (additional SELECT) to save original values. So, ReallyHugeTable.filter().delete() will create efficiently a full copy of the table in dipdup_model_update. Most likely you will never need to perform such queries in handlers, but keep that detail in mind.

Since the Ithacanet protocol, only two last blocks may be backtracked. We do not need to store older diffs, they are removed automatically. If you need to keep more levels or disable this feature, adjust rollback_depth config option.

advanced:
  rollback_depth: 2  # 0 to disable

on_index_rollback event hook now looks like this:

from dipdup.context import HookContext
from dipdup.index import Index


async def on_index_rollback(
    ctx: HookContext,
    index: Index,  # type: ignore[type-arg]
    from_level: int,
    to_level: int,
) -> None:
    await ctx.execute_sql('on_index_rollback')
    await ctx.rollback(
        index=index.name,
        from_level=from_level,
        to_level=to_level,
    )

TzKT buffer_size option remains available, but it's not required to handle chain reorgs anymore.

Crash dumps and automatic reporting

Now when DipDup catches unhandled exceptions, a crash dump will be saved to the temporary directory.

dipdup.exceptions.CallbackError: An error occured during callback execution
________________________________________________________________________________

`demo_token.hooks.on_restart` callback execution failed:

  Exception: 

Eliminate the reason of failure and restart DipDup.
________________________________________________________________________________

Crashdump saved to `/tmp/dipdup/crashdumps/veb7kz07.json`

This JSON file is the same data Sentry collects on crashes if integration is enabled. It includes a stack trace, local variables of each frame, and other information useful when investigating a crash. Attach this file when sending bug reports to GitHub Issues.

When preparing a crash dump Sentry can detect sensitive information like database passwords in the crash dump and remove it from the report. So it's generally safe to share the crash dump with the developers. Now you can also send these crash reports automatically to the Baking Bad team.

Your privacy matters; crash reporting is disabled by default. Simulate a crash with a random exception and inspect a crash dump before enabling this option to ensure that report doesn't contain secrets. Then add the following lines to your config:

advanced:
  crash_reporting: True

Changes since 5.2.5

Added

  • cli: Added config export --full flag to resolve templates before printing config.
  • config: Added advanced.crash_reporting flag to enable reporting crashes to Baking Bad.
  • config: Added advanced.rollback_depth field, a number of levels to keep in a database for rollback.
  • context: Added rollback method to perform database rollback.
  • database: Added an internal ModelUpdate model to store the latest database changes.
  • dipdup: Save Sentry crashdump in /tmp/dipdup/crashdumps/XXXXXXX.json on a crash.

Fixed

  • config: Do not perform env variable substitution in commented-out lines.
  • prometheus: Fixed updating dipdup_index_handlers_matched_total metric.

Changed

  • codegen: on_index_rollback hook calls ctx.rollback by default.
  • database: Project models must be subclassed from dipdup.models.Model

Removed

  • cli: --logging-config option is removed.
  • cli: All run command flags are removed. Use the advanced section of the config.
  • cli: cache show and cache clear commands are removed.
  • config: http.cache flag is removed.
  • hooks: Removed deprecated on_rollback hook.
  • index: Do not try to avoid single-level rollbacks by comparing operation hashes.

5.1.0

Migration from 5.0 (optional)

  • Run init command. Now you have two conflicting hooks: on_rollback and on_index_rollback. Follow the guide below to perform the migration. ConflictingHooksError exception will be raised until then.

What's New

Per-index rollback hook

In this release, we continue to improve the rollback-handling experience, which became much more important since the Ithaca protocol reached mainnet. Let's briefly recap how DipDup currently processes chain reorgs before calling a rollback hook:

  1. If the buffer_size option of a TzKT datasource is set to a non-zero value, and there are enough data messages buffered when a rollback occurs, data is just dropped from the buffer, and indexing continues.
  2. If all indexes in the config are operation ones, we can attempt to process a single-level rollback. All operations from rolled back block must be presented in the next one for rollback to succeed. If some operations are missing, the on_rollback hook will be called as usual.
  3. Finally, we can safely ignore indexes with a level lower than the rollback target. The index level is updated either on synchronization or when at least one related operation or bigmap diff has been extracted from a realtime message.

If none of these tricks have worked, we can't process a rollback without custom logic. Here's where changes begin. Before this release, every project contained the on_rollback hook, which receives datasource: IndexDatasource argument and from/to levels. Even if your deployment has thousands of indexes and only a couple of them are affected by rollback, you weren't able to easily find out which ones.

Now on_rollback hook is deprecated and superseded by the on_index_rollback one. Choose one of the following options:

  • You haven't touched the on_rollback hook since project creation. Run init command and remove hooks/on_rollback and sql/on_rollback directories in project root. Default action (reindexing) has not changed.
  • You have some custom logic in on_rollback hook and want to leave it as-is for now. You can ignore introduced changes at least till the next major release.
  • You have implemented per-datasource rollback logic and are ready to switch to the per-index one. Run init, move your code to the on_index_rollback hook and delete on_rollback one. Note, you can access rolled back datasource via index.datasource.

Token transfer index

Sometimes implementing an operation index is overkill for a specific task. An existing alternative is to use a big_map index to process only the diffs of selected big map paths. However, you still need to have a separate index for each contract of interest, which is very resource-consuming. A widespread case is indexing FA1.2/FA2 token contracts. So, this release introduces a new token_transfer index:

indexes:
  transfers:
    kind: token_transfer
    datasource: tzkt
    handlers:
      - callback: transfers

The TokenTransferData object is passed to the handler on each operation, containing only information enough to process a token transfer.

config env command to generate env-files

Generally, It's good to separate a project config from deployment parameters, and DipDup has multiple options to achieve this. First of all, multiple configs can be chained successively, overriding top-level sections. Second, the DipDup config can contain docker-compose-style environment variable declarations. Let's say your config contains the following content:

database:
  kind: postgres
  host: db
  port: 5432
  user: ${POSTGRES_USER:-dipdup}
  password: ${POSTGRES_PASSWORD:-changeme}
  database: ${POSTGRES_DB:-dipdup}

You can generate an env-file to use with this exact config:

$ dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=dipdup

The environment of your current shell is also taken into account:

$ POSTGRES_DB=foobar dipdup -c dipdup.yml -c dipdup.docker.yml config env
POSTGRES_USER=dipdup
POSTGRES_PASSWORD=changeme
POSTGRES_DB=foobar  # <- set from current env

Use -f <filename> option to save output on disk instead of printing to stdout. After you have modified the env-file according to your needs, you can apply it the way which is more convenient to you:

With dipdup --env-file / -e option:

dipdup -e prod.env <...> run

When using docker-compose:

services:
  indexer:
    ...
    env_file: prod.env

Keeping framework up-to-date

A bunch of new tags is now pushed to the Docker Hub on each release in addition to the X.Y.Z one: X.Y and X. That way, you can stick to a specific release without the risk of leaving a minor/major update unattended (friends don't let friends use latest 😉). The -pytezos flavor is also available for each tag.

FROM dipdup/dipdup:5.1
...

In addition, DipDup will poll GitHub for new releases on each command which executes reasonably long and print a warning when running an outdated version. You can disable these checks with advanced.skip_version_check flag.

Pro tip: you can also enable notifications on the GitHub repo page with 👁 Watch -> Custom -> tick Releases -> Apply to never miss a fresh DipDup release.

Changelog

See full 5.1.0 changelog here.

5.0.0

⚠ Breaking Changes

  • Python versions 3.8 and 3.9 are no longer supported.
  • bcd datasource has been removed.
  • Two internal tables were added, dipdup_contract_metadata and dipdup_token_metadata.
  • Some methods of tzkt datasource have changed their signatures and behavior.
  • Dummy advanced.oneshot config flag has been removed.
  • Dummy schema approve --hashes command flag has been removed.
  • docker init command has been removed.
  • ReindexingReason enumeration items have been changed.

Migration from 4.x

  • Ensure that you have a python = "^3.10" dependency in pyproject.toml.
  • Remove bcd datasources from config. Use metadata datasource instead to fetch contract and token metadata.
  • Update tzkt datasource method calls as described below.
  • Run the dipdup schema approve command on every database you use with 5.0.0.
  • Update usage of ReindexingReason enumeration if needed.

What's New

Process realtime messages with lag

Chain reorgs have occurred much recently since the Ithaca protocol reached mainnet. The preferable way to deal with rollbacks is the on_rollback hook. But if the logic of your indexer is too complex, you can buffer an arbitrary number of levels before processing to avoid reindexing.

datasources:
  tzkt_mainnet:
    kind: tzkt
    url: https://api.tzkt.io
    buffer_size: 2

DipDup tries to remove backtracked operations from the buffer instead emitting rollback. Ithaca guarantees operations finality after one block and blocks finality after two blocks, so to completely avoid reorgs, buffer_size should be 2.

BCD API takedown

Better Call Dev API was officially deprecated in February. Thus, it's time to go for bcd datasource. In DipDup, it served the only purpose of fetching contract and token metadata. Now there's a separate metadata datasource which do the same thing but better. If you have used bcd datasource for custom requests, see How to migrate from BCD to TzKT API article.

TzKT batch request pagination

Historically, most TzktDatasource methods had a page iteration logic hidden inside. The quantity of items returned by TzKT in a single request is configured in HTTPConfig.batch_size and defaulted to 10.000. Before this release, three requests would be performed by the get_big_map method to fetch 25.000 big map keys, leading to performance degradation and extensive memory usage.

affected methodresponse size in 4.xresponse size in 5.x
get_similar_contractsunlimitedmax. datasource.request_limit
get_originated_contractsunlimitedmax. datasource.request_limit
get_big_mapunlimitedmax. datasource.request_limit
get_contract_big_mapsunlimitedmax. datasource.request_limit
get_quotesfirst datasource.request_limitmax. datasource.request_limit

All paginated methods now behave the same way. You can either iterate over pages manually or use iter_... helpers.

datasource = ctx.get_tzkt_datasource('tzkt_mainnet')
batch_iter = datasource.iter_big_map(
    big_map_id=big_map_id,
    level=last_level,
)
async for key_batch in batch_iter:
    for key in key_batch:
        ...

Metadata interface for TzKT integration

Starting with 5.0 you can store and expose custom contract and token metadata in the same format DipDup Metadata service does for TZIP-compatible metadata.

Enable this feature with advanced.metadata_interface flag, then update metadata in any callback:

await ctx.update_contract_metadata(
    network='mainnet',
    address='KT1...',
    metadata={'foo': 'bar'},
)

Metadata stored in dipdup_contract_metadata and dipdup_token_metadata tables and available via GraphQL and REST APIs.

Prometheus integration

This version introduces initial Prometheus integration. It could help you set up monitoring, find performance issues in your code, and so on. To enable this integration, add the following lines to the config:

prometheus:
  host: 0.0.0.0

Changes since 4.2.7

Added

  • config: Added custom section to store arbitrary user data.
  • metadata: Added metadata_interface feature flag to expose metadata in TzKT format.
  • prometheus: Added ability to expose Prometheus metrics.
  • tzkt: Added ability to process realtime messages with lag.
  • tzkt: Added missing fields to the HeadBlockData model.
  • tzkt: Added iter_... methods to iterate over item batches.

Fixed

  • config: Fixed default SQLite path (:memory:).
  • prometheus: Fixed invalid metric labels.
  • tzkt: Fixed pagination in several getter methods.
  • tzkt: Fixed data loss when skip_history option is enabled.
  • tzkt: Fixed crash in methods that do not support cursor pagination.
  • tzkt: Fixed possible OOM while calling methods that support pagination.
  • tzkt: Fixed possible data loss in get_originations and get_quotes methods.

Changed

  • tzkt: Added offset and limit arguments to all methods that support pagination.

Removed

  • bcd: Removed bcd datasource and config section.
  • cli: Removed docker init command.
  • cli: Removed dummy schema approve --hashes flag.
  • config: Removed dummy advanced.oneshot flag.

Performance

  • dipdup: Use fast orjson library instead of built-in json where possible.

4.2.0

What's new

ipfs datasource

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

You can tune HTTP connection parameters with the http config field, just like any other datasource.

Sending arbitrary requests

DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
    cache=False,
    weigth=1,  # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.

Firing hooks outside of the current transaction

When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:

hooks:
  my_hook:
    callback: my_hook
    atomic: True

Until now, such hooks could only be fired according to jobs schedules, but not from a handler or another atomic hook using ctx.fire_hook method. This limitation is eliminated - use wait argument to escape the current transaction:

async def handler(ctx: HandlerContext, ...) -> None:
    await ctx.fire_hook('atomic_hook', wait=False)

Spin up a new project with a single command

Cookiecutter is an excellent jinja2 wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter package systemwide, then call:

cookiecutter https://github.com/dipdup-io/cookiecutter-dipdup

Advanced scheduler configuration

DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:

advanced:
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3

See apscheduler docs for details.

Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.

About the present and future of multiprocessing

It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with AsyncExitStack() as stack:
        await stack.enter_async_context(tortoise_wrapper(url, models))
        ...

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=False)

Then use python -m <project>.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.

That's all, folks! As always, your feedback is very welcome 🤙

4.1.0

Migration from 4.0 (optional)

  • Run dipdup schema init on the existing database to enable dipdup_head_status view and REST endpoint.

What's New

Index only the current state of big maps

big_map indexes allow achieving faster processing times than operation ones when storage updates are the only on-chain data your dapp needs to function. With this DipDup release, you can go even further and index only the current storage state, ignoring historical changes.

indexes:
  foo:
    kind: big_map
    ...
    skip_history: never|once|always

When this option is set to once, DipDup will skip historical changes only on initial sync and switch to regular indexing afterward. When the value is always, DipDup will fetch all big map keys on every restart. Preferrable mode depends on your workload.

All big map diffs DipDup pass to handlers during fast sync have action field set to BigMapAction.ADD_KEY. Keep in mind that DipDup fetches all keys in this mode, including ones removed from the big map. If needed, you can filter out the latter by BigMapDiff.data.active field.

New datasource for contract and token metadata

Since the first version DipDup allows to fetch token metadata from Better Call Dev API with bcd datasource. Now it's time for a better solution. Firstly, BCD is far from being reliable in terms of metadata indexing. Secondly, spinning up your own instance of BCD requires significant effort and computing power. Lastly, we plan to deprecate Better Call Dev API soon (but do not worry - it won't affect the explorer frontend).

Luckily, we have dipdup-metadata, a standalone companion indexer for DipDup written in Go. Configure a new datasource in the following way:

datasources:
  metadata:
    kind: metadata
    url: https://metadata.dipdup.net
    network: mainnet|ghostnet|limanet

Now you can use it anywhere in your callbacks:

datasource = ctx.datasources['metadata']
token_metadata = await datasource.get_token_metadata(address, token_id)

bcd datasource will remain available for a while, but we discourage using it for metadata processing.

Nested packages for hooks and handlers

Callback modules are no longer have to be in top-level hooks/handlers directories. Add one or multiple dots to the callback name to define nested packages:

package: indexer
hooks:
  foo.bar:
    callback: foo.bar

After running init command, you'll get the following directory tree (shortened for readability):

indexer
├── hooks
│   ├── foo
│   │   ├── bar.py
│   │   └── __init__.py
│   └── __init__.py
└── sql
    └── foo
        └── bar
            └── .keep

The same rules apply to handler callbacks. Note that callback field must be a valid Python package name - lowercase letters, underscores, and dots.

New CLI commands and flags

  • schema init is a new command to prepare a database for running DipDip. It will create tables based on your models, then call on_reindex SQL hook to finish preparation - the same things DipDup does when run on a clean database.

  • hasura configure --force flag allows to configure Hasura even if metadata hash matches one saved in database. It may come in handy during development.

  • init --keep-schemas flag makes DipDup preserve contract JSONSchemas. Usually, they are removed after generating typeclasses with datamodel-codegen, but you can keep them to convert to other formats or troubleshoot codegen issues.

Built-in dipdup_head_status view and REST endpoint

DipDup maintains several internal models to keep its state. As Hasura generates GraphQL queries and REST endpoints for those models, you can use them for monitoring. However, some SaaS monitoring solutions can only check whether an HTTP response contains a specific word or not. For such cases dipdup_head_status view was added - a simplified representation of dipdup_head table. It returns OK when datasource received head less than two minutes ago and OUTDATED otherwise. Latter means that something's stuck, either DipDup (e.g., because of database deadlock) or TzKT instance. Or maybe the whole Tezos blockchain, but in that case, you have problems bigger than indexing.

$ curl "http://127.0.0.1:41000/api/rest/dipdupHeadStatus?name=https%3A%2F%2Fapi.tzkt.io" 
{"dipdupHeadStatus":[{"status":"OUTDATED"}]}%

Note that dipdup_head update may be delayed during sync even if the --early-realtime flag is enabled, so don't rely exclusively on this endpoint.

Changelog

Added

  • cli: Added schema init command to initialize database schema.
  • cli: Added --force flag to hasura configure command.
  • codegen: Added support for subpackages inside callback directories.
  • hasura: Added dipdup_head_status view and REST endpoint.
  • index: Added an ability to skip historical data while synchronizing big_map indexes.
  • metadata: Added metadata datasource.
  • tzkt: Added get_big_map and get_contract_big_maps datasource methods.

4.0.0

⚠ Breaking Changes

  • run --oneshot option is removed. The oneshot mode (DipDup stops after the sync is finished) applies automatically when last_level field is set in the index config.
  • clear-cache command is removed. Use cache clear instead.

Migration from 3.x

  • Run dipdup init command to generate on_synchronized hook stubs.
  • Run dipdup schema approve command on every database you want to use with 4.0.0. Running dipdup migrate is not necessary since spec_version hasn't changed in this release.

What's New

Performance optimizations

Overall indexing performance has been significantly improved. Key highlights:

  • Configuration files are loaded 10x times faster. The more indexes in the project, the more noticeable difference is.
  • Significantly reduced CPU usage in realtime mode.
  • Datasource default HTTP connection options optimized for a reasonable balance between resource consumption and indexing speed.

Also, two new flags were added to improve DipDup performance in several scenarios: merge_subscriptions and early_relatime. See this paragraph for details.

Configurable action on reindex

There are several reasons that trigger reindexing:

reasondescription
manualReindexing triggered manually from callback with ctx.reindex.
migrationApplied migration requires reindexing. Check release notes before switching between major DipDup versions to be prepared.
rollbackReorg message received from TzKT, and can not be processed.
config_modifiedOne of the index configs has been modified.
schema_modifiedDatabase schema has been modified. Try to avoid manual schema modifications in favor of SQL hooks.

Now it is possible to configure desirable action on reindexing triggered by the specific reason.

actiondescription
exception (default)Raise ReindexingRequiredError and quit with error code. The safest option since you can trigger reindexing accidentally, e.g., by a typo in config. Don't forget to set up the correct restart policy when using it with containers.
wipeDrop the whole database and start indexing from scratch. Be careful with this option!
ignoreIgnore the event and continue indexing as usual. It can lead to unexpected side-effects up to data corruption; make sure you know what you are doing.

To configure actions for each reason, add the following section to DipDup config:

advanced:
  ...
  reindex:
    manual: wipe
    migration: exception
    rollback: ignore
    config_modified: exception
    schema_modified: exception

New CLI commands and flags

command or flagdescription
cache showGet information about file caches used by DipDup.
config exportPrint config after resolving all links and variables. Add --unsafe option to substitute environment variables; default values from config will be used otherwise.
run --early-realtimeEstablish a realtime connection before all indexes are synchronized.
run --merge-subscriptionsSubscribe to all operations/big map diffs during realtime indexing. This flag helps to avoid reaching TzKT subscriptions limit (currently 10000 channels). Keep in mind that this option could significantly improve RAM consumption depending on the time required to perform a sync.
statusPrint the current status of indexes from the database.

advanced top-level config section

This config section allows users to tune system-wide options, either experimental or unsuitable for generic configurations.

fielddescription
early_realtime
merge_subscriptions
postpone_jobs
Another way to set run command flags. Useful for maintaining per-deployment configurations.
reindexConfigure action on reindexing triggered. See this paragraph for details.

CLI flags have priority over self-titled AdvancedConfig fields.

aiosignalrcore replaced with pysignalr

It may not be the most noticeable improvement for end-user, but it still deserves a separate paragraph in this article.

Historically, DipDup used our own fork of signalrcore library named aiosignalrcore. This project aimed to replace the synchronous websocket-client library with asyncio-ready websockets. Later we discovered that required changes make it hard to maintain backward compatibility, so we have decided to rewrite this library from scratch. So now you have both a modern and reliable library for SignalR protocol and a much more stable DipDup. Ain't it nice?

Changes since 3.1.3

This is a combined changelog of -rc versions released since the last stable release until this one.

Added

  • cli: Added run --early-realtime flag to establish a realtime connection before all indexes are synchronized.
  • cli: Added'run --merge-subscriptions` flag to subscribe to all operations/big map diffs during realtime indexing.
  • cli: Added status command to print the current status of indexes from the database.
  • cli: Added config export [--unsafe] command to print config after resolving all links and variables.
  • cli: Added cache show command to get information about file caches used by DipDup.
  • config: Added first_level and last_level optional fields to TemplateIndexConfig. These limits are applied after ones from the template itself.
  • config: Added daemon boolean field to JobConfig to run a single callback indefinitely. Conflicts with crontab and interval fields.
  • config: Added advanced top-level section.
  • hooks: Added on_synchronized hook, which fires each time all indexes reach realtime state.

Fixed

  • cli: Fixed config not being verified when invoking some commands.
  • cli: Fixed crashes and output inconsistency when piping DipDup commands.
  • cli: Fixed missing schema approve --hashes argument.
  • cli: Fixed schema wipe --immune flag being ignored.
  • codegen: Fixed contract address used instead of an alias when typename is not set.
  • codegen: Fixed generating callback arguments for untyped operations.
  • codegen: Fixed missing imports in handlers generated during init.
  • coinbase: Fixed possible data inconsistency caused by caching enabled for method get_candles.
  • hasura: Fixed unnecessary reconfiguration in restart.
  • http: Fixed increasing sleep time between failed request attempts.
  • index: Fixed CallbackError raised instead of ReindexingRequiredError in some cases.
  • index: Fixed crash while processing storage of some contracts.
  • index: Fixed incorrect log messages, remove duplicate ones.
  • index: Fixed invocation of head index callback.
  • index: Fixed matching of untyped operations filtered by source field (@pravin-d).
  • tzkt: Fixed filtering of big map diffs by the path.
  • tzkt: Fixed get_originated_contracts and get_similar_contracts methods whose output was limited to HTTPConfig.batch_size field.
  • tzkt: Fixed lots of SignalR bugs by replacing aiosignalrcore library with pysignalr.
  • tzkt: Fixed processing operations with entrypoint default.
  • tzkt: Fixed regression in processing migration originations.
  • tzkt: Fixed resubscribing when realtime connectivity is lost for a long time.
  • tzkt: Fixed sending useless subscription requests when adding indexes in runtime.

Changed

  • cli: schema wipe command now requires confirmation when invoked in the interactive shell.
  • cli: schema approve command now also causes a recalculation of schema and index config hashes.
  • index: DipDup will recalculate respective hashes if reindexing is triggered with config_modified: ignore or schema_modified: ignore in advanced config.

Removed

  • cli: Removed deprecated run --oneshot argument and clear-cache command.

Performance

  • config: Configuration files are loaded 10x times faster.
  • index: Checks performed on each iteration of the main DipDup loop are slightly faster now.
  • index: Number of operations processed by matcher reduced by 40%-95% depending on the number of addresses and entrypoints used.
  • tzkt: Improved performance of response deserialization.
  • tzkt: Rate limit was increased. Try to set connection_timeout to a higher value if requests fail with ConnectionTimeout exception.

DipDup contribution guide

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

General

  • All code in this repository MUST be licensed under the MIT License.
  • Python code in this repository MUST run on Python 3.10. It also SHOULD run on Python 3.11. Using modern language features is encouraged.
  • Python code in this repository MUST run in Linux, macOS, Docker, and amd64/arm64 environments. Windows SHOULD be supported as well.
  • We use the Poetry package manager and GNU Make to set up the development environment. You SHOULD install both tools and run make help to see available shortcuts.
  • Developers SHOULD have fun while contributing to the project.

GitHub

  • Branch names MUST follow prefix/short-description format. Prefixes currently in use: feat for features, fix for bugfixes, docs for documentation, exp for experiments, aux for everything else.
  • Commits in pull requests MUST be squashed when merging to master.
  • Issues and pull requests MUST have a descriptive title; they SHOULD be linked to each other, appropriately labeled, and assigned to maintainers while in progress.

Codestyle

  • We use the following combo of linters and formatters: isort, black, ruff, mypy. All linter checks MUST pass before merging code to master (CI will fail otherwise).
  • Single quotes MUST be used for string literals.
  • Meaningful comments are highly RECOMMENDED to begin with # NOTE:, # TODO:, or # FIXME:.
  • f-string formatting is RECOMMENDED over other methods. Logging is an exception to this rule.

Packaging

  • All dependencies MUST be declared in pyproject.toml file.
  • Non-development dependencies MUST be pinned to non-breaking versions (e.g. ^1.2.3).
  • Core dependencies that we patch MUST be pinned to specific versions (e.g. 1.2.3).

Releases

  • Release versions MUST conform to Semantic Versioning. Releases that introduce breaking changes MUST be major ones.
  • Only the latest major version is supported in general. Critical fixes MAY be backported to the previous major release. To do so, create an aux/X.Y.Z branch from the latest stable tag, bump the DipDup version manually, and add a new tag.

Changelog

  • All changes that affect user (developer) experience MUST be documented in the CHANGELOG.md file.
  • Changes that significantly affect DipDup maintainers' experience MAY be documented in the CHANGELOG.md file.
  • The changelog MUST conform to the "Keep a Changelog" specification (CI will break otherwise).
  • Lines describing changes MUST be sorted and begin with DipDup module name (index: Added ...).

Documentation

  • A page in Release Notes SHOULD accompany all major releases.
  • All internal links MUST be created with {{ #summary ... shortcodes.
  • All values used in project templates MUST be replaced with {{ #cookiecutter ... shortcodes.

Security

  • GitHub alerts about dependencies that contain vulnerabilities MUST be investigated and resolved as soon as possible.
  • Security-related bugfixes MUST be mentioned in the changelog under the "Security" section.

Privacy

  • Crash reporting MUST be opt-in (disabled by default) both in config and project templates.
  • Sentry events and crash reports MUST NOT contain any sensitive information (IP addresses, hostnames, etc.)
  • DipDup SHOULD NOT perform network requests to APIs not defined in config as datasources. Current exceptions: GitHub.

Docker images

  • DipDup dockerfiles use autogenerated requirements.txt files. Maintainers MUST run make update script on every change in dependencies.
  • Docker images for stable releases MUST be published on Docker Hub. They MAY also be published on GHCR.
  • Maintainers MAY publish arbitrary images on GHCR and remove them when not needed.

Installer

  • Installer module MUST depend on Python stdlib only.

Scaffolding

  • Project templates SHOULD cover all index types available in DipDup.
  • They also MAY contain additional features and integrations.

Demo projects

  • Demos are stored in demos root directory. They MUST be generated automatically from project templates using replay files.
  • Maintainers SHOULD run make demos replays command regularly to ensure that demo projects are up to date.
This page or paragraph is yet to be written. Come back later.

Tests

Code Review

MIT License

Copyright (c) 2021 Baking Bad

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.