4.2.0

What's new

`ipfs` datasource

While working with contract/token metadata, a typical scenario is to fetch it from IPFS. DipDup now has a separate datasource to perform such requests.

datasources:
  ipfs:
    kind: ipfs
    url: https://ipfs.io/ipfs

You can use this datasource within any callback. Output is either JSON or binary data.

ipfs = ctx.get_ipfs_datasource('ipfs')

file = await ipfs.get('QmdCz7XGkBtd5DFmpDPDN3KFRmpkQHJsDgGiG16cgVbUYu')
assert file[:4].decode()[1:] == 'PDF'

file = await ipfs.get('QmSgSC7geYH3Ae4SpUHy4KutxqNH9ESKBGXoCN4JQdbtEz/package.json')
assert file['name'] == 'json-buffer'

You can tune HTTP connection parameters with the http config field, just like any other datasource.

Sending arbitrary requests

DipDup datasources do not cover all available methods of underlying APIs. Let's say you want to fetch protocol of the chain you're currently indexing from TzKT:

tzkt = ctx.get_tzkt_datasource('tzkt_mainnet')
protocol_json = await tzkt.request(
    method='get',
    url='v1/protocols/current',
    cache=False,
    weigth=1,  # ratelimiter leaky-bucket drops
)
assert protocol_json['hash'] == 'PtHangz2aRngywmSRGGvrcTyMbbdpWdpFKuS4uMWxg2RaH9i1qx'

Datasource HTTP connection parameters (ratelimit, backoff, etc.) are applied on every request.

Firing hooks outside of the current transaction

When configuring a hook, you can instruct DipDup to wrap it in a single database transaction:

hooks:
  my_hook:
    callback: my_hook
    atomic: True

Until now, such hooks could only be fired according to jobs schedules, but not from a handler or another atomic hook using ctx.fire_hook method. This limitation is eliminated - use wait argument to escape the current transaction:

async def handler(ctx: HandlerContext, ...) -> None:
    await ctx.fire_hook('atomic_hook', wait=False)

Spin up a new project with a single command

Cookiecutter is an excellent jinja2 wrapper to initialize hello-world templates of various frameworks and toolkits interactively. Install python-cookiecutter package systemwide, then call:

cookiecutter https://github.com/dipdup-io/cookiecutter-dipdup

Advanced scheduler configuration

DipDup utilizes apscheduler library to run hooks according to schedules in jobs config section. In the following example, apscheduler spawns up to three instances of the same job every time the trigger is fired, even if previous runs are in progress:

advanced:
  scheduler:
    apscheduler.job_defaults.coalesce: True
    apscheduler.job_defaults.max_instances: 3

See apscheduler docs for details.

Note that you can't use executors from apscheduler.executors.pool module - ConfigurationError exception raised then. If you're into multiprocessing, I'll explain why in the next paragraph.

About the present and future of multiprocessing

It's impossible to use apscheduler pool executors with hooks because HookContext is not pickle-serializable. So, they are forbidden now in advanced.scheduler config. However, thread/process pools can come in handy in many situations, and it would be nice to have them in DipDup context. For now, I can suggest implementing custom commands as a workaround to perform any resource-hungry tasks within them. Put the following code in <project>/cli.py:

from contextlib import AsyncExitStack

import asyncclick as click
from dipdup.cli import cli, cli_wrapper
from dipdup.config import DipDupConfig
from dipdup.context import DipDupContext
from dipdup.utils.database import tortoise_wrapper


@cli.command(help='Run heavy calculations')
@click.pass_context
@cli_wrapper
async def do_something_heavy(ctx):
    config: DipDupConfig = ctx.obj.config
    url = config.database.connection_string
    models = f'{config.package}.models'

    async with AsyncExitStack() as stack:
        await stack.enter_async_context(tortoise_wrapper(url, models))
        ...

if __name__ == '__main__':
    cli(prog_name='dipdup', standalone_mode=False)

Then use python -m <project>.cli instead of dipdup as an entrypoint. Now you can call do-something-heavy like any other dipdup command. dipdup.cli:cli group handles arguments and config parsing, graceful shutdown, and other boilerplate. The rest is on you; use dipdup.dipdup:DipDup.run as a reference. And keep in mind that Tortoise ORM is not thread-safe. I aim to implement ctx.pool_apply and ctx.pool_map methods to execute code in pools with magic within existing DipDup hooks, but no ETA yet.

That's all, folks! As always, your feedback is very welcome 🤙