Tutorial Overview

In this tutorial, we’re going to build a simple DAG workflow that will take a random cat fact generated by an API, a random cat fact generated by an LLM, and ask an LLM to determine which one was generated by an LLM. In order to do this, we’ll need to build a custom tool to get the random facts, prepare the LLM, and connect them together in our DAG.

The final DAG will look like this:

cat_fact_tool ---------|
                       |
                       v
                    llm_tool ---> llm_explanation_tool
                       ^
                       |
random_fact_tool ------|

Prerequisites

Please make sure you have completed the Getting Started tutorial before starting this one.

Building the cat fact tool

Overview

For this part of the tutorial, we’ll be using a free API to get random cat facts. We’ll need to extend the Node class and call the API using aiohttp.

Imports

To start, let’s import all the modules we’ll need for this tool.

cat_fact_tool.py

from trellis_dag import Node
import aiohttp

Extending Node

Next, we’ll extend Node to create our new CatFactsAPITool class. The only required functions to implement are the constructor and execute.

cat_fact_tool.py
class CatFactsAPITool(Node):

Writing CatFactsAPITool constructor

To effectively write our constructor, we need to understand the arguments that Node takes.

cat_fact_tool.py
    def __init__(
        self,
        name: str,
        *args,
        **kwargs,
    ) -> None:
        input_s = {}
        output_s = {"cat_fact_1": [{"fact": str, "length": int}]}
        super().__init__(name, input_s, output_s, *args, **kwargs)

There’s a lot going on here, so let’s break it down.

  • name is the name of the tool. Under the hood, the nodes are identified by UUID, but have a human-readable name for convenience.
  • input_s is a dictionary of the inputs that the tool expects from other Nodes as Python types. The keys are the names of the inputs, and the values are the types of the inputs.
  • output_s is a dictionary of the outputs that the tool produces as Python types. The keys are the names of the outputs, and the values are the types of the outputs. In this case, we produce a single output called cat_fact_1 that is a list of dictionaries, where each dictionary has string keys and string values. This is the format that the API returns, so we’ll keep it as is.
  • *args and **kwargs are used to pass arguments to the node which will be used at runtime (when Node.execute() is called). You can set these via the constructor for convenience, or use .set_execute_args to do this.

NOTE: even though input_s and output_s are optional to Node, setting these upfront will save you a lot of pain in debugging later.

For more information on Node, visit the Reference section.

Writing CatFactsAPITool execute

The execute function is where Node assumes your business logic will be called. For this example, the only thing we need to do is get the arguments for calling the CatFacts API, call the API, and return the result.

Getting the arguments

The first thing we need to do is get our execute arguments. In this case, we have one integer argument; max_length. There are two ways to set execute arguments: static and dynamic. If the arguments can be hardcoded, you can set them via .set_execute_args. However, if you want to dynamically set these arguments (say, from the output of an LLM call executed right before), you can do so as well. These execute args will then be sent to Node.input instead of Node.execute_args. Despite the difference under the hood, you can use safe_get_execute_arg (prioritizes Node.input over Node.execute_args) no matter how you set the argument to retrieve its value.

In this case, we only need to provide one argument: max_length. max_length is the maximum length of the cat facts we want to get.

cat_fact_tool.py
    async def execute(self) -> None:
        max_length: int = self.safe_get_execute_arg("max_length", 140)

Calling the API

Next, we need to call the API. We’ll use aiohttp to do this. aiohttp is a popular asynchronous HTTP client for Python that is well-supported and easy to use.

cat_fact_tool.py
        async with aiohttp.ClientSession() as session:
            async with session.get(
                f"https://catfact.ninja/facts?limit=1&max_length={max_length}"
            ) as response:

Once we get the result, we need to set Node’s output value to the output of the API call, so that our LLM can use its result. We can do this using set_output. Note the transformation from "data" to "cat_fact_1"; we have done this for ease of understanding in the further steps, but doing anything here is a decision left to the user. If it remains unchanged, you’ll have to transform it later.

cat_fact_tool.py
                if response.status == 200:
                    data = await response.json()
                    self.set_output({"cat_fact_1": data["data"]})

Since the rest of the pipeline depends on the output, we’ll throw an error if the API call fails. While we’ll default to raising errors in this tutorial, how you handle internal errors in-practice is a design decision that is completely up to you.

cat_fact_tool.py
                else:
                    raise ValueError(f"Failed to get cat facts: {response}")

Putting it all together

That’s it for the CatFactsAPITool! Visit the Node reference to learn more. The final code should look like this:

cat_fact_tool.py
from trellis_dag import Node
import aiohttp

import asyncio


class CatFactsAPITool(Node):
    def __init__(
        self,
        name: str,
        *args,
        **kwargs,
    ) -> None:
        input_s = {}
        output_s = {"cat_fact_1": [{"fact": str, "length": int}]}
        super().__init__(name, input_s, output_s, *args, **kwargs)

    async def execute(self) -> None:
        max_length: int = self.safe_get_execute_arg("max_length", 140)

        async with aiohttp.ClientSession() as session:
            async with session.get(
                f"https://catfact.ninja/facts?limit=1&max_length={max_length}"
            ) as response:
                if response.status == 200:
                    data = await response.json()
                    self.output = {"cat_fact_1": data["data"]}
                    return self.output
                else:
                    raise ValueError(f"Failed to get cat facts: {response}")

Move onto the next section to write the OpenAI calls for your LLM nodes.