Rethinking Ruleset Registration in KRL


Summary

A ruleset registry has been part of KRL from the start. This proposal would deprecate ruleset registries in favor of simply using URLs.

Updated January 21, 2014, 11:15am to add additional unresolved issues.

URL

Since it's inception, KRL was meant to be a language of the Internet. This was something of an experiment. Firstly, as an Internet language, all processing is in the cloud. That is, it's PaaS-only model; you can't run it from the command line. Secondly, programs would be identified by URL.

This has, for the most part, worked pretty well. But as I move to a model where multiple KRL rule engines (KREs) are running in Docker instances around the Internet, there's one early design decision that has caused some problems: ruleset registration.

URLs are long, so we created a registry where a ruleset identifier, or RID, could be mapped to the URL. This meant that KRL programs could refer to rulesets by a relatively short ID instead of a long URL. So, you'll see KRL code that looks like this:

ruleset example {
  meta {
    name "My Example Ruleset"

    use module a16x8 alias math
    use module b57x15
  }

  rule flip {
    select when echo hello
    pre {
      x = math:greatCicleDistance(56);
      y = b57x15:another_function("hello")
    }
    send_directive("hello world");
    always {
      raise notification event status for a16x69
        with dist = x
    }
  }
}

Note that we're using two modules identified by RID, a16x8 and b57x15 respectively. In the first case we gave it an alias to make the code easier to read. In the explicit event raise that happens in the rule's postlude, we raise the event for a specific ruleset by ID, a16x69 in this case. This doesn't happen often, but it's an optimization that KRL allows. When the rule engine runs across a RID, it looks it up in the registry and loads the code at the associated URL (if it's not cached).

The problem with a fixed registry is that each instance of KRE is running it's own registry. No problems there unless we want them to all be able to run the same program, say Fuse. The Fuse rulesets refer to each other by RID. That means that they need to have the same RID on every instance of KRE. An ugly synchronization problem.

Another solution would be to create a global registry, but that's just another piece of infrastructure to run that will go down and cause reliability problems. If KRL is a language of the Internet, then it ought not be subject to single points of failure.

I've determined the real solution is to go back to the root idea and simply use URLs, with in-ruleset aliases, as the ruleset identifier. So the proceeding code might become this:

ruleset example {
  meta {
    name "My Example Ruleset"

    use module https://s3.amazonaws.com/my_rulesets/math.krl alias math
    use module https://example.com/rulesets/transcode.krl alias transcode
    use rid notify for https://windley.com/rulesets/notification.krl
  }

  rule flip {
    select when echo hello
    pre {
      x = math:greatCicleDistance(56);
      y = transcode:another_function("hello")
    }
    send_directive("hello world");
    always {
      raise notification event status for notify
        with dist = x
    }
  }
}

Note that in the case of modules, we've simply replaced the RID with a URL and used the existing alias mechanism to provide a convenient handle. In the case of the event being raised to a specific ruleset, we don't necessarily want to load it as a module (and incur whatever overhead that might create), so I've introduced a new pragma in the meta block to declare aliased for rids. The syntax for that isn't set in stone, this is just a proposal.

The advantage to this method is that now rulesets can live anywhere without explicit registration. And multiple instances of KRE can run the program without a central registry. The ruleset serves as a soft registry that can be changed by the programmer as needed without keeping some static structure up to date. Note: none of this changes the current security requirements for rulesets to be installed in a pico before they are run there.

There are a few problems that I've yet to work out.

  1. This method works fine for rulesets that are publicly available at a URL. But some rulesets have developer keys and secrets. And some programmers don't want to make their ruleset public for other reasons (e.g. trade secrets). With a registry, we solved this problem by supporting BASIC AUTH URLs. Since the registry hid the URL, the password wasn't exposed. That obviously won't work here.

  2. The Sky Cloud API model relies on the RID. We obviously can't substitute a URL in the URL scheme for Sky Cloud and have it be very easy to use. One solution would be to use the ruleset name (the string immediately after the keyword ruleset in the ruleset definition) for this purpose. The system could dynamically register the name with the URL for a specific pico when the ruleset is installed in that pico. The user wouldn't be able to install two rulesets with the same name. This could be a potential problem since there's no way to enforce any global uniqueness on ruleset names.

  3. When rulesets are flushed from the cache in a given instance, the current method is to put a semicolon separated list of RIDs in the flush URL. This would have to change to support a collection of URLs in the body of a post.

These are the issues I've thought of so far. I'll continue to update this as I give it more thought. I welcome your comments and especially any suggestions you have to improving this proposal.