Tuesday, September 29, 2009

FSM and ragel

The diagram on the right is a depiction of a state machine to parse command line arguments. I'm looking at ragel lately, because the architecture and design are genuinely compelling. The philosophy and architecture behind it are not necessarily limited to lexing input or protocols (although that is what ragel basically does). I'm looking at this from the perspective of applied research in intelligent agents knowledge base sharing and upgrading. One of the ideas I was having is whether there is a possibility to develop a common knowledge between two computer processes that is not necessarily static (like 'pre-defined'), but whether it may actually have dynamic properties such that it can reason with its internal state and knowledge base to resolve specific dead-ends and so on.

( btw, just inbetween, for an explanation how I post code on blogger without using syntax highlighter: http://kevin-berridge.blogspot.com/2007/08/posting-code-on-blogger.html ).

The above diagram was generated by specifying a sort of language for the command line arguments that the application understands. Language is to be interpreted in the broadest sense of the word. Think of it as any stream of input characters in which you can convey ideas or specifications of actions to undertake.

In the above diagram, a state is reached when the state machine can successfully pick up the next character from the stream. So, the state machine can move to a different state if it finds that the next character in the stream contains that specific symbol. It's a bit like a filter. Some states have multiple exit points (so they can go over a number of transitions), which is fine. The interesting characteristic of ragel in comparison with lexer is that you're both string matching and executing code at the same time. So when using ragel, you get the opportunity to start executing things which at a later point may not be completable because the final part of the input is missing. It takes a bit of programming to either discard the state or use it any way, it's not applicable in every context. I can imagine that if you work on transaction-based systems, you just panicked with these statements :). There, you typically wait until the full request is in, generate a response and wait for the client to actually tell you to commit and do it for real.

Another interesting part of ragel is that it doesn't use glibc or other heavier functions (possibly that much). In the above example, you'd typically use some strXXX function from glibc to find out what the user supplied. You also need to make sure your buffers are correctly set up and you don't go over them (I always use strNXXXX functions just in case I get caught out). ragel on the other hand works on your supplied buffers immediately and uses pointer arithmetic. There are two output modes: table-based, where the transition of one state to another is more of a path description and goto-based.

Goto's should probably be considered evil, but with state machines I'm starting to think that machine intelligence could greatly benefit from execution contexts that can switch very quickly from one state to another. Earlier posts made in 2007 have already rambled on about stackless python and so on. Having a stack that grows infinitely doesn't help much.

Now, in the philosophy of ragel, would there be a possibility to develop an agent language that runs in some kind of engine where the agent would continuously instruct the engine what to execute next? Maybe a thread-based context of instructions could help to make this multi-processing.

In that line of thought, consider that a state is basically an identifiable place or state of mind or state in a computer. Having an apple in your hand could be considered a state. A large problem in AI is how you make computers reason from one state to another. Generally this is done with a pre-defined knowledge base that defines all the rules before anything is executed. Such machines or robots become applicable to one thing only (that what is in the knowledge base) and not much else.

Now, a start state is known, maybe it's idle or maybe it's about being hungry or some pro-active state where the AI is trying to achieve something (possibly governed by some emotion engine?). The interaction of several state machines together would be really interesting here. The idea is to get from "start" to "goal" state. If the computer would simulate in its engine how it could get from start to goal by going over the transitions, then it may be able to find different ways of achieving its objective. If transitions have costs associated with them, then the AI could reason about the best method to achieve the objective.

Taking a transition also means using its internal resources. It isn't necessarily a trivial task. A robot could be in a start state somewhere identifiable in the current space and it may deem that it is necessary to move to another location, thus another state. The current focus is then how to get from start -> location. The transition to do that is movement and movement is concerned about finding a path from start->location, possibly using intermediate states that could be used to achieve it. If the transition finds through observation that everything is blocked, then it may decide to panic or find other ways (more costly?) to attempt.

What is described here is a design for a flexible reasoning engine depending fully on (a combination of) state machines, which execute snippets of code inbetween its reasoning processes. Combine this with a shareable language between other robots and human beings (interactive terminal?) and the computer could start asking questions...:
  1. Q: "how start->goal?"
  2. A: "apply movement transition"
  3. ( robot downloads movement knowledgebase and code? )
A basic scenario involves a monkey, a box, a cage with three prescribed locations and a banana. The objective for the monkey is to grab the banana, which it can only do if the box is in the middle of the room, the monkey is standing on the box and it reaches out to grab the banana. This is a reasoning problem, as the details and specifics of actually executing those actions are of a different domain. Actually, the interesting part would be to communicate to other modules of an AI that something is an objective and leave it to sensors and other stuff to actually carry out the specific task. When those modules all agree the task has been executed, they could communicate this back to the reasoning module, which is now confirmed in the new state.

Ragel doesn't just apply actions when it's doing a transition. It can also do this when leaving states and entering them. This allows for some more flavours of interestingness. The idea is that an AI should be able to dynamically extend its knowledge base (which a couple of implementations do), ideally through communication using a simple, non-ambiguous language to communicate those knowledge gems.

In the example of the monkey above, a goal state could be to "have the banana". The computer then doesn't know how to get into that state, so it needs to understand the differences in the following:
  1. how to grab a banana
  2. how to reach out for a banana
  3. how to climb on a box (and whether it is strong enough to support the monkey)
  4. whether the monkey robot is tall enough to reach the banana without the box
  5. how to move the box around the room (and that the monkey cannot be on the box to do this).
  6. whether the box is in the middle of the room
Using these states, you can draw a state diagram of actions to be executed in a certain order. Eventually, if you leave the reasoning to the computer, it should reach a sequence of actions that is least costly to execute (the shortest way to get there) and that is what it should try.

No comments: