Tuesday, November 30, 2010

Optimization in scheduling problems

The CSP solver talked about before is an effective way to calculate all possible solutions. Some problems however are either underconstrained or just have a very large number of possible solutions. In those cases, exploring the entire solution space may be very expensive; or certain application areas may be subjective to particular time windows in which the solution must have been developed (dynamic scheduling problems). Even in dynamic scheduling, there are differences with regards to the exact time window that is allowed. A scheduling system for shipping berths for example has a different allowed time window than a scheduling system for elevators.

The image shown here is a 'fitness' landscape. It's a bit of an imaginary landscape that displays how effective or costly a solution is. The peaks are desirable solutions, whereas the valleys are much less desirable. In this case, it's a one-dimensional graph, but for other solution spaces it could well be more of a 2D or even 3D space. The number of dimensions for the fitness landscape is actually determined by the number of cost functions you have.

This is where we get to the optimization part. A cost function is a function that determines the cost of a range of tuples that are (part of) a solution. This cost function is what guides the desirability of a certain solution and that is what basically drives the optimization. Ideally, one would like to find the best solution first. In reality, the best solution doesn't come first without a bit of back-tracking. Even worse, there is no guarantee in practical planners that the best solution will be found, only a solution that is 'good enough' given a certain threshold.

If you've paid attention, then I've talked about the use of a 'cost function', which actually only calculates the cost of a particular solution so far. So how does one actually form very good solutions? Because this sounds like it's more like trying out each possibility in turn until we're happy with the solution so far? This is why the cost function must be paired with a heuristic function. The cost function is used to calculate the actual cost/benefit. The heuristic function is used to calculate the expected cost or benefit for using the valuation in a particular domain variable over a slightly further horizon.

This is not exactly all. The cost functions do not have to be commensurate or agree with another. Sometimes a particular tuple choice is good for cost function A, but very bad for function B. So the balance between the importance cost functions must be determined prior to starting the solution process. Most cost functions are pretty linear, but it's also possible that continually preferring solutions that favour function A eventually raise the costs non-linearly, such that further in the solution space function B becomes cheaper than function A. Those are very difficult to solve.

There may also be other situations in which you don't want the ideal solution, but want to allow for a certain flexibility overall. For example, in a schedule for berthing ships where you expect changes to occur, you might want to favour a solution where a minor change in the schedule doesn't flip the entire schedule around, but where the change is constrained to one or two flips between tupelizations (because the more tuples are flipped due to the change, the more costly the change will have become).

Optimization is rather difficult to do effectively and it's an NP-complete problem. So either you arrange the computation power to go through all solutions, or find better methods to achieve proper solutions. There are two ways to do the optimization. You could try to do optimization at the same time as solving a CSP (basically, define the ordering of values for a variable when tuples are picked), or generate some schedule that is more or less good and then try to improve it through local search.

Local search basically means that you have a complete schedule, but pick two tuples of a set and flip valuations around (whilst preferably not violating the constraints) in the hopes of finding a schedule that is slightly better. Local search however, if you look at the image, is a bit tricky, because you may have to force the solution to go through a valley. And how can you tell whether the solution, when going up another hill, is actually going to be a solution with a higher peak than the one you had before? And if you're just going to try this out, how do you prevent the exploration of the entire solution space, which is what you wanted to prevent in the first place?

Basically, optimization is a bit like navigating a landscape in fog. You get only slight clues that there's a hill or a valley and at some point you must decide whether to carry on downhill to eventually find another monsterpeak, or stop searching altogether and call it your best solution. That is very difficult to do without a map. Constructing the map is equal to calculating the entire solution space.

Sunday, November 21, 2010

Sudoku, CSP and playing with sets

Sudoku is solvable as a constraint satisfaction problem. It is described as a problem where you should populate a grid of 9x9 tiles with numbers ranging from 1-9 in such a way, that there are no duplicate numbers in a row, column or local tile-block. Because an empty grid would allow you to derive any combination imaginable, the puzzle starts with a reasonable amount of number already filled in, usually in the order of thirty something. A naive approach in solving the puzzle uses a brute-force like mechanism: - Start from the first empty tile, determine which valuations can be made, continue to the next, etc... A much better and clever way to solve the puzzle is to find places where only one value is possible, fill this in and see if this modifies the state space, such that another single valuation can be made in a different location. I've taken some time to try to solve this puzzle using the CSP solver that is running on the database. The first naive approach took over an hour and still did not return a single solution. The second approach however worked out to a solution under three seconds. It can solve very easy puzzles as shown here, but also the hardest of them. Harder ones just take slightly longer, because there is less information. Here's how my state space is organized and queries operate:
  1. Create a table to denote the grid. Each record denotes a single tile as a column, row combination and also records to which block that tile belongs.
  2. Create another table that records per record the valuation for that single tile, the tuple.
  3. Inspect the puzzle and create tuple records for each already given number. These constrain the valuations you can make from there.
  4. Then launch this query every time there's still some tile left to evaluate:
select sc.col, sc.row, count(*) as count from sudoku_cell sc
left join tuple t on sc.col = t.col and sc.row = t.row, my_values mv
where mv.some_assignment not in ( select t.assignment from tuple t where t.col = sc.col )
and mv.some_assignment not in ( select t.assignment from tuple t where t.row = sc.row )
and mv.some_assignment not in ( select t.assignment from tuple t where t.block = sc.block )
and t.assignment is null group by sc.col, sc.row order by count asc;
The result is an ordering of tiles, described by the number of valuations that can be made for each specific tile, given the current state of the puzzle. Of course, when you have a tile there where one valuation can be made, that brings you one step closer to the solution.
How does Sudoku help us to understand human reasoning better and how computer algorithms work in comparison to human reasoning?
All this work with SQL got me thinking about the method that computers use in reasoning from the perspective of set theory. Reasoning in computers is highly dependent on set theory, especially when you use SQL like I do. Logic in computer programs are basically running sets through algorithms that determine if a member is part or not part of a set (unions, intersections, etc.). Some sets are made to relate to one another through attributes or keys from either member of each set.

When you want to do anything useful in SQL, you need discriminators that determine the scope of your set and then operations from one set on another. Sometimes the relationship is a parent-child or a direct relationship from one member to another, so you can join the elements together to form a superset. Sometimes you wish to eliminate those values that are apparent in another set without attempting to make a full join. A lot of logic in SQL revolves around the use of "WHERE x NOT IN" or "WHERE x IN" kind of statements. So how does this all work in general? No matter what the query, you typically start out in a form like:
SELECT x,y,z,... FROM TABLE abc;
and very likely, the next improvement to this query is an attached WHERE statement, which reduces the returned set to those values that do not violate the conditions. Another mechanism used for querying is to join tables together on some suggested relation and on the basis of that relationship, further limit the returned results (mostly on attributes in the rows of the related table). This allows you to make rather complicated queries like: "only return those cats that have more than 5 kittens" or "query all orders that have line items higher than threshold X, created between now and three months ago".

The reason for mentioning this is that it may help in the future to construct queries. If one set is empty, it cannot have any effect on another set, because they are fully disjoint. So in order to do anything useful, your query must return rows, at least one.

A tougher example is what I had to do for the Zebra puzzle. There are cases where you have a rule that discriminates members of a set. There may also be no rule. So you can't depend on any useful result returned from the rule query. The idea is then to return all rows unless there's a rule that dictates some should not be returned. The eventual query for solving the puzzle turned out as a very complex beast with multiple IN, NOT IN statements. It's probably helpful to describe them a bit here:

SELECT all colors that can be assigned to the current house being considered, where
  1. the color is not already used in a tuple,
  2. the color is not in any rule that specifically binds the color to a different house,
  3. If there's a rule that dictates the house I'm currently considering has a certain color (a set that is composed of one color), the color must be in that set (return 1). If no such rule exists, return all possible colors (so if no rule exists, allow any color as long as this particular rule is concerned),
  4. the color is not in a set, composed by rules that are active on houses to which I'm not a neighbor. So if a rule dictates that a house must be blue and I am not a neighbor to that house, then exclude blue from the set,
  5. the color is not in a set, composed by all colors that are determined by rules for which the rule dictates the neighbor of the currently considered house should have that color. So, if a rule dictates that my neighbor's house should be blue, then the currently considered house cannot be blue.
You can understand that such reasoning causes programs to explode in complexity. The logic interpreters like Prolog hide a lot of these details (backtracking, generation of possible values), but at some point you may want to start optimizing how sets are generated, which is where you get similar complexity. And you can only be effective when you know how exactly the interpreter is handling things for you.

In Sudoku there's a similar issue that is easier to understand, but not often modeled in solvers. For example, given the above specific sudoku puzzle, the computer finds that column 4, row 1 and row 7 both have one possible valuation. When I solved this puzzle by hand however, I was not immediately drawn to those possibilities, because there are lots of openings around those tiles, so instinctively not a place that could lead to a quick inference. What I did notice very quickly is that tile (9,8) can only contain a 7 because of the two columns 7 and 8 both containing a 7 and the rest of the tile block being populated. The constraints that the computer is using however still do allow a valuation of 1, 2, 3, 6 or 7.

This is the kind of knowledge of experts. Reasoning very quickly over many different sets of data, instantly reasoning over intersections, unions and so on. There are some visual combinations of elements that are very quick to analyze for humans, but which pose challenges for computers. Those quick analyses are very interesting to look at, because it's mostly expert knowledge that is not easily converted into some problem statement or constraint:
For a specific tile block, determine the set of possible valuations in that tile block that are still to be made. This is set X. Then count for each valuation the number of tiles it can be legally assigned to in the tile block. If there is a valuation with only one legal tile allocation, this is the placement to make.
In order to understand this, we need to have an idea about the relationship of one tile with another. This relationship is that one tile is placed in the same block as another tile. Limiting domains is a good way to increase performance if the cost of doing that is less than traversing the possibilities in brute-force. Such complicated relationships are not often modeled in reasoners due to their complexity, but as you can now see they are often and quickly used by human reasoners.

It's difficult to combine these results in one particular query or reasoning mechanism. Another way of looking at this is that they are different strategies for resolving the problem. The problem is only concerned with the tuples, the more there are found the better. Trying out different constraint combinations and understanding how at a higher level these change the solution space is a very important concern for such problem solvers. A downside of this is that the state space and complexity goes up significantly, but at the benefit of finding solutions much faster or much better than a standard CSP can guarantee.

Thursday, November 11, 2010

Constraint programming

I've blogged before about the difference between inductive and imperative programming. I've started to look more into constraint satisfaction problem programming, which actually able to provide you with solutions to a surprisingly larger amount of problems than you'd initially think. Typical examples of CSP solvers include the textbook examples like the n-queens puzzle, the region coloring scheme, the "SENDMOREMONEY" problem, the "Zebra" puzzle, eight number puzzle, Sudoku and so on. Real life examples tend to go down the route of static scheduling problems. I explicitly mention "static" here, because dynamic scheduling problems are those with the potential to vary significantly over a very short amount of time. A famous one for dynamic scheduling problems is the choice of the actions for a set of lifts in a busy skyscraper building. There are expectations from lift users to be served within a certain amount of time (requiring the insight in the required capacity at design time of course), but also the strategy of the elevators per individual and as a group needs to be adjusted.

But I digress, because static scheduling problems are rather similar without explicit time constraints on calculation duration. Typically, the values that you consider for CSP solutions are in the numeric, boolean or finite domain (sets). For example, you could have a domain that dictates whether a truck is used or not (boolean), how many hours it would take to perform a transport (integer), or in which timeslot or time of day a particular transport is initiated (set/finite domain).

There are various approaches to CSP. Many academic research papers mostly worry about the algebra and theory notations, mostly the functional and semantic correctness of the algorithm, which means that they mostly worry about constraint violation or not, not necessarily complexity or how long it would take to run the algorithm.

A naive approach starts from the brute-force approach. Just run through every possible combination in your entire domain set. Then verify for each combination you come up with, verify if any constraint is violated and then backtrack and try something else. This approach will take a very long time.

This is where it is useful to start applying as much knowledge as possible you have about your domain. I see the following concerns for resolving a CSP:
  • The primary concern is that the solution does not violate hard constraints.
  • The secondary concern is that it runs in reasonable time.
  • The tertiary concern is that it provides a good enough solution early on, ideally the first solution.
The concerns should be considered in the design and implementation. Actually combining variables from different domains together can be called tuplization, the action of constructing a list of tuples which form the solution. The domains may have the following properties:
  • Some domains have less elements than others and invoke more constraints. Start with the domain that has little elements, but most severely restricts future possibilities. Backtracking can be reduced by the order in which you process domains.
  • There is usually one domain, usually the first domain that you consider, which you only need to traverse once. So, if you have a domain {A,B,C,D}, then taking A and finding all combinations where A is valid, then B, then C will find you all the solutions. There is no further need to process {B,A,C,D}, {C,A,B,D}, {D,A,B,C}, etc... So traverse this domain once only in the order that is given.
  • Variables in domains may be ordered by (contextual) desirability.
  • Some domains have variables that can be used more than once, sometimes even unlimited, although they usually have some relationship with other variables in other domains that eventually reduces this domain as well.
Can you implement a CSP solver with an imperative programming language? Yes, but there are considerations to make on the approach, most specifically the algorithm you wish to use for exploring the search space. I consider that for CSP solvers that apply lots of knowledge benefit more from a depth-first algorithm instead of a breadth-first algorithm. The reason is that I do not think it is very effective to verify the constraints each and every time you wish to perform a selection, but you should maintain the state in memory as you pick the tuples out (and roll these back afterwards).

For example, consider a scheduling problem for a university. You have classrooms, professors, timeslots (hours) and students. These domains need to be combined in such a way that:
  1. Every student can follow all the registered courses
  2. All courses are taught the correct number of times per week
  3. A professor is teaching a course, but never teaching two courses at once
  4. A classroom is not used for two different courses at once
  5. There may be a preference for the time of day a course is taught
  6. Ideally, there are no hours between courses for any student on any given day (courses taught are consecutive)
  7. Some professors may have registered time off on particular days
  8. Some professors may indicate preferences for time off, but not as a hard constraint
  9. There may be preferences for using classrooms that are close together for consecutive courses
  10. Classrooms are typically used for particular departments of study
  11. Some classrooms are available for exceptions only, but are typically to be avoided (auditorium)
So the CSP here has a number of hard constraints (such things that when combined, would invalidate the overall solution), but also a number of preferences (those things that bear some cost when it needs to be used, but can be born or considered if there is no cheaper alternative).

In the above constraints, it is probably cheaper to maintain a large 'status list' for each room, professor, timeslot and student, such that it can be easily verified whether a tuplization, given the current state of the solution, would violate a constraint or not. Better yet, with a simple query, it is now possible to select only those variables from a domain which are still assignable, given the current choice of variables in another domain.

Many CSP solvers use C++ or Java constructs with huge libraries to maintain this state for you and attempt to expose a usable, although probably not complete, interface for you to manipulate. Any CSP engine should attempt to leave the manipulation of state space (those modifications you think you should perform to maintain a correct view of the situation) externally to the engine, but guide the user in the process of performing the tuplization. Or rather, it is rather easy to develop an engine that lets an application engineer insert a couple of domains or functions that provide these for every tuplization step, which also allows for pluggable points to perform changes when one tuple has been selected and of course steps to undo those changes.

Now connect a CSP engine to an SQL database, then you could use the functionality of the database engine (indexing, transaction management, transaction isolation, tables, type checking, flushing to disk, etc.). The SQL then allows you to query at each tuplization step what the remaining elements are for each domain. You only need to order those elements to improve the order of desirability of the selection you are making.

There are no guarantees for finding optimals here, so the ordering and other heuristics are still guesses overall (that a selection will not have severe negative effects on possible future selections, increasing the cost significantly). But you know that the selection process for variables in the domain only provide those variables that actually can be assigned sensibly, given the current partial assignment of variables in the tuple. This is why the order in which you process domains is important, it has consequences for how much knowledge you can apply in the model and the position where it can be applied.

Monday, November 08, 2010

Neural Dynamics

One of my main interests is understanding the dynamics within and between neurons that allows them to tune in to the behaviour of organisms which is in the interest of that organism at a higher level. In previous posts, I discussed a mechanism for these neurons to be excited by neurotransmitters released by a pre-synaptic neuron using, mostly, regulation of ionic fluxes (currents) through ion channels within the membrane.

The problem with these channels is that they are not linear in nature, but exponentially cause the opening of other channels (or inhibition).

Although it sounds like these are great mechanisms as a basis for intelligence, it's only the basis of operations for transmission of a signal from one end to another. Moreover, the channels do not typically behave the same way, because these are open for a different amount of time. So the voltage being measured is the resulting voltage from the sum of channels opening and generating some current. Each channel by itself can be associated with a particular probability for opening or closing and for a certain amount of time. Only at a macro-level when you assume, say, 1.000 channels, will it look like following some binomial or other probability distribution ( which is what also generates the nonlinear behavior at a macro level ).

So the fundamentals of neuroscience are providing a slightly more complete picture of the entire dynamics at work. Where the ion channels provide a mechanism for propagation, summing and integration of signals, it does not yet explain a useful integration and propagation as such. The synapse is still a very mysterious field of research with mostly speculation. It's still very much a mystery whether the famous Hebbian rule of "fire together, wire together" is a reality or more like some probabilistic phenomenon observed most of the time (but where the real dynamics are slightly more complicated).

Interesting discourses related to the power of integration of neurons can be found in the connections of axons from several pre-synaptics towards the dendritic tree of a post-synaptic one. Most neural models connect the axon more or less directly to the neuron cell body, thereby only allowing for the summation and integration of signals at the level of the soma, where all signals come together. The branches in the dendrite tree however may already perform a couple of integrations prior to the voltage potential reaching (or not) the ion channels near the soma, where presumably the action potential is actually truly activated and propagated.

The result here being that some signals, due to their proximity, can lead to very different results when added together at a local level, rather than at a global level. The underlying reason is that an inhibitory signal can entirely cancel out another signal of different magnitude (so the result is not necessarily mathematically correct). A good example of these effects is by looking at formulas in maths. Integration at the level of the soma would equate to a formula like:

Y = A - B + C + D - E

But integration at branching points in the dendritic tree can cancel out any prior summation of positive signals just by applying a signal of similar magnitude on a place higher up (lower up?) in the branch, so you could get:

( C + D - F - G + H + K + L - M - ....... whatever .... ) - E = 0
Y = A - B

The difference here being that the placement of connections in the dendritic tree is important and that there is only a certain range available for excitations or inhibitions. So besides 'just' integration of signals, this suggests that certain more or less 'logical' operations may also exist, which can be very useful for more complex separations of data and signals. It must be said that this tree is not necessarily typically "logical", as some effects have been noted where activations at the remotest end of the dendritic tree cause increases or decreases of the action potential at about 200-300 um into the axon. So the real effects of how neurons connect to one another's tree is still food for thought.

All this is speculation of course. The most useful thing that a neural algorithm needs is a better learning algorithm. Most of these algorithms in classical neural nets are back-propagating ones. This means you measure at the output, calculate the error, then back-propagate this error within the network to follow some kind of gradient. You need the derivatives from the neuron activation functions for this to happen.

If you want this to work in nature, you'd need a second brain to calculate the error and then have this applied inside the network. Hardly plausible. Somewhere is a mechanism at work which has local knowledge only and which adjusts the synaptic efficacy accordingly. This adjustments taking a certain amount of time and which is highly regulated by Long Term Potentiation (the ability of a neuron to become more excitable over a longer period of time). It seems that all in all, there are certain time windows where a neuron starts to synthesize proteins, which eventually modify the behavior of the neuron, change the structure or otherwise influence the synaptic gaps (by making more synaptic vesicles for example).

Saturday, October 16, 2010

ZeroMQ

I'm evaluating a couple of products for a larger project at work. The project will use at some point some kind of distributed storage, which may possibly be filled by the "ceph" project. Although it is experimental, there are a good couple of features that work already. For these evaluations, I've looked at some other work that has been done in this area. Most notably of course is the work that Google has done (GFS, Chubby, Percolator and BigTable). Some of the concepts described there are also implemented in Ceph, most specifically the Paxos algorithm for reliable replication.

Distributed storage is great for anything you'd like to query and which should be available 3-4 seconds after the data is "created" (although it's very likely created earlier). Distributed storage though is not a very good basis for broadcasting/multicasting/unicasting data as fast as possible after it was produced, because clients end up polling the file system for changes. For efficiency's sake and your own sanity, it is then better to use some kind of message queue. Two main architectures here: "bus" and "broker". The bus is basically a local subnet broadcast, where everything attached to the bus will have to parse messages and discard them if there is no interest. The broker(s) can be configured to receive messages from producers and route them through some broker network, reducing the number of times messages are copied and thus get re-transmitted through a separate link. Buses are useful only if you want all servers to know about the data, or the majority of them. Brokers are useful if you have geographically dispersed locations, want full control on how data is retransmitted, etc... Some brokers can also act more like "hubs", if you follow the "pubsubhubbub" project.

Sidenote: if you can avoid 'realtime' and mq altogether and do it through some file system, I'd recommend it. The concepts are simpler, it's easier to deal with faults and should you need any group locks, then the file system, even the distributed one called ceph, will be able to provide that for you. A mq puts the onus on the client for data caching, storing, reordering and all that. So file-based applications are always easier to write (at least from the client perspective :).

A quite complete evaluation of message queues was made by SecondLife. Note what is said about "AMQP". I believe that standards should evolve from things that work and not the other way around. For more information, check out:

http://bhavin.directi.com/selecting-a-message-queue-amqp-or-zeromq/

http://www.imatix.com/articles:whats-wrong-with-amqp

I've downloaded 0MQ and decided to give it a spin. The first thing I did was visit the IRC channel on a saturday and there were 50 people in there. That's usually a very good sign it's quite popular. There's a good chance that 0MQ is getting some serious attention soon. It may not be the absolute topper in performance speed, but it may be best if you're looking for flexible solutions, prototyping and esxperimentation. Having said that, if you're not running a super-duper high-performance site that needs to squeeze all the juice of your CPU's and NIC's, ZeroMQ will do just very fine for you, no problems whatsoever.

To set the expectations right... it's a bit more of a library for internetworking your applications together using multicast/tcp/ipc protocols, but that's already a big gain if there's a library that does that for you. This is typically a layer where many developers think they want something special, but ends up being the same requirement like everybody else's. Relying on a 3rd party lib that's widely used everywhere, and thus tested, is a very good decision. The only thing you end up doing is writing the application or service specific implementation and the message marshalling stuff. The library does the rest.

For anyone doing Enterprise messaging, 0MQ won't give you much yet. There are no tools to replay messages, retrieve them from error logs and what have you (or development tools). You may need something like OpenAMQ instead or RabbitMQ. To be honest, you do not necessarily need anything more complicated other than 0MQ if you develop everything in house.

Many services nowadays use XML/XSD to specify message formats. My main gripe with XML is that for too many people, it's about dealing with format instead of function. The issues with namespaces and all these other things make XML a really big beast to handle. Read _message matching_ to get some ideas where this will go wrong. Delayed brokers eventually slow down your entire business, make brokers unreliable, cause delays in some transactions that need to go fast, or negatively impact other processes that run on the same machine, requiring you to set up 2 brokers with double the probability something goes wrong... etc... Better get rid of it, move on and separate 'meaning' from 'transmission' and then verify things later. No need to encapsulate each message with the entire ontology each time a message is sent. I have a couple of interesting ideas in this respect that I'll share later as a research publication perhaps.

Update: 0MQ isn't without 'issues', but as far as I could determine so far there aren't any serious bugs in the code like memory leaks or reconnects, etc... I found two issues so far:
  • Messages are not being throttled at the server side. So if you look at the weather server example, which sends a lot of messages as fast as possible, under the water these messages actually get piled up in a list for the I/O subsystem to transfer them asynchronously. If the I/O subsystem simply cannot keep up, then you'll deplete your memory resources very quickly. The solution is to add high water marks or swaps, or use other methods for throttling the throughput of messages.
  • zmq_poll works, but because it assumes you're listening on multiple sockets, the idea is to visit each socket pool as fast as possible to collect any new messages that may have arrived. You then get one message per call and some kind of interleaving between each socket pool (if they're both really fast). Although this sounds nice, the underwater code assumes that you do not want to maintain any kind of "sleep" or other function to give back some CPU time for other processors. So, when you start 10 clients as in the examples implementing the zmq_poll, no client will yield and they all try to get 100% CPU time. If your client app can deal with it, the workaround is to embed a simple 'x' ms sleep in there. I recommend you sleep only when the last cycle didn't actually read any messages (no event was set).

Sunday, October 10, 2010

Ion channels of excitable membranes

I've just finished reading a book about ion channels in excitable membranes. Some types of excitable membranes are neurons, hence the interest. The kinetics of these types of cells are really interesting to try to understand. The more complicated or integral parts of sensory processing (and what this means) are performed in such cells.

The different types of channels and subtypes thereof have over time evolved by responding to particular messages or refined their function to respond better to what is happening in the environment. As this blog is not academic, I can voice some gut feelings / hypothesis here without bothering too much about scientific grounding and such. Most of the time, people philosophize over the function of the brain using analogies with existing machineries, comparing it to a computer and so on, but most of the time it becomes really difficult to trace this down towards actual cell function.

I think computer scientists made a bit of a mistake to make sweeping generalizations over the function of a neuron some 60 years ago. The attempt was made to map the function of neurons to something that could run on computer hardware. It started with a model where neurons, because they emitted a potential ( a pulse?! ), could be compared to logic building blocks which either emit a 0 or 1. Just like electronics. The "1" in electronics is basically the 5V of action potential that you'd normally put on a component to communicate a signal.

At some point, the "binary" neuron made way for the sigmoid neuron (so the 'processing kernel' was simply replaced) and this led to networks that were slightly more capable of performing some kind of function. At the moment, a lot of research is undertaken on Spiking Neural Nets (SNN), but there's still not a ground-breaking success in those networks.

I think the reason is that classical nets didn't pose any constraints on the timing of events, because they were mostly used for situations in which each state was information complete and where it was possible to derive the entire output from the combination of inputs (or let's make the output some approximation, as long as it worked, it was fine). Spiking nets do require specific timing constraints and there's not enough knowledge yet to support how such neurons should function over the domain of the signal and time to respond appropriately.

The refinement of using a sigmoid kernel (or tanh kernel) over a binary one is an important one already. The binary kernel was so abrupt in the change, that it was only useful in very specific situations. The sigmoid kernels are more useful, because you can generate proper outputs over a larger dynamic range. But the use of a replaced kernel hasn't been fully exploited yet.

The ion channels of membranes come in roughly three types:
- excitatory (generating the potential)
- regulatory / rectifying (attempting to establish equilibrium)
- modulatory, or acting over a longer time period, having an effect on the excitability of the cell

Classical nets are generally built with some assumed kernel in place and all kernels having the same configuration or parameters. The modulatory effects are not modeled in any way and the calibrati0n of the network is highly focused on the calculation of the weights, assumed to be the efficiency in transfer of neurotransmitters in the synapse.

The book I am reading demonstrates that the fast excitatory and regulatory actions themselves can be modulated somehow (this is like saying that the sigmoid function changes its shape), but that over time or multiple "calculations", this shape can also be modified on the basis of the output of this shape in previous steps (so the output of the kernel is changed as a function of the output of the same kernel in previous steps through some kind of accumulation function).

The classical nets seem very useful to have some alternative, mathematical model using neural networks for very complicated problems, but their abilities may be enhanceable by looking at biologic models in more detail. Neurons do not always function the same way, but you need to look at their behavior over time to see the longer-term effects of repetitive firing, the way how other inputs may influence their behavior (neurotransmitters) and either increase excitability or decrease it (or even suspend it).

However, I am convinced that networks that need to perform real-time tasks, or otherwise tasks where time is explicitly a concern can only be built with networks where this type of modulation and excitation is explicitly built in. The classical nets were never built from this perspective, but they remain interesting for classification or historical data analysis. The controllers (robotics?) of tomorrow and so on should attempt to understand more the dynamics and kinetics of the ion channels in neurons and the ability of neurons to time things.

Tuesday, September 14, 2010

The "von Neumann" machine & parallel processing

It's been a while since I last posted a message. I've been thinking a lot about machine architectures and the most prevalent one, the "von Neumann" one (see right). This architecture basically serves the Turing machine very well. The Turing machine is a bit more of a philosophical model, whereas the von Neumann architecture is a specification for the implementation of one. The CPU here is a serial processor unit which, in any way you put it, never executes more than one instruction at any given time, or per clock cycle. The CPU then has a couple of pins available that are connected to outside devices or buses. Sometimes the buses don't really contain anything, but you could put some PCI, PCIe, ISA or other kinds of electronics in them, or you could attach some device to a serial, parallel, USB or FireWire port and then the more advanced Operating Systems would allow the automatic registration of these devices and subsequent allocations of address spaces and such (through a driver and hardware-specific communication protocol like USB has).

The architecture above is very well suited for mathematical calculations, because you could write programs that are expected to execute code in a serial way and this architecture would per definition execute those commands one after the other, store the intermediate results in memory, pick them up again at a later stage, post-process them with other results, produce output to screen, write it to disk, send it over the internet and whatever other "logic" you dare to endeavour in.

The above describes a regular PC and everything is deterministically determined. Its very architecture makes it very hard, if not impossible, to write programs on it that behave in ways that you would not be able to explain, assuming you have the tools available to verify the signals and so on (so it's a theoretical statement). This deterministic processing is what makes the design viable. It is by its nature serial, because the CPU must retrieve one instruction to execute, fetch something from memory, store it into memory and do this in the correct order for the program to function correctly.

One architecture that is too complex for us to design is one where the CPU is no longer processing one instruction per clock cycle, one after the other in serial fashion, but a CPU where you can have a very high number of processing units and memory is not in one place, but distributed around this network as well.

Most applications that we write nowadays are data-driven. Now, the interesting thing here is that I think more and more applications are written data-driven style instead of purely as code. Anywhere else but the enterprise world that is :). Data-driven programming means that you move logic, output or processing from code (long switch statements or branches) into processing tables, datafiles or whatever. Almost, if not all, machine learning programs are always by definition data-driven programs. The algorithm itself is usually quite compact as a program, but frequently needs mountains of data to function correctly.

The main complication in the entire thing is that purely serially written code avoids the complexity of having to synchronize results later on, most importantly at the right time. This latter thing is also what plagues certain parallel architectures and programs that already do exist and also complicates designs for GPU's, FPGA's and so on. You could also call it the binding problem in psychology. It's easy to think of channels of processing where a little bit of the entire thing happens in that single stage, but connecting the dots later to make it a larger whole that means more than each channel individually is a very large challenge. Maybe not if you have the time to put it together thoroughly at leisure, but certainly at realtime.

Friday, August 20, 2010

Flying with Ardupilot Mega

I'm experimenting with the ArduPilot Mega source code and settings and have finally succeeded to get it working properly using a FlightGear simulator and the board setup through a serial USB link. I've made a flight, slightly rough around the edges, around San Francisco airport in the simulator with some tweaked waypoints. You can find that file here.

The biggest problem in getting the link working has been the communication between the PC and the arduino board and FlightGear. FG may reduce or increase framerates and start buffering data when the interface program is not consuming them fast enough. The current interface is almost ready for real use, but can I think still increase the speed in which it passes data around.

In order to get stuff working, you should check the source code out of code.google.com and make sure that you're pointing to the correct library location. Download the GPS_IMU.zip which is used for the communication with the USB 'ground station' (the flightgear link). Then reduce the NAV_P gain in the APM_PID_Gains.h file to something like .5 and reduce the other P gains too (you'll notice if the gain is set too high when it becomes unstable in flight and increases the roll or pitch constantly).

As I said, the flow of data was the biggest problem, so communication with flightgear is now attempted to be optimized. FlightGear does not care too much if there is nothing to read, but writing more than it can handle will buffer data on the IP buffers. So you should aim to write 1-2 Hz less than the link setting. Also pay attention to the frame rate that FG is able to put out, because that is the minimum update frequency you can use.

Reading from a port should be done as fast as possible and ideally outside of the main "50 Hz" loop. Since Arduino attempts to follow a 50Hz loop, you could attempt to follow the same in the interface application, but there's a possibility that you overload the link. The older XPlane app issued the attitude and additionally the GPS at 5 Hz, but I modified that to output XPlane attitude @ roughly 42.5 Hz and GPS @ 5 Hz. That leaves a bit of space to get rid of any overload that might occur on the arduino side.

Reading from the serial port should also occur as fast as possible. This way, you can keep everything in the same loop once you use a couple of global variables for whatever data is read from the links. Once something can be written, then just set up the packets and kick it out the door.

There are probably some configs to change in the Arduino source code before thinsg will work for you:
  • I have airspeed sensor set to 0.
  • The example waypoint file has been modified to allow the cessna to make the turns (better:).
  • GCS_PROTOCOL is set to 3.
  • GPS_PROTOCOL is set to 3.
  • GPS_IMU is included.
  • The #ifdef GPS_PROTOCOL != 3 before the GPS.read() is commented ( around line 610 in main source file, inside update_GPS function).
I first ran in STABILIZE mode and verified that I got responsiveness on the sticks. If your interface is buffering too much, reading too slow or writing too fast then you may get responses after 5 seconds. That will never converge to a model that can fly properly.

Then after STABILIZE mode you can quickly verify FLY_BY_WIRE_A and B to see if that works out. Then I loaded the waypoints and turned on AUTO mode. For that to work, just start flightgear and pause it. Zoom out a bit with Shift-X until you see the throttle stick. Hit "H" to see the HUD to get more info. Then make sure the switch is set to manual with throttle at zero. Start up the interface. Wait for the servo's to wiggle and the interface to print "Ready to Fly". Now you can switch to AUTO mode, but you have to control the throttle manually. I basically set throttle to full.

Because I don't have a rudder attached (nor is this set in the ardupilot.xml file), the plane veers off to the left due to the propeller torque until it lifts off the ground. The ardupilot mega will then slowly turn back to the heading it had when it was going 3 m/s (12 knots). Initially I used 10 degrees pitch, but noticed that it tried to drill itself into the ground. Making that 20 degrees again fixed it. Also make sure that your NAV P gains are correctly set, as these will determine how eager the autopilot is to get to the setpoint. Since the stock cessna of FlightGear is a bit slow in relation to an easystar, these P gains probably have to be reduced. It will then be less strict in following the waypoints, but allows for easier flight.

Have fun!

Monday, August 16, 2010

Making HD videos in France

I've returned from a small holiday in France and took the UAV to fly over there. I used the GoPro HD camera in the cockpit and shot a couple of videos @ 60 fps, 1280x720p. The results are on YouTube with a slightly reduced bitrate (more compression):



And another one near the castle of Castlenaud:



And I returned to the camping to make a high-altitude video from up above:



Nothing got seriously broken and it's great to see the videos to improve my flying skills. The video was also transmitted live to my netbook on the ground, but I didn't use that for navigating around (yet). I first need to check into an automatically adjusting tripod setup that will track the plane for a patch antenna. That will ensure, usually, a very good video link.

Thursday, August 05, 2010

Hauppauge USB Live 2 & Ubuntu

I'm running Ubuntu and purchased a USB Live 2 stick from Hauppauge to visualize some direct composite video coming from a 5.8Ghz receiver. This video is transmitted from an RC plane to see what's going on up there on a netbook. I'm not flying it FPV for lack of goggles and because I want to keep track of where the plane is. Basically, the intention is to put the family behind the netbook to see what the plane sees and I'll stare at the dot in the distance.

The USB stick arrived yesterday in the mail and after plugging it in, nothing happened. USB devices get recognized by their ID's and as soon as you register a driver for it, the kernel will visit all eligible drivers to see if one wants to step up to communicate with the device. Apparently the support for USB live 2 is not available in the stock distribution of Lucid yet. After searching around I suddenly hit a post made on the 3rd of August (sic!) that support for USB Live 2 was just completed on a development branch in v4l2. So the day before I got the stick development just finished with the first version of the driver.

Blog: http://www.kernellabs.com/blog/?p=1445&cpage=1#comment-1632

I managed to install this on both my netbook and general PC. These are a 32-bit Lucid Lynx distribution on a Samsung N210 and a 64-bit installation on a more powerful machine with ATI HD5850 and i7 920. You should make sure that you're not holding back any linux packages, since I got screen corrruptions with kernel 2.6.32-23. The latest version now is 2.6.32-24 and there it worked (or it was something else, who knows?). After the clean install, you need to grab gcc, do an update/upgrade of all packages and make sure the latest kernel headers are present:
> apt-get install gcc
> apt-get install linux-headers-generic ( <-- ensure it's the latest & greatest @ 2.6.32-24 )
Then I used the entire v4l2 source tree available here (at 05-Aug-2010 that is, things may change rapidly here)
> hg clone http://kernellabs.com/hg/~dheitmueller/polaris4
> cd polaris4
/polaris4> sudo -s
/polaris4> make

( then edit v4l/.config:
1. find CONFIG_DVB_FIREDTV=m
2. change this into:
3. CONFIG_DVB_FIREDTV=n
4. (notice the n instead of m at the end, that's all).
5. This essentially deactivates the build of a problematic module, so if you have a problem there
you can deactivate it.

/polaris4> make
( yes again ).

If all went well:
/polaris4> make install

And then reboot. The reboot will ensure you're not using any old drivers. The rollback procedure for this is to reinstall the linux-image-...version... you were using. That will replace all your modules with the stock Ubuntu Lucid distribution.

Any time you get a new linux-image installed, you have to do another make, make install in this custom directory to overwrite the module files again. Remember that!

As you can tell, after doing this some video-related things may not work properly. For my netbook, I'm only going to use it in the field and some simple stuff. So far things haven't broken, so that should be relatively safe. It looks pretty stable so far.

I've noticed that the image when there's no signal may be plain blue. That is usually the case when the signal hasn't been tuned at all, but in this card you must ensure there's a video signal going into it.

The USB Live2 is a great little gadget with quite good image quality and zero lag on the Linux drivers. In Windows it's possible you get lag up to 3 seconds even, but that's because your driver must be configured to use "GAME" mode (if you're flying or gaming with this that is, otherwise it doesn't matter too much).

In order to visualize things, I now use tvtime (apt-get install tvtime). There's a handy configuration XML file in /etc/tvtime.xml where you can make all the default settings. The coolness about this utility is that you can change a large number of settings without restarting and adjust the image quality, brightness or contrast itself. Great for outdoors flying and getting a bit more out of the display.

The USB Live 2 uses the cx231xx kernel driver.

Sunday, August 01, 2010

Mesh networks

An interesting technology to be playing with are wireless self-healing mesh networks. Mesh networks are characterized by a rather chaotic characterization and where the data sent by one node is not necessarily sent directly to the receiver, but may make a number of hops before it gets there. This allows you to introduce a number of routes between the source of the data being generated and the coordinating node where the data needs to go.

A lot of propositions offer mesh networking for near real-time video display and those types of applications, instantly pushing the limits of the capabilities of bandwidth and network stability. I think the low-speed networks are slightly more interesting myself due to the low-cost ability to set this up and the speed at which such networks can be realized. This makes it very interesting for situations in which you need to cover a certain area for monitoring of some type. As long as the modules being placed have the right kinds of sensor and logic unit, you can start collecting data from this environment in no time.

Sometimes you want to collect more data than the sensors can give you, for example due to bandwidth constraints on this technology or the sensors that the installation is offering. In that case the cooperation of other mobile sensors that can be directed towards positions can help. Think UAV's being directed to that location, but still have means to be 'part' of the network as they move around a larger area of say 10 km to assist with the sensors they offer. The whole idea should be to plot interesting data about this larger area in no time and to do this as autonomously as possible.

Saturday, May 22, 2010

Joystick pong (how to write custom usb drivers for the nanoboard)

I've just worked out how to write custom drivers for the nanoboard to extend the capabilities of the platform through different USB devices. It isn't really that difficult for similar devices (you could just copy things across), but for more complicated devices you need to understand how the USB protocol works. The nanoboard USB host driver already takes care of a lot of the details for you. The specific device driver that you are writing only has to indicate if it wants to be a device driver for a particular detected device. The device driver call basically does a test using the data that is passed in. This data comes straight from the interface and device descriptors that were received from the bus. This is sometimes where the problems start. My joystick on Windows isn't recognized as a joystick per se. It's a Human Interface Device without a boot capability. It's working properly in games, but that's about it. The HID protocol isn't passed by it (which should be 04, for joysticks). This causes issues for people writing device drivers, because you can't know what type of device something is.

Also, not all joysticks pass the same ranges of data. The USB HID descriptor does indicate the ranges for USB, but this complicates things a bit for the device driver writer.

Anyway, the sample joystick driver is here and a new version of the pong game is here.
( the joystick driver should be put in the "Library/Software Platform/services/usbhost" directory of Altium designer).

In order to write drivers for other devices, it's helpful to boot into Linux and inspect the descriptors that are sent across the wire. A useful page comes from a guy who has rewired his old joystick to make it USB compatible. Other useful pages are the HID tools page and a page explaining how to use usbmon software to analyze the USB in more detail. Wireshark can also capture things from the USB. (usbmon on my lucid ubuntu seems to have been compiled into the kernel already. Just see if the /sys/kernel/debug/usb tree is there).

You can then use the kernel debug interface to inspect what's going on in the system. Under linux, cd'ing to:
/sys/kernel/debug/usb
will give you a devices file, which is useful to find out how a device is recognized. My joystick is incorrect, because it doesn't provide subclasses or protocols, as can be seen here:

T: Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=1.5 MxCh= 0
D: Ver= 1.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=046d ProdID=c215 Rev= 2.04
S: Manufacturer=Logitech
S: Product=Logitech Extreme 3D
C:* #Ifs= 1 Cfg#= 1 Atr=80 MxPwr= 30mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=00 Prot=00 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 7 Ivl=10ms

Compare that to the mouse (but notice how the same driver is used):
T:  Bus=07 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=1.5 MxCh= 0
D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 8 #Cfgs= 1
P: Vendor=046d ProdID=c018 Rev=43.01
S: Manufacturer=Logitech
S: Product=USB Optical Mouse
C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 1 Cls=03(HID ) Sub=01 Prot=02 Driver=usbhid
E: Ad=81(I) Atr=03(Int.) MxPS= 5 Ivl=10ms
Another cool thing to do is to look under:
/sys/kernel/debug/hid
This will show a couple of weird ID's, but those are the currently attached devices. In those directories, you can inspect the rdesc file for each of them. The rdesc file describes the format of the data, so indicates to the host device driver how it should interpret the data stream.

Devices may have multiple interfaces. The keyboard and mouse typically have a 'boot' interface, which allows the devices to be configured in the boot sequence. Notice that under linux, the same 'usbhid' driver takes care of most HID devices. This is because of the rdesc file, indicating the way how the driver should interact with the device. I'm not exactly sure at the moment how the driver would communicate this to the applications or HAL layer above it in a generic way.

Notice that HID are not always strictly input devices. Some HID's have outputs (device input) which allow the setting of LED's, force feedback, etc.

So how does one read these HID files to discover the data format? Let's look at a simpler example, the mouse:
Field(0)
Physical(GenericDesktop.Pointer)
Usage(3)
Button.0001
Button.0002
Button.0003
Logical Minimum(0)
Logical Maximum(1)
Report Size(1)
Report Count(3)
Report Offset(0)
Flags( Variable Absolute )
Field(1)provides
Physical(GenericDesktop.Pointer)
Usage(3)
GenericDesktop.X
GenericDesktop.Y
GenericDesktop.Wheel
Logical Minimum(-127)
Logical Maximum(127)
Report Size(8)
Report Count(3)
Report Offset(8)
Flags( Variable Relative )
The logical minimum and maximum determine the value range. This could also be used to rescale the values if you want a consistent output. The "report Size" is the size in bits of one individual field. "Usage(x)" indicates the number of fields that are being described. Report Count(x) does the same. The Report offset(x) describes the offset of these fields within the entire data stream. So, this mouse protocol uses the three bits in the first byte to describe the state of up to three buttons. It uses 3 bytes after that, indicating X and Y motions and the motions of the wheel.

This is basically how I figured out how my joystick worked. You could also specify your own rdesc / hid file using the HID tools mentioned above. Of course, computers don't need to communicate in ASCII, they'd use the compiled HID information as an array of bytes to convey the same information. The manufacturer and product ID are 16-bit fields that need to be looked up in a file on the host somewhere, if one cares about it.

Update: Here's another version using a lower resolution and lower color palette. Because of the USB code and the other drivers, the generated ROM image is larger than 32k and doesn't fit in local processor memory. This requires the use of a small amount of RAM, which constantly reads in new code to execute whenever required from memory. This memory (SRAM) is the external RAM on the nanoboard, which is shared with the VGA driver. Because the VGA driver gets priority, the game slows down whenever the processor tries to update the frame (because the processor cannot read in code to execute from SRAM memory at the same time the frame is being drawn to the monitor).

Thursday, May 20, 2010

Pong!

This is a late-night shot of the game Pong!, running on a soft-core TSK-3000 RISC processor on the nanoboard. You control the paddle with the remote left and right. There aren't that many examples on the Internet on how to use the actual VGA display for the nanoboards. Actually, I found none. There were two examples from the tutorials shipped with Altium designer that demonstrated how the VGA connector should be used. There are a number using the TFT display on the board, but that's too small for a couple of things.

Pong! on the nanoboard goes public domain. It's imperfect and inefficient, but it does show how to render things on the VGA monitor (not the TFT). This implementation uses double-buffering, which in embedded devices is very costly because of memory requirements.

The display is not yet stable, it's flickering due to issues with VSYNC. I haven't figured out yet how to use the interrupts properly and let the framework driver rely on them, or whether the API calls have been used correctly. I'm hoping for some feedback in the forums, after which I could provide some updates here and there and show better practices in doing this.

Another thing I'm looking into is the best practice for writing one's own drivers for USB. There's a software stack available that can already inspect USB signals and packets that tell a host device what a connected device is about. There's very likely going to be a point where someone wants to write their own USB device for interfacing with the nanoboard, but is wondering whether to start from scratch at the signal level, or whether the usb-host sw platform can be used in any way. Without hacking into the standard altium designer library that is.

Well, plenty of ground to cover still with this device. It's a lot of fun and I'm starting to find my way around it already. I'm feeling quite confident about my abilities to prototype using the hardware on and around the board. It's just so easy. The difficulties start to arise when more complicated designs are required, especially those requiring more memory or specific hardware protocols or interfaces (custom designs). As with the above, my next goal will be to find out how to connect a USB joystick (and I'll opensource that as well). And from there... onwards...

Wednesday, May 19, 2010

Tutorials finished!

Well, I got things up and running like a charm in the meantime. After the inital GB downloads (which may get fixed ? see comments previous post! ), I've completed most tutorials of the Nanoboard today. There were only three that I couldn't run due to some interoperability issue with the Xilinx tools, but that happens sometimes. It's been logged, noted and people are working on it!

So far... I can say it's truly amazing and a pleasure to work with this board, especially for those who are newbies in the field of electronics. The tutorials provide a very basic overview of what the board really can do, so keep that in mind. Of course those applications can be done quite easily on Arduino boards or whatever.

But then have a look at the extensive royalty-free library... you'll discover 119 subdirectories loaded with microprocessors, circuits, memory blocks, opamps and plenty of other electronics that can be inserted in the schematics diagrams. People at work sometimes do use some exotic hardware and this makes it likely I can find the same chips and IC's they work with. In combination with the other tools and outputs, you can generate all kinds of testbeds for running their code on this FPGA, but analyzing the signals or some other things to help out.

It wasn't very clear to me that the real use of the board is probably about trying out different processors and designs up-front, before you solder things together. But now that I think about it, this certainly makes sense. At work some people work on UAV's for example. They require very small, light-weight boards that provide lots of functionality and a good amount of processing power. Everything is embedded and hardware supported. You couldn't put a nanoboard in one of those planes due to weight and power cable concerns :). But simulating a particular board or processor, or just having a platform where you can run embedded ideas from is a great nice-to-have. Since it's very close to the PC that could run a simulator like FlightGear, you could just hook up this board to a FlightGear stream of simulation data, where the board must attempt to fly the plane level or other kinds autopilot functions.

So yes, I am certainly pleased with my purchase. If anybody is ever thinking of going into FPGA programming, this is the board you want. It's probably twice the price of a regular board, but you get three times the value back for it. The extensive library of 'soft-design' hardware, the ability to still run VHDL code on the chip, the way how it interfaces with your computer, the Altium Designer software (only for a year though, so check out the conditions) and the overall hardware that is attached to the board (TFT touch, PS/2, S/PDIF, USB, SD-card, MIDI, audio, RS-485, RS-232, ethernet, VGA-out).

Definitely recommended! It's Arduino++.

Monday, May 17, 2010

Altium Nanoboard 3000

I've managed to get my hands on a really cool FPGA board from Altium. It's very promising with a TFT touchscreen display, RS485, RS232, 3x USB 2.0, S/PDIF in-out, SD-card reader, headphones/line-in/line-out, 4-ch DAC, 4-ch ADC, 4x relays, PWM, SVGA output, MIDI and a TSK-3000A RISC 32-bit processor.

Unfortunately, where this technology is really amazing, getting things to work on your computer is not so. Your mind will dazzle from all the license files, registrations, threats about non-compliance with export restrictions and whatever else these companies come up with to make your life difficult. For someone having gotten used to Ubuntu and the software it brings through straight installs, getting a minimal "Hello World" out of this board is proving to be extremely challenging for a newbie. And I'm saying that as an experienced programmer.

The biggest crap factor are the Xilinx tools. They offer a "webpack", but the download is around 2.6GB (not much of a "webpack" if you ask me). The 12.1 version is incompatible with Altium Designer, so that's where problems are starting. So then I downloaded 11.1 as required, but that is not supported on Windows 7. Altium Designer so far has no particular problems with Windows 7. So I upgraded Altium Designer as much as possible, but that didn't improve the chances of compatibility with 12.1.

Basically, I got a dreaded "Xilinx ISE not found" message in red above my FPGA chip, which was actually due to a missing environment variable (no longer set in 12.1?). Altium Designer just picks up the "XILINX" path variable, which should point to the "ISE" subdirectory of the Xilinx ISE installation. If XILINX is not there, it will show the above message. Populating XILINX as the env variable by itself is not enough, as even a very simple example won't compile due to differences in compilation options.

If you get: "No Process Flow", then you should simple "configure" the project by right-clicking on the nanoboard icon in the devices view and generate the constraint files.

The information about what to do next is not really available on the Internet. Most suppliers lock down their forums entirely, so you need a login to find out if others have had the same problem. Since most companies probably follow the instructions on what OS to use and so on, this doesn't seem to have occurred much. Luckily, I found one solution that seems to be going to work. You need a lot of patience and flick your common sense in "off mode" for this to work:
  1. Download the ISE Webpack 11.1 anyway (2.6 GB or so).
  2. Download the ISE Webpack 11.3 as well (another 2.7 GB..... *sigh*). Downloading only 11.3 will not work, because the installer asks for the 11.1 install directory (hooray!). Even though the size of the installation file clearly shows otherwise.
  3. Then install 11.1, but not from the main folder. Go into "bin/nt" and start "xsetup.exe" from there.
  4. Wait for this to install. Takes a while. Feel free to *not* install "webtalk", because that just sends how you use the tools to xilinx and why'd you want to do that?
  5. Then install 11.3. It will ask for where you installed 11.1 to. 11.3 is luckily compatible with x64 already.
  6. This upgrade takes quite some time (apparently it checks every file if it needs upgrading?).
  7. Restart Altium Designer... and now it should show up. ( See "XILINX" environment variable if it doesn't ).
Another word on what these companies do that I find very strange. Altium Designer and Xilinx Webtalk basically monitor the way how you use the tools and send this over to their servers "for improving your experience". So you've got basically a hidden spy on your computer now that indicates to them what you use, how long you use it, on which days etc... ugh!

I'm certainly not some experienced FPGA designer, I just decided that I wanted to learn about this technology and play around with it. My intention is to grab a board that could do a number of interesting things, rather than starting from the bottom-level basics. Altium Designer allows me to focus on certain high-level topics, such that I can create some truly interesting marvels first before having to delve into the low-level details first. I prefer it that way, because I don't get hit by the low-level nit-picky details in the face immediately. Eventually however, I already noticed that one needs to gain the knowledge about electronics, wiring and so on in order to use these things effectively and knowledgeably.


My install got finished and it's all compiling now. I'm going to check through the tutorials, see if I can write a couple of things myself, play around with things a bit. Given the software that Altium provides, if you are a learner like me, I can certainly recommend what they are offering in these nanoboards...

Wednesday, May 12, 2010

Field Programmable Gate Arrays

I've been looking atFPGA's and ordered my own board to arrive in the mail pretty soon I hope. The difference between a normal CPU and an FPGA is best explained here. Notice how FPGA's are independent and act directly on the signal, whereas the CPU typically executes one instruction per 'clock cycle', but then may need to retrieve values from memory, put this into a register, execute another instruction, push back from register to memory and so on. Once you think about it, it's already pretty amazing that a single CPU can be so darn fast.

FPGA's however do not typically need to do this. It requires a totally different thinking about computing, but benefits from using much simpler logical elements wired together in different ways. This configuration of logic elements is what provides it with an application.

The FPGA's limitations are more the number of elements (and thereby the functionality it can possibly embed), because it's program is embedded in hardware and isn't read from memory and then executed by the same device. This is also how they are very different. FPGA's are also not programmed with a standard programming language (since it's inherently parallel and the program is basically the configuration how logic elements wire together). Another language called VHDL is used for this purpose, which does not require you to think about all things in parallel (one of the reasons why computers are easy to program is because we can think about events serially, as happening one-by-one. Multi-threaded computing is already getting more complicated, because you need to imagine how threads and processes may interact together and the results these may create).

You can build a host of very interesting things on FPGA's. One is an M&M sorter :), but a not too complex board can already simulate a C-64 and run the game Commando.

Of course, the FPGA gets very interesting when you need to process a large, complex signal. Think about video signals for video compression for example or speech recognition. Lots of research is undertaken there at the moment. But simpler implementations are also possible, for example PID controllers for temperature regulators etc... I do think though that FPGA's are slightly overkill and possibly unnecessarily complicated.

An alternative to FPGA's are Arduino boards, which are basically CPU's again and can also be programmed in C. So a lot of fun when you want a new kind of hobby.

Sunday, March 14, 2010

AI Challenge progress





This video shows some different progress with the AI Challenge. The previous video demonstrated how the computer was directly manipulating the joints through a neural network. In this video, there's a proportional controller attached to each joint, which only tries to get the angles of each joint to some particular value.

These values are set by a joystick that I attached to the computer. I've tried to control the arms directly, but that's basically hopeless. Besides just looking at the position, you also need to control the angular velocities at the same time for three arms at a time. This implementation therefore reduces the "cognitive stress" and already it's starting to make it possible to make predictions and expectations and to (sometimes) actually hit the ball. It remains somewhat difficult however.

So far, I haven't used IK solvers or any complicated math theories to do this, just one observation of angles and a direct reaction to whether that angle is desirable or not. The intention is to see how far I could take this and see if that is actually useful. Right now, I am supplying the angles directly, but progress can still be made to possibly focus on controlling something else (the hand?) and see if the simulation can iteratively figure out how to get there or something. The rest of the solution can then be given to understand the trajectory of the ball in a really simple way, find out when it's in range and where that will be and finally project a path (over time) where the joints actually need to start doing something.

Monday, March 08, 2010

Inductive vs. imperative programming

For work I'm currently looking at LISP programming. Lisp is one of a family of functional programming languages like Haskell, OCaml, Erlang and Scheme. Lisp itself is the oldest of these and it is based on the manipulation of lists (..of lists...(...of lists)). If you look at Lisp code, you'll quickly get dizzy from all the parentheses on the screen.

One interesting property of Lisp is that it can be a huge stream of code intermixed with data, or data becoming code (messages), or code producing more data (recursion etc.). There have been some libraries that were written for Lisp, which gave it more capabilities including screen widgets and so forth. Standard Lisp however doesn't have unification and backtracking abilities like Prolog does. So, even though Lisp has been used in AI applications, it isn't directly comparable with Prolog for example.

This article is a still relevant consideration about the use of Prolog and why it hasn't taken off so much as it could have. Most programmers work out particular problems procedurally themselves (sometimes giving rise to very costly and sneaky bugs), many times due to incorrect control code statements and so on.

Logic programming is an area where you inductively reach a conclusion. When you state a fact, the interpreter confirms that this fact may be stated (is true). If you state a relation with a variable (a query), the interpreter tries to find all possible valuations that do not contradict with any other logical facts or statements made about these valuations. The result can be either null / nil / zero / no or a number of valuations that the interpreter knows about.

Logic programming is inductive programming. This means that some facts are declared ahead of time and then the program builds up some execution tree and finds out the rest. Imperative programming on the contrary means that the programmer decides the entire flow of what the program does, when it does it and how. This is the most common way of programming (C, PHP, C#, VB, C++, python, Java, etc.). So there is no 'hidden' code in any imperative programming besides the grouping of CPU instructions through assembly language -> 2nd generation languages -> 3rd generation languages and upwards.

A recent area of interest is called constraint programming, which is quite similar to logic programming, except that you don't really derive facts from other facts or prove them right/wrong. There are libraries available for a host of different programming languages, including the imperative programming ones.

Now, without a lot of very intelligent optimizations, these solvers would basically run loops over loops over loops and take a very long time to compute queries. This may have a large impact on the usefulness of inductive programming in real-world scenarios. Also, considering the article above, the success of some language doesn't really depend on its capabilities, but on how easy it is to get started with it and how well/fast someone can verify that some intention is working out correctly as intended. In that light, imperative languages are easier to follow and verify, because the entire execution diagram is right in front of you (except for the funny race conditions, deadlocks, non-reentrant code, library errors, static variables and so on).

Constraints are both problematic to code for correctly, but incredibly helpful for coming up with solutions within your own lifetime :). Some article I read used the example of a number of ships for which a schedule needs to be developed. Any berth can be taken at any particular moment in time. Scheduling problems are the best example for constraint programming, because they can have a large number of constraints associated with them and often have a very large number of substitutes (multiplying the number of loops in total).

The idea is that through constraints, you can decrease the number of items that are under consideration, but this is not the whole story. Sometimes constrains are imposed on the combination of values from domain R with domain S. In such cases, the rule can consistently be firing over some iteration mechanism that dumbly retries other valuations, continuously hitting the constraint. The general thing here is that research is focusing on smarter algorithms to bring down the number of iterations that are required.

Why is this all so relevant? Because planning and reasoning systems with formal declarations still depend on this brute-force mechanism of trying out combinations of different values. For very complex problems, it is very difficult to write imperative software that always computes the correct solutions and results for all cases, given an expanding set of constraints that may be imposed on them. The idea is that, once you have a generic iteration mechanism and a generic method for declaring rules and properties of objects, you only need to extend a so-called knowledge base and the system does the rest.

I think however that it is wise to start thinking about blends of programming languages. Either by expanding the compiler to embed the inductive code into the imperative code or by creating embeddable interpreters that can be loaded with particular objects, exposed inside the interpreter runtime and then fired off with a query to find any objects that apply. But certainly, sometimes by restating the problem or by starting off with an intelligent choice made by a human, one can sometimes just resolve the problem directly with procedural code. For schedules however, I don't see that happening so quickly.

Check out the following libraries and projects:

Python: http://labix.org/python-constraint

Minion: http://minion.sourceforge.net/cases.html

C++: http://research.microsoft.com/apps/pubs/default.aspx?id=64335

Java: http://www.emn.fr/z-info/choco-solver/index.html

Saturday, February 27, 2010

AI Challenge



Simon Funk put an AI challenge online, where you need to control two mechanical hands and flap a ball from right to left. The simulation itself is all in python and getting started is really easy.

My approach is extended from some ideas I am looking at with a colleague. The video above are some initial results of my implementation. The best performance is seemingly at the start I reckon, although you can definitely see that the left hand is replaying its strategy once it is getting into the same situation (the ball slowly rolling towards it).

The research is highly focused on how intention, (intrinsic) reward and perceptions from the external environment all work together to hopefully do something useful in the end. It's a great challenge to work on, there are great dynamics in this physical simulation.

I'm using an ATI GPU for the computation, which I hope to extend further in the future to get better results. Notice in the video how the arms don't really move very determined in all cases. For a large amount of the time, they are fighting the gravity and seem to have trouble generating enough energy to overcome the counterforces playing on their 'muscles'. They don't really need to much about the state of the entire world in order to execute some "good" actions (once again, 'good' is in the eyes of the beholder and depends on how much fun something is, social interaction, the benefits it "feels" it gets, etc).

Anyway, these are the results after one day so far. Now let's see if it can be made better.

Monday, February 15, 2010

FlightGear hacking

On the right is an image of a free flight simulator called FlightGear. It's used a lot for research purposes, but certainly also just to have fun and it is licensed under the GPL. You can fly anywhere in the world with the planes provided and the entire framework is reasonably extendable. There is actually a choice for the Flight Dynamics Model that you wish to use. Each FDM has different strengths and weaknesses. One focuses on airflow around the body parts and weight and another one more on the steering and so forth. The objective is realism and trying to simulate how things behave prior to real life development and tests. Thus, that cuts down on the costs of development significantly really. Its objective is not to compete with consumer-level flight simulator, nice flashy graphical details and so on, but the focus is on correctness of the flight model and other items around it.

For example, the airports are correct, the runway placing, approach lights, the sky and environment model are accurate qua stars, sun and moon given the time of day, the handling of the instruments are correct (the magnetic compass lags behind a bit just as in a real airplane), some planes may break when put under too much stress (although the pilot is likely to break before that!). The scenery uses the SRTM data that I also used to do the hillshading in mapnik (see previous post).

There are a lot of customization capabilities for FlightGear too, making it very suitable for research. The way it works is pretty simple. You specify an option parameter that states a direction of input or output (if input, then the simulator accepts an external file or socket for instructions to control ailerons, rudder, etc. ). More details can be found under /usr/share/games/FlightGear/Docs/README.IO if interested. The option looks roughly like this:
--generic=socket,out,10,localhost,16661,udp,myprotocol
So, this chooses a socket transport, outgoing data, 10 times per second, localhost, port 16661, udp protocol using the protocol as specified in the myprotocol.xml file. Easy!

The receiving application can then start processing a stream of data coming from the flight simulator. It can also send back a number of commands to control the aircraft, the environment, sounds, AI models or whatever. Basically, it's quite a nice way to try out a couple of concepts before having to dig too deep into the bowels of the flight simulator package itself. The same methods are actually used to store a flight path of a plane and then replay the flight later (at the specified frequency intervals). The same methods are used to connect flightgear instances together and do some multiplayer flights or track the positions of flightgear users on a large map.

Sunday, February 14, 2010

Hill shading OpenStreet maps


I've been fooling around with OpenStreetmaps again, this time just trying out a couple of experiments that others did and hopefully in the process develop some interesting ideas to use the data. I've blogged about Openstreetmaps before. A couple of things have changed as Karmic came out. Some build steps require you to download more software and some packages may have been removed since then. I've had bigger problems getting mapnik to work, but that was related to the projection of the import of OSM into Postgis. When starting 'osm2pgsql', don't use the "-l" flag, because that indicates conversion to latlong. The standard mapnik projection however is mercator. Most of the times, when nothing shows up in the rendered image, it's a projection problem somewhere. Just verify it all makes sense going backwards from the mapnik process and you should find it quick enough.

Anyway, a very interesting experiment I tried relates to hillshading. This is an effect where elevation data from different sources are merged (nowadays people use 'mashed') with map data in order to shade particular areas of the map to show where the hills are located. Elevation data for most of the world can be downloaded for free from the Shuttle Radar Topography Mission. You should get version 2.1. For the US, there's elevation data available with 30m horizontal accuracy, the rest of the world only gets 90m (or 3 arcseconds). Some tools exist to help you out in managing that data: srtm2osm (I didn't use this) or a script called srtm_generate_hdr.sh (which I did use). Downloaded the data first, then produced the HDR files.

Then run a simple script:

cd /home/gt/srtm/
for X in *.hgt.zip
do
yes | /home/gt/srtm/srtm_generate_hdr.sh $X
done

This basically generates a number of TIF files for each particular region. These TIF files are height maps. When you look into them, you may be able to recognize certain parts of your country, but don't over-interpret the images you see, because the actual height doesn't really show up there.

In the Ubuntu repositories, there is a package called "python-gdal", which is a set of utilities in python to drive gdal utilities. This can easily be used to stitch the generated TIF files together to generate one huge height map for the entire region. I've used this to get a height map for the Benelux for example, resulting in a 55.0MB file.

Problem is that the projection is incorrect, so you need to assign the latlong projection first, then warp the image, reprojecting it to mercator (what mapnik uses) and in the process attempt to take out any invalid data points (set to 32767).

#> gdal_translate -of GTiff -co "TILED=YES" -a_srs "+proj=latlong" combined.tif combined_adapted.tif
#> gdalwarp -of GTiff -co "TILED=YES" -srcnodata 32767 -t_srs "+proj=merc +ellps=sphere +R=6378137 +a=6378137 +units=m" -rcs -order 3 -tr 30 30 -multi combined_adapted.tif warped.tif

Cool! So we now have a warped.tif in mercator projection that is basically an elevation map of the terrain. This is however not yet an image which uses the elevation data to 'shade' particular areas of the map. Think of putting the elevation data under a spotlight and then moving the spotlight to some particular origin (somewhere NW), located high above the map. This would create shady areas on the parts of the map where there are hills facing away from the lightsource. The other sides would be normally visible.

Perry Geo luckily did this work for me. You can download his utility here. I had to modify the Makefile to set the GDAL include path (-I/usr/include/gdal) and modify the included gdal library (gdal1.5.0). These tools create a 400-500MB hillshade file (18555x24128 pixels) if you use it for the entire Benelux region. That's still manageable, but in the process of getting there you may need 4GB of memory and a fast processor. The following image shows a minute detail of the Dutch coastline (scaled down 33%).

So that's cool. This image can be used as a bottom layer for some mapnik generated tiles. If mapnik then paints other layers above it slightly translucent, the shading will slightly shine through the image, creating a sense of height throughout the map.

A standard mapnik image from a region in Belgium, close to Maastricht which has a number of hills looks like this:

The same region in Google Maps, rendering terrain data, looks like this:

So, my rendered image looks different from Google's, but the styling wasn't optimized for terrain data in the mapnik renderer.

There's also a reason why the zoom level wasn't increased to about 18 or so. Remember that we downloaded the SRTM3 set, which provides 90m accuracy in the horizontal plane. The height map therefore interpolates between subplanes of equal elevation. This doesn't necessarily improve quality :). Google Maps doesn't go beyond zoomlevel 15, which is a reasonable one. Trying to render images at increased zoomlevels creates pixelated images that don't look nice at all. The height map can improve slightly with better algorithms, but no matter what you do, it will never be perfect. Google Maps even has a couple of areas with hills that look a bit blotchy, irregular or pixelated. The idea is just to get a sense of height of a mountain range anyway and drawing the contours around this map certainly helps to find out where the mountains are (especially for hiking and cycling maps, which use these techniques regularly).

Original sources with the tutorials are here and here.