The WITAS Project

TALPlanner Wins First Prize at AIPS Planning Competition


The Second AI Planning Systems Competition took place at the recent 5th International Conference on Artificial Intellligence Planning and Scheduling (AIPS-00) in Toronto, Canada. A presentation of the results has been posted on the competition webpage, and is interesting reading in many ways. In particular, it suggests very strongly that we have to reconsider the relationship between logics for actions and change and one hand, and planning systems on the other.

The following is what happened. There were fifteen competitors, two tracks, and five domains. Each domain allowed examples of varying size, for example, the blocks-world domain where the number of blocks could be varied. One track was for STRIPS and ADL planners not using any domain-specific information; the other was for hand-tailored planners allowing the teams to add domain dependent information to guide or constrain the search. There were twelve competitors in the "automatic" track and four in the hand-tailored track. (Fangzhen Lin's System R competed in both tracks).

In the judgement of the Organizing Committee chair, Fahiem Bacchus, "there are many metrics, and it is impossible to say that any one planner was the best". Also, different domains sometimes seemed to favor different planners. However, one planner was selected as the most distinguished competitor in each of the tracks, namely

Four more planners were selected for a wider set of distinguished planners, namely: Three of these six planners are wholly or partly based on HSP-type heuristic search, namely HSP2, FF, and STAN. The latter two also use techniques from Graphplan. MIPS uses BDDs together with heuristic search engines, System R uses a situation calculus-based regression and progression algorithm, and TALPlanner uses forward chaining in the TAL logic for actions and change.

In other words, two out of six distinguished planners in this competition were based directly on contemporary logics for reasoning about actions and change. This disproves the widespread belief that such logics are hopelessly impractical and not suitable for computational use.

A closer look at the figures shows this even more clearly. In the automatic track, System R did particularly well for the Miconic elevator domain where it came in second, after STAN. It was one of those planners in the Logistics domain that did well enough to scale to bigger problems, in a second round of the competition, although it did generate quite long plans in this domain. In the Blocks world it was the only automatic planner that scaled up consistently to relatively large problems (50 blocks in 1000 seconds).

For the hand-tailored track, two out of four contestants were logic-based: TALPlanner and System R. The other two were SHOP from the University of Maryland (also an HTN based planner) and PbR from USC/ISI which is based on plan rewriting.

The difference in performance was dramatic for most of the domains. For the Logistics domain, for example, problems of around 100 boxes were consistently solved in less than a second by TALPlanner, whereas SHOP and System R required 500 times as much, or more.

For the Blocks world, the TALPlanner consistently solved 500-blocks problems in little more than a second; System R did the same problems but in a minute and a half. The other two contestants required at least an order of magnitude more. One comment that was heard at at the conference after these results was that now the Blocks world is no longer an interesting problem even as a benchmark.

The most applied domain was the Miconic elevator domain, a sophisticated elevator controller that plans elevator movements under a number of customer-specific constraints as well as a resource constraint of a maximum of six persons in the elevator at any time. TALPlanner and PropPlan (an automatic planner) were the only contestants. PropPlan required 1000 seconds for problems of twenty items; TALPlanner solved problems of up to sixty items and usually in less than a second; only a few cases above ten seconds and one case above 100 seconds.

Therefore, TALPlanner was orders of magnitude more efficient than any of the competitors in many of the cases. Some qualifications must be made, however. SHOP produced somewhat shorter plans for the Logistics domain, not by a big percentage, but enough to count if the cost of executing a longer plan is high. For the Freecell domain (a solitaire game that comes with Microsoft Windows), all the participating hand-tailored planners behaved erratically (failing some problems or producing very long plans), as did also the automatic planners for this domain.

The general picture is clear, however: A planner based on a full-fledged logic of actions and change has outperformed conventional planners on their home ground. Consequently, the further development of AI planning systems must now be a joint concern for what used to be two distinct communities.

These results may also have an influence within the camp of actions and change. We are used to comparing our major approaches (situation calculus, event calculus, etc) with respect to (a) expressiveness, (b) ability to obtain the 'right' conclusions for various test cases, and (c) other attractive properties. For example, one of the major arguments in the defense of the situation calculus has been that planning comes essentially for free there. This qualitative argument can now be converted into a quantitative one, since we are able to compare the actual performance of a planner based on sitcalc and one based on an explicit-time logic.

There are plenty of things to discuss here; contributions to this discussion are most welcome.

Maintenance
information:
Latest update 12.5.2000 by EMTEK group.
Edit mode aml, position code D.witas.achiev.aipscom.