When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.
Artificial Intelligence, 2004, Vol 160, Issue 1-2, p. 53-78