Since 1993 we have explored a number of aspects related to the efficient implementation of the dynamic programming (DP) algorithm for the UKP on dedicated VLSI architectures, general purpose parallel machines, and also on sequential machines (using algorithmic advances). The parallel software solution is described in Irisa techreport PI-740 and led to our recent work on tiling optimizations (see for example Irisa techreport PI-999). My ongoing work involves implementing our most efficient algorithm on FPGA based coprocessors. The three main results/papers on the UKP are summarized below:
We present a parallel solution to the unbounded knapsack problem on a linear systolic array that achieves optimal speedup on a model of computation that is weaker than the PRAM. Our array is correct by construction, as it is formally derived by transforming a recurrence equation specifying the algorithm.
This recurrence has dynamic dependencies, a property that puts it beyond the scope of previous methods for automatic systolic synthesis. Our derivation thus serves as a case study. We generalize the technique and propose a systematic method for deriving systolic arrays by non-linear transformations of recurrences. We give sufficient conditions that the transformations must satisfy, thus extending systolic synthesis methods.
We address a number of pragmatic considerations: implementing the array on only a fixed number of PEs, simplifying the control to just two counters and a few latches, and loading the coefficients so that successive problems can be pipelined without any loss of throughput.
Using a register level model of VLSI, we formulate a non-linear optimization problem to minimize the expected running time of the array. The analytical solution of this problem allows us to choose the memory size of each PE in an optimal manner.
We systematically derive an improved algorithm (called the sparse algorithm) for the general knapsack problem which has better average case performance than the standard (dense) dynamic programming algorithm. The derivation is based on transformation of the standard recurrences into a stream functional programs, and cannot be achieved by the usual space-time mapping techniques because the dependencies are statically unpredictable. Furthermore such a sparse algorithm for the general knapsack problem has not been proposed in the literature, to the best of our knowledge. We also implement the sparse algorithm on a linear asynchronous VLSI array with constant size memory on each PE (i.e., a wavefront array processor). Using LPGS partitioning, the algorithm can run on an arbitrary size ring and has optimal time speedup.
This paper further extends the above idea of sparsity, namely, to avoid redundant computations. We present EDUK, an efficient dynamic programming algorithm for the UKP. It is based primarily on a new and useful dominance relations between object types called threshold dominance. We demonstrate that threshold dominance is a strict generalization of all the previously known dominance relations. We then show that combining it with a sparse representation of the iteration domain and the periodicity property leads to a drastic reduction of the solution space. Numerous computational experiments with various data instances are presented to validate our ideas and demonstrate their efficiency. We also compare EDUK with the available and widely used exact algorithm MTU2.