Improvement of the Branch and Bound Algorithm for Solving the Knapsack Linear Integer Problem

The paper presents a new reformulation approach to reduce the complexity of a branch and bound algorithm for solving the knapsack linear integer problem. The branch and bound algorithm in general relies on the usual strategy of first relaxing the integer problem into a linear programing (LP) model. If the linear programming optimal solution is integer then, the optimal solution to the integer problem is available. If the linear programming optimal solution is not integer, then a variable with a fractional value is selected to create two sub-problems such that part of the feasible region is discarded without eliminating any of the feasible integer solutions. The process is repeated on all variables with fractional values until an integer solution is found. In this approach variable sum and additional constraints are generated and added to the original problem before solving. In order to do this the objective bound of knapsack problem is quickly determined. The bound is then used to generate a set of variable sum limits and four additional constraints. From the variable sum limits, initial sub-problems are constructed and solved. The optimal solution is then obtained as the best solution from all the sub-problems in terms of the objective value. The proposed procedure results in sub-problems that have reduced complexity and easier to solve than the original problem in terms of numbers of branch and bound iterations or sub-problems.<br><br>The knapsack problem is a special form of the general linear integer problem. There are so many types of knapsack problems. These include the zero-one, multiple, multiple-choice, bounded, unbounded, quadratic, multi-objective, multi-dimensional, collapsing zero-one and set union knapsack problems. The zero-one knapsack problem is one in which the variables assume 0 s and 1 s only. The reason is that an item can be chosen or not chosen. In other words there is no way it is possible to have fractional amounts or items. This is the easiest class of the knapsack problems and is the only one that can be solved in polynomial by interior point algorithms and in pseudo-polynomial time by dynamic programming approaches. The multiple-choice knapsack problem is a generalization of the ordinary knapsack problem, where the set of items is partitioned into classes. The zero-one choice of taking an item is replaced by the selection of exactly one item out of each class of items.


Introduction
In general the linear integer programming problem has very important real life applications. The general linear integer problem comes in the form of capital budgeting, transportation, traveling salesman, facility location, scheduling, knapsack etc. This model even though it is very easy to model mathematically, has proved to be very difficult to solve. See [1][2][3][4][5] for more on linear integer models.
The paper presents a new reformulation approach to reduce the complexity of a branch and bound algorithm for solving the knapsack linear integer problem. The branch and bound algorithm [6,7] in general relies on the usual strategy of first relaxing the integer problem into a linear programing (LP) model. If the linear programming optimal solution is integer then, the optimal solution to the integer problem is available. If the linear programming optimal solution is not integer, then a variable with a fractional value is selected to create two sub-problems such that part of the feasible region is discarded without eliminating any of the feasible integer solutions. The process is repeated on all variables with fractional values until an integer solution is found. In this approach variable sum and additional constraints are generated and added to the original problem before solving. In order to do this the objective bound of knapsack problem is quickly determined. The bound is then used to generate a set of variable sum limits and four additional constraints. From the variable sum limits, initial sub-problems are constructed and solved. The optimal solution is then obtained as the best solution from all the sub-problems in terms of the objective value. The proposed procedure results in sub-problems that have reduced complexity and easier to solve than the original problem in terms of numbers of branch and bound iterations or sub-problems.
Knapsack problem reformulation is not a new idea. The reformulation approaches were once used to solve some knapsack and other problems [8,9].
The knapsack problem is a special form of the general linear integer problem. There are so many types of knapsack problems. These include the zero-one, multiple, multiple-choice, bounded, unbounded, quadratic, multi-objective, multi-dimensional, collapsing zero-one and set union knapsack problems. The zeroone knapsack [10,11] problem is one in which the variables assume 0 s and 1 s only. The reason being that an item can be chosen or not chosen. In other words there is no way it is pos sible to have fractional amounts or items. This is the easiest class of the knapsack problems and is the only one that can be solved in polynomial by interior point algorithms and in pseudo-polynomial time by dynamic programming approaches. The multiple-choice knapsack problem is a generalization of the ordinary knapsack problem, where the set of items is partitioned into classes. The zero-one choice of taking an item is replaced by the selection of exactly one item out of each class of items.
The multiple knapsack problem [12][13][14] is a generalization of the standard knapsack problem formed by combining single knapsacks into a group of knapsacks having different capacities. In this case the objective is to assign each item to at most one of the knapsacks in such a way that all capacity constraints are satisfied and that the total profit of all the items put into knapsacks is made maximum. In the bounded knapsack problem [15] there is a knapsack capacity and a set of items, each having a positive integer value, a positive integer weight, and a positive integer limit or bound on its availability. With the bounded knapsack problem the main objective is to select the number of each item type to add to the knapsack in such a way that the total weight is not violated and that the total value is a maximum.

Literature review and problem statement
In this case of the unbounded knapsack problem, types of items of different values and volumes are given, then it is required to find the most valuable set of items that fit in a knapsack of fixed volume. The main difference with the bounded knapsack problem is that the number of items of each type is unbounded. A quadratic knapsack [16][17][18][19] is a knapsack problem whereby the objective is expressed as a quadratic function subject to a set of linear constraints. The variables in this knapsack problem can either be zero-one or general integers. With the multi-objective knapsack, the objective changes from a single objective into many objectives within the same problem. For example in agriculture, there is an objective to maximize profit and at the same time minimizing transportation costs and maximizing the number of employees. In multi-objective [20,21] knapsack problems there is the dilemma of dealing with environmental, social, political and or economic concerns. In the multidimensional knapsack problem, several dimensions are considered in the formulation of the problem. The multidimensional knapsack [22][23][24][25] problem basically consists of finding a subset of objects that maximizes the total profit while observing some capacity restrictions.
The collapsing zero-one knapsack problem is a type of non-linear knapsack problem in which the knapsack size is a non-increasing function of the number of items included. The set-union knapsack [26] problem is a variation of the zero-one knapsack problem in which each item is a set of elements, each item has a nonnegative value, and each element has a nonnegative weight. The weight of one item is given by the total weight of the elements in the union of the items' sets.
The branch and bound was the first algorithm to be developed in 1960 [6] for these linear integer models. This method was further modified in 1965 to solve the mixed linear integer problem [7]. So many improvements have been done on the branch and bound algorithm in terms of addition of cuts to get the branch and cut algorithm [19,[27][28][29]. Pricing was introduced within the context of branch and bound to get the branch and price algorithm [30,31]. The improved versions, branch and cut and branch and price were also combined to get the branch, cut and price [32][33][34]. In addition to using cuts and pricing within the context of a branch and bound algorithm, preprocessing can reduce the number of sub-problems needed to verify optimality. Even with all these efforts the general linear integer is still very difficult to solve. In fact the general linear integer problem including the knapsack problem is NP hard [10,18,19,26,35,36] and there are not aware of any consistent efficient algorithm for these problems. These difficult problems and include the knapsack problem which is a special case with only one constraint.
The proposed algorithm has the advantage that it is parallelizable and independent processors can be used. The knapsack problem has so many real life applications. These include home energy management, cognitive radio networks, mining operations use, relay selection in secure cooperative wireless communication, electrical power allocation management, production planning, in selection of renovation actions, waste management, formulation and solution method for tour conducting and optimization of content delivery networks. Network of electricity that intelligently integrates the users'.
The knapsack problem has so many real life applications. These include home energy management [37], cognitive radio networks [38], mining operation use [39], relay selection in secure cooperative wireless communication [40], electrical power allocation management [41], production planning [42], in selection of renovation actions [43], waste management [44], formulation and solution method for tour conducting and optimization of content delivery networks [45].
Nowadays electricity network grid which incorporates the user's input or actions is now being used and is known as a smart grid. This smart grid is very important for sustainable, economical and secure supply of electrical power to the people. Knapsack optimization is used in the management and distribution of power.
Knapsack problem formulation is used in channel and power allocation for cognitive radio (CR) networks. In this formulation it is assumed that the total available spectrum is divided into several bands, each consisting of a group of channels. A centralized base station, enabled by spectrum sensing, is assumed to have the knowledge of all vacant channels, which will be assigned to various CRs according to their requests. In this case the objective of resource allocation is to maximize the sum data rate of all CRs.
An extension of the precedence constrained knapsack problem where the knapsack can be filled in multiple pe riods has applications in the mining operations. This problem formulation is known in the mining industry as the open-pit mine production scheduling problem. Both exact and heuristics are used in solving the LP relaxation of this problem.
Knapsack formulation is used in cooperative jamming. Cooperative jamming schemes support secure wireless communication in the presence of more eavesdroppers. Large numbers of cooperative relays provide better secrecy rate while increasing the communication ad synchronization needs associated with cooperative beam forming.
When renovating a building structure there is a need for the construction manager to select the most feasible renovation activities and the order in which they must be done. The main challenge of renovating a building structure is to determine whether to renovate the existing structure or start building a new building. Such a decision requires use of decision tools such as knapsack modeling.
Waste is another challenge that may cause serious environmental damage if not properly managed. Plastic and paper waste is a serious issue in most developing countries. The many problem of waste in developing countries is that there is very low level recycling in these countries. In other words there is a small amount of plastic and paper waste that is recycled and the rest is sent to the landfills or dumbed on the streets. For recycling to be financially profitable there is need to use effective and efficient ways in selecting items to be produced from a lot of items given a limited amount of money and other resources. This is where knapsack modeling is applied to minimize waste management costs.
Land conservation projects require proper managing and planning for the benefits to be seen. For these land projects there is always the dilemma of how to select the most profitable land projects subject to financial constraints. A multiple knapsack formulation is employed in making such decisions and it outperforms other decision making tools such as benefit targeting, cost-effectiveness analysis, and sequential binary integer programming.
The fast growing populations and introduction of healthcare systems have resulted in increased both inpatients and outpatients to public hospitals, particularly those hospitals that provide special and comprehensive health services in large countries such as China and India. The hospitals in these countries have huge numbers of both inpatients and outpatients. The huge numbers of patients result in overcrowding and these overcrowding conditions are a concern for the hospital managers. The obvious question is how to manage these huge numbers of patients effectively given the fact that some patients require less attention than the others, the lengths of some patients are predictable than the others. In other words those who require less clinical care are less likely to stay longer at the hospital. In order to alleviate the challenge of overcrowding, a multi-criteria knapsack model is used for disease selection in the reception or observation ward of the public hospitals. So many studies have been done in the field of optimization methods. Unfortunately, sometimes it is not possible to directly apply these optimization methods to practical problems. As an example, a tourist deciding on a traveling sche dule within a traveling time limit needs to select travel tourist spots from lists so as to be satisfied as far as possible. This is a problem that can't be solved by the available conventional methods. This problem is formulated as a tour conducting knapsack problem. The formulation and solution method of tour conducting knapsack problem are based on those of traveling salesman problem and knapsack problem.
The knapsack problem has many very important applications in so many areas of business and engineering and it is certainly very necessary to develop efficient solution algorithms for it. Most of the algorithms for the knapsack problem are branch and bound based and in this paper the branch and bound algorithm is improved for the knapsack problem.
Even though there is a lot of effort from researchers to develop an efficient and consistent solution such a method does not exists. The knapsack problem is NP hard [10,18,19,26,35,36] and an optimal solution is very difficult to obtain. For example in [10] a parallel algorithm for solving the NP-complete Knapsack Problem was proposed. NP complete is the most difficult subset of the NP hard problems. In [18] it is pointed out that the knapsack problem is an NP-hard optimization problem with so many diverse applications in industrial and management engineering, however, computational complexities associated with this problem still remain in the knapsack problem. In [19] it is also made very clear that the knapsack problem is a well-known NP-hard combinatorial optimisation problem, with many practical applications. Even up to now, approximation methods are still being developed [13,25,26,35,45] for this problem. The reason for using heuristics is that there are no efficient consistent exact methods for the knapsack problem. In [35] it is clarified that because of the high computational complexity of knapsack problem, three heuristic approaches are proposed. The paper [26] is a recent heuristic which shows that exact efficient approaches for this knapsack problem are not available.
The approximated solution for the knapsack problem is easy to obtain and good for quick decisions but the difference between the approximated solution and the exact solution may be in millions of dollars for large projects such the UN humanitarian projects and the US military operations. There is a need for exact methods for the knapsack problem.

The aim and objectives of the study
The aim of the study is to reformulate the knapsack problem given in so that it is easier to solve by branch and bound algorithm. To achieve the set aim the following tasks have been solved: -to determine the objective bound Z B 0 ; -to use the objective bound to generate the variable sum limits    1 2 , ,..., k and additional constraints; -to construct the k initial parallel sub-problems; -to illustrate by an example how to reformulate a knapsack; -to give classes ad examples of difficult knapsack problems.

1. General form of knapsack problem
The knapsack linear integer problem is a special case of the general integer problem. Even though this integer problem has only one constraint, it is believed to be NP complete and very difficult to solve.
Minimize Z c x c x cx n n = + + + where x j is integer.

2. Totally unimodular transportation matrix
The constraints of any linear integer problem can be expressed as (2).
where A is the transportation coefficient matrix.
Theorem 2: If matrix A is totally unimodular, then every vertex solution of (2) is integral Proof of 1&2. Note that every column of A has exactly two 1's, thus any column of A k has either: 1) two 1's; 2) only one 1; 3) exactly No. 1.
If A k contains a column that has No. 1, then clearly Det A k [ ]= 0 and done for (i). Thus now assume that every column of A k contains at least one 1. There are two cases that must be considered here. The first case is where every column of A k contains two 1's. Then one of the 1's must come from the source rows and the other one must come from the destination rows. Hence subtracting the sum of all source rows from the sum of all destination rows in A k will give the zero vector. where the sign depends on the indices of that particular 1. Now the theorem is proved by repeating the argument to matrix A k−1 . Therefore the matrix A k is totally unimodular. More on unimodular matrices can be found in [46]. The variable sum inequalities constructed in this chapter have zeros and ones as the only coefficients. Making the coefficient of every linear integer problem unimodular is a very difficult task. In this paper let's rely on the strategy of introducing new constraints to the knapsack problem with only zeros (0s) and ones (1s) as coefficients. This does not make the knapsack problem unimodular but makes the problem easier to solve than the original form.

3. Branch and Bound Algorithm
The branch and bound algorithm in general relies on the usual strategy of first relaxing the integer problem into a linear programing (LP) model. If the linear programming optimal solution is integer then, the optimal solution to the integer problem is available. If the linear programming optimal solution is not integer, then a variable with a fractional value is selected to create two sub-problems such that part of the feasible region is discarded without eliminating any of the feasible integer solutions. The process is repeated on all variables with fractional values until an integer solution is found. The worst case complexity of the branch and bound algorithm on knapsack linear integer models is NP Complete. The number of sub-problems can easily reach levels that are not manageable.

Variable sum equality
A constraint of the form x x x n 1 2  where  is an integer, is called a variable sum equality. Let's note the coefficients are only ones and this equality is not new and has been used as clique inequality in the general integer programming. Variable sum equalities can be generated for (1).
where SINT stands for the smallest integer. The objective bound Z B 0 can be found as (3) and can be expressed as (4).
The variable sum bounds   1 & k ( ) which are integers can now be determined once the objective bound is known. These two integral bounds satisfy (5).
The two variable sum bounds may be found by solving the following two linear programming models (6), (7).
Maximize  k n x x x = + + + where x j is integer.
where x j is integer. The variable sum equality was used recently in [17] to improve the optimality verification process. If there are parallel processors then these can be solved at the same time, otherwise these can be solved as a combined problem given in (8).
Let's maximize   = + + + y y y n ... , where x y j j , ³ 0 are the unknown variables. In this case the y variables are used for the second problem.

5. Initial branches
Once the variable sum bounds have been determined then the variable sum constraints can now be constructed as given in (9). From   Each variable sum equality is an initial branch for the branch and bound procedure which imply that the knapsack problem has k initial branches to be explored. The branches are shown in Fig. 1. he k initial branches of the proposed in and illustrated in Fig. 1, can be explored independently thus allowing the use of the much needed parallel processors.

6. Two additional constraints
Two additional binding constraints can be constructed and added to the original knapsack problem so that the complexity is reduced further. If the variable giving the objective bound is x j then an additional variable x n+1 can be introduced such that.
i. e.
x x x x n n 1 2 1 0 The variable x j is excluded in the sum of variables (11). Let's note that (11) is obtained by rearranging the variables and that the two constraints (10) and (11) are made up of only (0 s) and (±1 s) as the coefficients. The addition of these two constraints to each branch will significantly reduce the complexity of the problem.

1. Numerical illustration
where x j ³ 0 and integer ∀j. . This is a very small problem and the 1,351 sub-problems used to verify the optimal solution is too much.
There is definitely a need to preprocess the knapsack linear problem before solving it by the branch and bound method. In this paper there are variable sum constraints and additional constraints and add them to the original problem and then solve,

2. Reformulation procedure
Given any knapsack linear integer problem of the form. Let's minimize Z c x c x cx n n = + + + Electronic copy available at: https://ssrn.com/abstract=3708287 The k initial sub-problems can be solved independently and the optimal solution is the best solution (in terms of objective value) from the k sub-problems.

3. Algorithm
In other words the knapsack linear integer problem is solved using the following steps.
Step 1: Determine the objective bound Z B 0 .
Step 2: Use the objective bound to generate the variable sum limits    1 2 , ,..., k and additional constraints.
Step 3: Construct the k initial sub-problems.
Step 4: Solve the k sub-problems to obtain the optimal solution as the best solution from the k sub-problems in terms of the objective value.

4. Using the numerical illustration from 1
Let's minimize: where x j ³ 0 and integer ∀j.
i. e.
x x x x x x x x x x x x x x = + + + + + .
The initial branches and their corresponding number of sub-problems are given in Table 1. Table 1 No. of sub-problems for each branch where x j j = ∀ 0 or 1 and n is even. The behaviour of the standard branch and bound method for n = 4, 6, 8, 16, …, is given in Table 2. The number of sub-problems increases exponentially as n increases. This shows that the branch and bound on its own is not a very good approach.

2. Second class of bizarre knapsack problems
This is a modification of Class 7. 1. Class 7. 1 and the general form is given in (21).
Let's maximize: or Minimize x n . Such that: where x j = 0 or 1 ∀ ≤ ≤ − j n , , 3 1 κ κ is odd and n is even. The bizarre behaviour of the branch and bound method is given in Table 3. Table 3 Complexity of the problem as n increases These are small problems and the branch is not expected to struggle to solve these problems.

3. Third class of knapsack problem pure integer case
Changing of variables from binary to pure integer in any difficult knapsack problem automatically increases the complexity of the problem.
Let's maximize: Electronic copy available at: https://ssrn.com/abstract=3708287 Such that: where x j ³ 0, integer ∀j, 1 1 ≤ ≤ − κ n , κ is odd and n is even. The behaviour of the branch and bound method for n = 4, 6, 8 and 16 for this class of difficult problems is given in Table 4. Table 4 Complexity of the problem as n increases Changing from binary to general integer means expanding the problem. The number of sub-problems increases and this is expected.
The standard branch and bound method can't solve most of these for large values of κ. For example a knapsack problem with the parameters, n = 4, k = 91, λ = 97, explodes to an unmanageable number of sub-problems.
Let's minimize Z x n = . Such that: where x j ³ 0, integer ∀j. The branch and bound method requires 7449 sub-problems to verify the optimal solution. For large values of λ the knapsack problems are very difficult to solve by the standard branch and bound algorithm on its own. These numerical illustrations and more on complexity of knapsack problems and other linear integer models are given in [9].

Discussion of experimental results
The knapsack problem has been reformulated and a numerical illustration is used to show the reformulation process.
The numerical illustration is given in Section 5. The branch and bound on its own took 1351 sub-problems to verify optimality. The same knapsack problem is reformulated into 10 parallel problems which can be solved independently as given in Section 5. 4. The reformulated same knapsack problem is solved by the branch and bound algorithm. The numbers of iterations required to verify optimality ranges from 3 to 11 for the parallel problems as given in Table 1. Reducing complexity from 1,351 to the worst case of 11 is a very significant improvement. The branch and bound algorithm is a general purpose algorithm for solving the general linear integer problem. Unfortunately this approach on its own has serious weaknesses as presented in Section 6 from Tables 2 to 4.
The reformulated knapsack can be identified by the following features. The new knapsack problem is split into several independent parallel problems. The number of constraints increases from 1 to 5 for each parallel. The 4 new constraints for each parallel problem include an objective bound. Splitting the problem into many parallel problems, increasing the number of constraints from only 1 to 4 and increasing the number of variables by 2 for each split problem are the weaknesses of the proposed approach. The most important feature of the new problem is that it is easier to solve by the branch and bound algorithm than the original single constraint knapsack form.
There is a need to compare the proposed approach with other methods. Again this is a limitation and a shortcoming for this study. What seems to be an obvious weakness is that the reformulation splits the single problem into many but easier problems to solve. The challenge of splitting the problem into parallel independent sub-problem can be alleviated by use of parallel computer processors. There is a need to further reduce the numbers of branch and branch bound iterations needed to solve each sub-problem. The main challenge with this is that the complexity of the general integer problem increases with an increase in the number of variables.

Conclusions
1. Determining an objective bound Z B 0 ( ) to the knapsack problem. An objective bound which is the initial upper limit to the objective value of the problem. In this study all the n given variables in the original knapsack problem were used in determining the objective bound.
2. Once the objective bound was determined it became easy to generate the k variable sum limits    1 2 , ,..., k and the 2 additional constraints. To do this let's only calculate  1 and  k and the rest generated as all the integers between  1 and  k . The first additional constraint was easily generated from the objective bound and objective row and the other two constraints were generated from the variable sum limits.
3. The k parallel initial sub-problems where constructed from the k variable limits    1 2 , ,..., k and 3 additional constraints. Even though the original knapsack problem looked simpler than the each of the k parallel sub-problems the truth is that the reformulated parallel problems were easier to solve than the original problem as attested to by the numerical illustration. The available computing power which is in terms parallel processing can be taken advantage of.
4. The numerical illustration was shown in Section 5. Reformulation is the way for a knapsack problem given I this paper.
5. Classes of difficult problems were presented in this study. There is need for more research on knapsack problems as shown from the various applications.