
Author: Amund Tveit (amundt@idi.ntnu.no)  December 1997
The most used method of finding logical rules from data, inductive logic programming (ILP), has shown successful, but unfortunately not very scalable with increasing problem size. In this report a model for doing induction of logical rules, using the concepts of the potentially more scalable method of genetic algorithm, is suggested.
Five strategies of reducing the search space in the representation are suggested: pruning by logical entailment, pruning by integrity constraints, pruning by logic factorization, pruning by range restrictedness, and pruning using a heuristic fitness function on the cohesion of literals. The genetic operators suggested are applying these pruning search strategies.
The model has yet to be implemented and tried out in an experimental setting.
Table 6.1 Background Knowledge 17
Table 6.2 Positive and negative training examples 18
Table 7.1 Effect of integrity constraint pruning of gilp(n) 28
Table 9.1  The GILP calculation programs 35
Table 9.2  The GILP system 35
Introduction
Genetic Algorithms (GA) and Inductive Logic Programming (ILP) are both research fields of Artificial Intelligence (AI). Both fields are quite young, and have experienced considerable growth during the last decade. ILP (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming  From,See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming  Techniques and,See NienhuysCheng S.H., Wolf R. de (1997), Foundations of Inductive Logic) is a very specific field, aimed at inductively constructing logic programs, whereas GA (See Heitkoetter J. (1997), The HitchHiker's Guide to Evolutionary Computation,,See Koza J.R. (1992), Genetic Programming  On the Programming of Computers) is a broader field aimed at
solving different types of problems inspired by evolution of species in nature.
More and more information is being stored in databases around the world. As the amount of information stored electronically rises, so does the wish to get something more out of the data than just being able to retrieve what was entered. From this wish, the fields of Knowledge Discovery in Databases (KDD) and the related field Data Mining emerged (See Fayyad U.M., PiatetskyShapiro G., Smyth P., Uthurusamy R. (1996), Advances in).
Central in KDD is the development of efficient computerized methods to analyze databases, and ILP can be used to do that with partial success (See Alfheim E., Tveit A. (1996), Inductive Logic Programming Applied in).
Example of results after data mining (See Palmeri C.(1997), Believe in yourself, believe in the merchandise,):
WalMart knows that customers who buy Barbie dolls (it sells one every 20 seconds)
have a 60% likelihood of buying one of three types of candy bars. What does WalMart
do with information like that? "I don't have a clue," says WalMart's chief of
merchandising, Lee Scott.
Fortunately, this is not a typical example of a data mining and its interpretations. More
useful applications of data mining include: finding patterns in molecular structures, global climate change models, and numerous of others from science and business (See Fayyad U.M., PiatetskyShapiro G., Smyth P., Uthurusamy R. (1996), Advances in).
One problem with ILP systems is the performance, its search strategies are to not known to efficiently utilize modern parallel computer systems. From this wish, and from the success of genetic algorithms and evolutionary methods (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in,See Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,,See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional,See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms: and See Koza J.R. (1996), Future Work and Practical Applications of Genetic), the main
goal for this thesis is to create methods of solving ILP problems in GA setting.
Prerequisites
This report assumes that the reader has a basic knowledge of logic programming,
algorithmitic complexity analysis, and artificial intelligence. For logic and logic
programming a book such as See Nilsson U., Matuszynski J. (1990), Logic, Programming and Prolog, Wiley should cover the required material. For algorithmic complexity analysis a book such as See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms, (and possibly ) is recommended. For the artificial intelligence part, the book by Russel and Norvig (See Russel S., Norvig P.(1995), Artificial Intelligence  A modern approach,) is recommended.
Structure of this document
Chapter 3 describes graphs, hypergraphs, and how horn clauses can be modelled by
hypergraphs with the corresponding matrix representation. Chapter 4 describes mathematical programming, and how it has been applied to solve combinatorical integer problems which regarding to constraints and variables are similar to the hypergraph model of horn clauses. Chapter 5 describes evolutionary computation and genetic algorithms and programming.Chapter 6 describes inductive logic programming and some characteristics of logic
programs. Chapter 7 describes a horn clause vector model based on the hypergraph model for horn clauses, and suggests and discusses ways of pruning the vector model's search space. Chapter 8 suggests genetic algorithms based on the results from chapter 7. Chapter 9
gives a brief overview of the implementations done. Chapter 10 presents the conclusion and suggestions to further research.
Purpose
The motivation for performing the experiments in this document is threefold. First I wanted
to investigate a hypergraph matrix model for representation of horn clauses, and if possible improve it towards being applicable in a genetic algorithm (GA) setting. Second, suggest GA operators, and how to create good initial GA population, both regarding to efficient pruning of the search space. The main purpose for the GA operators is to do a stochastic search in a possible set of horn clause hypotheses and to get the best horn clause for a a set of both positive and negative examples, background knowledge and facts. This is similar to the goal of inductive logic programming . Third, implement the suggested GA methods and test them using standard problems known from inductive logic programming.
Graphs, Hypergraphs and Horn Logic
This chapter describes graphs, hypergraphs and applications of those to horn logic.
Horn logic representation
The representation of a relation like (See Sperschneider V., Antoniou G. (1991), Logic A Foundation for Computer Science,)
P if Q1 and Q2 and .. and Qn
is called a Horn Clause (after Horn who studied them). Logic programming language implementations are usually based on Horn Clause representation of relations for efficiency reasons. Prolog is the most well known logic programming language based on Horn Logic representation, an example of syntax for a clause is shown below.
grandfather(X,Y) : father(X, Z), father(Z, Y).
Graphs
A graph is a set of points and a set of arrows, where each arrow is joining a pair of points. The points are called the vertices of the graph, and the arrows are called the arcs or the edges of the graph.
A graph G consisting of a set of vertices V and a set of edges E can be defined formally as shown below.
Formal definition of a graph G
There are two standard ways to represent a graph, either as an adjacency matrix or as a collection of adjacency lists.
The adjacency matrix for the example graph in See Example of a graphis shown in See Adjancy matrix. Each row represents the start vertex, and each column the stop vertex for an edge, the edge between the vertices m and n exists if position (m,n) in the matrix is 1.
The collection of adjacency lists for the example graph in See Example of a graphis shown in See Adjancy lists. Here each list belongs to a vertex, and the list is ordered based on the vertices the vertex has an edge to.
Space usage for an adjacency matrix representation is
and the space usage for a
collection of adjacency lists is
. When the graph is dense
a matrix representation is favorable, but in a practical problem the graph is usually sparse
and a collection of adjacency lists is the only applicable way of representing the graph.
Examples of problems which can modeled using graphs: topological sorting in planning problems, minimum span tree in electronic circuit layout, shortest path problems.
Hypergraphs
A hypergraph is a generalization of a graph, allowing edges to be curves connecting more than two vertices (See Berge C. (1976), Graphs and Hypergraphs, NorthHolland). A dual representation of a logic clause (See Nytrø, Ø. (1992), Aspects of structure in Horn Clause programs,) like
grandfather(Y,X) :uncle(Z,X), parent(W,X), brothers(Z, W, Q), father(Y, Q).
is a hypergraph as shown in See Example of a hypergraph. The variables are represented as vertices and the predicates as edges.
A matrix hypergraph representation of horn logic clauses
A possible hypergraph representation for the clause below could be to have a matrix with each column representing parameter positions from left to right, this example will have 13 such columns. Each row in this representation is 1 if a variable number rownum is selected for a column and 0 otherwise.
grandfather(X0,X1),brothers(X2,X3,X4) : father(X0, X2), father(X2,X1),uncle(X3,X1),brother(X2,X4).
The matrix using this representation is shown in See Hypergraph matrix representation of clause. This is a very sparse matrix because of the natural constraint that you can not have more than one variable per parameter position in the clause. Another natural constraint is that the sum of 1s for each row is constrained by the rowlength. These constraints are formally (for a zero indexed n*n matrix A with elements aij) shown in See Hypergraph matrix constraints
Hypergraph matrix constraints
To make this a complete hypergraph representation information about which vertices (variables; Xi for aij=1, i is the row and j is the column number,
) belongs to which edges is needed (predicates, i.e the predicate brothers/3 map to parameter 2, 3 and 4).
Mathematical programming
Mathematical programming is `programming' in the sense of `planning' (See Williams H.P. (1993), Model building in mathematical programming,).
The common feature which mathematical programming models have is that they all involve optimization, either by maximizing or minimizing an objective function within certain
constraints. Mathematical programming models can be classified into linear programming models, nonlinear programming models, integer programming models or combinations of those.
Due to the integer nature of the hypergraph matrix representation for horn logic, only linear and integer programming model will be described here. For a thorough description of the other types of mathematical programming see See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles and See Williams H.P. (1993), Model building in mathematical programming,.
Linear programming models
1. decision variables involved in the problem are nonnegative (i.e. pos. or zero)
2. The criterion for selecting the "best" values of the decision variables can be
described by a linear function of these variables, that is, mathematically function
involving only the first powers of the variables with no cross products. The criterion
function is normally referred to as the objective function.
3. The operating rules governing the process (e.g., scarcity of resources) can be
expressed as a set of linear equations or linear equalities. This set is referred to as the
constraint set.
An example of a general LP minimization problem is shown below
Example of a LP minimization problem
Linear Programming Solution Methods
The most used method to solve LP problems is the simplex method (or variants of it)
developed by G.B. Dantzig. The simplex method is an iterative procedure for solving LP problems in standard form. To define the principles of the simplex methods two definitions are needed to describe the state of the LP system.
1. Basic variables (variables with value greater than 0) with coefficients different
from zero only exists in one of the equality constraints
2. Maximum of one such nonzero basic variable coefficient per equality constraint.
1. The objective is of the maximization or minimization type.
2. All constraints are expressed as equations.
3. All variables are restricted to be nonnegative
4. The righthand side of each constraint is nonnegative.
Principles of the Simplex Method
1. Start with an initial basic feasible solution in Canonical form
2. Improve the initial solution if possible by finding another basic feasible solution
with a better objective function value. At this step the simplex method implicitly
eliminates from consideration all those basic feasible solutions whose objective
function values are worse than the presented one.
3. Continue to find a better basic feasible solutions improving the objective function
values. When a particular basic feasible solution cannot be improved any further, it
becomes an optimal solution and the simplex method terminates.
Integer Programming Models
An integer programming model is a specialized LP model.
Requirements to be a Pure Integer Problem (abbreviated PIP)
1. The requirements in See Requirements to be a Linear Programming (abbreviated LP) ([21])Themust be satisfied
2. The possible values of the decision variables must be in the domain of natural
numbers including zero.
Examples of problems which can modeled as a PIP : the set partitioning problem,
the knapsack problem, the transport problem and the assignment problem.
Some classic IP problems and their solutions
The transport problem
The transport problem is minimizing the cost of transportation from a set of suppliers to a set of users given constraints like supply limits and that the demand requirements need to be fulfilled.
This problem can be solved by applying several (heuristic) strategies (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles), i.e. the northwest corner rule, the least cost rule or Vogel's approximation method. These methods have all in common that they represent the cost matrix as a table, where each row corresponds to a supplier and each column to a user (demand). Each cell in the table is the cost of transport from a supplier to a user. The differences between the methods is the strategy of variable selection. These methods does not guarantee an optimal solution, optimality can be checked by applying the steppingstone method. A method that guarantees an optimal solution is the
uv method, this method uses optimality checks (similar to the ones in the simplex method) for each variable it takes into the basis until an optimal solution is found. If an optimal
solution doesn't exist, it will degenerate.
The assignment problem
The assignment problem is minizing the cost of allocating a number of items of one type for a number of items of another type, i.e. allocating machines to jobs (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles).
This problem can be solved either by transforming it to a transport problem, or more
efficiently, by applying the hungarian method. This method is based on that there is no change in the optimal solution of a linear program by adding or subtracting a constant to the objective function, this is done by subtracting a sufficiently large number from rows and columns in the cost matrix until a solution is found by inspection.
The shortest path problem
The shortest path problem is to find the shortest path between nodes in a network (the path length can be seen as the cost).
To find the shortest path between two nodes in a network, Dijkstra's algorithm can be applied. To find the path between several nodes in sparse graph, Johnson's algorithm is
preferred (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,).
Solution methods for general IP problems
The branch and bound algorithm
The branch and bound algorithm is the most commonly used algorithm to solve IP problems in practice (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles, See Williams H.P. (1993), Model building in mathematical programming,).
It is based on first solving the IP problem with the simplex method as if it was a regular LP problem, then select a variable with a noninteger value and create two subproblems added new constraints for the selected variable, if x1 (=7.6) is the selected variable from an optimal LP basis, the first subproblem will get the constraint x1 <= 7, and the second
subproblem will get the constraint x1 >= 8. Then solve the new subproblems. If the
subproblems don't give an integer solution, branch on a new variable.
The first feasible integer solution is the greatest lower bound for the optimal solution, and nodes with values less than the least upper bound are fathomed, that is, any integer solution
from subnodes of these are going to be less than the greatest lower bound. Further
branching on such nodes is thereby unnecessary since a better solution can't be obtained. If a node gets an unfeasible solution, it is also fathomed, and no further subproblems are investigated. Branch and bound terminates when there are no more open nodes in the
problem tree, then the largest integer solution in the tree is the optimal one. (There is though possible to have a branch and bound tree with only noninteger solutions and unfeasible solutions, which means a feasible integer solution couldn't be found)
Selection of which variables to branch on is usually done by either selecting the variable with the largest fractional value, the variable with the relatively largest fractional value (i.e. if x1=1.3 and x3 = 4.3, x1 will be selected), or selection of the most important variable.
In addition to the LP relaxation of an IP problem solved by branch and bound, other
relaxations like lagrangean, surrogate and composite relaxations which gives tighter
constraints than the LP ones, have been used together in the branch and bound algorithm
(solving of the multidimensional knapsack problem, see See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional and See Gavish B., Pirkul H. (1985), Efficient algorithms for solving multiconstraint)
Gomory's method
Gomory's method (See Nygreen B. (1980), Heltallsproblemer (Integer Problems), Lecture book in) is similar to branch and bound since it is adding new constraints and using LP relaxation, the difference is that it is also adding new variables, thereby increasing the dimension of the basic matrix and the number of calculations per new
solution. If there is both an upper and a lower limit for the objective function, Gomory's method is proved to have convergence.
Dynamic Programming
Dynamic programming solves problems by combining the solution of subproblems (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,). It is only applicable when subproblems are not independent, that is, they share subsub
problems. Each time a new subproblem is solved, its solution is stored such that other subproblems sharing the stored subproblem can use the stored value instead of doing a
recalculation, thereby saving work compared to applying the divideandconquer principle on the same problem which would have recalculated everything.
The development of a dynamic programming algorithm is done in four steps (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,)
Characterize the structure of an optimal solution

Recursively define the value of an optimal solution

Compute the value of an optimal solution in a bottomup fashion

Construct an optimal solution from computed information
Dynamic programming have been applied to integer problems like the knapsack problem (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles), the shortestpath problem (Dijkstras algorithm can be formulated as a dynamic
programming algorithm), and in speech recognition in combination with Hidden Markov Models (See Russel S., Norvig P.(1995), Artificial Intelligence  A modern approach,).
Dynamic programming has also been applied in nonlinear integer programming (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles)
Evolutionary Computation
Evolutionary Computation (EC) is usually divided into five different methods (See Heitkoetter J. (1997), The HitchHiker's Guide to Evolutionary Computation,); Genetic Algorithms (GA), Evolutionary Programming (EP), Evolution Strategies (ES), Classifier Systems (CFS), Probalistic Incremental Program Evolution (PIPE) and Genetic
Programming (and there also exist some less well known problem solving methods more or less connected to EC).
All the methods have their strategies motivated by Darwin's principles about the means of natural selection, survival of the fittest, and the theories of evolution. The methods don't have strategies for the simulation of mutual influence, like in that of a "society". This
strategy is though supported by the related field artificial life.
Since this thesis is to mostly related to GA and GP, only these method will be described here. (See See Heitkoetter J. (1997), The HitchHiker's Guide to Evolutionary Computation, for a description of the other methods)
Genetic Algorithms
Genetic Algorithms is a model of machine learning inspired by some of the mechanisms of evolution in nature. It is usually applied to optimization problems, similar to integer
programming (section See Integer Programming Models). The model consists of a set of character strings, called a
population of individuals represented by chromosomes, and a set of operators for those character strings, called genetic operators. The most common chromosome type is a fixed length string of binary values. An overview of the traditional GA algorithm is shown in
See Flowchart for the traditional Genetic Algorithm(See Koza J.R. (1992), Genetic Programming  On the Programming of Computers). Each time all the chromosome individuals in the population have been used together with an operator, the population is going from one generation to the next.
The fitness measure function
The fitness measure function gives a value for a chromosome, usually scaled to a value between 0 to 1 (if possible). If solving a mathematical linear programming problem like See Example of a LP minimization problem (assuming integer variable requirements), the fitness function could be equal to the negated value of the objective function, such that it is transformed into a GA maximization problem. However, there is nothing wrong in having the objective function as the GA
fitness function, but that requires the selection criterias to be changed such that the element with the least fitness value is most likely to survive.
Operators
Application of an operator on a chromosome in the population is done stochastically, with certain probabilities for application of each of the operators.
The Reproduction Operator
Depending on the fitness measure or estimate, the chromosome will survive unchanged from one generation to another, this is controlled by the reproduction operator.
The Crossover Operator
Depending on fitness for three randomly selected chromosomes from the population, the two fittest of them will be combined systematically (crossed) to create a new
chromosome replacing the least fittest chromosome. There also exists other selection
criterias, i.e. one where two fit (relative to the rest of the population) are selected, then
crossovered and the least fit element of the two "parents" are selected for replacement.
This operator is contributing by combining fit elements to possibly get an even fitter one, though depending on how the crossover is done.
Differences between GA and GP
The main differences between GA and Genetic Programming (GP) is that GP extends the representation, such that program structures and not only character strings can be
represented (See Koza J.R. (1992), Genetic Programming  On the Programming of Computers). Another difference is that GP usually doesn't allow mutation, but allows growing of the chromosome in size (that is, program size), i.e. a crossover operation in GP will usually cause the chromosome to grow in size. Since programs can be represented directly in the structure there is usually no preprocessing of inputs or postprocessing of outputs required, the inputs, intermediate results and outputs are usually expressed in the terms of the natural terminology of the problem (See Koza J.R. (1992), Genetic Programming  On the Programming of Computers).
Genetic Programming and LISP
Genetic Programming applications frequently use LISP (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in, See Koza J.R. (1992), Genetic Programming  On the Programming of Computers) as the main chromosome
representation language, one reason for the selection of LISP is that both programs and data have the same form, Sexpressions. This makes it possible to do genetic manipulation of the program (as data), and then execute it to check fitness (as a program).
The other main reason for selection of LISP, is that LISP's Sexpressions are equivalent to the parse tree for the computer program, this parse tree is needed to make syntactically
correct genetic changes in the program. In other languages this parse tree is usually not accessible, and if it were accessible it would have another syntax and representation than the language itself.
Data Structures in Genetic Programming
Traditional GP doesn't support data structures known from software engineering, i.e. queues, stacks, lists and more complex data structures. Since a human programmer is used to data structures like those, there is a gap between programs generated by GP and the ones a human programmer is used to handle. By allowing the mentioned data structures the code generated by GP will also be more compact and being able to express more information in a shorter length. William B. Langdon was able to evolve a stack structure (See Langdon W.B. (1995), Evolving Data Structures with Genetic Programming,, See Langdon W.B. (1996), Using Data Structures within Genetic Programming,) using GP where each individual consisted of five trees representing five stack operations: makenull, top, pop, push and empty. The GP primitives used to develop the operations were: the
constants 0 and 1, max(10), the variable aux, the operations Inc_Aux and Dec_Aux to increase and decrease the variable aux.
Strongly Typed Genetic Programming
To decrease the search space and thereby possibly increase performance in GP, Strongly Type Genetic Programming (STGP) can be applied. This is done by creating inheritance hierarchies of types with a maximum tree depth (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in), and by using these types, a
syntactically correct tree will be created.
Populations and generations in GA and GP
A good rule of thumb for an initial population is that it should have a broad selection of individiual chromosomes, preferebly all uniqe, but there is usually no rule on how large a
population should be, but there has been a tendency towards big populations and few
generations. Gathercole and Ross (See Gathercole C., Ross P. (1997), Small populations over Many Generations can beat) has shown that for some problems that there is an advance in having a relatively small population, but instead having more generations and apply Dynamic Subset Selection (DSS) and Limited Error Fitness (LEF) learning methods in addition to GP. One argument in which they explain the results, is that there might be a larger probability for many small increments in fitness per generation in a GP solution with a small population, compared to the probability of large increments per generation in a GP solution with a large population.
Examples of applications of GA and GP
GA and GP has been shown to give good (though not guaranteed optimal) solutions on NPcomplete problems like the facility layout problems (See GarcesPerez J., Schoenefeld D.A., Wainwright R.L. (1994), Solving Facility), the multidimensional knapsack problem (See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:) and the setpartitioning problem (See Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,,See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional). These are problems previously solved using mathematical programming concepts like LP or IP (see chapter See Mathematical programming). In
automatic discovering of protein classification rules (the DEAD box family of proteins,
see See Koza J.R. (1996), Future Work and Practical Applications of Genetic), rules discovered by GP is better than any known rules found by other methods or by human experts. GA has also been applied to the autoparallelization of algorithms (See Koza J.R. (1996), Future Work and Practical Applications of Genetic), this is a highly relevant problem, since new processors will probably rely on "smarter"
compilers to create instructions to be run in parallel (i.e. HP and Intel's planned Merced CPU, see Byte magazine december 1997). Genetic Algorithms in the form of evolvable hardware applying devices like field programmable gate arrays (FPGA), have been applied to evolve a frequency discriminator circuit and a robot controller, this field is expected an explosive growth (See Koza J.R. (1996), Future Work and Practical Applications of Genetic)
Inductive Logic Programming
Inductive logic programming (ILP) can be seen as the intersection between logic programming and inductive machine learning (See Muggleton S., Raedt L.D.(1994), Inductive Logic Programming: Theory).
Machine learning is an area within AI which explores how computers can learn (See Midelfart, H. (1996), Knowledge Discovery in Databases applied in Inductive). One motive for developing ILP theory was the lack of expressiveness in machine learning.
There have been attempts to solve this problem by adding a 1.order logic framework from logic programming with possibilities to use substantial background knowledge and
recursion to ILP.
ILP can be explained more precisely by first describing the concepts of logic programming and induction separately. The main idea of logic programming (LP) is to use a computer to draw conclusions from declarative descriptions. A logic program is a finite set of logic
formulas describing facts and relations. LP's main principle of inference is deduction, which is the derivation from facts and general rules to new facts. An algorithm for
mechanical deduction called resolution is used.
Induction is described as "the act, process, or result or an instance of reasoning from a part to a whole, from particulars to generals, or from the individual to the universal" in
Webster's dictionary. This can be summarized into that induction is the creation of general rules based on facts and examples.
ILP is divided into two major sub fields, interactive ILP and empirical ILP. Interactive ILP is most often used on relatively small data sets and the user has possibilities to participate during learning of clauses. Empirical ILP is automatic extraction of regularities from big and possibly imperfect data sets without user interaction (See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming  Techniques and).
Some areas where ILP is applied: drugs' structure activity rule discoveries, learning rules for predicting protein secondary structure, automated program generation, learning rules for chess playing and in general as a tool for data mining and knowledge discovery in databases (See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming  Techniques and).
Example of ILP
Here is an example of the facts, examples and rules which inductive logic programming can start to induce from.
Background Knowledge
Background
Knowledge


mother(eve,sue).
mother(ann,tom).
father(ann,pat).
father(tom,sue).
parent(X,Y)
mother(X,Y)
parent(X,Y)
father(X,Y)

female(ann).
male(pat).
female(sue).
male(tom).
female(eve).

Positive and negative training examples
Positive training examples

Negative training examples

+ daughter(sue,eve).
+ daughter(ann,pat).

 daughter(tom,ann).
 daughter(eve,ann).

In See Background Knowledgethe known background information is represented, e.g., it is not possible to be a mother without being a parent. In See Positive and negative training examplesrepresents true and false observations. The
reason why ILP algorithms need negative examples is to specialize a generated rule that is inconsistent with negative examples.
The task here is to learn the relation daughter(X,Y) with the background knowledge
predicates female and parent. Using ILP methods the concluded rule, called a hypothesis is shown below
daughter(X,Y)
female(X), parent(X,Y).
This inference is not deductively sound as we have generalized beyond what was known with certainty.
Properties of a logic program
The set of positive examples is in this setting called E+, and correspondingly for negative examples, E.
A logic program P is complete (with respect to E+) iff, for all examples e
E+, P  e.
A logic program is consistent (with respect to E) iff, for no example e
E+, P  e.
A horn clause C is rangerestricted iff every variable in the head of C is found in
the body of C.
Basic ILP Techniques
Theta  subsumption
The
 subsumption lattice induces a generality ordering on clauses.
Inverse resolution
Inverse resolution (See Fayyad U.M., PiatetskyShapiro G., Smyth P., Uthurusamy R. (1996), Advances in) is used to produce general clauses from specific examples by
inverting the resolution rule of deductive inference. For generalization inverse substitution is used, which mean replacing constants with variables and at the same time ensuring that the original example(s) can be restored by ordinary substitution.
Relative Least General Generalization (rlgg)
Least general generalization (lgg) of two clauses is such that by applying two substitutions on the same variable in the lgg, the two clauses can be regenerated, e.g.
C1 = daughter(mary,ann)
female(mary), parent(ann,mary)
and
C2 = daughter(eve,tom)
female(eve), parent(tom,eve)
then the least general generalization of the two clauses 
lgg(C1,C2) = daughter(X,Y)
female(X), parent(Y,X)
Relative least general generalization is the least general clause more general than both C1 and C2 regarding the background knowledge B.
ILP Applications
Most ILP applications use topdown searching, starting with the most general hypothesis and it clause until it supports none negative examples.
Progol is currently the best performing ILP system, it uses topdown searching like MIS and FOIL, but searches in a lattice which is bounded both above and below and is therefore
better constrained (See Muggleton S. (1995), Inverse entailment and Progol, Oxford University). To search the lattice which is ordered by
subsumption, Progol uses an A*  like algorithm with refinement operators (See Russel S., Norvig P.(1995), Artificial Intelligence  A modern approach,).
Golem is the a well performing ILP system based on bottomup searching by applying
relative least generalization. Other methods for bottomup searching are inverse resolution and inverse implication (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming  From).
The GILP data model and its constraints
The goal for the GILP data model
The GILP data model must be able to represent horn clauses in a clear and
nonambiguous way eligible for genetic algorithm operations and representation.
In this chapter vector and clause are both used to describe a generated hypothesis.
Mathematical Model of GILP constraints
Matrix based GILP formulation
In the hypergraph matrix representation constraints formalized in See Hypergraph matrix constraintsthe first
constraint is unnecessary since the second one makes sure there is maximum one
occurrence of the value 1 per column and there are n rows which leads to a maximum row sum of n.
If starting from the left in the matrix and generating elements in columns a new variable (that is a new row without a value 1, getting a value 1). What this row number is for a new variable doesn't have any importance, so to avoid duplicate semantics for different matrices, the least free row number will be selected when entering a new variable. This can be formulated with adding a new vector of dimension 1*n
consisting of elements
which is the minimum available row index at column i. Since it is the minimum available row number at position i, it is greater than the maximum j for akj=1 and k < i. Since the maximum number of new variables going one column right is 1, constraints between different elements in
are added. The new model is shown below.
GILP Matrix Model Constraints
Vector based GILP formulation
Since there is only one variable per column in the matrix model it could be transformed into a vector model storing the variable number which is active for each column in a vector. This is done to decrease the storage requirements and the total number of constraints. The vector formulation is shown below
GILP Vector Model Constraints
here mi is the maximum variable number position i, and vk is the selected variable number at position k. Each GILP vector can be uniquely encoded to a number by
Investigating of the search space
All investigation of the search space is based the vector constraint model presented in See GILP Vector Model Constraints.
Search space for GILP
The search space for GILP is the number of allowed vectors for a given vector length.
It is from this point called gilp(n) where n is the vector length.
gilp(n)
n!;
The first and the second constraint in See GILP Vector Model Constraints ensures that the growth in number of
variables from position k to k+1 is less than or equal to 1 and m0 = 1
=> gilp(n)
2*3*..*n
n!
Calculating gilp(n)
Recurrence equations for calculation of gilp(n)
This has very high running time if solved in a straight order fashion. Since there are many common recursive steps there are overlaps of calculations of the various
and a program based on the dynamic programming principle was developed (see appendix ASee gilpcalc.cc)
Calculations of subsearchspaces of gilp(n)
A subsearchspace of gilp(n) is in this context the number of vectors with a specific number of vector elements different from zero.
Since comb(n,k) = n!/(k!(nk)!), the left side of the equality symbol is then
simplified to n/(nk) and lim of that when k is fixed and n goes towards infinity is 1.
comb(n,k) is not an exponentially growing function for a fixed k.
Let gilpnz(n,k) be equal to the number of vectors based on a gilp vector length
of n with k elements greater than 0.
gilpnz(n+1,k)>gilpnz(n,k) and gilpnz(n,k) > 0 for all positive and
integer n and k where k < n
Since by increasing the vector length n by 1, you keep the search space for the
previous vector length but also allows more combinations, like adding all the
vectors with nonzero of length k1 from the vectors generated for length n and
having a nonzero element in the new position allowed, thereby increasing the
overall search space. From this we can say that gilpnz(n+1,k)/gilpnz(n,k) > 1
for all allowed n and k.
By inspection of the data calculated for each length, an assumption of that
comb(n+1,k)/comb(n,k) > gilpnz(n+1,k)/gilpnz(n,k) seems to hold for all data and
even a stronger assumption for growing n or k (or both).
If the previous assumption holds, applying the squeeze theorem to gilpnz(n+1,k)/
gilpnz(n,k) shows that the limit for any allowed k when n goes towards infinity is 1,
since gilpnz(n,k) is growing slower than comp(n,k) then by lemma 7.3 gilpnz(n,k) is
not an exponentially growing function.
Constraints on gilp clause variable unification
A variable in a gilp clause is not allowed to unify to the same values as the other
variables in the clause
A consequence of definition See Constraints on gilp clause variable unification is that a generated clause
grandfather(X,Y) : father(X,Z), father(Z,Y).
will instead be represented in a prolog query as
grandfather(X,Y), \+ X=Y : father(X,Z), \+ X=Z, father(Z,Y), \+ Z=Y.
In See gilpnz(n,k) for k of 5, 10, 20, 40 and 100there is an example of the growth of gilpnz(n,k) for k of 5, 10, 20, 40 and 100,
as we can see from the figure, the curves are growing slower than straight lines and since the yaxis scale is logarithmic that supports the previously proved nonexponential growth of gilpnz(n,k) for any fixed k.
Pruning by entailment
GILP completeness fitness  gilpcompfit(H,E+)
gilpcompfit(H,E+) is equal to the number of unique members of E+ matched by
the query Head,Body where Head and Body are the head and body part of the
Horn Clause hypothesis H , divided by the total number of positive examples in E+
gilpcompfit(H,E+) is a number between 0 and 1 (inclusive), if gilpcompfit(H,E+) is equal to 1, H is complete (see definition See Completeness of a logic program ([1])).
GILP consistency fitness  gilpconsfit(H,E)
gilpconsfit(H,E) is equal one minus, the number of unique members of E matched
by the query Head,Body where Head and Body are the head and body part of the
Horn Clause hypothesis H divided by the total number of negative examples in
E
gilpconsfit(H,E) is a number between 0 and 1 (inclusive), if gilpconsfit(H,E) is equal to 1, H is consistent (see definition See Consistency of a logic program ([1])).
Because of the formulation of gilp vectors, having a gilp vector
with at least one vector element equal to zero, if adding an allowed nonzero number to that currently zero position, the new vector will be logically entailed by
. That means that
is more general than the new vector. If
is checked for completeness fitness and found to be unfit there is no need to check the new vector since it is more special and it is upper bound by the
gilpcompfit(
,E) regarding to completeness fitness.
The reason why the completeness fitness function is used is the fact that the search space gilp(n) is "very large" and this is a way to prune clauses in gilpnz(n,k2) based on failed clauses in gilpnz(n,k1) where k1 < k2 <= n and the vector in gilp(n,k2) is entailed by a vector in gilp(n,k1).
If a vector generated get a nonzero completeness fitness the consistency fitness gilpconsfit will be measured. The reason why this isn't done for every generated vector is that the
gilpcompfit() and gilpconsfit() functions are computationally heavy operations and it is
usually not of great interest in a practical problem to measure the comsistency fitness of generated vector which has a zero completeness fitness. (Though, this could be of
theoretical interest in some problems).
This can be summarized in the algorithm defined below
Algorithm for entailment pruning algorithm of a gilp vector
1. Check if
(in gilpnz(n,k2)) is entailed by investigating each tree of stored
failed hypotheses for lengths from 0 to k2  1, if such an element is found give
the generated vector completeness fitness of 0
2. If no entailment vector is found calculate gilpcompfit(
,E+), if it is 0, store
it in the failedtree for gilpnz(n,k2) (tree number k2)
3. If gilpcompfit(
,E+) > 0, calculate gilpconsfit(
,E), if it is equal to 1
(
is complete) remove the positive examples covered by
One problem with this algorithm are the storage requirements to store one tree of failed clauses for each k from 0 to n in gilpnz(n,k), the partial solution to this could be to store trees with a nonzero length of k up to a certain percentage of the vector length n.
I.e. for a template clause length of 40, an upper bound for the number of vectors with a nonzero length of 6, is gilpnz(40,6) = 6.62*10^8, which is only a 3.31*10^(28) part of gilp(40) = 2.35*10^36.
6.62*10^8 is still a significantly high number nodes of vectors in a binary tree regarding to memory requirements, but this number will be significantly reduced by other pruning
operations, like integrity constraints (see section See Pruning using Integrity Constraints), logic factorization (see section See Pruning using logic factorization), requirements of connectivity between predicates in a clause (see section See Pruning using heuristics on literal cohesion), and requirements that a generated clause has to be rangerestricted (see section See Pruning by demanding rangerestrictedness).
Another feature of the entailment algorithm is that the vectors of short nonzero lengths are the most general and when the tree is starting to fill up many new generated
vectors will found to be entailed and thereby removed as a possible hypothesis.
Pruning using Integrity Constraints
Integrity constraints(See Nytrø Ø., Østvold B.M. (1996), Pruning the specialization search space using) are restrictions on a predicate parameter variable binding constrained by previous bindings of that variable. A simple example can be for the predicate father/2, the (natural) integrity constraint for the second parameter is that it can't be bound to the same variable as the first parameter was bound to, i.e. father(X,X) is not allowed (a father can't be the father of himself), but father(X,Y) is valid by integrity constraints. There can also be integrity constraints between predicates in a template clause, in:
father/2, brothers/2, female/1, mother/2, male/1
there are integrity constraints between the first parameter in father/2 and the parameter in female/1 (a father can't be a female), father/2 is the head predicate. The unconstrained search space gilp(8) = 21147, and the integrity constrained search space for this example template = 4496 (the unconstrained search space is reduced with 78.74%).
Do predicate ordering matter?
Since the number of possible variables for a position p in a gilp vector is growing with p,
there seems to be a point in positioning the predicates such that integrity constraints occur early (as much to the left as possible) in the template clause, this could possibly prune a larger part of the search space  gilp(n) than an integrity constraints further to the right in the template clause. The template clause in See father/2, brothers/2, female/1, mother/2, male/1 could be rearranged to
father/2, female/1, male/1, brothers/2, mother/2
to get the following collisionposition pairs: (3,1), (4,3), (5,3), (6,3), (7,1), (7,4), (7,5) and (7,6) instead of the previous collisionposition pairs: (5,1), (5,3), (5,4), (6,1), (6,3), (6,4), (8,5) and (8,6). The new constrained search space is still the same as the old one, 4496.
By splitting up the predicates and allowing parameters for different predicates to be moved around independently the clause could be rearranged to
father/2, female/1, mother_param_1, male/1, brothers/2, mother_param_2
getting the corresponding collisionposition pairs: (3,1), (4,1), (5,3), (5,4), (6,3), (6,4), (7,3) and (7,4). There gives no decrease in the constrained search space, it is still 4496. By
allowing splitting of the head predicates the clause can be rearranged to
father_param_1, female/1, mother_param_1, male/1, brothers/2, mother_param_2, father_param_2
getting the corresponding collisionposition pairs: (2,1), (3,1), (4,2), (4,3), (5,2), (5,3), (6,2)
and (6,3). This last template search space is still 4496. Reconsidering the assumed pruning effect of rearranging predicates or parameter places, it seems plausible that there is no effect in the rearranging of parameter places regarding to search space, this is because moving an integrity constraint to the right will prune smaller subtrees of the search space, but it will also prune an increasing number of subtrees. Rearranging integrity constraints is only a
syntactic change, not a change of the allowed semantics of clauses. There is still a point in having the integrity constraints as far left as possible, in an iterative or recursive program for generation of clauses the running time would be better caused by fewer and larger
cuts of the search space.
Effect of integrity constraint pruning of gilp(n)
n

10

11

12

13

14

Minimum

39746

289920

583907

10346072

71444707

Average

382090

2408232

14730743

82480584

540248088

Maximum

678570

4213597

27644437

190899322

1192059223

Min/Max %

5.86

6.88

2.11

5.42

5.99

#Trials

78

63

48

33

18

In See Effect of integrity constraint pruning of gilp(n)effects of pruning using integrity constraints are shown. The minimum row is the minimum search space obtained for a generated integrity constraint, the maximum row is the maximum search space obtained for a generated integrity constraint with a upper bound of gilp(n). To generate integrity constraints a program accepting 3 parameters: vector length, probability for having zero integrity constraints for a parameter position and a
ratio (between 0 and 1) limiting the maximum number of integrity constraints for a
parameter, a parameter with more than zero integrity constraints are limited to the integer value of the product fraction value times (position number  1), i.e. with a fraction of 0.5 for position 8 there there can be a maximum of trunc(0.5*(81)) = 3 integrity constraints. The integrity constraint generator program can be found in in appendix ASee gilp_generator.perl and the calculation program in appendix ASee gilpcalc_ic.cc.
The reason why there is a decrease in number of trials for increasing n is the exponential growth in time it takes to search with integrity constraints, and dynamic programming can't be applied since there are dependencies both to positions to the left and right for any given
position in a vector (calculationwise).
Pruning using logic factorization
In a gilp template clause, there will typically be repeated use of predicates, and when
generating a clause using that template, there is no gain in information to have repeated
predicates with the same variables setup. I.e. if a part of a clause looks like
.., father(X,Y), .., father(X,Y),..
the last occurence of father(X,Y) is superfluous. Removing such repeated predicates is a part of doing logic factorization, which is removing repeated and superfluous predicates in a horn clause until no more predicates can be removed without altering the semantics of the clause. In logic factorization of a generated clause like
grandfather(X0,X2) : father(X0,X1), father(X1,X2), father(X0,X3),father(X3,X2)
the last two father/2 predicates will be removed, but this will not be done by the
singlepredicate logic factorization suggested in gilp.
The strict constraints for unification of variables in definition See Constraints on gilp clause variable unification will probably also help pruning the search space, since the completeness fitness for the clause above is the upper bound for the completeness fitness of the same clause with unification constraints.
Pruning using heuristics on literal cohesion
High cohesion of literals in a clause means that there is a high number of repeated
occurrences of variables in different parameter positions in a clause. I.e.
grandfather(X,Y) : father(X,Z), father(Z,X).
has higher cohesion than
grandfather(X,Y) : father(X,Z), father(Q,Y).
In the extreme case of low cohesion, there are no reoccurrences of variables in a clause, this is equivalent of having independent predicates in a clause, which means that a fitness query for such a clause is the same as making the fitness query for each individual clause
consisting of one predicate in the order given by the template mesh, which is (usually) of no interest. The other extrema is the highest cohesion, where only one variable exists
throughout the parameter places of the clause, then the expressibility of the clause is almost reduced to that of a propositional logic, since the predicates "have taken the role as" atoms.
To avoid getting close to the possible cohesion extremas mentioned for a generated clause, a heuristic approach is used. Since the reoccurrences of variables in parameter places is
connected to the cohesion, the heuristic function here is estimated based on those.
The gilp heuristic estimate  gilpheuristic
The heuristic function gilpheuristic requires the following elements:
The generated clause vector

A vector
which registers variable occurrence in predicates, initially all the elements are initialized to 1, when a variable k is added to a position mapped to predicate m. if(vc(k) < maxpredicate+1) then begin if(vc(k) = 1) then vc(k) = m else if(v(k) != m) then v(k) = maxpredicate+1 end. From
three questions about a variable k can then be answered, 1. is it use? (v(k) != 1), 2. is it used in exactly one predicate? (1<v(k) < maxpredicate+1), and 3. is it used in more than 1 predicate (v(k) = maxpredicate + 1).
needs a variable for the number of elements equal to maxpredicate+1.

A vector
of variable numbers counting number of occurences per variable, i.e. if variable 4 is created at first, vn(4) = 1, if variable 4 reoccurs, just do increase(vn(4),1).
needs a variable for the number of elements greater than 1.

A vector to store and a vector
of occurences of variable occurences, i.e. is if variable 3 and 5 are the only ones with 5 occurences, then ov(5) is equal to 2 (this method is inspired by "the bucket sort" algorithm, which is O(n) given its requirements, see section 9.2 of See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,). If a variable reoccurs, i.e. variable number 3 increases from 5 to 6 occurrences, then decrease(ov(5),1) and increase(ov(6),1) to keep
consistent with the current
.
needs a variable for the number of elements greater than a user selected value.

Also needed are bookkeeping variables like: the vector length, the head and body lengths, a maximum occurrence counter, a maximum variable number counter, and a vector position counter. To map a parameter place to predicate, a fourth vector is used. The regular maxvariable vector is also used.
The calculation of all the above elements has a time complexity of O(n), where n is the gilp
vector length. Going from one step to another has a time complexity of O(1).
Heuristic fitness function for gilp
The gilp heuristic fitness function =
w1*(number of elements in
with values greater than 1 divided by the total
number of nonzero elements in
) +
w2*(number of elements in
with the value equal to maxpredicate + 1 divided
by the number of positive elements in
, zero inclusive) +
w3*(number of elements in
with a value greater than a user limit
divided by the total number of nonzero elements in
).
To make the fitness value be between 0 and 1, let w1+w2+w3 = 1
If the creation of a gilp vector is done incrementally the heuristic fitness has a time complexity of O(1) from one step to another, which gives a total time complexity O(n) for
calculating the heuristic fitness of the gilp vector.
Pruning by demanding rangerestrictedness
In the previous section the problems with low cohesion in a clause were mentioned, another example of low cohesion could be
grandfather(X,Y) : father(Q,W), father(W,R).
in my opinion the above clause is practically worthless since there is no cohesion between the head and body literals of the clause, i.e. if you combine the clause with the facts:
father(haakon,olav). father(olav,harald).
you can't for sure get any information about the grandfather relation for those 3 atoms
(haakon, olav and harald) , except that there exists a positive example of grandfather with some atoms.
This is the motivation to only allow rangerestricted clauses (see definition See Rangerestricted horn clause ([49])) in gilp. This can be done iteratively such that for each new head variable created, a counter is increased, and each time a head variable is used in the body and at the same time not allready used in more than one predicate, the counter is decreased. This criteria could be checked by the
vector, assuming that there is only one head predicate. If the head variables are not necessarily in order from 0 to maximum headvar, the variable check to see if its a headvar is increased from O(1) to O(lg HV) where HV is the set of head variables. Summarized the check for a rangerestricted clause is O(n) (and can be done iteratively simultaneously with the factorizing checks) if the head variable numbers occur in an continous range and there is only one head predicate. If the head variables are not in a continous range, the check is increased to O(n lgHV). If there are more than one head predicate, the
vector structure would have to be changed to uniquely represent which predicates a variable occur in, a
possible solution could be to have an associative array (implemented using binary trees), giving a time complexity of (at least) O(n (HP lg(TP) + lgHV)), where HP are the
number of head predicates, and TP are total number of predicates in the clause.
The GILP algorithm
The methods for constraining the GILP search space in the methods suggested here, are described in the previous chapter.
Creation of the initial population
Creation of an individual
An individual in the GILP population is represented by a chromosome vector of length n.
It is P predicates in the vector template, with HP predicates in the head part and BP
predicates in the body part.
GILP's algorithm for initial creation of a chromosome vector.
1. Randomly select the first position in the vector to be of value 0 or 1
2. If the current position is the last position in the vector, go to 7
3. Calculate the set PV of possible values for next pos in the vector based on:
3a. The integrity constraint check, if it fails go to 6
3b. The simplified logic factorization check, if it fails go to 6
4. If the set of possible values generated is empty go to 6
5a. Go to the next position vp in the vector
5b. Insert a random value from the generated set PV, and insert in position vp
5c. Check if the vector is rangerestricted, if it is go to 7, otherwise go to 2
6. Return error message
7. Return the generated vector
Creation of a population
A GILP population is represented by an ordered set P of individuals. The size of the
population is k.
GILP's algorithm for the creation of a population
1. If the population size P is equal to k go to 7
2. Create a new individual using the method in definition See GILP's algorithm for initial creation of a chromosome vector.
3a. Do the entailment check, if it fails go to 1
3b. Calculate the completeness fitness for the individual, if "low" or zero go to 5a
3c. Calculate the consistency fitness for the individual, if "low" or zero go to 6
4. Save the individual in the set of good solutions and go to 6
5a. Insert the clause in the entailment storage tree (for the correct nonzero length)
5b. Go to 1
6. Insert the individual into P and go to 1
(NOTE: this is a set insert, so if it already exists in P, P will not grow in size)
7. Return the generated population
The reason why the heuristic function on literal cohesion isn't used in the creation of an
initial population, is that the primary wish for the initial population is to have individuals which are rangerestricted (which is a strict constraint) but also entails many elements in gilp(n) by having a short nonzero length (see section ). Another approach for the
creation of an initial population could be to use an recursive programs like ASee gilpcalc.cc or ASee gilpcalc_ic.cc.
By prioritizing completeness fitness over consistency fitness in the the algorithm in
defintion See GILP's algorithm for the creation of a population, GILP is a topdown GA search algorithm.
Reproduction
GILP's primary method for reproduction
Based on fitness value there is a probability pr for a survival of an individual to
the next generation. This gives no new contribution to next generations population
but most likely it will contribute to sustain or increase the average fitness of the
next generation by "helping" fit individuals to survive.
GILP's primary algorithm for mutation
Based on the fitness value there is a probability pm for mutation of a vector.
Select a random position in the vector and select a random variable from that
positions set of possible variable numbers (which may have to be recalculated),
select another random position in the clause and repeat. If the vector is mutated
from being rangerestricted to not being it, repetition of the mutation method is
applied. When reaching a certain number of repeated mutations on a clause
GILP's supporting method for mutation is applied.
GILP's supporting algorithm for mutation
If the first mutation method failed, the second one replaces the vector using
an algorithm similar to the one in definition See GILP's algorithm for initial creation of a chromosome vector., but it is also adding a check of
of the heuristic fitness value for cohesion in 5c. such that vectors with greater
nonzero lengths can be created, and which have a higher likelihood of having
a sufficient cohesion.
GILP's primary algorithm for crossover
Based on the fitness value there is a probability pc for a crossover.
Select two random individuals i1, i2 in the population, combine them into a third by:
Calculate the heuristic fitness value for cohesion for the ix with
highest completeness fitness.
Case 1: If the heuristic fitness value and the consistency fitness value both are
low, do a crossover equal to the specialization of ix adding new variables at
positions where ix has a value of 0, but the other vector has positive values. To
ensure feasibility according to constraints, the minimum of the
maximum allowed value at that position s (that is ms) and the positive value in
the opposite vector of ix. Insert the new element with instead of element with the
lowest weighted sum of fitness, that is v1*(completeness fitness) +
v2*(consistency fitness) + v3*(heuristic fitness),
To make the fitness value be between 0 and 1, let v1+v2+v3 = 1.
Since the two actual fitness measures are more precise than the heuristic, v1, v2 > v3
Case 2: Loop through the pairs of vectors and randomly select which vector
to take a variable from, if the variable is not allowed to select, randomly select
between the variable in other vector and the median value of the allowed values
calculated for the current position until an allowed variable is found.
Case 1 in the crossover algorithm is a with to specialize the vector with the locally best completeness fitness, but is quite general and covers too many negative examples
(low consistency fitness). The weighted replacement criteria function allows different types of solutions by changing the weight values, i.e. if v1 is 1 the vector which was used as the basis for the specialization is kept.
Case 2 is a crossover algorithm similar to the crossover in See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:, except that the repair
operator is built into the crossover operator in GILP. The reason why the median value of allowed is chosen to make a feasible solution is 1. to make the crossover a less "random" operation (by not selecting a random allowed value), and 2. a wish to decrease the number of new variables (such as selecting the largest allowed value possibly would be), since the addition of a new variable will not contribute to make the vector get a higher cohesion.
Implementation overview
 The GILP calculation programs
Filename


Description

gilpcalc.cc


Calculates the unconstrained search space gilp(n) using dynamic programming.

gilp_generator.perl


Generates an stochastic integrity
constraint input file to gilpcalc_ic, which can be tuned by parameters

gilpcalc_ic.cc


Calculates the integrity constrained search space

 The GILP system
Filename


Description

Imakefile


Imakefile to create a Makefile for the GILP system

gilp_assoc.(hcc)


Associative array class

gilp_common.h


Common constants and includes for GILP

gilp_communicator.(hcc)


Communication class for controlling a CLP(R) or Prolog Program using Unix pipes

gilp_element.(hcc)


Chromosome individual class, consists of an array of gilp_predicate objects and a static array gilp_predicatemesh objects

gilp_main.cc


The main program for GILP

gilp_parameter.(hcc)


A class which takes care of all user interaction,
including reading of configuration files and
command line parameters

gilp_population.(hcc)


Population class, consists of an array of gilp_element objects

gilp_predicate.(hcc)


Class to represent predicates (within an individual)

gilp_predicatemesh.(hcc)


Class to represent constraints like logic
factorization and integrity constraints for a predicate in a mesh

gilp_rand.(hcc)


Random number generator class

gilp_statistic.(hcc)


Statistic class with functions to simulate some
statistical distributions

gilp_string.(hcc)


String class, implemented because of problems with the string class that ships with the C++
Standard Template Library (STL)

The GILP system is an unfinished (GA) system implementing the algorithms suggested in chapter 7 and 8. To be able do fitness queries, Prolog (or CLP(R)) is needed integrated with GILP, either through pipes/sockets or using the PrologC API.
Conclusion
We have in this report investigated ways of applying genetic algorithms (GA) concepts in methods for the discovering of rules represented as horn logic in data mining problems. Such problems have usually been solved using inductive logic programming (ILP).
The representation model for horn clauses is a GA vector. When unconstrained, this vectors search space is extremely large, at least exponentially for the vector lenght. This has moved most of the attention to the investigation of methods to constrain this search space. Five strategies to prune this search space are suggested (chapter 7):
Integrity constraints

Simple logical factorization

Entailment, storing failed vectors with low fitness

Rangerestricted clauses.

Heuristic function for estimation of literal cohesion
The GA algorithms suggested in chapter 8, are created using the five pruning strategies mentioned.
An system to test the suggested algorithms is partly developed, but needs more work before
tests can be done.
Applied research and development
Continue on the GILP system implementation and test it on various ILP problems, and compare the results with the ILP systems. This would make a suitable semester project for one or two students.
Extend the algorithms to support atoms and lists as parameters in a generated clause, this could be done by adding one predicate per atomic value, i.e. norway_pred(norway)., such that every time norway_pred(X) is used, X is unified with the atom norway, without the need of having the constants directly in the generated clause, they are instead a part of the facts.
Implement ILP operators like relative least general generalization or refinement operators, and integrate them with the suggested GA methods to possibly improve performance and quality of the search.
Do statistical experiments measuring the performance of the GILP system as a function of the weights in the heuristic function or as a function of the weights in the crossover
operator.
Combine the GILP system with evolutionary hardware, i.e. a field programmable gate array (FPGA), to create a regulator for an industriell process which automatically adapts
to changes in input signals.
Look at the visualization possibilities for the generated hypotheses, either as a hypergraph or as a vector or matrix. I.e is there a way to "sense" the fitness for a population, by looking at the visualization of all the vectors in a population simultaneously?
Theoretical research
Do a thorough complexity analysis of the suggested GILP system, including kolmogorov complexity () and a possible classification of the time complexity of GILP ().
Investigate heuristic methods applied to which have been applied to NP complete problems like The Travelling Salesman problem with some success to see if they are applicable to prune the GILP search space.
Develop parallel models of GILP, this is motivated by results with parallel genetic
algorithms, where the (combinatorical) quadratic assignment problem with more than one billion binary variables has been solved ()!
Evalution of the project
The project progression was rather slow in the beginning, mainly caused by work on other projects and wrong prioritations of time, but also because of a slight confusion about what the approach to the project should be. Finally, after doing some initial implementation of the GILP system, I ended up with a more theoretical approach, investigating ways of pruning of the search space. Unfortunately I didn't finish the GILP system so that practical tests of its applicability could be done, but hopefully there have been some useful algorithm
suggestions.
Bibliography
Bergadano F., Gunetti D. (1996), Inductive Logic Programming  From
Machine Learning to Software Engineering, MIT Press
Berge C. (1976), Graphs and Hypergraphs, NorthHolland
Burton D.M. (1989), Elementary Number Theory, WCB Press
Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,
MIT Press
Fayyad U.M., PiatetskyShapiro G., Smyth P., Uthurusamy R. (1996), Advances in
Knowledge discovery and Data Mining, MIT Press
Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in
Strongly Typed Genetic Programming, Chapter 18 of Advances in Genetic
Programming, MIT Press
Heitkoetter J. (1997), The HitchHiker's Guide to Evolutionary Computation,
FAQ for usenet news group comp.ai.genetic
Jájá J. (1992), An Introduction to Parallel Algorithms, Addison Wesley
Jones N.D. (1997), Computability and Complexity  From a Programming
Perspective, MIT Press
Judson T.W. (1994), Abstract Algebra  Theory and Applications,
PWS Publishing Company
Koza J.R. (1992), Genetic Programming  On the Programming of Computers
by Means of Natural Selection, MIT Press
Lavrac N., Dzeroski S. (1993), Inductive Logic Programming  Techniques and
Applications, Prentice Hall.
Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,
Ph.D thesis at Illinois Institute of Technology
Li M, Vitányi P. (1997), An Introduction to Kolmogorov Complexity and
Its Applications, SpringerVerlag
Midelfart, H. (1996), Knowledge Discovery in Databases applied in Inductive
Logic Programming, Diploma (M.Sc) thesis at the Norwegian University of
Science and Technology  NTNU
Nakhaeizadeh G., Taylor C. C. (1997), Machine Learning and Statistics  The
Interface, John Wiley & sons
NienhuysCheng S.H., Wolf R. de (1997), Foundations of Inductive Logic
Programming, SpringerVerlag
Nilsson U., Matuszynski J. (1990), Logic, Programming and Prolog, Wiley
Nygreen B. (1980), Heltallsproblemer (Integer Problems), Lecture book in
Operations Research (norw. text) at the University of Trondheim, Norway
Nytrø, Ø. (1992), Aspects of structure in Horn Clause programs,
Ph.D thesis at the University of Trondheim, Norway
Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles
and Practise, John Wiley & sons
Russel S., Norvig P.(1995), Artificial Intelligence  A modern approach,
Prentice Hall
Sperschneider V., Antoniou G. (1991), Logic A Foundation for Computer Science,
Addison Wesley
Sterling, L. (1990), The Practise of Prolog, MIT Press
Taylor H.M., Karlin S. (1993), An Introduction to stochastic modeling,
Academic Press
Truss, J.K. (1991), Discrete Mathematics for computer scientists, Addison Wesley
Van De Velde, E.F. (1994), Concurrent Scientific Computing, SpringerVerlag
Walpole R.E., Myers R.H. (1993), Probability and Statistics for Engineers and
Scientists, Prentice Hall
Williams H.P. (1993), Model building in mathematical programming,
John Wiley & sons
References
Alfheim E., Tveit A. (1996), Inductive Logic Programming Applied in
Knowledge Discovery In Databases, semester project at NTNU, Norway
Banzhaf W. (1994), GenotypePhenotypeMapping and Neutral Variation 
A case study in Genetic Programming, Department of Computer Science,
Dortmund University, Germany
Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional
Knapsack Problem, The Management School, Imperial College, England
Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:
the Set Partitioning Problem, The Management School, Imperial College, England
Corcoran A.L., Wainwright R.L. (1993), The Performance of a Genetic Algorithm
on a Chaotic Objective Function, University of Tulsa, Oklahoma, USA
Crawford K.D., Wainwrigth R.L., Vasicek D.J. (1995), Detecting Multiple `
Outliers in Regression Data Using Genetic Algorithms, Symposium of Applied
Computing, Nashville TN, USA
GarcesPerez J., Schoenefeld D.A., Wainwright R.L. (1994), Solving Facility
Layout Problems Using Genetic Algorithms, University of Tulsa, Oklahoma, USA
Finn P., Muggleton S., Page D., Srinivasan A. (1996), Discovery of
Pharmacophores, Computational Chemistry, Pfizer Central Research, England
Gathercole C., Ross P. (1997), Small populations over Many Generations can beat
Large Populations over Few Generations in Genetic Programming,
Department of Artificial Intelligence, University of Edinburgh, Scotland
Gavish B., Pirkul H. (1985), Efficient algorithms for solving multiconstraint
zeroone knapsack problems to optimality, Mathematical Programming, issue 31
Genetic Programming 1998 Conference, Abstracts and Speaker Biographies
for 22 tutorials, to be held at the University of Wisconsin  Madison, Wisconsin,
USA
Koza J.R. (1996), Future Work and Practical Applications of Genetic
Programming, Stanford University, USA
Langdon W.B. (1995), Evolving Data Structures with Genetic Programming,
Department of Computer Science, University College, London, England
Langdon W.B. (1995), Pareto, Population Partitioning, Price and Genetic
Programming, Department of Computer Science, University College, London,
England
Langdon W.B. (1996), Using Data Structures within Genetic Programming,
Department of Computer Science, University College, London, England
Muggleton S., Raedt L.D.(1994), Inductive Logic Programming: Theory
and methods, Journal Of Logic Programming 1994:19,20:629679
Muggleton S., Page Jr. C.D. (1994), A Learnability Model for Universal
Representations, Oxford University Computing Laboratory, England
Muggleton S. (1995), Inverse entailment and Progol, Oxford University
Computing Laboratory, England
Muggleton S. (1996), Machine intelligibility and the duality principle,
Oxford University Computing Laboratory, England
Muggleton S. (1996), Stochastic Logic Programs, Oxford University
Computing Laboratory, England
Nytrø Ø., Østvold B.M. (1996), Pruning the specialization search space using
integrity constraints, the Norwegian University of Science and Technology,
Norway
Palmeri C.(1997), Believe in yourself, believe in the merchandise,
Forbes v160, n5,USA
Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program
Evolution, IDSIA, Switzerland
Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program:
Stochastic Search Through Program Space, IDSIA, Switzerland


