Site Meter Amund Tveit's Publications
 

Home
Publications
Resume
Blog
InternetPercent
Amund@Twitter
Google
Web amundtveit.info

 

Author: Amund Tveit (amundt@idi.ntnu.no) - December 1997

Abstract

The most used method of finding logical rules from data, inductive logic programming (ILP), has shown successful, but unfortunately not very scalable with increasing problem size. In this report a model for doing induction of logical rules, using the concepts of the potentially more scalable method of genetic algorithm, is suggested.

 

Five strategies of reducing the search space in the representation are suggested: pruning by logical entailment, pruning by integrity constraints, pruning by logic factorization, pruning by range restrictedness, and pruning using a heuristic fitness function on the cohesion of literals. The genetic operators suggested are applying these pruning search strategies.

 

The model has yet to be implemented and tried out in an experimental setting.

 

 

 

 

 

Acknowledgements

I would like to thank my supervisor Øystein Nytrø for his initial suggestion of a hypergraph

model for horn clauses, and for being patient enough to give me free hands in my rather

stochastic search towards what is presented in this thesis. I will also thank Geir Sørensen for help with the standard template library for C++, even if I didn't end up using it.

 

Abstract i

Acknowledgements ii

List Of Figures v

List Of Tables vi

Chapter 1 Introduction 1

Chapter 2 Purpose 3

Chapter 3 Graphs, Hypergraphs and Horn Logic 5

3.1 Horn logic representation 5

3.2 Graphs 5

3.3 Hypergraphs 6

3.4 A matrix hypergraph representation of horn logic clauses 7

Chapter 4 Mathematical programming 9

4.1 Linear programming models 9

4.2 Linear Programming Solution Methods 9

4.3 Integer Programming Models 10

4.4 Some classic IP problems and their solutions 10

List Of Tables

Table 6.1 Background Knowledge 17

Table 6.2 Positive and negative training examples 18

Table 7.1 Effect of integrity constraint pruning of gilp(n) 28

Table 9.1 - The GILP calculation programs 35

Table 9.2 - The GILP system 35

Introduction

Genetic Algorithms (GA) and Inductive Logic Programming (ILP) are both research fields of Artificial Intelligence (AI). Both fields are quite young, and have experienced considerable growth during the last decade. ILP (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming - From,See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming - Techniques and,See Nienhuys-Cheng S.-H., Wolf R. de (1997), Foundations of Inductive Logic) is a very specific field, aimed at inductively constructing logic programs, whereas GA (See Heitkoetter J. (1997), The Hitch-Hiker's Guide to Evolutionary Computation,,See Koza J.R. (1992), Genetic Programming - On the Programming of Computers) is a broader field aimed at

solving different types of problems inspired by evolution of species in nature.

 

More and more information is being stored in databases around the world. As the amount of information stored electronically rises, so does the wish to get something more out of the data than just being able to retrieve what was entered. From this wish, the fields of Knowledge Discovery in Databases (KDD) and the related field Data Mining emerged (See Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (1996), Advances in).

Central in KDD is the development of efficient computerized methods to analyze databases, and ILP can be used to do that with partial success (See Alfheim E., Tveit A. (1996), Inductive Logic Programming Applied in).

 

Example of results after data mining (See Palmeri C.(1997), Believe in yourself, believe in the merchandise,):

Wal-Mart knows that customers who buy Barbie dolls (it sells one every 20 seconds)

have a 60% likelihood of buying one of three types of candy bars. What does Wal-Mart

do with information like that? "I don't have a clue," says Wal-Mart's chief of

merchandising, Lee Scott.

 

Fortunately, this is not a typical example of a data mining and its interpretations. More

useful applications of data mining include: finding patterns in molecular structures, global climate change models, and numerous of others from science and business (See Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (1996), Advances in).

 

One problem with ILP systems is the performance, its search strategies are to not known to efficiently utilize modern parallel computer systems. From this wish, and from the success of genetic algorithms and evolutionary methods (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in,See Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,,See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional,See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms: and See Koza J.R. (1996), Future Work and Practical Applications of Genetic), the main

goal for this thesis is to create methods of solving ILP problems in GA setting.

Prerequisites

This report assumes that the reader has a basic knowledge of logic programming,

algorithmitic complexity analysis, and artificial intelligence. For logic and logic

programming a book such as See Nilsson U., Matuszynski J. (1990), Logic, Programming and Prolog, Wiley should cover the required material. For algorithmic complexity analysis a book such as See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms, (and possibly ) is recommended. For the artificial intelligence part, the book by Russel and Norvig (See Russel S., Norvig P.(1995), Artificial Intelligence - A modern approach,) is recommended.

Structure of this document

Chapter 3 describes graphs, hypergraphs, and how horn clauses can be modelled by

hypergraphs with the corresponding matrix representation. Chapter 4 describes mathematical programming, and how it has been applied to solve combinatorical integer problems which regarding to constraints and variables are similar to the hypergraph model of horn clauses. Chapter 5 describes evolutionary computation and genetic algorithms and programming.Chapter 6 describes inductive logic programming and some characteristics of logic

programs. Chapter 7 describes a horn clause vector model based on the hypergraph model for horn clauses, and suggests and discusses ways of pruning the vector model's search space. Chapter 8 suggests genetic algorithms based on the results from chapter 7. Chapter 9

gives a brief overview of the implementations done. Chapter 10 presents the conclusion and suggestions to further research.

Purpose

The motivation for performing the experiments in this document is threefold. First I wanted

to investigate a hypergraph matrix model for representation of horn clauses, and if possible improve it towards being applicable in a genetic algorithm (GA) setting. Second, suggest GA operators, and how to create good initial GA population, both regarding to efficient pruning of the search space. The main purpose for the GA operators is to do a stochastic search in a possible set of horn clause hypotheses and to get the best horn clause for a a set of both positive and negative examples, background knowledge and facts. This is similar to the goal of inductive logic programming . Third, implement the suggested GA methods and test them using standard problems known from inductive logic programming.

 

 

 

 

 

Graphs, Hypergraphs and Horn Logic

This chapter describes graphs, hypergraphs and applications of those to horn logic.

Horn logic representation

The representation of a relation like (See Sperschneider V., Antoniou G. (1991), Logic A Foundation for Computer Science,)

P if Q1 and Q2 and .. and Qn

is called a Horn Clause (after Horn who studied them). Logic programming language implementations are usually based on Horn Clause representation of relations for efficiency reasons. Prolog is the most well known logic programming language based on Horn Logic representation, an example of syntax for a clause is shown below.

grandfather(X,Y) :- father(X, Z), father(Z, Y).

Graphs

A graph is a set of points and a set of arrows, where each arrow is joining a pair of points. The points are called the vertices of the graph, and the arrows are called the arcs or the edges of the graph.

 

Example of a graph

 

A graph G consisting of a set of vertices V and a set of edges E can be defined formally as shown below.

Formal definition of a graph G

 

 

 

 

There are two standard ways to represent a graph, either as an adjacency matrix or as a collection of adjacency lists.

 

Adjancy matrix

 

The adjacency matrix for the example graph in See Example of a graphis shown in See Adjancy matrix. Each row represents the start vertex, and each column the stop vertex for an edge, the edge between the vertices m and n exists if position (m,n) in the matrix is 1.

Adjancy lists

 

The collection of adjacency lists for the example graph in See Example of a graphis shown in See Adjancy lists. Here each list belongs to a vertex, and the list is ordered based on the vertices the vertex has an edge to.

 

Space usage for an adjacency matrix representation is and the space usage for a

collection of adjacency lists is . When the graph is dense a matrix representation is favorable, but in a practical problem the graph is usually sparse and a collection of adjacency lists is the only applicable way of representing the graph.

 

Examples of problems which can modeled using graphs: topological sorting in planning problems, minimum span tree in electronic circuit layout, shortest path problems.

Hypergraphs

A hypergraph is a generalization of a graph, allowing edges to be curves connecting more than two vertices (See Berge C. (1976), Graphs and Hypergraphs, North-Holland). A dual representation of a logic clause (See Nytrø, Ø. (1992), Aspects of structure in Horn Clause programs,) like

grandfather(Y,X) :-uncle(Z,X), parent(W,X), brothers(Z, W, Q), father(Y, Q).

is a hypergraph as shown in See Example of a hypergraph. The variables are represented as vertices and the predicates as edges.

Example of a hypergraph

A matrix hypergraph representation of horn logic clauses

A possible hypergraph representation for the clause below could be to have a matrix with each column representing parameter positions from left to right, this example will have 13 such columns. Each row in this representation is 1 if a variable number rownum is selected for a column and 0 otherwise.

grandfather(X0,X1),brothers(X2,X3,X4) :- father(X0, X2), father(X2,X1),uncle(X3,X1),brother(X2,X4).

The matrix using this representation is shown in See Hypergraph matrix representation of clause. This is a very sparse1 matrix because of the natural constraint that you can not have more than one variable per parameter position in the clause. Another natural constraint is that the sum of 1s for each row is constrained by the rowlength. These constraints are formally (for a zero indexed n*n matrix A with elements aij) shown in See Hypergraph matrix constraints

Hypergraph matrix constraints

To make this a complete hypergraph representation information about which vertices (variables; Xi for aij=1, i is the row and j is the column number, ) belongs to which edges is needed (predicates, i.e the predicate brothers/3 map to parameter 2, 3 and 4).

Hypergraph matrix representation of clause

The hypergraph matrix representation of a clause is the basis representation for the

suggested GILP data model shown in See The GILP data model and its constraints.

 

Mathematical programming

Mathematical programming is `programming' in the sense of `planning' (See Williams H.P. (1993), Model building in mathematical programming,).

The common feature which mathematical programming models have is that they all involve optimization, either by maximizing or minimizing an objective function within certain

constraints. Mathematical programming models can be classified into linear programming models, non-linear programming models, integer programming models or combinations of those.

 

Due to the integer nature of the hypergraph matrix representation for horn logic, only linear and integer programming model will be described here. For a thorough description of the other types of mathematical programming see See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles and See Williams H.P. (1993), Model building in mathematical programming,.

Linear programming models

Requirements to be a Linear Programming (abbreviated LP) (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles)The

1. decision variables involved in the problem are nonnegative (i.e. pos. or zero)

2. The criterion for selecting the "best" values of the decision variables can be

described by a linear function of these variables, that is, mathematically function

involving only the first powers of the variables with no cross products. The criterion

function is normally referred to as the objective function.

3. The operating rules governing the process (e.g., scarcity of resources) can be

expressed as a set of linear equations or linear equalities. This set is referred to as the

constraint set.

 

An example of a general LP minimization problem is shown below

Example of a LP minimization problem

Linear Programming Solution Methods

The most used method to solve LP problems is the simplex method (or variants of it)

developed by G.B. Dantzig. The simplex method is an iterative procedure for solving LP problems in standard form. To define the principles of the simplex methods two definitions are needed to describe the state of the LP system.

Canonical form of LP (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles)

1. Basic variables (variables with value greater than 0) with coefficients different

from zero only exists in one of the equality constraints

2. Maximum of one such non-zero basic variable coefficient per equality constraint.

LP in standard form (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles)

1. The objective is of the maximization or minimization type.

2. All constraints are expressed as equations.

3. All variables are restricted to be non-negative

4. The right-hand side of each constraint is non-negative.

Principles of the Simplex Method

1. Start with an initial basic feasible solution in Canonical form

2. Improve the initial solution if possible by finding another basic feasible solution

with a better objective function value. At this step the simplex method implicitly

eliminates from consideration all those basic feasible solutions whose objective

function values are worse than the presented one.

3. Continue to find a better basic feasible solutions improving the objective function

values. When a particular basic feasible solution cannot be improved any further, it

becomes an optimal solution and the simplex method terminates.

Integer Programming Models

An integer programming model is a specialized LP model.

Requirements to be a Pure Integer Problem (abbreviated PIP)

1. The requirements in See Requirements to be a Linear Programming (abbreviated LP) ([21])Themust be satisfied

2. The possible values of the decision variables must be in the domain of natural

numbers including zero.

 

Examples of problems which can modeled as a PIP : the set partitioning problem,

the knapsack problem, the transport problem and the assignment problem.

Some classic IP problems and their solutions

The transport problem

The transport problem is minimizing the cost of transportation from a set of suppliers to a set of users given constraints like supply limits and that the demand requirements need to be fulfilled.

 

This problem can be solved by applying several (heuristic) strategies (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles), i.e. the northwest corner rule, the least cost rule or Vogel's approximation method. These methods have all in common that they represent the cost matrix as a table, where each row corresponds to a supplier and each column to a user (demand). Each cell in the table is the cost of transport from a supplier to a user. The differences between the methods is the strategy of variable selection. These methods does not guarantee an optimal solution, optimality can be checked by applying the stepping-stone method. A method that guarantees an optimal solution is the

u-v method, this method uses optimality checks (similar to the ones in the simplex method) for each variable it takes into the basis until an optimal solution is found. If an optimal

solution doesn't exist, it will degenerate.

The assignment problem

The assignment problem is minizing the cost of allocating a number of items of one type for a number of items of another type, i.e. allocating machines to jobs (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles).

 

This problem can be solved either by transforming it to a transport problem, or more

efficiently, by applying the hungarian method. This method is based on that there is no change in the optimal solution of a linear program by adding or subtracting a constant to the objective function, this is done by subtracting a sufficiently large number from rows and columns in the cost matrix until a solution is found by inspection.

The max flow problem

The max flow problem is to find maximum flow in a graph from a source to a sink.

 

This problems maximum value has an upper bound of the minimum capacity of a cut, where the sink and the source belongs to different sides of the cut. This problem can be solved by using the Ford-Fulkerson or The Edmonds-Karp algorithm (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms, and See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles).

The shortest path problem

The shortest path problem is to find the shortest path between nodes in a network (the path length can be seen as the cost).

 

To find the shortest path between two nodes in a network, Dijkstra's algorithm can be applied. To find the path between several nodes in sparse graph, Johnson's algorithm is

preferred (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,).

Solution methods for general IP problems

The branch and bound algorithm

The branch and bound algorithm is the most commonly used algorithm to solve IP problems in practice (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles, See Williams H.P. (1993), Model building in mathematical programming,).

 

It is based on first solving the IP problem with the simplex method as if it was a regular LP problem, then select a variable with a non-integer value and create two subproblems added new constraints for the selected variable, if x1 (=7.6) is the selected variable from an optimal LP basis, the first subproblem will get the constraint x1 <= 7, and the second

subproblem will get the constraint x1 >= 8. Then solve the new subproblems. If the

subproblems don't give an integer solution, branch on a new variable.

 

The first feasible integer solution is the greatest lower bound for the optimal solution, and nodes with values less than the least upper bound are fathomed, that is, any integer solution

from subnodes of these are going to be less than the greatest lower bound. Further

branching on such nodes is thereby unnecessary since a better solution can't be obtained. If a node gets an unfeasible solution, it is also fathomed, and no further subproblems are investigated. Branch and bound terminates when there are no more open nodes in the

problem tree, then the largest integer solution in the tree is the optimal one. (There is though possible to have a branch and bound tree with only non-integer solutions and unfeasible solutions, which means a feasible integer solution couldn't be found)

 

Selection of which variables to branch on is usually done by either selecting the variable with the largest fractional value, the variable with the relatively largest fractional value (i.e. if x1=1.3 and x3 = 4.3, x1 will be selected), or selection of the most important variable.

 

In addition to the LP relaxation of an IP problem solved by branch and bound, other

relaxations like lagrangean, surrogate and composite relaxations which gives tighter

constraints than the LP ones, have been used together in the branch and bound algorithm

(solving of the multidimensional knapsack problem, see See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional and See Gavish B., Pirkul H. (1985), Efficient algorithms for solving multiconstraint)

Gomory's method

Gomory's method (See Nygreen B. (1980), Heltallsproblemer (Integer Problems), Lecture book in) is similar to branch and bound since it is adding new constraints and using LP relaxation, the difference is that it is also adding new variables, thereby increasing the dimension of the basic matrix and the number of calculations per new

solution. If there is both an upper and a lower limit for the objective function, Gomory's method is proved to have convergence.

Enumerative Methods

Enumerative methods are only looking at integer solutions (represented as paths in trees),

since there is an exponential number of possible integer solutions (i.e. 2^n for n binary

variables), "smart" selection and pruning operators have to be applied. I.e. in implicit

enumeration (See Nygreen B. (1980), Heltallsproblemer (Integer Problems), Lecture book in, See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles), there are two possibility test operators and two optimality test operators used for pruning of the search tree.

Dynamic Programming

Dynamic programming solves problems by combining the solution of subproblems (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,). It is only applicable when subproblems are not independent, that is, they share subsub-

problems. Each time a new subproblem is solved, its solution is stored such that other subproblems sharing the stored subproblem can use the stored value instead of doing a

recalculation, thereby saving work compared to applying the divide-and-conquer principle on the same problem which would have recalculated everything.

 

The development of a dynamic programming algorithm is done in four steps (See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,)

Characterize the structure of an optimal solution
  1. Recursively define the value of an optimal solution
  2. Compute the value of an optimal solution in a bottom-up fashion
  3. Construct an optimal solution from computed information

 

Dynamic programming have been applied to integer problems like the knapsack problem (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles), the shortest-path problem (Dijkstras algorithm can be formulated as a dynamic

programming algorithm), and in speech recognition in combination with Hidden Markov Models (See Russel S., Norvig P.(1995), Artificial Intelligence - A modern approach,).

 

Dynamic programming has also been applied in non-linear integer programming (See Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles)

 

Evolutionary Computation

Evolutionary Computation (EC) is usually divided into five different methods (See Heitkoetter J. (1997), The Hitch-Hiker's Guide to Evolutionary Computation,); Genetic Algorithms (GA), Evolutionary Programming (EP), Evolution Strategies (ES), Classifier Systems (CFS), Probalistic Incremental Program Evolution (PIPE) and Genetic

Programming (and there also exist some less well known problem solving methods more or less connected to EC).

 

All the methods have their strategies motivated by Darwin's principles about the means of natural selection, survival of the fittest, and the theories of evolution. The methods don't have strategies for the simulation of mutual influence, like in that of a "society". This

strategy is though supported by the related field artificial life.

 

Since this thesis is to mostly related to GA and GP, only these method will be described here. (See See Heitkoetter J. (1997), The Hitch-Hiker's Guide to Evolutionary Computation, for a description of the other methods)

Genetic Algorithms

Genetic Algorithms is a model of machine learning inspired by some of the mechanisms of evolution in nature. It is usually applied to optimization problems, similar to integer

programming (section See Integer Programming Models). The model consists of a set of character strings, called a

population of individuals represented by chromosomes, and a set of operators for those character strings, called genetic operators. The most common chromosome type is a fixed length string of binary values. An overview of the traditional GA algorithm is shown in

See Flowchart for the traditional Genetic Algorithm(See Koza J.R. (1992), Genetic Programming - On the Programming of Computers). Each time all the chromosome individuals in the population have been used together with an operator, the population is going from one generation to the next.

The fitness measure function

The fitness measure function gives a value for a chromosome, usually scaled to a value between 0 to 1 (if possible). If solving a mathematical linear programming problem like See Example of a LP minimization problem (assuming integer variable requirements), the fitness function could be equal to the negated value of the objective function, such that it is transformed into a GA maximization problem. However, there is nothing wrong in having the objective function as the GA

fitness function, but that requires the selection criterias to be changed such that the element with the least fitness value is most likely to survive.

Operators

Application of an operator on a chromosome in the population is done stochastically, with certain probabilities for application of each of the operators.

The Reproduction Operator

Depending on the fitness measure or estimate, the chromosome will survive unchanged from one generation to another, this is controlled by the reproduction operator.

The Crossover Operator

Depending on fitness for three randomly selected chromosomes from the population, the two fittest of them will be combined systematically (crossed) to create a new

chromosome replacing the least fittest chromosome. There also exists other selection

criterias, i.e. one where two fit (relative to the rest of the population) are selected, then

crossovered and the least fit element of the two "parents" are selected for replacement.

This operator is contributing by combining fit elements to possibly get an even fitter one, though depending on how the crossover is done.

The Mutation Operator

The mutation operator does a random change in a chromosome. This is often to "loosen up" the population in cases of premature convergence. The mutation operator is usually the only operator to do such a dramatic change, even if there can exist crossover operators with some

randomness (See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional).

Other Operators

There can exist many problem specific operators, i.e. a repair operator (See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional), which is used to guarantee feasibility even after random mutation in a chromosome.

 

Flowchart for the traditional Genetic Algorithm

Genetic Programming

Differences between GA and GP

The main differences between GA and Genetic Programming (GP) is that GP extends the representation, such that program structures and not only character strings can be

represented (See Koza J.R. (1992), Genetic Programming - On the Programming of Computers). Another difference is that GP usually doesn't allow mutation, but allows growing of the chromosome in size (that is, program size), i.e. a crossover operation in GP will usually cause the chromosome to grow in size. Since programs can be represented directly in the structure there is usually no preprocessing of inputs or postprocessing of outputs required, the inputs, intermediate results and outputs are usually expressed in the terms of the natural terminology of the problem (See Koza J.R. (1992), Genetic Programming - On the Programming of Computers).

 

Probalistic Incremental Program Evolution

Probabilistic Incremental Program Evolution (PIPE) is a novel method similar to GP (See Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program,See Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program:), except that instead of a crossover operator, it uses a "probalistic prototype tree" to combine experiences of different programs to generate better programs. On some

regression problems it has been shown to give better solutions than GP, but it also has more variance in the fitness within the population.

Genetic Programming and LISP

Genetic Programming applications frequently use LISP (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in, See Koza J.R. (1992), Genetic Programming - On the Programming of Computers) as the main chromosome

representation language, one reason for the selection of LISP is that both programs and data have the same form, S-expressions. This makes it possible to do genetic manipulation of the program (as data), and then execute it to check fitness (as a program).

The other main reason for selection of LISP, is that LISP's S-expressions are equivalent to the parse tree for the computer program, this parse tree is needed to make syntactically

correct genetic changes in the program. In other languages this parse tree is usually not accessible, and if it were accessible it would have another syntax and representation than the language itself.

Extensions of traditional Genetic Programming

Several suggestions has been made to extend the GP model, this has mainly been motivated by the means of increasing the domain (See Koza J.R. (1996), Future Work and Practical Applications of Genetic) of programs which GP can

represent and handle, but also because of performance issues.

Data Structures in Genetic Programming

Traditional GP doesn't support data structures known from software engineering, i.e. queues, stacks, lists and more complex data structures. Since a human programmer is used to data structures like those, there is a gap between programs generated by GP and the ones a human programmer is used to handle. By allowing the mentioned data structures the code generated by GP will also be more compact and being able to express more information in a shorter length. William B. Langdon was able to evolve a stack structure (See Langdon W.B. (1995), Evolving Data Structures with Genetic Programming,, See Langdon W.B. (1996), Using Data Structures within Genetic Programming,) using GP where each individual consisted of five trees representing five stack operations: makenull, top, pop, push and empty. The GP primitives used to develop the operations were: the

constants 0 and 1, max(10), the variable aux, the operations Inc_Aux and Dec_Aux to increase and decrease the variable aux.

Strongly Typed Genetic Programming

To decrease the search space and thereby possibly increase performance in GP, Strongly Type Genetic Programming (STGP) can be applied. This is done by creating inheritance hierarchies of types with a maximum tree depth (See Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in), and by using these types, a

syntactically correct tree will be created.

Populations and generations in GA and GP

A good rule of thumb for an initial population is that it should have a broad selection of individiual chromosomes, preferebly all uniqe, but there is usually no rule on how large a

population should be, but there has been a tendency towards big populations and few

generations. Gathercole and Ross (See Gathercole C., Ross P. (1997), Small populations over Many Generations can beat) has shown that for some problems that there is an advance in having a relatively small population, but instead having more generations and apply Dynamic Subset Selection (DSS) and Limited Error Fitness (LEF) learning methods in addition to GP. One argument in which they explain the results, is that there might be a larger probability for many small increments in fitness per generation in a GP solution with a small population, compared to the probability of large increments per generation in a GP solution with a large population.

Examples of applications of GA and GP

GA and GP has been shown to give good (though not guaranteed optimal) solutions on NP-complete problems like the facility layout problems (See Garces-Perez J., Schoenefeld D.A., Wainwright R.L. (1994), Solving Facility), the multidimensional knapsack problem (See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:) and the set-partitioning problem (See Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,,See Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional). These are problems previously solved using mathematical programming concepts like LP or IP (see chapter See Mathematical programming). In

automatic discovering of protein classification rules (the D-E-A-D box family of proteins,

see See Koza J.R. (1996), Future Work and Practical Applications of Genetic), rules discovered by GP is better than any known rules found by other methods or by human experts. GA has also been applied to the autoparallelization of algorithms (See Koza J.R. (1996), Future Work and Practical Applications of Genetic), this is a highly relevant problem, since new processors will probably rely on "smarter"

compilers to create instructions to be run in parallel (i.e. HP and Intel's planned Merced CPU, see Byte magazine december 1997). Genetic Algorithms in the form of evolvable hardware applying devices like field programmable gate arrays (FPGA), have been applied to evolve a frequency discriminator circuit and a robot controller, this field is expected an explosive growth (See Koza J.R. (1996), Future Work and Practical Applications of Genetic)

 

Inductive Logic Programming

Inductive logic programming (ILP) can be seen as the intersection between logic programming and inductive machine learning (See Muggleton S., Raedt L.D.(1994), Inductive Logic Programming: Theory).

 

Machine learning is an area within AI which explores how computers can learn (See Midelfart, H. (1996), Knowledge Discovery in Databases applied in Inductive). One motive for developing ILP theory was the lack of expressiveness in machine learning.

There have been attempts to solve this problem by adding a 1.order logic framework from logic programming with possibilities to use substantial background knowledge and

recursion to ILP.

 

ILP can be explained more precisely by first describing the concepts of logic programming and induction separately. The main idea of logic programming (LP) is to use a computer to draw conclusions from declarative descriptions. A logic program is a finite set of logic

formulas describing facts and relations. LP's main principle of inference is deduction, which is the derivation from facts and general rules to new facts. An algorithm for

mechanical deduction called resolution is used.

 

Induction is described as "the act, process, or result or an instance of reasoning from a part to a whole, from particulars to generals, or from the individual to the universal" in

Webster's dictionary. This can be summarized into that induction is the creation of general rules based on facts and examples.

 

ILP is divided into two major sub fields, interactive ILP and empirical ILP. Interactive ILP is most often used on relatively small data sets and the user has possibilities to participate during learning of clauses. Empirical ILP is automatic extraction of regularities from big and possibly imperfect data sets without user interaction (See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming - Techniques and).

 

Some areas where ILP is applied: drugs' structure activity rule discoveries, learning rules for predicting protein secondary structure, automated program generation, learning rules for chess playing and in general as a tool for data mining and knowledge discovery in databases (See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming - Techniques and).

Example of ILP

Here is an example of the facts, examples and rules which inductive logic programming can start to induce from.

 

Background Knowledge

Background

Knowledge

 

mother(eve,sue).

mother(ann,tom).

father(ann,pat).

father(tom,sue).

parent(X,Y) mother(X,Y)

parent(X,Y) father(X,Y)

female(ann).

male(pat).

female(sue).

male(tom).

female(eve).

 

Positive and negative training examples

Positive training examples

Negative training examples

+ daughter(sue,eve).

+ daughter(ann,pat).

- daughter(tom,ann).

- daughter(eve,ann).

In See Background Knowledgethe known background information is represented, e.g., it is not possible to be a mother without being a parent. In See Positive and negative training examplesrepresents true and false observations. The

reason why ILP algorithms need negative examples is to specialize a generated rule that is inconsistent with negative examples.

 

The task here is to learn the relation daughter(X,Y) with the background knowledge

predicates female and parent. Using ILP methods the concluded rule, called a hypothesis is shown below

 

daughter(X,Y) female(X), parent(X,Y).

 

This inference is not deductively sound as we have generalized beyond what was known with certainty.

Properties of a logic program

The set of positive examples is in this setting called E+, and correspondingly for negative examples, E-.

 

Completeness of a logic program (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming - From)

A logic program P is complete (with respect to E+) iff, for all examples e E+, P |- e.

Consistency of a logic program (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming - From)

A logic program is consistent (with respect to E-) iff, for no example e E+, P |- e.

Range-restricted horn clause (See Muggleton S. (1996), Stochastic Logic Programs, Oxford University)

A horn clause C is range-restricted iff every variable in the head of C is found in

the body of C.

Basic ILP Techniques

Theta - subsumption

The - subsumption lattice induces a generality ordering on clauses.

 

- subsumption (See Midelfart, H. (1996), Knowledge Discovery in Databases applied in Inductive)

A clause - subsumes a clause denoted by . is a

generalization of under - subsumption.

 

Inverse resolution

Inverse resolution (See Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (1996), Advances in) is used to produce general clauses from specific examples by

inverting the resolution rule of deductive inference. For generalization inverse substitution is used, which mean replacing constants with variables and at the same time ensuring that the original example(s) can be restored by ordinary substitution.

Relative Least General Generalization (rlgg)

Least general generalization (lgg) of two clauses is such that by applying two substitutions on the same variable in the lgg, the two clauses can be regenerated, e.g.

lgg(parent(ann,mary), parent(ann,tom)) = parent(ann,X).

A rlgg is the most specific generalization of two clauses relative to the given background information (See Lavrac N., Dzeroski S. (1993), Inductive Logic Programming - Techniques and).

E.g.

C1 = daughter(mary,ann) female(mary), parent(ann,mary)

and

C2 = daughter(eve,tom) female(eve), parent(tom,eve)

then the least general generalization of the two clauses -

lgg(C1,C2) = daughter(X,Y) female(X), parent(Y,X)

Relative least general generalization is the least general clause more general than both C1 and C2 regarding the background knowledge B.

ILP Applications

Most ILP applications use top-down searching, starting with the most general hypothesis and it clause until it supports none negative examples.

 

Progol is currently the best performing ILP system, it uses top-down searching like MIS and FOIL, but searches in a lattice which is bounded both above and below and is therefore

better constrained (See Muggleton S. (1995), Inverse entailment and Progol, Oxford University). To search the lattice which is ordered by -subsumption, Progol uses an A* - like algorithm with refinement operators (See Russel S., Norvig P.(1995), Artificial Intelligence - A modern approach,).

 

Golem is the a well performing ILP system based on bottom-up searching by applying

relative least generalization. Other methods for bottom-up searching are inverse resolution and inverse implication (See Bergadano F., Gunetti D. (1996), Inductive Logic Programming - From).

The GILP data model and its constraints

The goal for the GILP data model

The GILP data model must be able to represent horn clauses in a clear and

non-ambiguous way eligible for genetic algorithm operations and representation.

 

In this chapter vector and clause are both used to describe a generated hypothesis.

Mathematical Model of GILP constraints

Matrix based GILP formulation

In the hypergraph matrix representation constraints formalized in See Hypergraph matrix constraintsthe first

constraint is unnecessary since the second one makes sure there is maximum one

occurrence of the value 1 per column and there are n rows which leads to a maximum row sum of n.

 

If starting from the left in the matrix and generating elements in columns a new variable (that is a new row without a value 1, getting a value 1). What this row number is for a new variable doesn't have any importance, so to avoid duplicate semantics for different matrices, the least free row number will be selected when entering a new variable. This can be formulated with adding a new vector of dimension 1*n consisting of elements which is the minimum available row index at column i. Since it is the minimum available row number at position i, it is greater than the maximum j for akj=1 and k < i. Since the maximum number of new variables going one column right is 1, constraints between different elements in are added. The new model is shown below.

GILP Matrix Model Constraints
Vector based GILP formulation

Since there is only one variable per column in the matrix model it could be transformed into a vector model storing the variable number which is active for each column in a vector. This is done to decrease the storage requirements and the total number of constraints. The vector formulation is shown below

GILP Vector Model Constraints

here mi is the maximum variable number position i, and vk is the selected variable number at position k. Each GILP vector can be uniquely encoded to a number by

Investigating of the search space

All investigation of the search space is based the vector constraint model presented in See GILP Vector Model Constraints.

Search space for GILP

The search space for GILP is the number of allowed vectors for a given vector length.

It is from this point called gilp(n) where n is the vector length.

 

gilp(n) n!;

 

The first and the second constraint in See GILP Vector Model Constraints ensures that the growth in number of

variables from position k to k+1 is less than or equal to 1 and m0 = 1

=> gilp(n) 2*3*..*n n!

 

 

GILP unconstrained search space vs. the factorial function

 

 

 

Calculating gilp(n)

Recurrence equations for calculation of gilp(n)

 

 

This has very high2 running time if solved in a straight order fashion. Since there are many common recursive steps there are overlaps of calculations of the various and a program based on the dynamic programming principle was developed (see appendix A-See gilpcalc.cc)

 

Growth of comb(n,k)

 

 

Calculations of sub-searchspaces of gilp(n)

A sub-searchspace of gilp(n) is in this context the number of vectors with a specific number of vector elements different from zero.

 

Since comb(n,k) = n!/(k!(n-k)!), the left side of the equality symbol is then

simplified to n/(n-k) and lim of that when k is fixed and n goes towards infinity is 1.

 

comb(n,k) is not an exponentially growing function for a fixed k.

 

Let gilpnz(n,k) be equal to the number of vectors based on a gilp vector length

of n with k elements greater than 0.

 

gilpnz(n+1,k)>gilpnz(n,k) and gilpnz(n,k) > 0 for all positive and

integer n and k where k < n

 

 

 

 

Since by increasing the vector length n by 1, you keep the search space for the

previous vector length but also allows more combinations, like adding all the

vectors with non-zero of length k-1 from the vectors generated for length n and

having a non-zero element in the new position allowed, thereby increasing the

overall search space. From this we can say that gilpnz(n+1,k)/gilpnz(n,k) > 1

for all allowed n and k.

 

By inspection of the data calculated for each length, an assumption of that

comb(n+1,k)/comb(n,k) > gilpnz(n+1,k)/gilpnz(n,k) seems to hold for all data and

even a stronger assumption for growing n or k (or both).

 

If the previous assumption holds, applying the squeeze theorem to gilpnz(n+1,k)/

gilpnz(n,k) shows that the limit for any allowed k when n goes towards infinity is 1,

since gilpnz(n,k) is growing slower than comp(n,k) then by lemma 7.3 gilpnz(n,k) is

not an exponentially growing function.

Constraints on gilp clause variable unification

A variable in a gilp clause is not allowed to unify to the same values as the other

variables in the clause

 

A consequence of definition See Constraints on gilp clause variable unification is that a generated clause

grandfather(X,Y) :- father(X,Z), father(Z,Y).

will instead be represented in a prolog query as

grandfather(X,Y), \+ X=Y :- father(X,Z), \+ X=Z, father(Z,Y), \+ Z=Y.

In See gilpnz(n,k) for k of 5, 10, 20, 40 and 100there is an example of the growth of gilpnz(n,k) for k of 5, 10, 20, 40 and 100,

as we can see from the figure, the curves are growing slower than straight lines and since the y-axis scale is logarithmic that supports the previously proved non-exponential growth of gilpnz(n,k) for any fixed k.

Pruning by entailment

GILP completeness fitness - gilpcompfit(H,E+)

gilpcompfit(H,E+) is equal to the number of unique members of E+ matched by

the query Head,Body where Head and Body are the head and body part of the

Horn Clause hypothesis H , divided by the total number of positive examples in E+

 

gilpcompfit(H,E+) is a number between 0 and 1 (inclusive), if gilpcompfit(H,E+) is equal to 1, H is complete (see definition See Completeness of a logic program ([1])).

GILP consistency fitness - gilpconsfit(H,E-)

gilpconsfit(H,E-) is equal one minus, the number of unique members of E- matched

by the query Head,Body where Head and Body are the head and body part of the

Horn Clause hypothesis H divided by the total number of negative examples in

E-

 

gilpconsfit(H,E-) is a number between 0 and 1 (inclusive), if gilpconsfit(H,E-) is equal to 1, H is consistent (see definition See Consistency of a logic program ([1])).

 

Because of the formulation of gilp vectors, having a gilp vector with at least one vector element equal to zero, if adding an allowed non-zero number to that currently zero position, the new vector will be logically entailed by . That means that is more general than the new vector. If is checked for completeness fitness and found to be unfit there is no need to check the new vector since it is more special and it is upper bound by the

gilpcompfit( ,E) regarding to completeness fitness.

 

The reason why the completeness fitness function is used is the fact that the search space gilp(n) is "very large" and this is a way to prune clauses in gilpnz(n,k2) based on failed clauses in gilpnz(n,k1) where k1 < k2 <= n and the vector in gilp(n,k2) is entailed by a vector in gilp(n,k1).

 

If a vector generated get a non-zero completeness fitness the consistency fitness gilpconsfit will be measured. The reason why this isn't done for every generated vector is that the

gilpcompfit() and gilpconsfit() functions are computationally heavy operations and it is

usually not of great interest in a practical problem to measure the comsistency fitness of generated vector which has a zero completeness fitness. (Though, this could be of

theoretical interest in some problems).

 

This can be summarized in the algorithm defined below

Algorithm for entailment pruning algorithm of a gilp vector

1. Check if (in gilpnz(n,k2)) is entailed by investigating each tree of stored

failed hypotheses for lengths from 0 to k2 - 1, if such an element is found give

the generated vector completeness fitness of 0

2. If no entailment vector is found calculate gilpcompfit( ,E+), if it is 0, store

it in the failed-tree for gilpnz(n,k2) (tree number k2)

3. If gilpcompfit( ,E+) > 0, calculate gilpconsfit( ,E-), if it is equal to 1

( is complete) remove the positive examples covered by

 

One problem with this algorithm are the storage requirements to store one tree of failed clauses for each k from 0 to n in gilpnz(n,k), the partial solution to this could be to store trees with a non-zero length of k up to a certain percentage of the vector length n.

 

I.e. for a template clause length of 40, an upper bound for the number of vectors with a non-zero length of 6, is gilpnz(40,6) = 6.62*10^8, which is only a 3.31*10^(-28) part of gilp(40) = 2.35*10^36.

 

6.62*10^8 is still a significantly high number nodes of vectors in a binary tree regarding to memory requirements3, but this number will be significantly reduced by other pruning

operations, like integrity constraints (see section See Pruning using Integrity Constraints), logic factorization (see section See Pruning using logic factorization), requirements of connectivity between predicates in a clause (see section See Pruning using heuristics on literal cohesion), and requirements that a generated clause has to be range-restricted (see section See Pruning by demanding range-restrictedness).

 

Another feature of the entailment algorithm is that the vectors of short non-zero lengths are the most general and when the tree is starting to fill up4 many new generated

vectors will found to be entailed and thereby removed as a possible hypothesis.

 

gilpnz(n,k) for k of 5, 10, 20, 40 and 100

 

gilp(n) vs. gilpnz(n,k) where k=floor(0.2*n)

Pruning using Integrity Constraints

Integrity constraints(See Nytrø Ø., Østvold B.M. (1996), Pruning the specialization search space using) are restrictions on a predicate parameter variable binding constrained by previous bindings of that variable. A simple example can be for the predicate father/2, the (natural) integrity constraint for the second parameter is that it can't be bound to the same variable as the first parameter was bound to, i.e. father(X,X) is not allowed (a father can't be the father of himself), but father(X,Y) is valid by integrity constraints. There can also be integrity constraints between predicates in a template clause, in:

father/2, brothers/2, female/1, mother/2, male/1

there are integrity constraints5 between the first parameter in father/2 and the parameter in female/1 (a father can't be a female), father/2 is the head predicate. The unconstrained search space gilp(8) = 21147, and the integrity constrained search space for this example template = 4496 (the unconstrained search space is reduced with 78.74%).

Do predicate ordering matter?

Since the number of possible variables for a position p in a gilp vector is growing with p,

there seems to be a point in positioning the predicates such that integrity constraints occur early (as much to the left as possible) in the template clause, this could possibly prune a larger part of the search space - gilp(n) than an integrity constraints further to the right in the template clause. The template clause in See father/2, brothers/2, female/1, mother/2, male/1 could be rearranged to

father/2, female/1, male/1, brothers/2, mother/2

to get the following collision-position pairs: (3,1), (4,3), (5,3), (6,3), (7,1), (7,4), (7,5) and (7,6) instead of the previous collision-position pairs: (5,1), (5,3), (5,4), (6,1), (6,3), (6,4), (8,5) and (8,6). The new constrained search space is still the same as the old one, 4496.

By splitting up the predicates and allowing parameters for different predicates to be moved around independently the clause could be rearranged to

father/2, female/1, mother_param_1, male/1, brothers/2, mother_param_2

getting the corresponding collision-position pairs: (3,1), (4,1), (5,3), (5,4), (6,3), (6,4), (7,3) and (7,4). There gives no decrease in the constrained search space, it is still 4496. By

allowing splitting of the head predicates the clause can be rearranged to

father_param_1, female/1, mother_param_1, male/1, brothers/2, mother_param_2, father_param_2

getting the corresponding collision-position pairs: (2,1), (3,1), (4,2), (4,3), (5,2), (5,3), (6,2)

and (6,3). This last template search space is still 4496. Reconsidering the assumed pruning effect of rearranging predicates or parameter places, it seems plausible that there is no effect in the rearranging of parameter places regarding to search space, this is because moving an integrity constraint to the right will prune smaller subtrees of the search space, but it will also prune an increasing number of subtrees. Rearranging integrity constraints is only a

syntactic change, not a change of the allowed semantics of clauses. There is still a point in having the integrity constraints as far left as possible, in an iterative or recursive program for generation of clauses the running time would be better caused by fewer and larger

cuts of the search space.

Effect of integrity constraint pruning of gilp(n)

n

10

11

12

13

14

Minimum

39746

289920

583907

10346072

71444707

Average

382090

2408232

14730743

82480584

540248088

Maximum

678570

4213597

27644437

190899322

1192059223

Min/Max %

5.86

6.88

2.11

5.42

5.99

#Trials

78

63

48

33

18

In See Effect of integrity constraint pruning of gilp(n)effects of pruning using integrity constraints are shown. The minimum row is the minimum search space obtained for a generated integrity constraint, the maximum row is the maximum search space obtained for a generated integrity constraint with a upper bound of gilp(n). To generate integrity constraints a program accepting 3 parameters: vector length, probability for having zero integrity constraints for a parameter position and a

ratio (between 0 and 1) limiting the maximum number of integrity constraints for a

parameter, a parameter with more than zero integrity constraints are limited to the integer value of the product fraction value times (position number - 1), i.e. with a fraction of 0.5 for position 8 there there can be a maximum of trunc(0.5*(8-1)) = 3 integrity constraints. The integrity constraint generator program can be found in in appendix A-See gilp_generator.perl and the calculation program in appendix A-See gilpcalc_ic.cc.

 

The reason why there is a decrease in number of trials for increasing n is the exponential growth in time it takes to search with integrity constraints, and dynamic programming can't be applied since there are dependencies both to positions to the left and right for any given

position in a vector (calculationwise).

Pruning using logic factorization

In a gilp template clause, there will typically be repeated use of predicates, and when

generating a clause using that template, there is no gain in information to have repeated

predicates with the same variables setup. I.e. if a part of a clause looks like

.., father(X,Y), .., father(X,Y),..

the last occurence of father(X,Y) is superfluous. Removing such repeated predicates is a part of doing logic factorization, which is removing repeated and superfluous predicates in a horn clause until no more predicates can be removed without altering the semantics of the clause. In logic factorization of a generated clause like

grandfather(X0,X2) :- father(X0,X1), father(X1,X2), father(X0,X3),father(X3,X2)

the last two father/2 predicates will be removed, but this will not be done by the

single-predicate logic factorization suggested in gilp.

 

The strict constraints for unification of variables in definition See Constraints on gilp clause variable unification will probably also help pruning the search space, since the completeness fitness for the clause above is the upper bound for the completeness fitness of the same clause with unification constraints.

Pruning using heuristics on literal cohesion

High cohesion of literals in a clause means that there is a high number of repeated

occurrences of variables in different parameter positions in a clause. I.e.

grandfather(X,Y) :- father(X,Z), father(Z,X).

has higher cohesion than

grandfather(X,Y) :- father(X,Z), father(Q,Y).

In the extreme case of low cohesion, there are no reoccurrences of variables in a clause, this is equivalent of having independent predicates in a clause, which means that a fitness query for such a clause is the same as making the fitness query for each individual clause

consisting of one predicate in the order given by the template mesh, which is (usually) of no interest. The other extrema is the highest cohesion, where only one variable exists

throughout the parameter places of the clause, then the expressibility of the clause is almost reduced to that of a propositional logic, since the predicates "have taken the role as" atoms.

 

To avoid getting close to the possible cohesion extremas mentioned for a generated clause, a heuristic approach is used. Since the reoccurrences of variables in parameter places is

connected to the cohesion, the heuristic function here is estimated based on those.

The gilp heuristic estimate - gilpheuristic

The heuristic function gilpheuristic requires the following elements:

The generated clause vector
  1. A vector
  2. which registers variable occurrence in predicates, initially all the elements are initialized to -1, when a variable k is added to a position mapped to predicate m. if(vc(k) < maxpredicate+1) then begin if(vc(k) = -1) then vc(k) = m else if(v(k) != m) then v(k) = maxpredicate+1 end. From three questions about a variable k can then be answered, 1. is it use? (v(k) != -1), 2. is it used in exactly one predicate? (-1<v(k) < maxpredicate+1), and 3. is it used in more than 1 predicate (v(k) = maxpredicate + 1). needs a variable for the number of elements equal to maxpredicate+1.
  1. A vector
  2. of variable numbers counting number of occurences per variable, i.e. if variable 4 is created at first, vn(4) = 1, if variable 4 reoccurs, just do increase(vn(4),1). needs a variable for the number of elements greater than 1.
  1. A vector to store and a vector
  2. of occurences of variable occurences, i.e. is if variable 3 and 5 are the only ones with 5 occurences, then ov(5) is equal to 2 (this method is inspired by "the bucket sort" algorithm, which is O(n) given its requirements, see section 9.2 of See Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,). If a variable reoccurs, i.e. variable number 3 increases from 5 to 6 occurrences, then decrease(ov(5),1) and increase(ov(6),1) to keep consistent with the current . needs a variable for the number of elements greater than a user selected value.
  1. Also needed are bookkeeping variables like: the vector length, the head and body lengths, a maximum occurrence counter, a maximum variable number counter, and a vector position counter. To map a parameter place to predicate, a fourth vector is used. The regular max-variable vector is also used.

 

The calculation of all the above elements has a time complexity of O(n), where n is the gilp

vector length. Going from one step to another has a time complexity of O(1).

Heuristic fitness function for gilp

The gilp heuristic fitness function =

w1*(number of elements in with values greater than 1 divided by the total

number of non-zero elements in ) +

w2*(number of elements in with the value equal to maxpredicate + 1 divided

by the number of positive elements in , zero inclusive) +

w3*(number of elements in with a value greater than a user limit

divided by the total number of non-zero elements in ).

To make the fitness value be between 0 and 1, let w1+w2+w3 = 1

 

If the creation of a gilp vector is done incrementally the heuristic fitness has a time complexity of O(1) from one step to another, which gives a total time complexity O(n) for

calculating the heuristic fitness of the gilp vector.

Pruning by demanding range-restrictedness

In the previous section the problems with low cohesion in a clause were mentioned, another example of low cohesion could be

grandfather(X,Y) :- father(Q,W), father(W,R).

in my opinion the above clause is practically worthless since there is no cohesion between the head and body literals of the clause, i.e. if you combine the clause with the facts:

father(haakon,olav). father(olav,harald).

you can't for sure get any information about the grandfather relation for those 3 atoms

(haakon, olav and harald) , except that there exists a positive example of grandfather with some atoms.

 

This is the motivation to only allow range-restricted clauses (see definition See Range-restricted horn clause ([49])) in gilp. This can be done iteratively such that for each new head variable created, a counter is increased, and each time a head variable is used in the body and at the same time not allready used in more than one predicate, the counter is decreased. This criteria could be checked by the vector, assuming that there is only one head predicate. If the head variables are not necessarily in order from 0 to maximum headvar, the variable check to see if its a headvar is increased from O(1) to O(lg |HV|) where HV is the set of head variables. Summarized the check for a range-restricted clause is O(n) (and can be done iteratively simultaneously with the factorizing checks) if the head variable numbers occur in an continous range and there is only one head predicate. If the head variables are not in a continous range, the check is increased to O(n lg|HV|). If there are more than one head predicate, the vector structure would have to be changed to uniquely represent which predicates a variable occur in, a

possible solution could be to have an associative array (implemented using binary trees), giving a time complexity of (at least) O(n (|HP| lg(|TP|) + lg|HV|)), where HP are the

number of head predicates, and TP are total number of predicates in the clause.

The GILP algorithm

The methods for constraining the GILP search space in the methods suggested here, are described in the previous chapter.

Creation of the initial population

Creation of an individual

An individual in the GILP population is represented by a chromosome vector of length n.

It is |P| predicates in the vector template, with |HP| predicates in the head part and |BP|

predicates in the body part.

GILP's algorithm for initial creation of a chromosome vector.

1. Randomly select the first position in the vector to be of value 0 or 1

2. If the current position is the last position in the vector, go to 7

3. Calculate the set PV of possible values for next pos in the vector based on:

3a. The integrity constraint check, if it fails go to 6

3b. The simplified logic factorization check, if it fails go to 6

4. If the set of possible values generated is empty go to 6

5a. Go to the next position vp in the vector

5b. Insert a random value from the generated set PV, and insert in position vp

5c. Check if the vector is range-restricted, if it is go to 7, otherwise go to 2

6. Return error message

7. Return the generated vector

Creation of a population

A GILP population is represented by an ordered set P of individuals. The size of the

population is k.

GILP's algorithm for the creation of a population

1. If the population size |P| is equal to k go to 7

2. Create a new individual using the method in definition See GILP's algorithm for initial creation of a chromosome vector.

3a. Do the entailment check, if it fails go to 1

3b. Calculate the completeness fitness for the individual, if "low" or zero go to 5a

3c. Calculate the consistency fitness for the individual, if "low" or zero go to 6

4. Save the individual in the set of good solutions and go to 6

5a. Insert the clause in the entailment storage tree (for the correct non-zero length)

5b. Go to 1

6. Insert the individual into P and go to 1

(NOTE: this is a set insert, so if it already exists in P, P will not grow in size)

7. Return the generated population

 

The reason why the heuristic function on literal cohesion isn't used in the creation of an

initial population, is that the primary wish for the initial population is to have individuals which are range-restricted (which is a strict constraint) but also entails many elements in gilp(n) by having a short non-zero length (see section ). Another approach for the

creation of an initial population could be to use an recursive programs like A-See gilpcalc.cc or A-See gilpcalc_ic.cc.

 

By prioritizing completeness fitness over consistency fitness in the the algorithm in

defintion See GILP's algorithm for the creation of a population, GILP is a top-down GA search algorithm.

Genetic Operations

Reproduction

GILP's primary method for reproduction

Based on fitness value there is a probability pr for a survival of an individual to

the next generation. This gives no new contribution to next generations population

but most likely it will contribute to sustain or increase the average fitness of the

next generation by "helping" fit individuals to survive.

Mutation
GILP's primary algorithm for mutation

Based on the fitness value there is a probability pm for mutation of a vector.

Select a random position in the vector and select a random variable from that

positions set of possible variable numbers (which may have to be recalculated),

select another random position in the clause and repeat. If the vector is mutated

from being range-restricted to not being it, repetition of the mutation method is

applied. When reaching a certain number of repeated mutations on a clause

GILP's supporting method for mutation is applied.

GILP's supporting algorithm for mutation

If the first mutation method failed, the second one replaces the vector using

an algorithm similar to the one in definition See GILP's algorithm for initial creation of a chromosome vector., but it is also adding a check of

of the heuristic fitness value for cohesion in 5c. such that vectors with greater

non-zero lengths can be created, and which have a higher likelihood of having

a sufficient cohesion.

Crossover
GILP's primary algorithm for crossover

Based on the fitness value there is a probability pc for a crossover.

Select two random individuals i1, i2 in the population, combine them into a third by:

Calculate the heuristic fitness value for cohesion for the ix with

highest completeness fitness.

Case 1: If the heuristic fitness value and the consistency fitness value both are

low, do a crossover equal to the specialization of ix adding new variables at

positions where ix has a value of 0, but the other vector has positive values. To

ensure feasibility according to constraints, the minimum of the

maximum allowed value at that position s (that is ms) and the positive value in

the opposite vector of ix. Insert the new element with instead of element with the

lowest weighted sum of fitness, that is v1*(completeness fitness) +

v2*(consistency fitness) + v3*(heuristic fitness),

To make the fitness value be between 0 and 1, let v1+v2+v3 = 1.

Since the two actual fitness measures are more precise than the heuristic, v1, v2 > v3

Case 2: Loop through the pairs of vectors and randomly select which vector

to take a variable from, if the variable is not allowed to select, randomly select

between the variable in other vector and the median value of the allowed values

calculated for the current position until an allowed variable is found.

Case 1 in the crossover algorithm is a with to specialize the vector with the locally best completeness fitness, but is quite general and covers too many negative examples

(low consistency fitness). The weighted replacement criteria function allows different types of solutions by changing the weight values, i.e. if v1 is 1 the vector which was used as the basis for the specialization is kept.

 

Case 2 is a crossover algorithm similar to the crossover in See Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:, except that the repair

operator is built into the crossover operator in GILP. The reason why the median value of allowed is chosen to make a feasible solution is 1. to make the crossover a less "random" operation (by not selecting a random allowed value), and 2. a wish to decrease the number of new variables (such as selecting the largest allowed value possibly would be), since the addition of a new variable will not contribute to make the vector get a higher cohesion.

 

Implementation overview

- The GILP calculation programs

Filename

 

Description

gilpcalc.cc

 

Calculates the unconstrained search space gilp(n) using dynamic programming.

gilp_generator.perl

 

Generates an stochastic integrity

constraint input file to gilpcalc_ic, which can be tuned by parameters

gilpcalc_ic.cc

 

Calculates the integrity constrained search space

- The GILP system

Filename

 

Description

Imakefile

 

Imakefile to create a Makefile for the GILP system

gilp_assoc.(h|cc)

 

Associative array class

gilp_common.h

 

Common constants and includes for GILP

gilp_communicator.(h|cc)

 

Communication class for controlling a CLP(R) or Prolog Program using Unix pipes

gilp_element.(h|cc)

 

Chromosome individual class, consists of an array of gilp_predicate objects and a static array gilp_predicatemesh objects

gilp_main.cc

 

The main program for GILP

gilp_parameter.(h|cc)

 

A class which takes care of all user interaction,

including reading of configuration files and

command line parameters

gilp_population.(h|cc)

 

Population class, consists of an array of gilp_element objects

gilp_predicate.(h|cc)

 

Class to represent predicates (within an individual)

gilp_predicatemesh.(h|cc)

 

Class to represent constraints like logic

factorization and integrity constraints for a predicate in a mesh

gilp_rand.(h|cc)

 

Random number generator class

gilp_statistic.(h|cc)

 

Statistic class with functions to simulate some

statistical distributions

gilp_string.(h|cc)

 

String class, implemented because of problems with the string class that ships with the C++

Standard Template Library (STL)

 

 

The GILP system is an unfinished (GA) system implementing the algorithms suggested in chapter 7 and 8. To be able do fitness queries, Prolog (or CLP(R)) is needed integrated with GILP, either through pipes/sockets or using the Prolog-C API.

Conclusion

We have in this report investigated ways of applying genetic algorithms (GA) concepts in methods for the discovering of rules represented as horn logic in data mining problems. Such problems have usually been solved using inductive logic programming (ILP).

 

The representation model for horn clauses is a GA vector. When unconstrained, this vectors search space is extremely large, at least exponentially for the vector lenght. This has moved most of the attention to the investigation of methods to constrain this search space. Five strategies to prune this search space are suggested (chapter 7):

Integrity constraints
  1. Simple logical factorization
  2. Entailment, storing failed vectors with low fitness
  3. Range-restricted clauses.
  4. Heuristic function for estimation of literal cohesion

 

The GA algorithms suggested in chapter 8, are created using the five pruning strategies mentioned.

 

An system to test the suggested algorithms is partly developed, but needs more work before

tests can be done.

Further research

Applied research and development

Continue on the GILP system implementation and test it on various ILP problems, and compare the results with the ILP systems. This would make a suitable semester project for one or two students.

 

Extend the algorithms to support atoms and lists as parameters in a generated clause, this could be done by adding one predicate per atomic value, i.e. norway_pred(norway)., such that every time norway_pred(X) is used, X is unified with the atom norway, without the need of having the constants directly in the generated clause, they are instead a part of the facts.

 

Implement ILP operators like relative least general generalization or refinement operators, and integrate them with the suggested GA methods to possibly improve performance and quality of the search.

 

Do statistical experiments measuring the performance of the GILP system as a function of the weights in the heuristic function or as a function of the weights in the crossover

operator.

 

Combine6 the GILP system with evolutionary hardware, i.e. a field programmable gate array (FPGA), to create a regulator for an industriell process which automatically adapts

to changes in input signals.

 

Look at the visualization possibilities for the generated hypotheses, either as a hypergraph or as a vector or matrix. I.e is there a way to "sense" the fitness for a population, by looking at the visualization of all the vectors in a population simultaneously?

Theoretical research

Do a thorough complexity analysis of the suggested GILP system, including kolmogorov complexity () and a possible classification of the time complexity of GILP ().

 

Investigate heuristic methods applied to which have been applied to NP complete problems like The Travelling Salesman problem with some success to see if they are applicable to prune the GILP search space.

 

Develop parallel models of GILP, this is motivated by results with parallel genetic

algorithms, where the (combinatorical) quadratic assignment problem with more than one billion binary variables has been solved ()!

Evalution of the project

The project progression was rather slow in the beginning, mainly caused by work on other projects and wrong prioritations of time, but also because of a slight confusion about what the approach to the project should be. Finally, after doing some initial implementation of the GILP system, I ended up with a more theoretical approach, investigating ways of pruning of the search space. Unfortunately I didn't finish the GILP system so that practical tests of its applicability could be done, but hopefully there have been some useful algorithm

suggestions.

Bibliography

Bergadano F., Gunetti D. (1996), Inductive Logic Programming - From

Machine Learning to Software Engineering, MIT Press

Berge C. (1976), Graphs and Hypergraphs, North-Holland

Burton D.M. (1989), Elementary Number Theory, WCB Press

Cormen T.H., Leiserson C.E., Rivest R.L.(1992), Introduction to Algorithms,

MIT Press

Fayyad U.M., Piatetsky-Shapiro G., Smyth P., Uthurusamy R. (1996), Advances in

Knowledge discovery and Data Mining, MIT Press

Haynes T.D., Schoenfeld D.A., Wainwrigth R.L. (1996), Type Inheritance in

Strongly Typed Genetic Programming, Chapter 18 of Advances in Genetic

Programming, MIT Press

Heitkoetter J. (1997), The Hitch-Hiker's Guide to Evolutionary Computation,

FAQ for usenet news group comp.ai.genetic

Jájá J. (1992), An Introduction to Parallel Algorithms, Addison Wesley

Jones N.D. (1997), Computability and Complexity - From a Programming

Perspective, MIT Press

Judson T.W. (1994), Abstract Algebra - Theory and Applications,

PWS Publishing Company

Koza J.R. (1992), Genetic Programming - On the Programming of Computers

by Means of Natural Selection, MIT Press

Lavrac N., Dzeroski S. (1993), Inductive Logic Programming - Techniques and

Applications, Prentice Hall.

Levine D. (1994), A Parallel Genetic Algorithm for the Set Partitioning Problem,

Ph.D thesis at Illinois Institute of Technology

Li M, Vitányi P. (1997), An Introduction to Kolmogorov Complexity and

Its Applications, Springer-Verlag

Midelfart, H. (1996), Knowledge Discovery in Databases applied in Inductive

Logic Programming, Diploma (M.Sc) thesis at the Norwegian University of

Science and Technology - NTNU

Nakhaeizadeh G., Taylor C. C. (1997), Machine Learning and Statistics - The

Interface, John Wiley & sons

Nienhuys-Cheng S.-H., Wolf R. de (1997), Foundations of Inductive Logic

Programming, Springer-Verlag

Nilsson U., Matuszynski J. (1990), Logic, Programming and Prolog, Wiley

Nygreen B. (1980), Heltallsproblemer (Integer Problems), Lecture book in

Operations Research (norw. text) at the University of Trondheim, Norway

Nytrø, Ø. (1992), Aspects of structure in Horn Clause programs,

Ph.D thesis at the University of Trondheim, Norway

Ravindran A., Phillips D.T., Solberg J.J. (1987), Operations Research Principles

and Practise, John Wiley & sons

Russel S., Norvig P.(1995), Artificial Intelligence - A modern approach,

Prentice Hall

Sperschneider V., Antoniou G. (1991), Logic A Foundation for Computer Science,

Addison Wesley

 

Sterling, L. (1990), The Practise of Prolog, MIT Press

Taylor H.M., Karlin S. (1993), An Introduction to stochastic modeling,

Academic Press

Truss, J.K. (1991), Discrete Mathematics for computer scientists, Addison Wesley

Van De Velde, E.F. (1994), Concurrent Scientific Computing, Springer-Verlag

Walpole R.E., Myers R.H. (1993), Probability and Statistics for Engineers and

Scientists, Prentice Hall

Williams H.P. (1993), Model building in mathematical programming,

John Wiley & sons

References

Alfheim E., Tveit A. (1996), Inductive Logic Programming Applied in

Knowledge Discovery In Databases, semester project at NTNU, Norway

Banzhaf W. (1994), Genotype-Phenotype-Mapping and Neutral Variation -

A case study in Genetic Programming, Department of Computer Science,

Dortmund University, Germany

Chu P.C., Beasley J.E. (1997), A Genetic Algorithm for the Multidimensional

Knapsack Problem, The Management School, Imperial College, England

Chu P.C., Beasley J.E. (1997), Constraint Handling in Genetic Algorithms:

the Set Partitioning Problem, The Management School, Imperial College, England

Corcoran A.L., Wainwright R.L. (1993), The Performance of a Genetic Algorithm

on a Chaotic Objective Function, University of Tulsa, Oklahoma, USA

Crawford K.D., Wainwrigth R.L., Vasicek D.J. (1995), Detecting Multiple `

Outliers in Regression Data Using Genetic Algorithms, Symposium of Applied

Computing, Nashville TN, USA

Garces-Perez J., Schoenefeld D.A., Wainwright R.L. (1994), Solving Facility

Layout Problems Using Genetic Algorithms, University of Tulsa, Oklahoma, USA

Finn P., Muggleton S., Page D., Srinivasan A. (1996), Discovery of

Pharmacophores, Computational Chemistry, Pfizer Central Research, England

Gathercole C., Ross P. (1997), Small populations over Many Generations can beat

Large Populations over Few Generations in Genetic Programming,

Department of Artificial Intelligence, University of Edinburgh, Scotland

Gavish B., Pirkul H. (1985), Efficient algorithms for solving multiconstraint

zero-one knapsack problems to optimality, Mathematical Programming, issue 31

Genetic Programming 1998 Conference, Abstracts and Speaker Biographies

for 22 tutorials, to be held at the University of Wisconsin - Madison, Wisconsin,

USA

Koza J.R. (1996), Future Work and Practical Applications of Genetic

Programming, Stanford University, USA

Langdon W.B. (1995), Evolving Data Structures with Genetic Programming,

Department of Computer Science, University College, London, England

Langdon W.B. (1995), Pareto, Population Partitioning, Price and Genetic

Programming, Department of Computer Science, University College, London,

England

Langdon W.B. (1996), Using Data Structures within Genetic Programming,

Department of Computer Science, University College, London, England

Muggleton S., Raedt L.D.(1994), Inductive Logic Programming: Theory

and methods, Journal Of Logic Programming 1994:19,20:629-679

Muggleton S., Page Jr. C.D. (1994), A Learnability Model for Universal

Representations, Oxford University Computing Laboratory, England

Muggleton S. (1995), Inverse entailment and Progol, Oxford University

Computing Laboratory, England

Muggleton S. (1996), Machine intelligibility and the duality principle,

Oxford University Computing Laboratory, England

Muggleton S. (1996), Stochastic Logic Programs, Oxford University

Computing Laboratory, England

Nytrø Ø., Østvold B.M. (1996), Pruning the specialization search space using

integrity constraints, the Norwegian University of Science and Technology,

Norway

Palmeri C.(1997), Believe in yourself, believe in the merchandise,

Forbes v160, n5,USA

Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program

Evolution, IDSIA, Switzerland

Salustowicz R., Schmidhuber J. (1997), Probalistic Incremental Program:

Stochastic Search Through Program Space, IDSIA, Switzerland

Index


1. A matrix with a high proportion of zero elements (typically more than 90%)

2. Exactly how high was not determined, but experiments showed that it was possibly "around" O(k^(n^2)) for a constant k.

3. Solved on todays "average" supercomputer, i.e. Cray T3E

4. Getting close to the maximum number of vectors for that specific non-zero length

5. Assuming no change of gender.. #:-)

6. I apologize for the "sceptic, anti- scifi&magic" feeling that the reader might get of this :-)



Follow @atbrox