On MatBase ’s algorithm for preventing cycles in binary Cartesian function products

This paper introduces the algorithm that MatBase (an intelligent knowledge and database management system prototype) uses for enforcing acyclicities of binary Cartesian function products, characterizes it -proving that it is complete, sound, optimal, and linear- and, besides its pseudocode embedding SQL, also provides an example implementation in standard ANSI-99 SQL and MS VBA.


Introduction
Database (db) constraint enforcement is crucial for guaranteeing db instances plausibility [1]. While this is fairly simple for the Relational Data Model (RDM) [1,2,3], for which all RDM-based Management Systems (RDBMS) provide six types of constraints (namely, domain/range, not-null, reference integrity, key/uniqueness, tuple/check, and default values), things are not at all simple for most of the non-relational ones, e.g., those provided by the (Elementary) Mathematical Data Model ((E)MDM) [4,5,6].
For example, let us consider the RDM table scheme and instance from Figure 1 (where the primary key is x, an integer autonumber generated by the underlying RDBMS, Name, a mandatory text made of at most 64 ASCII characters, is a unique key as well, whereas Mother, Father, and Spouse are foreign keys referencing x; please recall that for any function f : D  C, its image, denoted Im(f), is the set of all the values taken by f, i.e. Im(f) = {yC | xD, f(x) = y}  C).
We carefully entered data in the table PERSONS from Figure 1, so that these non-relational constraints are satisfied as well. Generally, however, especially when working with huge data, end-users should never be left unassisted in face of non-relational constraints as, for example, in this case, they are generally not historians and even historians are humans, so prone to errors.

MatBase
MatBase [5 -13] is an intelligent knowledge and database management system prototype developed by us and based on both the (E)MDM, RDM, Datalog [3,6,13], and the Entity-Relationship (E-R) Data Model (E-RDM) [1,14,15]. MatBase accepts (E)MDM schemes and automatically translates them into RDM ones and may extract corresponding E-R Diagrams (E-RD); dually, it accepts both E-RDs and RDM schemes and automatically translates them into (E)MDM ones. Finally, it accepts Datalog programs as well and, whenever correct, it computes their results against the corresponding RDM schemes and instances.
Moreover, MatBase automatically generates software applications for managing the data it stores, including code for enforcing both relational and non-relational constraints.
Currently, MatBase has two versions: one developed in MS Access VBA, for small dbs, and a MS C# and SQL Server, for professional management of big and huge dbs.

Enforcing the non-relational constraints
There are only two possible ways to enforce non-relational constraints: either use an intelligent DBMS as MatBase (that are not commercially available) or through event-driven software db applications (APP) managing data stored by RDBMSes.
Whenever possible, it is best to enforce them preventively, i.e. to filter out unplausible data that would violate the constraints and provide end-users with only plausible data to choose from.
For example, constraint Spouse irreflexive may be easily enforced as follows: Spouse should be implemented in the PERSONS graphical user interface (GUI) form as a combo-box (storing the x values, but presenting end-users the corresponding Name values, alphabetically ordered); whenever end-users (or the APP) successfully moved the cursor on another data row of the underlying PERSONS table, the APP recomputes the Spouse combo-box, eliminating the current person from it (of course that on new lines, nothing has to be eliminated as the current person data is not yet saved in the db).
For example, to do it in MS VBA, only the following six lines of code must be added to the Form_Current method of the PERSONS class (which is automatically invoked by the system each time the data cursor arrives on another line): If NewRecord Then Spouse.RowSource = "SELECT x, Name FROM PERSONS ORDER BY Name" Else Spouse.RowSource = "SELECT x, Name FROM PERSONS WHERE x <> " & x & " ORDER BY Name" Endif Spouse.Requery However, most of the non-relational constraints may not be enforced preventively, but only curatively, i.e. after endusers chose data the APP should check whether it is plausible and reject unplausible values.
For example, enforcing the constraint Spouse symmetric should be done as follows: -after end-users changed data in the Spouse combo-box for the current person and successfully saved it in the db, the APP should analyze the context and: -if Spouse was nullified, then the Spouse value for the former spouse must be automatically nullified as well; -if Spouse was null and it has become not-null, then the Spouse value of the newly saved spouse must be automatically set to the current person's x value; -if Spouse was not null and was updated to another not-null value, then, automatically, the Spouse value of the former spouse must be nullified and the one of the current spouse must be set to the current person's x value; -recompute the Spouse combo-box for all persons.
For example, in MS VBA this needs some 18 code lines to be added to the Form_AfterUpdate method of the PERSONS class (which is automatically invoked by the system each time the data of the current line was saved in the db).
Enforcing acyclicity of self-maps is even costlier; for example, enforcing the constraint Mother acyclic may be done as follows: -after end-users changed data in the Mother combo-box for the current person and ask for saving it in the virtual memory, the APP should analyze the context and: -if Mother was nullified, then nothing has to be done (as null values may not violate acyclicity); -if Mother was set to a not-null value (distinct of the previous one, if any), then a corresponding call to a library Boolean function (called IsPath in MatBase) must be made, passing the corresponding x value, the self-map name and its domain name (Mother and PERSONS, in this case), as well as its value for the current data row (i.e. y = Mother(x), in this case); this method returns true if there is a path in the self-map's graph between y and x (i.e. a cycle would close in this graph were y accepted) or false otherwise; if it returns false, then the new value is saved, otherwise it is rejected (with a corresponding error message).
For example, in MS VBA this needs 3 code lines to be added to the Mother_BeforeUpdate method of the PERSONS class (which is automatically invoked by the system each time the data in the Mother combo-box was changed and end-users would like to save it in the virtual memory) and 28 code lines for the IsPath method of the CONSTRAINTS library.
Please note that, for any person, the corresponding graph of Mother/Father is a binary tree growing upwards for storing his/her ancestors adjacent with a n-ary tree growing downwards for storing his/her descendants (n natural). However, for the product Mother  Father the corresponding graph is not tree-like, but lattice-like, i.e., as a undirected graph it may have cycles, as there are persons descending more than one time from another one, both on same and on different length paths (see, e.g., [6,10,13]).
Consequently, enforcing acyclicity for binary homogeneous function products is not at all a trivial task, especially when both functions may also take null values.
However, to our knowledge, no research was done on the acyclicity of homogeneous binary Cartesian function products.

Paper outline
The second section of this paper presents and characterizes the pseudocode MatBase's algorithm for enforcing acyclicity of homogeneous binary Cartesian function products, while the third one provides its MS VBA embedding SQL implementation. The paper ends with conclusion and references.

MatBase's algorithm for enforcing acyclicities of homogeneous binary Cartesian function products
MatBase's algorithm xFunctProductAcyclic for enforcing acyclicity of binary Cartesian function products is based on the computeFunctProductInstantiationTransClosure algorithm presented and characterized in section 3 of [13], for n = 2.
The main difference between them is that, after each insertion performed within the while loop, the instance of the result table TransClosure is searched for the values of f(x) and g(x), provided they are not null; moreover, whenever f and g are self-maps, the value of x is also searched for (as, for example, nobody may be an ancestor of his/her mother/father); if such a value is found, i.e., a cycle would be created in the graph of f  g were the values of f(x) and g(x) saved in the db, then the computation stops and the xFunctProductAcyclic Boolean function displays a corresponding error message and returns true; otherwise, computation of the transitive closure goes on and when it ends without any cycle detected the xFunctProductAcyclic Boolean returns false. xFunctProductAcyclic should be automatically called whenever on the current line x either f(x) or/and g(x) have been updated, at least one of them is not null, and saving of data in the db for the current line has been requested (either implicitly or explicitly). For example, in MS Access VBA the corresponding event-driven method is Form_BeforeUpdate, while in MS .NET the corresponding event is called Validating. Obviously, whenever xFunctProductAcyclic returns false saving of the new data should be allowed, while otherwise it must be rejected.
For example, let us call the function xFunctProductAcyclic from Figure 2 with the following input from Figure 1: R = "PERSONS"; xName = "x"; fName = "Mother"; gName = "Father"; TransClosure = "TransClosure"; a = "RelatedPersons"; x = 1; f = 5; g  NULLS (which corresponds to an attempt of saving the fact the Queen Mary of Romania were the mother of her grandmother, Queen Victoria of UK); here is what would happen (supposing table TransClosure does not exist; corresponding TransClosure instance is shown in Figure 3  TransClosure, a -the names of the desired table (distinct in the db) and its column for storing the closure; x -The value of the surrogate key of R for the current element;   The instance of TransClosure for the second above example (from which 2 nd and 3 rd lines were deleted) Please note that xFunctProductAcyclic needs a TransClosureDuplicates SQL view to delete any duplicates that might pollute the transitive closure.
Let us now investigate the characterization of xFunctProductAcyclic: Theorem: Algorithm xFunctProductAcyclic from Figure 2 has the following four properties: (i) it is linear in the cardinal of the input relation (ii) it is sound (i.e., it is not returning true, except when a f  g acyclicity violation would occur) (iii) it is complete (i.e., it is returning true whenever a f  g acyclicity violation would occur) (iv) it is optimal (i.e., it detects any f  g acyclicity violation attempt in the least number of steps possible) Proof: (i) Obviously, as it has only two finite loops (so, as it also deletes any duplicates that might occur in the transitive closure, it never loops infinitely) depending on the sum of the heights in the graphs of f and g, which are at most n = |R| (trivially, as db instances are finite, any such height is finite and at most equal to n -1); consequently, in the worst case (i.e. no null values for both f and g and no cycle found), the algorithm performs 2 * (n -1) steps, so its complexity is O(n). (ii) Obviously, the only times when xFunctProductAcyclic returns true is when either in the subset of "descendants" or in the one of "ancestors" the transitive closure computation added either f(x) or/and g(x), on any other level than -1 (those of the "parents" of x), or/and, only when f and g are self-maps, the current element x from R. (iii) Obviously, as variable S stores any not null value of f(x) and g(x), plus, for self-maps, the one of x as well, any time when a cycle might be closed xFunctProductAcyclic returns true. (iv) Obviously, for both while loops, as soon as the previous loop iteration did not add any new elements to the result, the process stops (i.e. the algorithm only computes the first two fixpoints, which is the minimum possible in order to discover the least fixpoint). Moreover, the algorithm never generates duplicates neither on a same level (as it joins to the input relation only the current result elements that were added in the previous iteration), nor on different ones (as, using the TransClosureDuplicates SQL view, duplicates are deleted as soon as they occur). Finally, xFunctProductAcyclic stops as soon as a would-be cycle is detected. q.e.d. Figure 5 presents, as an example, the MatBase's MS Access VBA embedding SQL implementation of the xFunctProductAcyclic Boolean function (of the Constraints module) whose pseudocode algorithm is shown in Figure 2.

Conclusion
This paper provides an algorithm (both in pseudocode and in MS Access VBA embedding SQL) for enforcing acyclicities of binary homogeneous Cartesian function products, examples of using it for a genealogical db, as well as a formal proof that this algorithm is linear, sound, complete, and optimal. This algorithm is embedded in MatBase, an intelligent knowledge and db management system prototype designed and developed by us.
Declaring and enforcing acyclicities of binary homogeneous Cartesian function products further contribute to guaranteeing db instances quality. Consequently, it is our firm belief that this type of non-relational constraints should also be added to all DBMSes, so that developers need not enforce them through their code.