A NEW SOLUTION FOR ASYNCHRONOUS INCREMENTAL MAINTENANCE OF MATERIALIZED VIEWS

N g u y e n T r a n Q u o c V i n h PhD* Е-mail: ntquocvinh@ued.udn.vn T r a n T r o n g N h a n Master of Science in Computer Science* Е-mail: trongnhan.tran93@gmail.com L e V a n K h a n h Master of Computer Science* Е-mail: khanhlv@lekhanhtech.com T r a n D a n g H u n g PhD, Associate Professor Faculty of Information Technology Hanoi National University of Education Xuan Thuy str., 136, Cau Giay District, Hanoi, Vietnam, 110000 Email: hungtd@hnue.edu.vn A b e e r A l s a d o o n PhD, Associate Professor** Email: AAlsadoon@studygroup.com P W C h a n d a n a P r a s a d PhD, Associate Professor** Email: CWithana@studygroup.com P h a m D u o n g T h u H a n g PhD student* Е-mail: ntquocvinh@ued.udn.vn *Faculty of Information Technology The University of Da Nang – University of Science and Education Ton Duc Thang, 459, Lien Chieu Dist., Da Nang city, Vietnam, 550000 **School of Computing and Mathematics, Sydney Campus Charles Sturt University Level 1, Oxford str., 63, Darlinghurst NSW 2010, Australia Матерiалiзованi представлення можуть значно пiдвищити швидкiсть виконання запитiв, використовуючи всi або частину збережених попередньо обчислених результатiв запитiв. При iнкрементному обслуговуваннi матерiалiзованi представлення оновлюються вiдповiдно до змiн у вiдповiдних базових таблицях. Часто це бiльш ефективно, нiж повне оновлення, що замiнює таблицi матерiалiзованих уявлень новим результатом виконання запиту. Асинхронне обслуговування, що приводить матерiалiзованi представлення до фактичного стану, не є складовою частиною транзакцiї, що вносить змiни в базовi таблицi. Бiльшiсть опублiкованих робiт присвячено синхронному iнкрементному оновленню представлень, алгоритми якого вимагають доступу до стану базових таблиць перед оновленням i не можуть застосовуватися безпосередньо до асинхронним оновлень, якi виконуються в станi пiсля оновлення. Кiлька робiт присвячено асинхронному обслуговуванню представлень, або обмежують змiни тiльки в однiй з базових таблиць, або передбачають наявнiсть лише однiєї базової таблицi, що недоцiльно, або невiрно, або надає занадто високий рiвень i складнi алгоритми iнкрементного оновлення, або може бути реалiзовано, тiльки якщо система управлiння базами даних пiдтримує управлiння версiями даних на рiвнi таблиць i рядкiв. У данiй роботi запропоноване рiшення для асинхронного iнкрементного оновлення представлень, яке може бути реалiзовано з будь-якими системами управлiння базами даних. Ми збираємо змiни в базових таблицях, отримуємо доступ до стану перед оновленням базових таблиць, використовуючи процес ущiльнення, i застосовуємо алгоритми iнкрементного обслуговування перед оновленням для асинхронного обслуговування до стану базових таблиць пiсля оновлення, враховуючи особливостi асинхронного обслуговування. Це може бути застосовано для запитiв SPJ з внутрiшнiми з'єднаннями, запитiв з внутрiшнiми з'єднаннями i агрегатами. Створено прототип та наведено експерименти з автоматичної генерацiї вихiдних кодiв на мовi Сi для збору змiн в базових таблицях i виконання асинхронного iнкрементного оновлення матерiалiзованих представлень в PostgreSQL Ключовi слова: матерiалiзоване представлення, стан перед оновленням, асинхронне iнкрементне обслуговування, синтез вихiдного коду, PostgreSQL UDC 004.65 DOI: 10.15587/1729-4061.2020.193715

It is necessary to make MV actual to the base tables after data changes on them. Incremental update of MV calculates the part of records needed to be removed or inserted into MV according to the changed records in the base tables. The updating process can be done in a synchronous or asynchronous manner. Synchronous (eager, immediate) update is performed within the transaction which changes data in the base tables. In opposite, asynchronous (deferred, lazy) is done outside of those transactions, by user requests, on-demand when a query that uses the MV appears, periodically or by some schedule. Synchronous maintenance of MV can be performed before (more often) or after the base tables are updated, i. e. on the pre-update or post-update states of the base tables and often after changes in each base table. On the opposite, asynchronous one is done on the post-update state of base tables and often after data changes in more than one table.
Asynchronous incremental maintenance of MV is important. There are a number of works dedicated to the asynchronous incremental update of views, which either restrict changes in only one of base tables or assume the query uses only one base table, which is impractical, either is wrong, either provides too high level and complicated incremental update algorithms or can be implemented only if the database management system supports data versioning at the table and row levels. Most of database management systems don't support data versioning at the table and row levels and access to full system transaction log is not always available. Therefore, it is necessary to develop a new technique to incrementally maintain MV in asynchronous manner that can be applied and implemented to any database management system.

Literature review and problem statement
Most of the works are devoted to the synchronous incremental update [2][3][4][5][6][7], a little number of them are dedicated to the asynchronous incremental update of MV [8][9][10]. The works [5,11] build and provide experiments on triggers in programming languages PL/pgSQL and C, implementing the same incremental update algorithms in a synchronous manner on the same MV based on the same query with the same base tables with the clustered indices on primary keys and foreign keys. It is shown that triggers written in C are more effective by about 13 %. The work [5] suggests a sequence of optimizations to the incremental update algorithm relative to [11]. The work [2] proposes a solution for synchronous incremental of recursive materialized view based on SQL query. Although it may be more effective if the incremental update code is embedded into a database management system, solution using triggers has its advantages because it is almost independent of database management system versions and has higher mobility.
It is clear that data in base tables are different at the two states that lead to the necessity of different incremental update algorithms, at least the calculations of the sets of records need to be removed from and/or inserted to the MV according to the data changes made in the base tables. However, most of the published researches didn't care about those different states, which leads to state bug, so that the incremental update algorithms proposed by those works may be wrong [9]. The work [9] develops a new algorithm espe-cially for asynchronous maintenance with different update strategies for helping to reduce the time the MV tables and base tables are locked, which may affect other applications that use the tables negatively. The solution proposed by this paper seems to be very complicated and very difficult to implement. It also doesn't show how to access the pre-update state of the base table to calculate the delta stream for MV.
After the work [8], almost asynchronous incremental maintenance is studied for warehousing and distributed environment [12][13][14][15][16] employing a versioning mechanism of the database management systems. The work [12] delivers that in the very large scale distributed shared-nothing data storage systems, remote view tables are better for index, equijoin, and selection MV, while local ones are more useful for MV with aggregations so that those two types of MV implementation can provide a good tradeoff between system throughput and minimizing view staleness. The work [13] shows the effectiveness of combining eager and lazy MV maintenance policies for social networking applications on a database management system using a distributed memory cache, but there is a tradeoff between accuracy and freshness of the MV. The work [14] tends to find the cost metric which helps to balance the accuracy, the latency, and the source utilization effectively for updating policies of MV with joins on continuous queries over data stream and linked data.
The work [16] utilizes the versions store and deferred maintaining mechanism of the work [8] and approach proposed by the work [17] to build an incremental update algorithm using version store in a warehousing environment. The work [8] exploits the ability of Microsoft SQL Server (from ver. 2005) that supports data versioning at the table level down to record level to access the pre-update state of base tables. That paper also provides condense operators to "compress" the record sets which can help to dramatically reduce the size of the sets of changed record sets in base tables to be processed in deferred maintaining mode. It improves performance. The experiments provided by the paper show that asynchronous maintenance is more effective than synchronous one even in on-demand mode, i. e. the asynchronous update is performed when there is a query that uses the MV appears. The work [15] states the same. In fact, if the data changes in base tables are accumulated too much, the incremental update process may take too much time and system resources, the query owner has to wait longer and there may be also a negative effect on other applications executing parallelly.
According to optimizations suggested by the work [5], the positive effects are gained from condense operators that may be not so high as obtained in the work [8]. The reason is that changes in base tables on attributes participating in GROUP BY clause but not in WHERE clause and are not parameters of an aggregate function can be applied directly to the MV without any relational algebra expression calculation. Anyway, the solution can be applied only to the database management systems which support data versioning to the table and record levels with the ability to access full system transaction log as needed.
PostgreSQL is the world's advanced open source relational database management system. It supports materialized view with the asynchronous full update of MV, which re-executes the underlying query and replaces the previous result in the MV table. As almost open source database management systems, PostgreSQL doesn't support data versioning on the table and record levels. So, we can't apply the solution proposed in the work [8] to implement support of the asynchronous incremental update of MV. Although spending on system resource among the system operational time with the transactions is one of the advantages of synchronous incremental maintenance of MV, it always exploits system resource so can affect the system negatively. It can lengthen the transaction execution time too much especially when multiple MV are affected and needed to be updated, which is not acceptable in many applications. In some cases, for example in data warehouse and distributed systems, the base tables are not always available for synchronous maintenance. On the opposite, asynchronous maintenance may allow MV to be not actual to data in the base tables in some period between the last update time to the next update time. Thus, asynchronous maintenance is very important and needs to be studied and implemented.

The aims and objectives of the study
This research aims to find a solution for asynchronous incremental maintenance of MV which can be applied to any database management system. The following objectives are established and archived to reach the aims: -carefully study the related published works regarding the incremental update of MV at all and asynchronous incremental update in particular; -formally show the state bug error and propose a technique to access the pre-update state of base tables exploiting the condensing process which avoids the state bug error; -build the updating expressions accessing the pre-update state of base tables and the mechanism of asynchronous incremental maintenance which can be implemented with every database management system; -build a prototype for implementing the proposed solution with PostgreSQL.

.1. SQL query
It is necessary to formulate the SQL queries that are used to create MV. Each th x SPJ query x Q which th x MV is based on consists of: where: - -set of fields that are selected and presented in SELECT predicates; - -set of base tables that participate in FROM predicates. FROM predicate x F is the combination of x T and :

join conditions between base tables in ;
x T x W -WHERE predicates, the conditions on each record in joining result of .
x F In case of implicit joins, x J is empty and it is contained in .
x W Otherwise, it is not empty. Let . x Suppose that x J and x W are converted into a conjunctive canonical form.

Each th
x query x Q with aggregate functions which th x MV is based on consists of: where x S now can contain aggregate functions and x G is the set of groups by attributes.

2. Incremental update of MV and state bug
The work [9] first shows examples about state bug when the pre-update incremental maintenance algorithms are applied to the post-update state of the database in the asynchronous incremental update of MV. In this paper, this problem is shown in a formal form.
The standard execution order for SPJ query is join operations, where predicates and then selection. The standard execution order for queries with aggregations is join operation, where predicates, group-by operations and aggregate functions and then selections. Incremental update of those both MV types is based on the distributive property of inner join operation of relational algebra. It is known that a record of a th i base and there is a record set . .
Now, suppose the current pre-update state (instance) of the database is with the set of base tables .
x T The execution result of ( ) , , , The eq. (5) can be presented in the form of a relational algebra expression as follows: If there is a set of records The database now has a new instance and inferring from eq. (3)-(5), a new execution result of x Q is then: x x postM is in the form of a relational algebra expression as follows: is the set of records that must be inserted into MV .
If the pre-update expression (7) is applied directly to the post-update state of the database, we have: The record set that must be inserted to MV with SPJ query according to data changes in base tables in every case must be identified. But the result of expression (8) is wrong and now they are different in expressions (7) and (8): In general, if all n base tables have changed with insertions, it is not difficult to prove that: ..
x dnewpreM and x dnewpostM must be identical, but expressions (9) and (10) show the opposite. It means that the expression (8) is wrong, we can't directly apply pre-update MV incremental maintenance algorithms to the post-update state of the database. It is analogical for delete operations. Suppose we have a base table x   i   T with the pre-update state .

3. Proposed technique
x i oldT There is a record set It is necessary to access the pre-update state of base tables to do incremental maintenance of MV. In this paper, we exploit the idea of condensing process mainly to access the pre-update state of the base tables, so that the state bug error is avoided and the pre-update expressions can be used to do the asynchronous incremental update of MV at the post-update state of base tables. Certainly, we use the condensing process to "compress" the record set

3. 1. Condense processing
At the post-update state of , postT is already in the database.

3. 3. Update expressions
It is not difficult to refer from expression (7) and we adopt the update expression ..
The expressions for MV based on a query with aggregative functions are different but can be referred analogically.

3. Used asynchronous incremental update algorithms
The incremental update algorithms used for MV based on SPJ query and MV based on a query with aggregations are the same and adopted from the work [5]. They are not repeated here to reduce the length of the paper. Anyway, the algorithms for the asynchronous maintenance must have specifics: -changed data is collected by triggers which are fired on each data manipulation event for each base table with the transaction id and the timestamp the insert/update/delete statement started; -the sets of changed rows in base tables must be condensed; -the expression (13) is applied to one set of inserted into base tables records instead of (7) and similarly for the set of deleted from base table records, the expression (14) is used. It is the main difference between synchronous and asynchronous incremental update of MV; applied for the cases when there may be more than one base table changed at a time; the pre-update and post-update states are marked for each base table during each asynchronous update for an MV.

System model
The system for asynchronous incremental maintenance of MV is suggested to consist of three components (Fig. 1): i) MV manager; ii) asynchronous incremental update manager. MV manager i) analyses the input MV query getting meta-data about base tables (key, detail information about attributes…) from the database, ii) generates triggers and asynchronous incremental update source in C and trigger registration code in SQL and iii) updates MV configuration to the database and create the MV table. The generated triggers on each of insert/update/delete events for each base table which will gather the changed records with transaction id, data manipulating statement timestamp and action information in base tables and will save into the This process is synchronously done within the transaction which makes changes in the base tables. It is committed or rollbacked together with the main transaction.
The source code in C is synthesized similarly as in the works [2,5], will be compiled and linked, then saved in the form of dynamic link library (.dll). The MV manager uses generated SQL script and compiled trigger functions code in .dll to register trigger. All the codes implementing expressions (13) and (14) are synthesized at this stage. The structure "…((SELECT all_columns FROM new_table EX-CEPT SELECT all_collums FROM dnew_table) UNION SELECT all_collums FROM dold_table) AS table…" is used to implement expression (12) to access the pre-update state of a base table.
The MV manager also creates the structure for the tables that contain MV configurations and information if needed. It also creates tables to store the changed data in the base table during the process of trigger definition.
The code that implements the asynchronous incremental update of MV is saved in the library and used by the update manager. When an MV update arrived, the update manager will mark the pre-update and post-update state for the changed base tables, mark the current point of auxiliary tables corresponding to those changed base tables for current MV, do the condensing process and then invoke the save code to undertake incremental maintenance of MV.

Experiments on changes gathering and asynchronous incremental update
PostgreSQL is chosen as a database management system to build a prototype with. It now supports trigger for statements allowing to see all the changed records in the body of the trigger. The trigger functions can be written in PL/ pgSQL, C… There was the work showing that triggers written in C are more effective, so C is chosen as a programming language to generate trigger functions.

1. Prototype
A high-level application programming interface is provided to simplify the process of source code synthesis. Different sub-modules have their own functions: -connecting and communication director; -definition of trigger function structures; -trigger parameters checking; -processing of numerical and string constants; -data type conversions; -short-hands for string processing; -debug and log manager; -SQL query executor; -query execution result cache manager; -query execution result data extractor; -changed in base tables data extractor; -query execution result validating; - -changed in base tables data condensing templates and rules; -incremental update templates and rules. The prototype is realized as two standalone applications. The first does the tasks of MV manager and the secondasynchronous incremental update manager.

2. Experiments
There are base tables countries, customers, sales and costs have numbers of attributes of 10, 23, 8  The built source code synthesizer is used to produce i) script to create all auxiliary tables, ii) codes of triggers on all data manipulation events for all 4 base tables and iii) codes for the incremental update of MV mv_total. All those source codes are identical to codes created manually and function well. The insert/update/delete events on base tables are now similar, almost without a difference, to the previous case when there was not any trigger included. It is because of that triggers are very 'light', they do only read the deleted/inserted records in base tables and write to auxiliary tables.
The experiments were provided on a personal computer with configuration CPU Intel Core i5 3317U, RAM DDR3 4GB, HDD SATA3 5400rpm, and PostgreSQL v11.6 64bit installed. The asynchronous maintenance policy in the experiment is on request. Table 1 shows the execution time measured in milliseconds. Two cases of the number of records were evaluated per updating action on each base table: i) one record is manipulated by one SQL statement and ii) 10 records are manipulated by 10 SQL statements. The time is measured for each record per statement. The time in Table 1 is cumulative in the case of synchronous maintenance with 10 records changed. Table 1 Incremental update time in milliseconds  Sales  22  90  138  29  55  75  108  Customers  15  56  77  28  47  71  10 records/10 commands/action/base table  Sales  211  925 1174  40  66  89  117  Customers  139  612  757  28  54  77 For asynchronous maintenance, we examine not only the general case when there is a mix of actions (insert or delete of update) but also the case when there is only one of the actions separately. The optimization mentioned for the case of base table customers is not applied to check the general case of that there are more than one base table changed between the two pre-update and post-update states.

3. Discussion of experimental results
The closest to this solution is the one suggested in the work [8]. As mentioned above, its main distinguishing point to this research is the mechanism for accessing the pre-update states of base tables and gathering changed records. Its disadvantages are in system resources needed to read the right version of base tables and gathering changes from transaction log, the performance of which is strongly dependent on the concrete database management system. Our solution avoids those costs, but it has a disadvantage in system resources required to calculate the pre-update state of the base tables following expression (12). So, the performance of the two solutions may be competitive, but our solution can be implemented with any database management system, not only with the ones that support data versioning at the row and table levels.
The experimental results show that the commutative resource required for the synchronous incremental update is grown almost with the factor of the number of invokes, but fortunately, the cost is spread over time and is negligibly small for each call.
It may be because of the small number of manipulated records, the time of maintenance seems almost for code invoking for both synchronous and asynchronous cases. For asynchronous maintenance, the difference between updating 1 record and 10 records is different by 12 %. Opposite to synchronous one, the accumulative update in an asynchronous manner costs about 30-33 % of that when the updates are performed separately. The total asynchronous updating time is about 3 % (10 records/10 commands per action in each base table) -27 % (1 record/1 command per action in each base table) in comparison to the synchronous incremental update. These good results are due to the effects of condensing process and smaller number of code invoking.
If we apply the optimization suggested in the work [5] for the cases like base table customers, the update of MV according to data changes in customers will be performed separately.

Conclusions
1. We formally showed the state bug when applying the expressions for the incremental update of MV at the pre-update state of base tables, which is used in synchronous maintenance to the post-update state of base tables, which is required in asynchronous one.
2. We proposed a solution for the asynchronous incremental update of MV with the new technique for accessing pre-state of base tables. So that the state bug is avoided and the expressions for calculation of changes to MV which is often used in the synchronous incremental update can be correctly applied to the asynchronous update. We exploited the idea of condensing operators and described them detailly to prove the correctness of the suggested accessing pre-state of base tables technique. The incremental update algorithm is adopted from other our published works applying the updating expressions and specifics for asynchronous maintenance. It is the main contribution to the field of this research.
3. We built a prototype which can synthesize the source code in an automatic manner for supporting the asynchronous incremental update and provided experiments to ensure the correctness of the solution. The total time of asyn-chronous update is 4-33 times smaller than synchronous ones. The accumulative update in an asynchronous manner reduces the cost by about 67-70 % of that when the updates are performed separately.