Oil Storages and Main Oil Pipelines Pumps Database Development

The object of research is a data model that characterizes the pumping units of main oil pipelines. The paper considers the creation of pumps database and their characteristics as a component of software systems designed to search for sources of energy saving in pipeline transportation of oil and oil products.<br><br>An overview of existing programs for calculating the joint operation of an oil pumping station and a pipeline is given. It is shown that most programs are inaccessible for analyzing the applied data storage technologies, or involve manual input of entry data without using databases. Comparative analysis of spreadsheets and relational databases is performed. It is shown that relational databases better meet the requirements of data convenience, availability, scalability, and performance. A physical model of a relational database is presented. The parent and child entities have been established, which make it possible to fully reflect all information about the pumping unit, namely:<br><br>– coefficients of hydraulic characteristics depending on the diameter of the impeller rotor;<br><br>– characteristics depending only on the pump make;<br><br>– classification of the pump by design or purpose. Relationship types and referential integrity rules are defined between entities when deleting or updating data. For data management, the SQLite system is proposed, which provides data manipulation in the SQL query language and does not require the development of additional software. The analysis of possible ways of organizing multilingualism using a database is carried out. The text of the SQL query is proposed, which allows to select the pump with its characteristics by the pump make and impeller diameter. A database structure is proposed that allows storing information about equipment for oil pipelines for various purposes, which makes it possible for automated calculations of complex technological processes. The pumping unit database is an open source project that is posted on the public web service GitHub.


Introduction
Oil and petroleum distribution systems are subject to constant changes, which are caused by the following reasons: -new oil fields are discovered and the volumes of oil production on existing oil fields are changed; -refineries can increase or decrease the volume of oil refining, which leads to a change in the volume of oil supplies; -countries that import oil or transport it through their territory can diversify their suppliers of oil and oil products, which can also affect the operation of their oil transportation systems. For example, in Ukraine, Ukrtransnafta has adopted a number of strategic directions. One of the priorities is to promote Ukraine's integration into the Eurasian oil transport corridor project, which involves the construction of the Brody-Plock oil pipeline. At the moment, the pipeline system of Ukraine is operating in an underloaded mode. This is caused by a significant decrease in the production of petroleum products at local oil refineries and a decrease in the volume of oil transit through its territory. Nevertheless, the scope of Ukraine's oil transportation system is large. 4767.4 km of main pipelines, the «Yuzhny» offshore oil terminal and 28 oil pumping stations (OPS) are being involved [1].
Oil transportation systems consume significant amounts of energy. Energy resources are limited and expensive, so countries legislate energy conservation strategies.
For example, Ukraine has adopted a law «On Energy Saving» and a number of other regulatory documents [2], which prescribe organizational and technical measures to optimize the operating modes of oil pipelines in underloaded sections.
To ensure the required performance on the operating pipelines, taking into account the criteria for energy efficiency, alternative (optimization) calculations are used. Such calculations are usually performed with software products.
There are a lot of references of computer programs using for solving many problems of oil transportation and storage, though [3][4][5], there is no evidence about the information technologies used in the development of these programs. There are also no recommendations for building equipment databases, used in oil transportation and storage. A database provides the ability to organize a single access to the characteristics of equipment and their control, and also provides the ability to apply generally accepted design patterns for software products. Production enterprises, design organizations, scientific and educational institutions can show significant interest in the accumulation and structuring of information about oil technological equipment.

ISSN 2664-9969
In view of insufficient coverage of issues related to the storage of information about technological equipment of oil pipelines, the description of the principles of building databases seems to be an urgent task.

The object of research and its technological audit
The object of research is a data model that characterizes the pumping units of main oil pipelines. Currently, mainline pumps in Ukraine are operating in an underloaded condition. The pumps are driven by electric motors, the power consumption of which depends on the pump performance. Pump operating modes are far from optimal [6], which can lead to overestimated specific power consumption of the drive. In general, the reasons for the low load of oil pipelines can be as follows: -insufficient amount of pumped liquid in the tanks of the head oil pumping station or overfilling of the tanks of the final destination; -repair of the pipeline without stopping the pumping; -change in the viscosity of the pumped oil; -technological limitations of the pressure value at the inlet and outlet of pumping stations (PS); -presence of travel discharges. Since most of the operating costs of the oil pipeline are due to the operation of electric motors [7], then the positive effect of realizing the energy saving potential of pumping units is also the greatest [8]. When choosing rational modes of oil pipelines operation, a number of measures are considered: -regulation of the pipeline performance by changing the number of operating pumps at the pumping station; -regulation by changing the pump operation scheme at the pumping station; -bypassing; -regulation by turning the pump impellers; -frequency regulation of the pump.
To search for modes that meet the criteria of the highest energy efficiency and the lowest capital intensity, hundreds of options for the operation of the oil pipeline are calculated. To perform such calculations, as a rule, computer programs are written. Not only the amount of calculated data is significant, but also the data supplied to the input of the program. These include the characteristics of pumping equipment and the linear part of the main pipeline. This is explained by the fact that pumps of various brands and with various standard sizes of impellers operate on the existing oil pipelines. Pump data is usually stored in a tabular (Table 1) or graphically in books and catalogs [9][10][11].
In the hydraulic calculations of the oil pipeline, data are used to describe the characteristics of the pump: -total head dependence H: (1) -dependency of the efficiency η p from productivity Q (m 3 /hr): where H, η p -total head (m) and pump efficiency at productivity Q (m 3 /hr) respectively; H 0 , b, c 0 , c 1 , c 2 -empirical coefficients.
Manual entry of these data into the program is inconvenient, since it requires concentration of attention and significant labor costs.

The aim and objectives of research
The aim of research is to develop a database of pumping units for main oil pipelines.
To achieve the aim, it is necessary to solve the following objectives: 1. Design a physical database model, defining parent and child entities, types of relationships between them and data integrity rules, taking into account various variations of pump marks.
2. Develop the methods for organizing a multilingual database of pumping units.

Research of existing solutions of the problem
Most of the existing programs devoted to the calculation of the operating modes of oil pipelines are not available or work without databases.
For example, the NS1 program for calculating the productivity of an oil pipeline section adjacent to the main pumping station [13] prompts the user to manually enter data for pumps and pipeline through the console. In order to repeat the calculation with a changed value of one of the input parameters, it is necessary to re-enter the remaining parameters, which seems inconvenient. Entering data for pumping units from books and catalogs [9,11,12] requires significant labor costs. Reading speed is slow, errors are possible.
The RABT.BAS program has been developed in the BASIC language for calculating the joint operation of an oil pipeline and a pump station [14]. The program works without a database. The input of the coefficients of hydraulic characteristics of the pumps is done manually.
In [15], the text of the program for calculating the operating mode of an oil pipeline with the use of anti-turbulent additives is given. Additive data is read from TXT files. The characteristic coefficients of the booster and main pumps are assigned directly to the variables. Database is not used. A program for calculating non-stationary modes of main oil pipelines has been implemented in Pascal [16]. The program lacks the code responsible for connecting to the database. Pump and piping data are set manually by assigning variables.
«Transit» program for calculating the operating modes of the oil pipeline [17,18] provides a convenient way to define main and booster pumps. The program has an interface. The selection of pumps is carried out using the drop-down list.
The «Transit» program is paid. Its source code is not available. Literary sources lack information about the implemented method of data storage, which excludes the possibility of using developers' experience of the «Transit» program for designing new databases.
During the research on the development and improvement of modeling and construction of effective algorithms for oil pipeline systems hydraulic calculation, an industrial complex of programs was created for the hydraulic calculation of main oil pipelines operating modes [19]. The software package was put into commercial operation, but it is not freely available and there is no information about the method of data organizing.
The method of storing information using MS Excel spreadsheets has become widespread. For example, in the program [20], pumping equipment data is organized into tables, in which each column corresponds to a certain characteristic of the pump. Thanks to the built-in filtration, the program allows to select a pump among many modifications. Among the disadvantages of spreadsheets, the following should be noted: -MS Excel is a paid software product, therefore it is not available everywhere; -many programming languages have libraries for working with Excel files, but the code for fetching data seems to be more cumbersome and less intuitive in comparison, for example, with SQL queries to relational databases; -Excel file processing speed is slow when there are a significant number of rows [21][22][23].
To achieve the goal of the work, the most preferable is the use of relational databases. Their widespread use [24,25] is due to the following reasons [26]: -the relational data organization model is quite simple and at the same time powerful, suitable for solving the vast majority of data storage and processing tasks; -this model is characterized by simplicity of data structure and user-friendly tabular presentation; -ensures data integrity, scalability and independence from the application that uses the data. The choice of a relational database management system (RDBMS) is not critical. For small projects, it seems convenient to use a lightweight RDBMS SQlite. Its advantage is a simplified data structure and no need to use a database server.

Methods of research
For the convenience of structuring the data, the pump characteristics will be divided into attributes and properties. Attributes will be considered characteristics that determine various combinations (variations) of the equipment mark. For example, for the pump NM 2500-230 such attributes as rotor number and impeller diameter can be distinguished D 2 . The rest of the characteristics are defined as properties. For example, the combination of the «impeller = 1» and «D 2 = 440» attributes uniquely determines the property Н 0 = 281.5 m (Table 1).
When creating a relational database of pumping units, the following parent entities are defined: equipment, category, feature and attribute_group.
To store different variations of pump mark, feature values and attributes, the following child entities are defined: combination, feature_value, and attribute. Fig. 1 shows a physical database model that describes data types and relationships between entities.
Relationship types and referential integrity rules are defined between entities when deleting or updating data from parent entities. Let's take a closer look at each of the tables. The table «equipment» contains the mark of the pump. As will be shown below, it is convenient to store the pump mark itself in a separate table containing translations into various languages. The uniqueness of each record is ensured by the id_equipment field with a primary key (PK) and integer values. The «marking» field takes a text data type and is used to store the alias of the equipment brand.
When transporting oil and oil products, pumps of various types can be used. It is convenient to store information about possible types of pumps in the table «category» (Fig. 1). When developing its structure, one should take into account the criterion of scalability, which implies the possibility of creating an unlimited number of subcategories. For example, the category «pumps» can have subcategories «centrifugal pumps», «gear pumps», «piston pumps», which can be designated as subcategories of the second level. In turn, among the «centrifugal pumps» sectional pumps of the NM type and spiral pumps of the NM type can be distinguished, which are subcategories of the third level. The id_category field with the primary key contains a unique integer category identifier. An integer entry for the id_parent field points to the id_category of the parent category. The category_alias field is optional. It was created for a more understandable visual perception of the table and contains unique category aliases, which in this case correspond to the translations of categories into English.
Links between pumps and categories are defined in the «category_equipment» table (Fig. 1) table for two reasons: firstly, several marks of equipment can belong to one category at once, and secondly, one mark of equipment can belong to several categories at once. However, records that are a combination of the two mentioned fields are unique. Data integrity is ensured by foreign keys (FK) for each of the fields. The category_equipment table provides a manyto-many relationship between the pumping equipment and categories entities.
The Table 1 analysis of the main oil pump reference data shows that it is not possible to establish an unambiguous relationship between the brand of the pump and its characteristics. The reason is that pump performance can be determined not only by its mark, but also by combinations of other pump characteristics, such as impeller and impeller diameter. In order to determine the relationship between the mark of the pump and its combinations a table «combination» should be created. The table has two fields. The id_combination field contains unique identifiers of variations. The id_equipment field, like id_combination, takes an integer data type. The field is assigned a foreign key that refers to the equipment ID in the equipment table. Since one entity in the «equipment» table can correspond to one or more entities in the «combination» table, and one entity in the «combination» table can correspond only to one entity in the «equipment» table, the relationship between them can be defined as «one-to-many».
The names of the pump attributes are defined as separate entities and are placedin the «attribute_group» table (Fig. 1). The table consists of fields id_attribute_group and group_alias with data types INTERGER and TEXT, respectively. The id_attribute_group field with the primary key contains attribute IDs. The group_alias field contains a unique name for the attribute.
A separate table «attribute» is intended for storing attribute values (Fig. 1). Each attribute value has its own unique identifier, which is recorded in the id_attribute field. The attribute table also has an id_attribute_group field with a foreign key that points to the id of the attribute name id_attribute_group in the attribute_group The relationship between the entities «attribute» and «attribute_group» is defined as «one-to-many», since the name of an attribute can correspond to many of its values, and the value of an attribute -only one name.
The relationship between attribute values and pump variation is specified using the «combination_attribute» table (Fig. 1). The table has two fields id_attribute id_com-bination and integer data, which due to foreign keys indicate unique identifiers and attribute combinations, respectively. The combination_attribute table establishes a many-to-many relationship between entities in the attribute and combination tables. This relationship means the following: several pumps can correspond to one attribute value, and vice versa, several attributes can correspond to one pump variation at once.
Storage of equipment properties is organized using two tables -«feature» and «feature_value» (Fig. 1). The feature table consists of two fields -feature_alias and id_feature. The feature_alias field is textual and is used to store the names of equipment properties. The integer identifier of the property is written in the id_feature field. Both fields are unique.
The feature_value table is used to store property values. Values are written into the text field feature_value_alias. Due to the field id_feature and its foreign key each value of the property referrs to its name in «feature» table.
There is a one-to-many relationship between the feature and feature_value tables, since multiple values can correspond to one property.
Connections between combinations, combination properties and their values are set using the «feature_combination» table (Fig. 1). Feature_combination table consists of three fields that take integer values. The id_combination field, using a foreign key, refers to a unique identifier in the «combination» table, which in turn is associated with the equipment identifier. The id_feature field of the «fea-ture_combination» table refers to the unique identifier of the property in the «feature» table. The id_feature_value field of the feature_combination table indicates the identifier of the property value that is stored in the feature_value table. Each row of the table is unique. The feature_combination table provides a relationship between the pump variation entity and the many-to-many pump property entity, since multiple property values can correspond to a single pump variation and multiple pump variations can have one property.
When creating software products aimed at users from different countries, it is important to include support for multilingualism in the database architecture. Further, using the example of pump categories, various options for organizing multilingualism using a database are be considered. Fig. 2 shows a way to support multilingualism, in which a separate field is created for each language in the category table.
The disadvantage of this approach is the need to make changes to the table structure every time when adding a new language. This method can be used only in cases where the number of supported languages is known in advance and when each entity (in this example, the category name) must exist in all language variants. Another way of multilingualism can be implemented using information in a serialized form (Fig. 3).
The essence of the implementation is that information in JSON, XML, binary, or another format is written TECHNOLOGY AUDIT AND PRODUCTION RESERVES -№ 5/2(55), 2020 ISSN 2664-9969 to each field that requires translation. In this case, the written object can be an associative array or dictionary with keys in the form of language identifiers (for example, ru, en, uk) and values in the form of translations into the corresponding languages. The disadvantages of this implementation include the impossibility of using the query language for translation management. For example, in order to remove all translations of the English language a programming language (Python, PHP) should be used with the need to read and write the corresponding row of the table. The use of the SQL query language in this case is not possible.

Fig. 3. A way of multilingualism with complex serialized data
It is also possible to create separate records for each translation in the same table that contains the entity that needs localization (Fig. 4). In this case, a table is also created that describes the available languages.
The disadvantage is that for one entity several objects are created in the table at once, which leads to duplicate data that does not require translation and, in the presence of several relationships with other tables, greatly complicates the business logic of the software product.
There is also an example of multilingualism implementation, when for each language a separate table is created with fields that require translation (Fig. 5).  A one-to-one relationship is established between entities, since one translation line corresponds to one category and one category record can be associated with only one line in the translation table for the corresponding language.
The disadvantage of this structure is the need to create a new table when adding a new language. With this implementation, each entity requiring localization will have a set of tables with translations. For example, if two new languages are required to be added, assuming there are five entities, ten new language tables should be created, which seems extremely inconvenient.
The most common method for implementing multilingualism is to create two tables for each entity (Fig. 6), where the main table contains fields that do not depend on localization, and the second contains fields that require translations. A table is also created with a list of available languages. Since it is not known in advance which languages will be used in the designed database, this method of multilingual support is chosen as the most flexible and easily scalable. The general principle of creating multilingualism is described using the example of categories. The entity «category» is stored in the main table «category». Each language is also a separate entity, which is recorded in the «lang» table. In the id_lang field with the primary key the language identifier is recorded. The iso_code field contains a unique two-letter language designator in accordance with ISO 639-1, for example, ru. The lang field is intended to store the name of the language in the original, for example -Russian. The alias_lang field contains the name of the language in English, for example -russian. The «category_lang» table contains fields that require translation. Because of both fields id_category and id_lang each row of the table determines which category and which language the translation relates to. The «category_lang» table provides a many-to-many relationship between the category entity and the language entity, since several languages can correspond to one entity in the «category» table and one language entity can be used for several categories.
By the same principle, language tables are created for other entities, for example, combinations and properties.

Research results
As an example let's consider the data structure of the pump NM 2500-230 and its two variations depending on the diameter of the impeller (Fig. 7). Each variation has its own unique identifier, which is defined in the id_combination field of the «combination» table. This table specifies the relationship between the pump mark id_equipment identifier and its variation identifier (combination). For each variation and language in the «combination_lang» table a short «short_description» and a long «description» of the pump are indicated. Each row of the «combination_lang» table with the «id_lang» identifier indicates the language for which the corresponding translations are created. The available languages are defined in the «lang» table. Referential data integrity is ensured by foreign keys, for example, deleting a pump in the equipment table automatically removes all its variations in the combination table and translations in the combination_lang table. Likewise, for example, deleting the Russian language from the «lang» table removes all Russian translations in the «combination_lang» table.
The tables in Fig. 8 show which parameters (attributes) the pump variations depend on.
The relationships between attributes and variations are defined in the «combination_attribute» table. For example, TECHNOLOGY AUDIT AND PRODUCTION RESERVES -№ 5/2(55), 2020   The relationship of the «attribute» table to the at-tribute_group table is provided by the foreign key of the id_attribute_group field. For example, the values of attributes with identifiers 1 and 3 correspond to attributes with identifiers 1 and 2, where 1 and 2 correspond to the identifiers of the rotor and the diameter of the pump impeller, respectively.
The names and values of pumps characteristics are stored in the tables «feature» and «feature_value», respectively (Fig. 9).
The feature_combination table defines the relationship between pump variation and pump performance. For example, a pump variation with id_combination = 1 has two properties with IDs 1 and 2. Property values are indicated by identifiers in the id_feature_value field. Thus, it can be determined that the reviewed pump has two properties -empirical coefficients H 0 and b in formula (1) with values H 0 = 246.7 m and b = 16.8•10 -6 hr 2 /m 5 .
The process of obtaining specific equipment from the database of the proposed architecture can be formalized through the use of SQL queries. Fig. 9 shows the tables of the database, which store the characteristics of the pump NM 2500-230 in two variations: 1) rotor 0.5, impeller diameter D 2 = 425 mm; 2) rotor 0.7, impeller diameter D 2 = 405 mm.

Fig. 9. Pump characteristics
As an example, Table 2 shoes a request for obtaining the characteristics of the pump NM 2500-230 with a rotor of 0.5 and an impeller diameter of 425 mm.
The proposed query has a quite complex structure, since it contains a subquery enclosed between lines 4 and 14 ( Table 2). The task of the nested query is to find the unique identifier of the pump variation id_combination, which in this example is equal to one (Table 3).
In general, the search for the id_combination identifier is carried out according to two criteria: equipment labeling and attribute values. In this example, the specified criteria are reflected in lines 11-13 of the query (Table 2). The COUNT aggregation function after the SELECT statement (line 5) counts the number of rows corresponding to each pump variation. After calling the COUNT function, the combinations_count alias is specified, under which the result of the function will be available. Due to the GROUP BY clause (line 14), rows with the same id_combination values are combined into one. Since several pump variations can meet the criteria specified in lines 11-13 at once, to obtain the required one, it is neccesary use the HAVING operator with the combinations_count field value equal to two (by the number of attributes). As a result of the work of the nested query, a The structure of the developed database is posted on the GitHub web service [27] and is available for making changes.

SWOT analysis of research results
Strength. Compared to manual input of entry data into a software product, their storage in electronic form can significantly reduce the complexity of calculating the characteristics of oil pumps and operating modes of oil pipelines. The advantage of relational databases over spreadsheets is as follows: -free; -allow to store significant amounts of data and provide high performance; -can be easily used in conjunction with popular programming languages (Python, PHP, Java, etc.) and packages of applied mathematical programs (Matlab, Mathematica, Scilab); -expand the scope for scaling projects, for example, multilingual support can be implemented with a table structure that best suits the requirements of the project; -use a convenient query language SQL. Weaknesses. Relational databases use the SQL query language directly or through a visual editor to add, modify and delete data (phpMyadmin, dbviewer, dbeaver). This is not convenient enough in both cases. To address the problem, data may be imported from files in json, xml, csv or xls format, which requires additional time-consuming development of import modules.
Opportunities. The research results can be used in the design of databases of oil and gas equipment of various types, for example, heat exchangers, dust collectors, gas pressure regulators of gas distribution stations, centrifugal blowers, gas turbine engines and oil reservoirs. Due to the well-thought-out architecture of the database, it is possible to create both highly specialized programs designed to solve one problem, and large software systems for calculating complex technological processes.
Threats. Working with relational databases places higher demands on the qualifications of a program developer, as it requires knowledge of designing database architectures, composing tables and relationships between them, placing indexes and determining keys.

Conclusions
1. Analysis of the tasks of pipeline transportation of oil and oil products and modern programs for calculating the operating modes of oil pipelines showed the relevance of creating databases of pumping equipment. It is shown that due to the complex data structure of the pumps, it is preferable to use relational databases than Excel spreadsheets. When designing the physical model of the database, parent and child entities were selected in such a way that it was possible to reflect all information about the pumping unit: design features, scope, pump characteristics, depending on its variations for replaceable rotors. In the process of modeling, data integrity rules are determined, and relationships of the «one-to-many» and «many-to-many» types are established between the entities of the physical model. The developed database architecture provides modifiability, scalability, modularity and effective use in programs for calculating the operating modes of oil pipelines.
2. An overview of various ways of multilingualism organizing in a database has been made. To achieve the aim of research, it seems most preferable to use a structure in which each entity is stored in two tables. One table stores data that does not depend on translations, and the other contains data with translations.