Differentiating the Level of Territorial Development of the Transport and Logistics Infrastructure in Ukraine by Adapting the Cluster Analysis Methodology

Recently, much attention has been paid to the ideas and theories that highlight the cluster approach to the territorial organization of the economic system. Separation and development of transport and logistics cluster contributes to the development of competitive advantages, provides the leading position of the country in the world market and enhances the investment attractiveness of the region in which it operates. 
The use of a multidimensional statistical method of cluster analysis to determine the homogeneity between the individual regions of Ukraine contributes to the identification of transport and logistics cluster. For this reason, the article analyzes the role of cluster analysis in the study of the development of transport and logistics services, shows that the transport and logistics cluster is formed on the basis of the real socio-economic situation in the conditions of a particular territory, taking into account its potential development. It is substantiated that the transport and logistics cluster, in addition to meeting the needs of the economy, performs a number of important functions: it creates opportunities for the development of sectors of the national economy; contributes to the development of the region's infrastructure; increases the investment attractiveness of the region. The conducted research of existing methods of cluster analysis of grouping of regions of Ukraine by the level and potential of development of the market of transport and logistic services has allowed to allocate the considered ways of estimation of efficiency of functioning of the cluster which differ in external and internal effects. A cluster analysis of the territorial level and potential of logistics development in the regions of Ukraine is carried out on the basis of three groups of indicators (socio-economic, transport performance indicators of the region and indicators characterizing the composition of the transport system of the region and its potential opportunities). As a result of the cluster analysis, the regions of Ukraine were grouped into three clusters by the level of logistics development, and into two clusters by the logistics development potential. It is determined that the classification of regions into homogeneous groups will allow to build in each cluster typologically regressive equations of interaction of market factors, which will increase the accuracy of the study of the dynamics of the development of the market environment of regions of potential placement of elements of transport and logistics infrastructure.


Differentiating the Level of Territorial Development of the Transport and Logistics Infrastructure in Ukraine by Adapting the Cluster Analysis Methodology
Recently, much attention has been paid to the ideas and theories that highlight the cluster approach to the territorial organization of the economic system. Separation and development of transport and logistics cluster contributes to the development of competitive advantages, provides the leading position of the country in the world market and enhances the investment attractiveness of the region in which it operates.
The use of a multidimensional statistical method of cluster analysis to determine the homogeneity between the individual regions of Ukraine contributes to the identifi cation of transport and logistics cluster. For this reason, the article analyzes the role of cluster analysis in the study of the development of transport and logistics services, shows that the transport and logistics cluster is formed on the basis of the real socio-economic situation in the conditions of a particular territory, taking into account its potential development. It is substantiated that the transport and logistics cluster, in addition to meeting the needs of the economy, performs a number of important functions: it creates opportunities for the development of sectors of the national economy; contributes to the development of the region's infrastructure; increases the investment attractiveness of the region. The conducted research of existing methods of cluster analysis of grouping of regions of Ukraine by the level and potential of development of the market of transport and logistic services has allowed to allocate the considered ways of estimation of effi ciency of functioning of the cluster which differ in external and internal effects. A cluster analysis of the territorial level and potential of logistics development in the regions of Ukraine is carried out on the basis of three groups of indicators (socio-economic, transport performance indicators of the region and indicators characterizing the composition of the transport system of the region and its potential opportunities). As a result of the cluster analysis, the regions of Ukraine were grouped into three clusters by the level of logistics development, and into two clusters by the logistics development potential. It is determined that the classifi cation of regions into homogeneous groups will allow to build in each cluster typologically regressive equations of interaction of market factors, which will increase the accuracy of the study of the dynamics of the development of the market environment of regions of potential placement of elements of transport and logistics infrastructure.
Keywords: cluster, clustering, cluster analysis, Ward method, k-means method, transport and logistics services. Intoduction. The early 21th century marked the transition to a post-industrial society based on innovation, knowledge economy, high precision and high performing industries with a prevailing share of innovative and high quality services in GDP, including transport and logistics ones. The newly born post-industrial paradigm is closely associated with theoretical and methodological developments in economics, in-depth analytical studies by use of multidimensional statistical methods, which results are designed to contribute in taking the decisions promoting commercialization of ideas and innovations and bringing the national economy onto the innovation-driven path.
The above said raises the importance of using cluster analysis to group the Ukraine's regions by logistics performance of and market potentials of the transport and logistics services.
Literature review. Applications of methods based on cluster analysis for the analysis of various components of the socio-economic environment at country level are explored by domestic researchers: А. Holovach [1], А. Yerina [2], І. Mantsurov [3], О. Osaulenko [4], N. Parfentseva [5], Т. Chala [6] and others. Problems involved in building up a methodological framework for creating an integrated transport and logistics system in Ukraine from the perspective of cluster approach are elaborated by О. Poliakova [7] and Ye. Sych [8].
In spite of an extensive coverage of the above aspects, further studies need to focus on issues related with grouping of the Ukrainian regions by logistics performance and logistics potential on the basis of cluster approach.
The article's objective is to adapt cluster analysis methods for grouping of the Ukrainian regions by (i) logistics performance and (ii) potentials of the logistics services market.
Results. Cluster analysis offers a highly important tool for exploring the performance of transport and logistics services. The clustering algorithm is the process of dividing a tangible or intangible object into groups of similar objects by a set of division principles [10]. Clustering is a very important component of various programs for data analysis, such as regression, forecasting, intellectual analysis of data [11]. According to L. Rokach, clustering breaks the studied objects into subsets in a way that the similar objects are incorporated into one cluster [12]. The clustering structure can be formally represented as a set S of subsets S 1 , S 2 , ... , S k , so that the following condition is met:

DIFFERENTIATING THE LEVEL OF TERRITORIAL DEVELOPMENT OF THE TRANSPORT AND LOGISTICS INFRASTRUCTURE IN UKRAINE BY ADAPTING THE CLUSTER ANALYSIS METHODOLOGY
The method of cluster analysis is far more complicated than the method of controlled classifi cation, because the former does not set clearly defi ned features found in a certain model of clustering. While in the controlled classifi cation these features lay ground for grouping of studied objects, in clustering, where the features do not exist, it is diffi cult to decide to what group a studied object belongs.
A. Jain suggests that the main purpose of clustering of the studied objects is seeking for a real grouping (group) of the studied points or objects [13].
As there is no standard clear defi nition of "cluster", various methods for cluster analysis were proposed, depending on the principle of grouping. C. Fraley and A. Raftery [14] divide these methods into two groups: hierarchical and distributive. J. Han, M. Kamber and J. Pei [15] propose another division: density-based methods, model-based methods, and grid-based methods. An alternative division based on the principle of induction of various methods for cluster analysis is presented by V. Castro and J. Yang [16].
Cluster analysis is designed to reduce the size of studied objects by identifying homogenous groups [17]. Clustering of studied objects results in created groups with similar functions (features), where the studied objects in different groups have essentially different functions (features).
In our study cluster analysis is used to defi ne the groups of Ukrainian regions with a similar performance of the transport and logistics services market and to combine them in clusters.
The fi rst phase of cluster analysis is to defi ne the characteristics, i. e. indicators that will be used for segmentation of the logistics market in Ukraine by region. The indicators for clustering are usually selected according to the theory and the topic of a study. The grouping of Ukrainian regions by performance and potential of the transport and logistics services market is proposed to be made by the indicators providing the best description of the location and workload of the objects belonging to the transport and logistics infrastructure. These indicators include: 1. Socio-economic indicators: -overall exports of goods (million USD); -overall imports of goods (million USD).

Indicators of transport operation in a region:
-cargo transportation by automobile transport (million tons); -cargo turnover for automobile transport (million ton-km); -distribution of cargo transportation by automobile transport (%); -cargo transportation by automobile transport enterprises (million tons); -distribution of cargo transportation by automobile transport enterprises (%); -cargo turnover of automobile transport enterprises (million ton-km); -average distance of transportation of 1 ton of cargo by automobile transport (km); -average distance of transportation of 1 ton of cargo by automobile transport enterprises (km). 3. Infrastructure factors (indicators measuring the composition of the transport system in a region and its potential capacities): -operating length of the railway line of general purpose (thousand km); -overall length of the motorway of general purpose (thousand km); -gasoline stations (units). The second phase of the cluster analysis is the choice of a clustering method. There exist several methods for cluster analysis, but the one used most widely is the non-hierarchical approach based on k-means method, due to its ability to fi nd a stable solution improving the reliability of results [18]. General types of cluster methods include hierarchical clustering and k-means method.
The computations are made in the applied software package "Statistica" version 6.0, module "Cluster analysis".
The module of cluster analysis or multidimensional classifi cation consists of the following procedures [19]: 1) joining (tree clustering); 2) k-means clustering; SCIENTIFIC BULLETIN OF THE NATIONAL ACADEMY OF STATISTICS, ACCOUNTING AND AUDIT, 2020, № 1-2

СТАТИСТИКА
3) two-way joining. Because the abovementioned indicators have different measurement units, they are normalized.
The hierarchical algorithm for cluster analysis starts with n clusters, where each studied object creates a separate cluster, and ends with one cluster incorporating all the studied objects. At each phase, two nearest studied objects or groups of studied objects are combined in one new cluster. The procedure of cluster analysis is fi xed by a special tree graph, or the so called dendrogram that shows individual phases of the hierarchical algorithm of cluster analysis, including the distances at which separate clusters (or studied objects) were combined. The dendrogram is also used for presentation of results. The starting point of the clustering is to choose a measure signifying the similarity (distance) of separate cases. In our study Euclidean distance is chosen as a distance metric. It is a standard distance metric, well established for multidimensional data and used in geometry.
The chosen metric enabled us to build a matrix of distances between the Ukrainian regions. At the fi rst step Ward's method is used, which is based on creating clusters with maximally possible internal homogeneity and represents a hierarchical method for clustering, recommended by many experts. Ward's method, based on dispersion analysis, is used to select and combine the clusters with minimal sum of squares: (2) where g, a and b are clusters; A is the number of objects in cluster a; В is the number of objects in cluster b; G is the number of objects in cluster g; i is the number of an object (element) of the respective cluster; j is the number of an indicator (variable) characterizing an object; n is the number of indicators (variables) charactering an object; x gij is the value of i element in cluster g; v gj is the mean value of j cluster variable in cluster g; х aij is the value of i element in cluster a; v aj is mean value of j cluster variable in cluster a; x bij is the value of i element in cluster b; v bj is the mean value of j cluster variable in cluster b. The main condition for the application of Ward's method is that the distance of objects is signifi ed by the Euclidean squared distance. In the Euclidean squared metric, the distances are derived by the formula: where D E (x, y) 2 is the Euclidean squared distance between objects x and y; x i is the value of i variable of object х; y i is the value of i variable of object у; п is the number of variables [20]. In our study the hierarchical clustering is used only to derive the number of clusters that tend to form in a natural way.
The dendrogram with the results of the cluster analysis by Ward's method is given in Figure 1, with the vertical axis showing the Ukrainian regions and the horizontal one being the cluster distance.
Analysis of Figure 1 allows us to assume that the breaking of the Ukrainian regions by logistics performance will be optimal when three clusters are involved. НАУКОВИЙ ВІСНИК НАЦІОНАЛЬНОЇ АКАДЕМІЇ СТАТИСТИКИ, ОБЛІКУ ТА АУДИТУ, 2020, № 1-2

Figure 1. Dendrogram of the grouping of the Ukrainian regions by logistics performance (cluster analysis by Ward's method)
Source: developed and constructed by the author The hierarchical clustering is quite simple and acceptable for interpretation. But for a large set it appears to be cumbersome. If it is so, preference will be given to iterative procedures [2].
K-means method is a simple algorithm for iterative clustering. Using the distance as a metric and fi xing K classes in a data set, the mean distance is calculated, the initial gravitation center is found, the classes are defi ned, and the gravitation centers for these classes are characterized. For a data set X, containing n multidimensional data points and the category K, which needs to be divided, the Euclidean distance is chosen as the similarity index, with the clustering procedure used to minimize the sum of various types of squares [21]: (4) where k is the number of cluster centers; u k is k center; x i is i point in the data set; n is the number of data points. The gravitation center u k is derived as follows:

СТАТИСТИКА
is the dispersion (a deviation of the values of a random variable from its expectation), in our case it is the dispersion of x i point in the data set from the gravitation center u k in a cluster.
Given that (5) equals zero, 10 indicators for 24 regions of Ukraine are used for the clustering by k-means method, and the maximal mean distance is used to derive the initial gravitation center. After that the regions were iteratively included in the cluster with the nearest gravitation center using the Euclidean square distance. As mentioned before, the clustering by k-means method is started by choosing the number of clusters. The number of clusters in k-means method can be derived by several approaches.
At the third phase of cluster analysis the number of clusters is chosen. In the clustering by k-means method, the number of clusters has to be chosen by an analyst using various rules or expert knowledge. This can be done by several approaches. We used the previous distribution by Ward's method.
Finally, once the assumption on the clusters number is made, results of the clustering can be interpreted with reference to the basic theory and the fi eld of study.
The analogous results with respect to the structural interpretation of the grouping were derived using the iterative method of cluster analysis, i. e. the clustering by k-means method with breaking into three clusters (see Table 1). Transcarpathia, Kyiv, Lviv, Odesa, Kharkiv 5 Source: computed by the author The clustering by k-means method allowed us to compute mean normalized values of the indicators for each derived cluster. Classifi cation of the regions by logistics performance is shown in Figure 2.
Results of our study allow for making the following conclusions.
The fi rst cluster includes the regions with a low value of all the indicators under study: Vinnytsia, Volyn, Zhytomyr, Zaporizhzhia, Ivano-Frankivsk, Kirovohrad, Luhansk, Mykolaiv, Poltava, Rivne, Sumy, Ternopil, Kherson, Khmelnytskyi, Cherkasy, Chernivtsi, and Chernihiv The second cluster (Dnipropetrovsk and Donetsk regions) has high values for the majority of the indicators, except for "Cargo turnover for automobile transport", "Cargo turnover of automobile transport enterprises", "Average distance of transportation of 1 ton of cargo by automobile transport", "Average distance of transportation of 1 ton of cargo by automobile transport enterprises". It should be emphasized that the latter two indicators ("Average distance of transportation of 1 ton of cargo by automobile transport", "Average distance of transportation of 1 ton of cargo by automobile transport enterprises"), sometimes called as the average length or the lever of transportation, are the most signifi cant of all the factors having impact on technical, operational and economic performance of a transport system, and in this cluster they are even lower than in the fi rst one.

DIFFERENTIATING THE LEVEL OF TERRITORIAL DEVELOPMENT OF THE TRANSPORT AND LOGISTICS INFRASTRUCTURE IN UKRAINE BY ADAPTING THE CLUSTER ANALYSIS METHODOLOGY
transport enterprises", " Average distance of transportation of 1 ton of cargo by automobile transport ", and " Average distance of transportation of 1 ton of cargo by automobile transport enterprises" have the highest values in this cluster.

Figure 2. Mean normalized values of indicators for the clusters
Source: computed by the author For grouping of the Ukrainian regions by logistics performance, the inclusion of indicators measuring one basic transport category will be suffi cient. This category of transport may not conform to the requirements of a high carrying capacity or effectiveness of long distance transportation, as is the case of the second cluster derived by us, but it needs to feature high dynamics, have simple infrastructure and be easily accessible. These requirements are perfectly met by the automobile transport. But the automobile transport alone will not be suffi cient for clustering the Ukrainian regions by logistics potential due to the necessity to move large volumes of cargos over long distances, to reach external markets or support cross-regional links.
The abovementioned requirements can be met by two transport categories: water (sea transport in the fi rst place) and railway. Therefore, to be developed from the logistics point of view, a region needs to have at least two main transport categories: automobile and water or railway. While the former supports internal links and the system's integrity, the latter provides for connections with other clusters or systems of higher rank. It can be logically assumed that other things being equal the more transport categories are involved in the operation of a territorial economic system, the higher is the probability of the occurrence of a transport and logistics cluster in it. Conversely, whenever two effective transport categories and a well established transport & logistics sector can be found, the other transport categories will be logically included in the system.
It is for this reason that in the cluster analysis of the transport and logistics services market by logistics potential three core infrastructure factors (i. e. the indicators refl ecting the structure of the regional transport system and its potential) of two transport categories, railway and automobile, were included for all the Ukrainian regions: SCIENTIFIC BULLETIN OF THE NATIONAL ACADEMY OF STATISTICS, ACCOUNTING AND AUDIT, 2020, № 1-2 СТАТИСТИКА -operating length of the railway line of general purpose (km); -overall length of the motorway of general purpose (thousand km); -gasoline stations (units). Figure 3 shows the dendrogram for the results of the cluster analysis by Ward's method, with the vertical axis signifying the regions of Ukraine and the horizontal axis being the distance between clusters.

Figure 3. Dendrogram of the grouping of Ukrainian regions by logistics potential (cluster analysis by Ward's method)
Source: developed and constructed by the author Analysis of Figure 3 allows us to assume that the breaking of the Ukrainian regions by logistics potential will optimal when two clusters are involved.
The analogous results for compositional interpretation of the grouping were produced using the iterative method of cluster analysis, i. e. the clustering by k-means method with breaking into two clusters (see Table 2).

DIFFERENTIATING THE LEVEL OF TERRITORIAL DEVELOPMENT OF THE TRANSPORT AND LOGISTICS INFRASTRUCTURE IN UKRAINE BY ADAPTING THE CLUSTER ANALYSIS METHODOLOGY
The mean normalized values are computed for all the derived clusters in the process of clustering by k-means. The division of regions into classes by logistics potential is shown in Figure 4. Results of this study lead to the following conclusions. The fi rst cluster includes the regions with a high level of all the indicators under study: Vinnytsia, Dnipropetrovsk, Donetsk, Zhytomyr, Zaporizhzhia, Kyiv, Lviv, Odesa, Poltava, and Kharkiv.
Conclusions. The study confi rms that methods of cluster analysis can be useful for grouping of the Ukrainian regions by performance and potential of the transport and logistics services market. The cluster analysis by k-means method allowed for grouping of the Ukrainian regions into three clusters by logistics performance and two clusters by logistics potential. Breaking of the regions into homogenous groups is supposed to be an effective tool for constructing typological regression equations for each cluster, to explore interactions of market factors, which can improve the accuracy of studies of the business climate dynamics in the regions where components of the transport and logistics infrastructure can be potentially located.