High performance cluster - SUPERPHI PTE.LTD.
HOME > SOLUTIONS > High-performance clusters >
High performance cluster
Date:2018-12-17 High Performance Computing Cluster (HPCC), also known as a data analysis supercomputer, is an open source, data-intensive computing system platform developed by MediaTek Risk Solutions.
 
HPCC (High Performance Computing Cluster), also known as DAS (Data Analysis Supercomputer), is an open-source, data intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform integrates a software architecture implemented on commodity computing clusters, providing high-performance data parallel processing for applications using big data. The HPCC platform includes a system configuration that supports parallel batch data processing (Thor) and high-performance online query applications using indexed data archives (Roxie). The HPCC platform also includes a data-centric parallel data processing declarative programming language ECL.
 
The HPCC system architecture includes two different cluster processing environments, each of which can be independently optimized to achieve parallel data processing purposes. The first of these platforms is called a data refinery, whose overall purpose is to handle large amounts of any type of raw data for any purpose, but is typically used for data cleaning and hygiene, extraction, transformation, loading and processing of raw data, record connections and entity parsing, complex analysis of large-scale modeling, and creating keyed data and indexes to support high-performance structured queries and data warehouse applications. The data refinery, also known as Thor, symbolizes compressing a large amount of raw data into useful information. The functions, execution environment, archive system, and functionality of Thor cluster are similar to those of Google and Hadoop MapReduce platforms.






Figure 1 shows a representation of a physical Thor processing cluster that serves as the batch processing job execution engine for scalable data-intensive computing applications. In addition to Thor master and slave nodes, additional auxiliary and versatile group components are required to achieve a complete HPCC processing environment.
 
The second parallel data processing platform is called Roxie and can be used as a fast data transfer engine. This platform is designed as an online high-performance structured query analysis platform or data warehouse, supporting parallel data access and processing needs of online applications through a web service interface, supporting thousands of synchronous queries and user response times in minutes and seconds. Roxie uses a distributed index archive system with an optimized execution environment and an archive system for high-performance online processing to provide parallel processing of queries. The Roxie cluster has similar functions and capabilities to Hadoop, adding HBase and Hive capabilities and providing almost immediate predictable query latency. Thor and Roxie clusters both use ECL programming language to implement applications, thereby improving continuity and programmer productivity.




Figure 2 shows a representation of a physical Roxie processing cluster that serves as an online query execution engine for high-performance query and data warehouse applications. The Roxie cluster contains multiple nodes, including servers and worker processes used to process queries; An additional auxiliary component called ESP server provides an interface for external clients to access the cluster; And other commonly used components shared with Thor clusters in the HPCC environment. Although Thor processing clusters can be implemented and used without Roxie clusters, HPCC environments that include Roxie clusters should also include Thor clusters. Thor cluster is used to build distributed index files for Roxie cluster and develop online queries that will be deployed together with the index files to Roxie cluster.
 
The HPCC software architecture includes Thor and Roxie clusters, as well as common middleware components, external communication layers, client interfaces that provide end-user services and system management tools, and auxiliary components that support monitoring and assisting file system loading and storage from external sources. The HPCC environment can only contain Thor clusters, or Thor and Roxie clusters.





TEL:+65 87795190        EMAIL:Lubg@superphi.com.cn       ADDRESS:6 Raffles Quay #14-06 Singapore (048580)

SUPERPHI PTE.LTD. Copyright 2015-2024