Although Flash, NAS and SAN may be the current “go-to” solutions for storage in the High-Performance Computing (HPC) world, that storage paradigm has proven to be limited when it comes to dealing with the rapidly growing data sets of the 21st century. As IT Pros search for cost-effective storage that easily scales with their HPC needs, object-based storage solutions are often at the top of their list.
Tuesday, July 17, I’ll be hosting our monthly Tech Tuesday webinar and this month’s topic is focused on Object Storage for HPC. Pat Ray, Caringo Swarm Integration Engineering Lead, will be my featured guest.
As Pat and I have been developing the webinar, we’ve talked a lot about those pain points and how a “cloud approach” for HPC storage helps to alleviate those issues. This approach introduces valuable benefits to a wide range of public and private organizations including educational and research institutions, laboratories, businesses and others faced with handling relentless data growth. You will find a matrix of pain points and how object storage helps address them in the table below:
Pain PointHow Swarm HelpsBenefits/Results
|Pain Point||How Swarm Helps||Benefits|
|File system limitations (e.g., billions of files, complex directory structures, fixed block sizes etc.)||No dependence on file systems (inodes, blocks sizes etc. do not matter)Objects are managed and located based on their characteristics (metadata), not where they live on the file system (directories/paths)||Easily scales to billions or more objects in storeData with same or similar characteristics can be easily queried and collated into dynamic collections (saved queries), regardless of where it “lives” in storage|
|Data integrity concerns (multiple backups/copies usually needed)||Uses auto-correction to prevent corruption of objectsOffers flexible data protection schemes (Learn more about replication and erasure coding options)||Continuous built-in data integrity checking and protection delivers multiple “nines” of data durability while optimizing data footprint|
|Inefficient parallel access and storage silos||Global Unified Namespace (replacing hierarchical file systems)Stateless Gateways handle multiple protocol personalities
Lustre HSM tiering via SwarmNFS
|Simultaneous access via native API, S3, NFS, and HDFSNative HDFS support for big data workflows
Cloud enablement (S3 and Azure compatibility)
|Lack of multi-tenant access and/or quota support (typically, data pools into single tenant apps and databases)||Multi-protocol/Multi-tenant access is already built in (not tacked on)“Fine grained” access control with comprehensive Authentication/Authorization support||Enable collaboration throughout your user base with secure multi-tenancyEliminate storage silos and build an active archive for your data|
|Lack of “web-accessible” storage (percolating data to web shares causes overly complex web/app/database tier deployments)||Collapse traditional web/app/database/storage layers into a simplified and streamlined RESTful/web access method||Easily support multi-protocol access (next-gen applications and workflows)|
|Difficulty operating on data subsets locally (object metadata is trapped in a database/uncoupled from the data)||Metadata and data are combined (metadata + data is created/stored as an “object”)||Both data and metadata are managed and protected by the storage system over the object’s lifecycle|
We will discuss all of these pain points as well as considerations when selecting and deploying an object storage solution for an HPC environment Tuesday. Make sure to reserve your seat for the webcast and live Q&A, or register now and you will be notified when the broadcast recording is available on demand.