Big Data applications exhibit evident sequentiality but due to the contentions amongst other I/O submitting applications, the I/O accesses get multiplexed which leads to higher disk arm movements. The data access time in HDDs is majorly governed by disk arm movements, which usually occurs when data is not accessed sequentially. Hard disk drives (HDDs) form the backbone of data center storage. We have designed and developed two contention avoidance storage solutions, collectively known as “BID: Bulk I/O Dispatch” in the Linux block layer specifically to suit multi-tenant, multi-tasking shared Big Data environments. Unfortunately, despite it’s significance, the block layer, essentially the block I/O scheduler, hasn’t evolved to meet the needs of Big Data. The Linux OS (host) block layer is the most critical part of the I/O hierarchy, as it orchestrates the I/O requests from different applications to the underlying storage. The problems associated due to the inefficiencies in data management get amplified in Big Data shared resource environments. Despite advanced optimizations applied across various layers along the odyssey of data access, the I/O stack still remains volatile. The full potential of storage devices cannot be harnessed till all layers of I/O hierarchy function efficiently. This results in a mismatch between the application needs from storage and what storage can deliver. The performance gap between compute and storage is fairly considerable.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |