Large data stores are pushing the limits of modern technology. Parallel file systems provide high I/O throughput to large data stores, but are limited to particular operating system and hardware platforms, lack seamless integration and modern security features, and suffer from slow offsite performance. Meanwhile, advanced research collaborations are requiring higher bandwidth as well as concurrent and secure access to large datasets across myriad platforms and parallel file systems, forming a schism between file systems and their users.; It is my thesis that a distributed file system can improve I/O throughput to modern parallel file system architectures, achieving new levels of scalability, performance, security, heterogeneity, transparency, and independence. This dissertation describes and examines prototypes of three data access architectures that use the NFSv4 distributed filing protocol as a foundation for remote data access to parallel file systems while maintaining file system independence.; The first architecture, Split-Server NFSv4, targets parallel file system architectures that disallow customization and/or direct storage access. Split-Server NFSv4 distributes I/O across the available parallel file system nodes, offering secure, heterogeneous, and transparent remote data access. While scalable, the Split-Server NFSv4 prototype demonstrates that the absence of direct data access limits I/O throughput.; Remote data access performance can be increased for parallel file system architectures that allow direct data access plus some customization. The second architecture analyzes the pNFS protocol, which uses storage-specific layout drivers to distribute I/O across the bisectional bandwidth of a storage network between filing nodes and storage. Storage-specific layout drivers allow universal storage protocol support and flexible security and data access semantics, but can diminish the level of heterogeneity and transparency. The third architecture, Direct-pNFS, uses a commodity distributed file system for direct access to a parallel file system's storage nodes, bridging the gap between performance and transparency. The dissertation describes the importance and necessity for both direct data access architectures depending on user and system requirements. I analyze prototypes of both direct data access architectures and demonstrate their ability to match and even exceed the performance of the underlying parallel file system.
展开▼