Reasons to use LUSTRE
Visit the LUSTRE website.
ONLY if:
- You need persistence of data files (such as large restart files) between runs,
- You generate scratch files larger than 400G and less than 3.2T bytes in size,
- Or you need to share more data than can fit in main memory with Global Arrays.
Drawbacks
- Slow speed ~ 100MB/sec for 1 writer/OST versus ~ 235MB/sec for scratch disk (/scratch)
- Multiple writers to single file is very slow.
- No automatic failover capacity yet for Object Storage Targets (OST's) or Meta-Data Server (MDS).
If an OST dies, all files that had a stripe on that OST are temporarily unavailable until the failover OST is up and running.
Details
Logistics
- File system is mounted as /dtemp.
- Your file names must start with /dtemp/your_login_id/.
Normal file io commands are used (fopen, fclose, ...). - No backup is done.
- Maximum default file size is 3.2 terabytes. If you need larger LUSTRE files please ask the MSCF consultants .
- Total space available is 53 terabytes.
- Files are striped across Object Storate Targets (OST's) in blocks.
Stripe size and count may be tuned per application.- Use the lstripe command to set the stripe count, stripe size, and stripe start for a file.
- Use the lfind command to see the stripe count and start for a file.
- Default stripe count is 8 with stripe size of 65536 (bytes).
- If you want to change the default stripe size and count for all of your jobs do the following in your submit script.
- Set the environment variable LD_PRELOAD to /home/mscf/lib/liblls.so .
- Control the default stripe count, stripe size and stripe start with the environment variables LLSTRIPE_COUNT, LLSTRIPE_START, and LLSTRIPE_SIZE The default values for these are 1, -1, and 65536 respectively when using liblls.so.
- LLSTRIPE_START=-1 lets the LUSTRE file system choose the starting stripe for you.
- If you wish to have different stripe parameters for individual files within a job, you can precreate the files with the lstripe command or via special parameters to your open file calls and additional calls to ioctl in your code. See Chapter 22, "File striping configuration", in "The Lustre Storage Architecture" document for examples.
- Best performance for writing LUSTRE files is with a single writer per file.
This minimizes lock contention.
Layout
- 33 reserved nodes not schedulable by users. Each node has a primary processor and a failover processor.
- 1 Meta Data Server node (MDS)
The MDS stores inode-like information. - 32 Object Storage Target nodes (OST's)
These store actual file blocks that compose the stripes.
Each OST has 2 880GB LUN's attached for storing blocks of LUSTRE files.
- 1 Meta Data Server node (MDS)
- Client nodes: 566 fat-nodes (8G memory and 400G /scratch) and 369 thin-nodes(6G memory and 10G /scratch)
These nodes are user schedulable MPP2 nodes.
