370 likes | 479 Vues
This tutorial provides a detailed guide to installing and configuring the Sector file system and Sphere programming environment, emphasizing security settings and access control. It covers system requirements, setup of masters and slaves, client configuration, and programming with Sector, including file operations and user-defined functions. Key topics include security server configuration, access control lists (ACLs), SSH setup, and data management techniques. Perfect for developers looking to enhance their skills in secure data handling with advanced systems.
E N D
Sector & Sphere Tutorial Yunhong Gu Univ. of Illinois at Chicago @Booz Allen Hamilton, Aug 6, 2009
Outline • Installation • Sector File System • Sphere Programming
Installation: System Requirement • Linux (debian recommended, XFS recommended) • gcc 3.4 or above • openssl development library • FUSE development library (optional)
System Architecture security_node.key ./users slave_acl.conf master_acl.conf security_node.cert master_node.key master.conf, topology.conf slaves.list master_node.cert client.conf Security Server Masters Clients SSL SSL Data slaves slaves master_node.cert slave.conf
ls ./codeblue2 • Makefile • client • conf • gmp • master • slave • common • doc • lib • security • udt
Configure Security Server • For a testing system, you can use the default configurations • Otherwise, update slave ACL, master ACL, and user accounts
Access Control List (ACL) • Format IP1 IP2 IP3/Mask • Example: 10.0.0.1 192.168.0.0/24
User Account • All accounts in ./conf/users • One account per file • Example: ./conf/users/test is the account configuration for account “test”
User Account PASSWORD xxx READ_PERMISSION / WRITE_PERMISSION /test /angle EXEC_PERMISSION TRUE ACL 0.0.0.0/0 QUOTA 1000000
Start the Security Server • ./sserver <port> • Default port is 5000
Configure the Master Server • ./conf/master.conf SECTOR_PORT 6000 SECURITY_SERVER ncdm161.lac.uic.edu:5000 REPLICA_NUM 2 DATA_DIRECTORY /home/u2/yunhong/work/data/
Configure the Slaves • ./conf/slave.conf MASTER_ADDRESS ncdm161.lac.uic.edu:6000 DATA_DIRECTORY /raid/sector/data/
Start masters and slaves • ./start_master • ./start_slave • ./start_all • ./stop_all • Password-free SSH • ./conf/slaves.list
./conf/slaves.list gu@192.168.136.1 /home/gu/codeblue2/slave/ gu@192.168.136.2 /home/gu/codeblue2/slave/ gu@192.168.136.3 /home/gu/codeblue2/slave/ username@slave_ip BLANK/TAB slave_path • NOT the slave data directory path! • Sector will automatically restart an offline slave, if its address is on this list
Configure the Client • ./conf/client.conf • Optional, but useful for client tools and examples MASTER_ADDRESS ncdm161.lac.uic.edu:6000 USERNAME test PASSWORD xxx CERTIFICATE /home/gu/codeblue2/conf/master_node.cert
Check System Status $cd client $cd tools $./sysinfo Display system information: list of masters, slaves, available disk spaces, etc. ./master/sector.log
Accessing Sector FS • Tools: ./client/tools • ls, mkdir, stat, rm, download, upload, cp, mv • FUSE: ./client/fuse • make • mount: ./sector-fuse <local dir> • unmount: fusermount -u <local dir>
Programming with Sector • #include <fsclient.h> • Sector::init(master_ip, master_port); • Sector::login(username, password, cert); • Sector::logout(); • Sector::close();
Programming with Sector • Sector::list(path, vector<SNode>& attr) • Sector::stat(path, SNode& attr) • Sector::mkdir(path) • Sector::move(src, dst) • Sector::remove(path) • Sector::copy(src, dst) • Sector::utime(path, ts)
SNode • std::string m_strName; • bool m_bIsDir; • std::set<Address, AddrComp> m_sLocation; • int64_t m_llTimeStamp; • int64_t m_llSize;
Sector Files • SectorFile handle; • handle.open(path, mode); • handle.read(buf, size); • handle.write(buf, size); • handle.close(); • seekp, seekg, tellp, tellg, upload, download
Sphere Programming for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); SphereStream sdss; sdss.init("sdss files"); SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); findBrownDwarf(char* image, int isize, char* result, int rsize);
Record Offset Index • Data Text1 text1 text1 text1 Text2 text2 Text3 text3 text3 • Index 0 23 44 61 • Index is a binary file with 64-bit integers, with a postfix of “idx” • user.dat / user.dat.idx
Hashing and Bucket Files • Similar to the Reduce process in MapReduce • Each output record is assigned a bucket ID • Records with the same bucket ID will be sent to the same bucket file
User Defined Function (UDF) • int _FUNCTION_(const SInput* input, SOutput* output, SFile* file)
UDF::SInput struct SInput{ char* m_pcUnit; int m_iRows; int64_t* m_pllIndex; char* m_pcParam; int m_iPSize; };
UDF::SOutput struct SOutput{ char* m_pcResult; int m_iBufSize; int m_iResSize; int64_t* m_pllIndex; int m_iIndSize; int m_iRows; int* m_piBucketID; int64_t m_llOffset; string m_strError; };
UDF::SOutput • If m_pcResult or m_pllIndex is not large enough, resize it • When processing a file, if the result is too large, set m_llOffset to record the current file position and the UDF will be called again to restart processing from m_llOffset, until m_llOffset is set to -1.
UDF::SFile struct SFile{ std::string m_strHomeDir; std::string m_strLibDir; std::string m_strTempDir; std::set <std::string> m_sstrFiles; }; Results can be written into local files, the paths should be put into m_sstrFiles
UDF • __FUNCTION__.cpp #include <sphere.h> extern “C” { int _FUNCTION_(const SInput* input, SOutput* output, SFile* file) { } } • generate FUNC.so file
A Sphere Program #include <dcclient.h> Sector::init(); Sector::login(…) SphereStream input; SphereStream output; SphereProcess myProc; myProc.loadOperator(“func.so”); myProc.run(input, output, func, 0); myProc.read(result) myProc.close(); Sector::logout(); Sector::close();
Sphere Stream • Input vector<string> files;files.insert(files.end(), "/html");SphereStream s;s.init(files); • Output SphereStream temp;temp.setOutputPath("/result", "bucket");temp.init(256);
Upload UDF and related files • SphereProcess::loadOperator(path) • Send UDF to all selected slaves for the current process • Can also send any other files (applications, parameter data, etc.) • The path will be stored in SFiles::m_strLibDir
Run a Sphere Process • int run(const SphereStream& input, SphereStream& output, const string& op, const int& rows, const char* param = NULL, const int& size = 0); • rows: number of rows to pass to UDF each time • N > 0: N rows • 0: the whole segment • -1: the whole file
Read Result and Check Progress • SphereProcess:read(SphereResult*& res, const bool& inorder = false, const bool& wait = true); • If output.init(0), results will be sent back to the client • int checkProgress();