620 likes | 753 Vues
Trials and Tribulations: Archiving Electronic Records. Adam Jansen Digital Archivist Washington State Archives. If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations. Records and Information or, Why we do what we do.
E N D
Trials and Tribulations:Archiving Electronic Records Adam Jansen Digital Archivist Washington State Archives
If - Information is power… And - Records are storage of information Then – Records must be preserved for future generations Records and Informationor, Why we do what we do
Historically records were stored on paper, kept in filing cabinets When the cabinet was full, records sent to file room Now records stored electronically on computers When the computer is ‘full’ – add more hard drives Basic skills to manage and maintain records has been lost, replaced by infinite storage Shifting Media
As electronic records become more integrated into society, producers of those records will be held to higher standards of conduct HIPPA SOx Federal and State Mandates Case Law Higher Standards
As defined in RCW 40.14 ANY records that have been made by or received by any agency of the state of Washington in connection with the transaction of public business WA Public Records Laws
Any destruction of official public records shall be pursuant to a schedule approved under RCW 40.14 Why?... The foundation of democracy in America is government accountability to the people Records Retention
So the question becomes… who takes care of the records, and do they have the knowledge?
Historically records sent to file room, staff maintained access to records and managed lifecycle based on need and legal requirements Now records are managed by users and IT staff, based on capacity and cost Neither trained in the ‘science of information management’ Caretakers of Information
Comply with statutory & regulatory mandates. The Law requires preservation of certain public records – it doesn’t specify whether those records are paper or electronic. All records must be given the same care. Avoid loss of legal & historical records As technology changes, the older media (5 ¼” floppy disks, for instance) become harder to read. Centralize Records Centralization means uniformity in maintenance ‘Trained professionals’ serve as caretakers Preserve rare and ‘at-risk’ paper records Improved access for citizens By centralizing historical electronic records in one location, ‘one-stop shopping’ will provide the information quicker and easier Why a Digital Archives?
Not mass storage for active business applications & data Not remote back-up for state & local government networks & data What the Digital Archives is not
The Digital Archives will: • Preserve electronic records with long-term legal, historical and/or fiscal significance • Assure platform-neutral retrieval 50, 100, or more years from now • Provide security back-up of certain permanent electronic legal records (courts, vital records, land records, etc.)
2001 Session – Legislative approval (SSB 6155, 2001-2003 Capital Budget) January – September 2002 – Building Programming January 2003 – Building construction begins September 2003 – ISB technology review October 2004- Grand Opening Q4 2006 – Full implementation Project History
Primary funding source - $1 surcharge Expenditures $14.5M joint use facility $1.5M technology acquisition $950,000 Software Development Ongoing budget of $2.1M/year Monies In and Out
Hardware Software Management Authenticity Requirements to E-Archive
File Room of the 21st century Capacity and Speed double every 18 months Many choices Tape Optical Spinning Disc First Immutable Law of Digital Archiving “What hardware you use today will be obsolete within four years” Hardware
Network – Cisco Backbone end to end LAN and SAN EMC – SAN storage 5 TB now, 20TB by end of Year HP – Servers and desktops ADIC – Tape Library for offsite, disaster recovery Microsoft – Software and Development w/EDS Digital Archives Hardware
Native ASCII TIF PDF/A XML Whenever possible seek the Open, documented solution! Remember WordStar and DBase II ??? Archival Software Formats
Digital Archives Multi-pronged approach: Stored as BLOBs in DB with metadata: Maintain native format, wrapped Create open file format version Render XML formatted version, wrapped Acquire original hardware and software File Formats
Essential to maintain control of the information explosion Allows hard coded rules and information exchange BUT still requires a strong knowledge, understanding and implementation of basic records management Second Immutable Law of Digital Archiving: “Data is Data, a Record is a Record, It is the content that drives retention, not the media” Content Management
Not true CM but rather archival storage and retrieval DoD 5015.2-STD compliant system Wrap original file in native format Wrap XML copy Apply metadata & XML for indexing, searching & retrieval Provide chain of custody & authenticity ‘Content Management’
Microsoft Solution Custom Coded .Net front end SQL Server back end BizTalk translation utility SSH Tectia for secure transport ‘Content Management’
Maintain Chain of Custody In the care of trusted 3rd party Received from trusted, known source Authenticity
Encrypted SSH FTP transmission Issue Digital Certificate Verify IP and computer information MD5 Hash on all original files Copy of FTP on tape prior to ingestion DB backups on tape Record Level Security for confidential Info Data Security
FTPUpload Date="8/23/2005 9:13:05 AM" NTUserName="temp" Domain="CRISPLUS" SFTPUserName="FranklinCoAuditor" HostInformation WindowsVersion="Microsoft Windows NT 5.0.2195.0" CPU ID="x86 Family 15 Model 2 Stepping 9, GenuineIntel" Level="15" Local Area Connection: Connection-specific DNS Suffix . : annex.co.franklin.wa.us Description . . . . . . . . . . . : Intel(R) PRO/100 VE Physical Address. . . . . . . . . : 00-0D-60-3C-22-34 DHCP Enabled. . . . : Yes Autoconfiguration Enabled . . . . : Yes IP Address. . . . . . . . . . . . : 172.30.7.39 Subnet Mask . . . . . . . . . . . : 255.255.255.0 DNS Servers . . . . . . . . . . . : 172.30.7.2, 198.239.73.3 Primary WINS Server . . . . . . . : 172.30.7.2 Secondary WINS Server . . . . . . : 198.239.73.3 FTP Fingerprint
Restrict records at item, field or series level Restrict to individual, dept, office or global Uses authenticated login to reveal fields Anonymous users see ‘Restricted’ Record Level Security
Restricted Record Confidential
MUST be flexible No Mandate and 3300 agencies Microsoft BizTalk 2004 Transforms, adds metadata based on business rules Creates ‘deep storage’ copy wrapping original file in XML, with Hash Creates ‘web’ version of original file Ingestion Process
BizTalk Predefined Pipelines fname firstname First_Name Fst_name first Jun-07-05 07-Jun-05 06/07/2005 06/07/05 06/07/2005
Deep Storage XML Schema • Record Common • Who • What • When • Where • Original File • ‘web’ file • Security • Fixity • Vital Records • Type • Birth • Date of • Father, Mother • Hospital
Designed around latest industry standards Open source, non-proprietary file storage Applies metadata ‘tags’ to save information about record creator, date, agency, subject, etc. Provides chain of custody & authenticity of record Allow search and retrieval of archival records through a web page Archive Database
Web Design Wire Frame www.digitalarchives.wa.gov
Requires authenticated log-in Allows viewing of confidential information E-Transmittal process Viewing of open orders Admin Pages
Avg over 300 visits per day Avg length of stay 9 minutes 6% .gov - 4% .edu - 1% .org 13% came from Internet Search (Google, MSN, Yahoo) Visitors from: Canada, US Military, Romania, Germany, France, Australia, Japan, UK, Netherlands, Russia, Thailand, Portugal, Belgium, Poland, Italy, Indonesia, Singapore, Sweden, Mexico, New Zealand, Czech Republic, Hungary, Brazil, Norway, Columbia, Austria, Greece, Bulgaria, China, Yugoslavia, Philippines, Spain, South Korea, Denmark, Oman, Pakistan, South Africa, Jamaica, Switzerland Who’s Visiting???
Distributed, non-standardized environment No mandate to use Digital Archives Limited technology expertise in some agencies Unpredictable data growth rate Few business models Emerging technologies Limited internal expertise Risks
Authenticity of record Metadata File naming conventions Corporate Culture Start small with e-mail, web page Use existing retention schedules Educate Shift AWAY from desktops Management Software is a must! Privacy of sensitive data Management Issues
“Anything that you do today, will need major overhaul in two years” Technology and industry changing at unprecedented rates… But, more records are ‘lost’ every day! Key is to be flexible and attack with forethought Third Immutable Law
Digital Archives Eastern Washington University, Cheney, Washington Adam Jansen Digital Archivist ajansen@secstate.wa.gov
Uses SSH Tectia client 128 Bit Encryption Ease of use Minimal user interaction/Intervention Simple notification XML log file output Digital Footprint Custom FTP Configuration
Minimal Notification Minimal User interaction Ease of understand of Notification Quick notification of errors. Ease of Cleanup of sent files. Notifications