1 / 51

Raw Data engine

Raw Data engine. Perl – a Generic Cure For a Perennial Problem. The Task. Get from a customer data files in his format + specs Create from these files data understood by ALEPH system. General Description. Input. Output. Manipulation. Input - Overview. Input. Record. Output.

jamar
Télécharger la présentation

Raw Data engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raw Data engine Perl – a Generic Cure For a Perennial Problem

  2. The Task Get from a customer data files in his format + specs Create from these files data understood by ALEPH system Perl – a Generic Cure For a Perennial Problem

  3. General Description Input Output Manipulation Perl – a Generic Cure For a Perennial Problem

  4. Input - Overview Input Record Output Manipulation Record Record Record Record Record Perl – a Generic Cure For a Perennial Problem

  5. Input Structure Input field,field, field, field, field, field Output Manipulation Record Record Record Record Record Perl – a Generic Cure For a Perennial Problem

  6. Input Variety Input field,field, field, field, field, field field;field;field;field;field;field f1 f2 f3 f4 f5 Perl – a Generic Cure For a Perennial Problem

  7. Input - Multiline Input Rec name value name value name value …………….. Rec name value name value name value …………….. Perl – a Generic Cure For a Perennial Problem

  8. Make Input Uniform All input into a hash: %input_hash; $input_hash{f1} = f1_val; $input_hash{f2} = f2_val; Perl – a Generic Cure For a Perennial Problem

  9. Repeatable Fields 32137001535410 I0010.. 32137001535410 I0020.. 005 I0050.. 1992-07-29 I0060.. TB55 I0250.. 0004688683 I0300.. PERP I0330.. PERP I0360.. WCC3 I0390.. $***0.00 I0250.. 0004688684 I0300.. PERP I0330.. PERP I0360.. WCC3 I0390.. $***5.00 Perl – a Generic Cure For a Perennial Problem

  10. It’s a Perl Hash Some of %input hash entries become pointers to lists: x \%input_hash 0 HASH(0x1a7e8f0) 'I0010' => 32137001535410 'I0020' => 005 'I0050' => '1992-07-29' 'I0060' => 'TB55' 'I0250' => ARRAY(0x1c3e990) 0 0004688683 1 0004688684 'I0300' => ARRAY(0x1c3e96c) 0 'PERP' 1 'PERP' 'I0330' => ARRAY(0x1d40328) 0 'PERP' 1 'PERP' 'I0360' => ARRAY(0x1d4031c) 0 'WCC3' 1 'WCC3' 'I0390' => ARRAY(0x1d40388) 0 '$***0.00' 1 '$***5.00' Perl – a Generic Cure For a Perennial Problem

  11. Some Code… ## no value yet – simple scalar unless (defined $input_hash{$current_name}) { $input_hash{$current_name} = $value;}## already value for key else {## already list – just push if (ref $input_hash{$current_name}) {push @{$input_hash{$current_name}}, $value; }## only scalar – make list with existing + new else {my @lst = ($input_hash{$current_name}, $value); $input_hash{$current_name}= \@lst; } Perl – a Generic Cure For a Perennial Problem

  12. Output Let’s look at one typical output: • Each record is one simple string • Each field has a fixed width (Typical for file to be imported into DB) Perl – a Generic Cure For a Perennial Problem

  13. Output – Needed Structures To build a correct output record we need: • The values – value per field • Their correct order • Their correct format (I.e. witdh per field) Perl – a Generic Cure For a Perennial Problem

  14. Output Info Source How do we know what the output record looks like? • From DB ‘describe’ of relevant table • From Data Definition file (COBOL copy, C h file etc) • From some special purpose definition file Perl – a Generic Cure For a Perennial Problem

  15. Output From COBOL Copy 01 Z71. 02 Z71-REC-KEY. 03 Z71-SEQUENCE-TYPE PICTURE X(1). 03 Z71-DOC-NUMBER PICTURE 9(9). 03 Z71-COPY-ORDER-SEQUENCE PICTURE 9(5). 03 Z71-REC-SEQUENCE PICTURE 9(15). 02 Z71-USER-NAME PICTURE X(10). 02 Z71-OPEN-DATE PICTURE 9(8). 02 Z71-OPEN-HOUR PICTURE 9(4). 02 Z71-OPEN-SECOND PICTURE 9(4). 02 Z71-ACTION-DATE PICTURE 9(8). 02 Z71-TYPE PICTURE X(2). 02 Z71-ALPHA PICTURE X(1). 02 Z71-TEXT PICTURE X(200). 02 Z71-DATA PICTURE X(2000). 02 Z71-TRIGGERED PICTURE X(1). Perl – a Generic Cure For a Perennial Problem

  16. Field Order @z71_cols_copy = ( 'z71_sequence_type', 'z71_doc_number', 'z71_copy_order_sequence', 'z71_rec_sequence', 'z71_user_name', 'z71_open_date', 'z71_open_hour', 'z71_open_second', 'z71_action_date', 'z71_type', 'z71_alpha', 'z71_text', 'z71_data', 'z71_triggered' ); Perl – a Generic Cure For a Perennial Problem

  17. Output format $z71_format_copy = '%-1s%09s%05s%015s %-10s%08s%04s%04s%08s%-2s%-1s%-200s%-2000s%-1s'; A regular Perl format for (s)print based on the COPY info • Each character string (X(n)) is left justified • Each number is right justified ans zero padded. Perl – a Generic Cure For a Perennial Problem

  18. Writing Correct Output Now suppose we have built %zhash with values corresponding to the Z71 copy. The following code writes out a record foreachmy $f (@z71_cols_copy) {push @out_vals, $$zhash{$f};} ##sprintf receives the format and a list $out_str = sprintf($z71_format_copy , @out_vals);print $out_str . "\n"; Perl – a Generic Cure For a Perennial Problem

  19. Making It Generic $current_cols = $ztable . “_cols_copy”; *cols = *$current_cols; $current_format = $ztable . “_format_copy”; *format = *current_format; foreachmy $f (@cols) {push @out_vals, $$zhash{$f};} ##sprintf receives the format and a list $out_str = sprintf($format , @out_vals);print $out_str . "\n"; Perl – a Generic Cure For a Perennial Problem

  20. Output Sum up To Create • Read all relevant Copy files • For each znn table create znn_cols_copy, znn_format_copy • Use Data::Dumper to store in zvars_copy.txt Perl – a Generic Cure For a Perennial Problem

  21. Output Sum up To use: • do “zvars_copy.txt” • establish current $ztable • use glob * to get correct cols and format • write output Perl – a Generic Cure For a Perennial Problem

  22. Input – Simplifying Variety Define to which type belongs current input: • Fields separated by delimiter – specify delimiter, name fields • Fields with fixed widths – specify names and widths • Record multiline – specify new record, field format • Others (cvs, Excel etc.) Perl – a Generic Cure For a Perennial Problem

  23. File Type Definitions auth_kul_ext type fixed auth_kul_ext names d1:0-1;lnk:2-19;d2:20-40;val:41-70 ser_copy_data type fixed ser_copy_data names CNUM:8;LOCATION:6; ser_copy_data names COPY_NUM:5;SUPPLIER:12; item_data type delimited:; item_data names num;price;date cash_data type tag_multiline:rec=14;sep=2;tag=5;sep=3;val=* acq_order_data type tag_multiline:delim=> acq_order_data new_rec **STORE Perl – a Generic Cure For a Perennial Problem

  24. Input Implementation • one module-object per input type • Each has the new constructor • Each has a process_infile function for actual input reading • A module ChooseInputFormat.pm selects the right object and return a pointer to it. • new just does some preparations based on the type • process_infile reads the actual file • for each input record – builds %input_hash and calls build_output() received as argument Perl – a Generic Cure For a Perennial Problem

  25. Reading Delimited • names: n1;n2;n3;… @lst = split /;/, $names; ##map name to its position $ind = 0; foreach (@lst) { $lst_names{$_} = $ind; $ind++; } @vals = split /$delim/, $_; foreachmy $name (keys %lst_names) { ## take value of name from correct position my $val = $vals[$lst_names{$name}]; $input_hash{$name} = $val;} Perl – a Generic Cure For a Perennial Problem

  26. Reading Fixed my @lst = split /\s*;\s*/, $names;my $field_num = 0; $line_pat = "";foreachmy $lst (@lst) {my ($name, $width) = split /:/, $lst; ## position per name $lst_name{$name} = $field_num++; ## unpack for reading $line_pat .= 'A' . $width . ' '; } my %input_hash = (); my @lst = unpack ($line_pat, $_);foreachmy $name (keys %lst_name) { $input_hash{$name} = $lst[$lst_name{$name}]; } Perl – a Generic Cure For a Perennial Problem

  27. From In To Out Manipulation Out File In File input_hash zhash ??? Perl – a Generic Cure For a Perennial Problem

  28. General Look Z30-DOC-NUMBER sequence[,1,9] Z30-ITEM-SEQUENCE sequence[docno,10,6] Z30-SUB-LIBRARY PERM_LOC[1:4][sub_lib_map] Z30-MATERIAL MATERIAL[material_map] Z30-ITEM-STATUS "09" Z30-OPEN-DATE INVENTORY_DATE[format_date] Z30-CALL-NO CALL_NO Z30-PRICE PRICE[decimal] Z30-NOTE-OPAC NOTE1 & “;” & NOTE2 Z30-NOTE-CIRCULATION NOTES[substring_on_word:200:1] Z30-NOTE-INTERNAL NOTES[substring_on_word:200:2] Perl – a Generic Cure For a Perennial Problem

  29. Building zhash – Syntax 1 Z30-ITEM-STATUS "09" Build Z30-ITEM-STATUS from literal “09” Z30-CALL-NO CALL_NO Build Z30-CALL-NO from input field CALL_NO as is Z30-SUB-LIBRARY PERM_LOC[1:4][sub_lib_map] Build Z30-SUB-LIBRARY from input field PERM_LOC, take first 4 characters, then map result using sub_lib_map Z30-PRICE PRICE[decimal] Build Z30-PRICE from input PRICE, extract just the decimal portion Perl – a Generic Cure For a Perennial Problem

  30. Building zhash – Syntax 2 Z30-NOTE-OPAC NOTE1 & “;” & NOTE2 To build Z30-NOTE-OPAC concatenate input NOTE1, then “;”, then input NOTE2 Z30-DOC-NUMBER sequence[,1,9] Build a sequence with delta 1, width 9, leading zeroes Z30-ITEM-SEQUENCE sequence[docno,10,6] Build a sequence with delta 10, width 6, leading zeroes, restart the sequence for each input docno with differen values. 3 input records with docno=500: 000010,000020,000030 2 input records with docno=732: 000010,000020 Perl – a Generic Cure For a Perennial Problem

  31. Implementation - 1 Implementing the functions is done in 2 levels: • Group type • Using modifiers Types: • Just a literal • An input field – no modifiers • An input field with modifiers(s) • Concatenation (Each element may include modifiers) • Various sequences Perl – a Generic Cure For a Perennial Problem

  32. Implementation - 2 For each output field we build an info hash, that includes: • the pointer to the function to call • the target • the source • other info – needed per specific function All hashes are pushed into a list Perl – a Generic Cure For a Perennial Problem

  33. The Functional Backbone $f = “ZbuildOutFuncs::'$func”; $handle_funcs[$func_ind]{func} = \&{$f}; $handle_funcs[$func_ind]{target} = $target; $handle_funcs[$func_ind]{source} = $source; $handle_funcs[$func_ind]{…} = …; Perl – a Generic Cure For a Perennial Problem

  34. Actually Building zhash foreachmy $handle (@handle_funcs) { my $res = $handle->{'func'}($handle, \%zhash, $input_hash); } Each function has something like: $target = $handle->{target}; ## What to build $source = $handle->{source}; ## How to build my $res; ………. evaluates… ## Build ………. $$zhash{$target} = $res; ## Prepare for output return $res; Perl – a Generic Cure For a Perennial Problem

  35. Getting Values sub get_gen_field_val { my ($my_input_hash, $column) = @_; my ($org_val, $field, $modifiers); if (defined $$input_hash{$column}) { $org_val = $$input_hash{$column}; return $org_val; } if ($column =~ /^_\w+$/) { $org_val = $$zhash{$column}; return $org_val; } ($org_val, $field, $modifiers) = analyze_complex($column); $org_val = get_modified_val($org_val, $modifiers); return $org_val; } Perl – a Generic Cure For a Perennial Problem

  36. Modifying a Value sub get_modified_val { my ($org_val, $modifiers) = @_; my $non_square = '([^\[\]]+)'; my @mods = $modifiers =~ /$non_square/go; foreach my $mod (@mods) { next if $mod =~ /^\s*$/; my ($type, $action) = $mod =~ /^(\w+):(.*)/; if ($type) { $org_val = $mod_functions{$type}->($org_val, $action); } else { exit_on_err(); } } return $org_val; } Perl – a Generic Cure For a Perennial Problem

  37. Simple Modifier - uppercase sub mod_uppercase {my ($val, $action) = @_;returnuc $val;} What if $val is list ?? Perl – a Generic Cure For a Perennial Problem

  38. Handling Lists sub handle_list {my ($val, $action, $func) = @_;my @val_lst;foreachmy $v (@$val) {my $res = $func->($v, $action);push @val_lst, $res; }return \@val_lst;} Perl – a Generic Cure For a Perennial Problem

  39. Modifier with List sub mod_uppercase {my ($val, $action) = @_;return handle_list($val, $action, \&mod_uppercase) ifref $val; returnuc $val;} Perl – a Generic Cure For a Perennial Problem

  40. regex Definition regex_check my_reg /\d{4}\D\d{4}/ Code sub build_regex_check { my ($name, $regex) = $_ =~ /(\w+)\s+(.*)/; my $code_src = "sub {my \$val=shift;my \$res=\$val=~$regex;return \$res}"; my $code = eval $code_src; } Perl – a Generic Cure For a Perennial Problem

  41. Running Index We need several running indexes. Maybe the easiest way to do it: sub build_running_index{ my $cnt = 0; return sub { return ++$cnt; } } Each instance: • starts from 0 • increments by 1 Perl – a Generic Cure For a Perennial Problem

  42. Using it $cnt1 = build_running_index();$cnt2 = build_running_index();$res1 = &$cnt1();$res1 = &$cnt1();$res2 = &$cnt2();$res1 = &$cnt1();print"c1: $res1 c2: $res2\n";sub build_running_index{ my $cnt = 0;returnsub {return ++$cnt; }}output: c1: 3 c2: 1 Perl – a Generic Cure For a Perennial Problem

  43. More General • Different start • Different delta sub build_own_index_gen{ my ($cnt, $delta) = @_;returnsub {return $cnt += $delta; }} Perl – a Generic Cure For a Perennial Problem

  44. Usage $cnt1 = build_own_index_gen(0, 3);$cnt2 = build_own_index_gen(5,10);$res1 = &$cnt1();$res2 = &$cnt2();$res1 = &$cnt1();$res1 = &$cnt1();print"c1: $res1 c2: $res2\n";sub build_own_index_gen{ my ($cnt, $delta) = @_;returnsub {return $cnt += $delta; }} output: c1: 9 c2: 15 Perl – a Generic Cure For a Perennial Problem

  45. Timer - 1 We need several timers We also need a total timer sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem

  46. Timer Timer for differences sub get_time_diff { my $tls = time(); return sub { my $tle = time(); my $tlt = $tle - $tls; $tls = $tle; return $tlt; } } Perl – a Generic Cure For a Perennial Problem

  47. Timer - Usage $t1 = get_time_diff(); $t2 = get_time_diff(); sleep(1); $r = $t1->(); $r1 = $t2->(); print “r: $r; r1: $r1\n"; sleep(1); $r = $t1->(); $r1 = $t2->(); print “r: $r1; r1: $r1\n"; Results: r: 1; r1: 1r: 1; r1: 1 Perl – a Generic Cure For a Perennial Problem

  48. Timer 2 Timer with total sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem

  49. Timer 2 - Usage $t1 = get_time_diff(); $t2 = get_time_diff(); sleep(1); $r = $t1->(); $r1 = $t2->(1); print “r: $r; r1: $r1\n"; sleep(1); $r = $t1->(); $r1 = $t2->(1); print “r: $r1; r1: $r1\n"; Results: r: 1; r1: 1r: 1; r1: 2 Perl – a Generic Cure For a Perennial Problem

  50. Timer We need several timers We also need a total timer sub get_time_diff { my $tls = time(); return sub { my $start = shift; my $tle = time(); my $tlt = $tle - $tls; $tls = $tle unless $start; return $tlt; } } Perl – a Generic Cure For a Perennial Problem

More Related