90 likes | 107 Vues
Review of the DPP prototype program testing with EST data, detailing input/output structures, threading process, and processing steps.
 
                
                E N D
EOVSA EST DPP Testing J. McTiernan EOVSA Prototype Review 24-Sep-2012
Testing the DPP with EST data: • The DPP is a multi-threaded process, using Open MP, written in C and FORTRAN. Input is correlator data, (state frame addition TBD). Output are visibility datasets in MIRIAD format. The different processing steps are run simultaneously, but can be thought of in sequence. Here is how the DPP test program for EST udp data (DPP_test_est12) works. • Setup: There are four FORTRAN common block/C global external structures: • Dppcorin holds the input packet data • Pkt_register_in holds packet header information (packet number, accumulation number and packet address in the packet buffer Dppcorin.) • Pkt_register holds packet info, sorted by packet number(0-8) and accumulation number. • Procflags holds flags and pointers for buffer and packet buffer processing
Packet Buffer: N=NBUF*8 packets Pkt1 Pkt2 Pkt3 Pkt4 Pkt5 Pkt6 PktN …….. Threads 1 and 2) Input, two streams of packet data. The streams are created from an IDL process that reads in one file, randomly decides whether a packet goes into one or another output files. These files are then used as inputs to the DPP_test_est12 program. The C routine est_read1 inputs packets from one file, est_read2 inputs from the other. Data from est_read1 is stored in the "top half", i.e., from 1 to N/2, data from est_read2 is stored in the "Bottom half", from N/2+1 to N. (This may change to even-odd in the future.) Packet header information is stored in another common block "pkt_register_in" in the same order as the input packets. Inputs are timed so that approximately 16 msec elapses for 1 accumulation. There is a flag associated with each position in pkt_register_in; it starts at value 255, is set to zero when a position is filled. When each half-buffer is filled, the processes loop back to the start.
Pkt_register_in: Pkt3 Pkt4 Pkt6 Pkt2 Pkt5 Pkt7 Pkt8 Pkt1 … Thread 3) The subroutine dpp_fill_register orders the packets. It reads the pkt_register_in common block, starting with positons 1 and N/2+1, in input order, and the fills another common block, the "pkt_register" with packet information in order (i.e., for each accumulation number in order, packets 1 to 8 in order). There is a wait time associated with this step, (currently 100 msec). If a flag in the pkt_register_in does not go to zero during this time, then we move on.) There is a flag associated with each position in pkt_register; it starts at value 255, is set to zero when a position is filled. Also the flag for the packet_register_in is reset from 0 to 1, to denote that that packet has been ordered. For EOVSA, the state frame interpretation, assigning times and UV values to accumulation numbers could be included here. Once we have gone through the pkt_register_in buffer, we loop back to the start, and go again.
Pkt_register_in to PKT_register: Input 1: Input 2: … Pkt1 Pkt3 Pkt4 Pkt6 Pkt2 Pkt5 Pkt7 Pkt8 Pkt1 Pkt2 Pkt3 Pkt4 Pkt5 Pkt6 Pkt7 Pkt8
Pkt_register: Pkt2 Pkt3 Pkt4 Pkt5 Pkt6 Pkt7 Pkt8 Pkt1 Thread 4) Now we have ordered packets, the next process, dpp_process_1, loops through the pkt_register and checks the flags for each accumulation, when all of the flags for an accumulation are zero, the packet addresses (The position in the packet buffer of the first 4-byte integer in the packet) are saved to an array. There is a wait time (currently 1 second) associated with this step. If the flags aren't all zero after this time, then we move to the next accumulation. Otherwise, we send the packet addresses to the subroutine DPP_PROCESS_DATAFRAME, G. Hurford's program that processes 1 accumulation. There is another flag associated with this processing. Starting at 255, if the process is successful, then the flag is set to 1. Otherwise the flag is set to 255.
Flag Buffer: Size NBUF 1 2 3 4 5 6 Acc NBUF NBUF Pkt1 …….. Thread 5) The next process, dpp_process_2, loops through the process flag array and checks for a flag value of 1. There is a wait time (currently 1 second) associated with this step; if the flag isn't one after this time, then we move to the next accumulation. Otherwise, we call DPP_PROCESS_SPECTRAFRAME, G. Hurford's program that (currently) processes 1 accumulation, (but for EOVSA operations will process 1 second of data). If the process is successful, then the flag is set to 2. Otherwise the flag is set to 255. For EST data, the output program est_data_write, which calls MIRIAD routines to open and output the data, is called in this step. For EOVSA this will be farmed out to a separate process.
Testing the DPP with EST data: • This process currently works. At least it outputs uvlist-able output, and amplitudes and phases correspond to what is expected from examining the visibility output into text files. • When multi-threading, it turns out that flagging is not sufficient to avoid segmentation faults. So in addition to a flag value, each process gets a pointer, to insure that the following process never catches up. • There is no checking to see if processing of a given packet is finished before overwriting its position in the packet buffer. It's assumed that processing 20 msec of data will take less than 20 msec (hopefully much less). We can split processes if necessary; i.e., one process could do the even buffer positions, one odd, or 3 processes, where one does every third buffer position, etc… • Each process has a done_flag, so that everybody can know when to stop. Each process checks the flag for the process before it before starting a new loop through the buffers. Processing ends when the last process has gone through the buffers twice after the previous process ends.
Hardware? • The final version should have 7 threads, so a large number of processors is a good idea. • The current version runs well on an 8-processor machine, (2 quad-core processors) but significant slowness occurred when expanding from 4 to 5 threads. We are guessing that is due to the fact that one of the quad-core processors was running system tasks, but really have no clue. • In any case, we would like at least a 16 processor system (2 octo-core) with 32 to 64 Gbytes of RAM. (Note that my current desktop with 8 processors and 32 Gb RAM was $5000 in 2009).