IO performance

IO performance Lessons

Disclaimer • I won’t be mentioning good things here. • In hindsight things look obvious • No plan survives the first data intact

Design problems • T/P is way too complex for an average HEP physicist turned programmer • This lead to copy/paste approach • Even good documentation can’t help that • Code bloating • Very difficult to remove obsolete persistent classes/converters • Tools needed added late: custom compressions/DataPool/ only now working on error matrix compression • No unit tests infrastructure • Should have had a way to create “full” object • Should have forced loopback t-p-t • No central place to control what’s written out • No tools for automatic code generation • Fabrizio spent 3 months just fixing part of Trigger classes

management problems • At least some performance tests should have been done before full system deployment. At least to understand what affects performance. • Trying to understand changes in what was written out after the fact is not the way things should go. • Way too many tools (ara, mana, event loop …) • No real support • Code bloat • Opportunity cost • Still there is no one good tool • that people would be happy to use • would provide us possibility to monitor and optimize • to start thinking about analysis meta data storage and access one year after data taking started is a bit late.

management problems • No recommended way to do analysis • Waiting to see what people would be doing is not the best idea. People can’t know if their way will scale or not. • We can’t test all approaches and surely can’t optimize sites for all of them Group production AOD G Grid Local CPU bound IO bound G D3PDmaker NTUPLE G L D3PD G Skim/slim download PROOF based Analysis L LG Simple ROOT Analysis

DPD problems • Having thousands of simple variables in DPD files is just so … not OO. • DPDs are expensive to produce • Train based production will alleviate problem • If train production will not use tag db, tag db should be dropped. • Probably way too large – and difficult to overhaul. If we have problems finding out if AOD collection is used or not, problem is 10 times bigger with dpds • Too small • Should/could be merged using latest ROOT version • Even worse with skimmed/slimmed. No simple grid based merge tool? • No single, generally used framework to use them • In half a year it will be way to late to start thinking about all of this as people will be already used to their hacked together but working tools.

DPD problems • Current situation • local disk • ROOT 5.28.00e • A lot of space for improvement! • Improvements to come • Proper basket size optimization • Too many reads with TTC • Multi-tree TTC • Now so much better that we are getting HDD seeks limited even for 100% of data read • Two jobs or one read/write job brings CPU efficiency down to unacceptable level • Calculations typically done in analysis jobs wont hide disk latency on 4 or 8 core CPU • Needs better file organization • Even reordering files would have sense

NO efficiency feedback • Up to now we had no resource contention. That’s changing. • People would not mind running faster and more efficiently, but have no idea how good/bad they are and what to change. • Will someone see the effect of not turning TTC in grid based job? Not likely. Consequently won’t turn it on. • I don’t know how to optimally split task. Do you? • Can be changed relatively easily: once a week send a mail to all people that used grid on what they consumed, how efficient they were and what they can do to improve.

My 2 cents • If one core is not 100% used when reading fully optimized file why bother with multicore things? • Due to very low “real” information density in production DPDs any analysis/slimming/skimming scheme is bound to be very inefficient. • A user that have done at least one round of slim/skim can opt for a map-reduce approach in form of proof. • But proof not an option for really large scale – production DPDs. • Thinking big – SkimSlimService • Organize large scale map-reduce at a set of sites capable of having all of the production DPDs on disk. • Has to return (register) produced skimmed/slimmed dataset in under 5 min. Limit size of the returned DS. • That eliminates 50% of all grid jobs. • Makes users produce 100 variable instead of 600 variable slims. • Relieves them of thinking about efficiency and gives result in 5 min instead of 2 days. • We don’t distribute DPDs. • We do optimizations.

IO performance

IO performance

Presentation Transcript

Async IO, Non Blocking IO, Blocking IO and Multithreading

IO Performance

IO revisited

IO Streams

9. Media IO- Publisher IO

IO data

‘Io

Aud io

IO Multiplexing

File IO

Polled IO versus Interrupt Driven IO

NET IO

Data size and IO performance report

io games

IO module

ADABAS Performance Thruput, CPU, IO checklist