The Versus Comparison Framework Kenton McHenry, Ph.D. Research Scientist National Center for Supercomputing Applications
The Problem • The abundance of file formats is a problem when preserving electronic records • Why? • Will there be software to load the file in the future? • If not will the specification for the format still exist? • Was the specification ever available to begin with (closed/proprietary formats)?
*.pdf (*.prc, *.u3d) *.k3d *.ma, *.mb, *.mp *.w3d *.lwo *.blend *.iam *.max, *.3ds *.c4d *.dwg *.vtk, *.vtp *.skp
Converting Formats • In order to preserve content for future use one option is to convert the file to an open/standardized format that is likely to be supported for some time. • Store both this file and the original for provenance • Ideally with one file format for a particular content type it will be easy for users to view/use the data.
Converting Formats (continued…) • How and which format!? • Fully supporting the many available formats is an enormous undertaking • If a file format is closed/proprietary it may be difficult to retrieve the data directly from the file • May be possible to reverse engineer and recover some of the content • Vendor file formats sometimes store application feature specific pieces of information that’s not supported in other formats • Examples include: animations, physics, … • When converting to a format that doesn’t have a place for such information we must drop it. • Information loss…
Converting Formats (continued…) • There are different ways of storing the 3D content itself • Faceted: • Comprised of vertices and faces • Popular within the graphics community • Boundary Representation: • Comprised of vertices, edges, edge loops, and primitive surfaces • Popular among CAD users • Constructive Solid Geometry • Comprised of boolean operations on primitive volumes • …
Converting Formats (continued…) • Translating geometry representation may not be trivial • B-Rep to Faceted • Translating involves triangulating the surfaces created from the bounded primitives (tesselation) • The resulting sampled surface will suffer from aliasing at high viewing resolutions • Can accommodate by performing a finer triangulation (i.e. more triangles and a larger file) • Faceted to B-Rep: • Translating in this direction is non-trivial! • How does one decide if a group of triangles should be grouped together as part of some larger primitive (e.g. part of a cylinder).
Converting Formats (continued…) • How do we measure information loss? • Which is the best format to use for preservation?
NCSA Polyglot (2009) • Conversions service based on utilizing any and all available 3rd party software • Imposed Code Reuse: Re-attaching a programmable interface to compiled software. • Scripted operations within software • GUI scripting (e.g. AutoHotKey) • Created a simple workflow referred to as an Input/Output Graph • Compared files before/after conversion to measure information loss • Distributed across multiple machines • Web access
ISDA File Migration Tools • Conversion Software Registry • Software Servers • Polyglot • Versus
Software that can Convert between Formats • There is a lot of software available, each with its own unique capabilities • A lot of it is not free • It would be expensive to buy a package just to check if it truly is capable of converting between a desired pair of formats • How can someone know what software to get for their needs? http://isda.ncsa.illinois.edu/NARA/CSR
Adobe 3D Reviewer The Conversion Software Registry
Adobe 3D Reviewer Input/Output Graphs
Input/Output Graphs 3DS Max Adobe 3D Reviewer AutoCAD Blender Cinema 4D K-3D LightWave 3D Maya Wings 3D
Input/Output Graphs Shortest conversion path
Software Server • To program against software • i.e. to write new code that can utilize functionality within arbitrary software, compiled code, where the source code is probably not available. • Imposed Code Reuse (or Software Reuse): The process of attaching an API like interface to software so that its functionality can be called within new code.
Software Servers • Shares the functionality of software over the web • In contrast to services which share data: ftpd, nfsd, sambad, httpd • Similar to services such as: telnetd, sshd, VNC, rdesktop • The main difference is in the interface: • Uniform across all software http://host:8182/software/<Application>/<Task>/<Output Format>/<InputFile> • Simple • Widely accessible • Capable of being programmed against • Allows any desktop application to become a cloud based web service*
Software • 3D Studio Max • Adobe 3D Reviewer • Google Sketchup • Wings 3D • Blender • K3D • Paraview • VTK • CyberwarePLYTool • NIST X3D Tool • Imagemagick • IrfanView • GIMP • Microsfot Paint • Excel 2010 • Word 2010 • Power Point 2010 • Publisher 2010 • One Note • Access 2007 • Wordpad • Notepad • Calculator2010 • Internet Explorer 8 • Winzip • Photoshop CS5 • Adobe Acrobat • ABBYY
Polyglot • Listens for Software Server broadcasts on the network • Catalogues available input/output operations and constructs and I/O-graph • Identifies conversion paths between input and output formats • Carries out CHAINED conversions
Versus • Java library/framework for comparing file content • Distributed architecture • RESTful Web Interface • http://<host>/versus/comparisons • dataset1, dataset2 • adapter, extractor, measure
Measuring Information Loss We would like to assign a value to each conversion edge … • With a “universal” converter we could convert files from every format A to every other format B • Assuming we then had a loader for both format A and format B we could load and compare the 3D content independent of how it is stored.
Measuring 3D Information Loss • Adobe 3D Reviewer • Blender • CyberwarePlyTool • K-3D • NIST VRML/X3D • VTK not so good… (e.g. 0.1) good… (e.g. 1.0)
Measuring 3D Information Loss • Data representation • Meshes • Loaders • Use 3D similarity as a means of comparing 3D models • Statistics • Surface Area [Brunnermeier, RTI 1999] • Spin Images [Johnson, PAMI 1999] • Light Fields [Chen, Eurographics 2003]
Statistics • Use the mean and standard deviation of the vertices to represent the model • Simple but fast to compute • Sensitive to size and orientation of the model
Surface Area • Use the sum of face areas to represent the model • Also simple and fast to compute • Sensitive to size, somewhat sensitive to shape. Will detect loss of faces.
Light Fields [Chen, 2003] • Compares silhouettes from various viewing angles around a model.
Light Fields • Fairly fast to compute • Sensitive to shape of convex hull, invariant to rigid transformations
Spin Images [Johnson, 1999] • 2D histograms of the in plane and out of plane distances of vertices neighboring a given vertex. N q b a p