onsdag 19. november 2008

Don't be simple - xml

Recently, I'm working with Metamod, a metadata database-application, which can be seen among others on the Damocles website. The metadata of scientifical data is nowadays usually expressed in xml, e.g. DIF or ISO19115. There are several ways to work with xml-files, the best known are maybe DOM, SAX or StAX, often connected to XPath. These come with a lot of commands, implemented in different languages. It is quite of a learning curve to process xml-data, and this is maybe a reason why a lot of people still believe that "just use ASCII" is much better.

Help is coming along by modules like XML::Simple (Perl) or SimpleXML (PHP). These integrate xml nicely into their respective language, converting it to a perl-structure or a php-class. These modules have their right to exist when somebody needs to parse an xml-document written by somebody else and he doesn't want to learn about XML.

Whenever I had to work with XML, it started with something very simple, a good case for the simple modules. But shortly after I have to extend the XML, start with namespaces, modify the original file or something else. And that's where the simple modules fail. The author of XML::Simple has recognized the same problem and written an article how to step up from XML::Simple to DOM/XPATH.

The problem with the "simple modules seems to be old , quoting Einstein: Make everything as simple as possible, but not simpler.