I inherited the challenge of committing huge quantities of XML web content (describing published articles and their content) as Drupal content on a new website. The client, incidentally, had no resource to manually rewrite the content so I had no choice but to convert the data.
Each piece of content needed to be linked to relevant content types, which varied depending on their association.
There was some existing functionality for pulling in one ‘article’ at a time but not multiple ones, nor rectifying any formatting issues with the XML structure, such as carriage returns around `CDATA` tags.
In order to solve these problems, I modified how the default functionality of the PHP `SimpleXMLElement` function was operating. I took this approach so that it would automatically filter/format the data that is imported, no matter the junk of whitespace and/or carriage returns in the data they submit.
The data structures would then be iterated over for each respective level, including ‘articles’, ‘volumes’ and ‘journals’. These would then be fed into a set of object handlers which would feed the data and construct it into Drupal content that would be laid out correctly and be accessible, just like any normal, manually created Drupal content.