Archive

Archive for the ‘XML’ Category

XML documents should not be large

August 4th, 2004

XML.com: Decomposition, Process, Recomposition :-

So my first line of advice has always been: don’t go processing gigabyte files in XML formats. I have been working with XML for about 8 years. I have used XML in numerous ways with numerous tools for numerous purposes. I have never come across a natural need for an XML file more than a few megabytes. There may be terabytes of raw data in a system, but if you’re following XML (and classic data design) best practices, this should be structured into a number of files in some sensible way.

I’m saving this hear so I can remember where it is when one of my customers complains about poor performance or memory issues with the Oracle XDK.

It happens about once a month, and 9 times out of 10 they have huge XML files. The problem is that they don’t think they have a huge XML file and think 1Gb is a reasonable size, and give spurious arguments like “a powerful database like Oracle should be able to handle this much data”.

I have always told them that 1Mb is a more sensible limit, but have never been able to find a solid link to back myself up.

Paul Oracle XDK, XML

XSLProcessor.setParam

June 1st, 2004

The XSLProcessor.setParam(uri, name, value) allows you so set the value of a top-level stylesheet parameter.

The uri parameter is the namespace URI of the paramater name, which by the XSLT 1.0 specification is a qualified name.

That means that you can have:

    <xsl:stylesheet xmlns:foo="urn:foo" xmlns:bar="http://bar.com/params" ...>
    <xsl:param name="foo:param1"/>
    <xsl:param name="bar:param2"/>
    <xsl:param name="param3"/>

so the parameter names can be namespace-qualified, where the namespace prefix is a shortcut syntax for referring to the fully-specified namespace URI.

So, to set values for all three of these parameters above, you would do:

    xslProc.setParam("urn:foo","param1","'val1'");
    xslProc.setParam("http://bar.com/params","param2","'val2'");
    xslProc.setParam("","param3","'val3'");

The parameter value is expected to be a valid XPath expression (note that string literal values would therefore have to be explicitly quoted)

[Gleaned from mailing list email from Steve Muench and documentation]

Paul Oracle, XSLT Processing

Latest XML.com articles

June 12th, 2003
Comments Off

Shortening XSLT Stylesheets [Jun. 11, 2003]: “To avoid hurting your brain, avoid cascaded xsl:import statements.”

“XML Data Bindings in Python”:http://www.xml.com/pub/a/2003/06/11/py-xml.html looks at generateDS.py, “A tool for generating Python data structures from XML Schema”

“Regular Expression Matching in XSLT 2″:http://www.xml.com/pub/a/2003/06/04/tr.html “The parsing power that regular expressions add to XSLT lets you output XML with more value than your input XML, because XML that identifies data at a finer-grained level is XML that you can do more with.” Apparently Regular Expressions are quite useful, I really must sit down for a while and learn how to use them properly (Something I’ve been saying for a couple of years now).

Paul XML

xmltramp

May 16th, 2003
Comments Off

xmltramp: Make XML documents easily accessible.

Nice and easy XML handling in Python. From ??Aaron Swartz??

Paul Python, XML

Safe HTML Comment checker

February 25th, 2003
Comments Off

??Simon Willison?? has enabled a subset of HTML in his comments by writing his own XML parser to check validity

bq. The system I have implemented works by running submitted posts through an XML parser, which checks that each element is in my list of allowed elements, is nested correctly (you can’t put a blockquote inside a p for example) and doesn’t have any illegal attributes.

This post was brought to you by the numbers 2, 6 and 3 and the plugin “Textile (Brad Choate: MT-Textile)”:http://www.bradchoate.com/past/mttextile.php

Paul XML

Purge Completed Todo Utility

February 21st, 2003
Comments Off

In Simple XML Processing With elementtree, Uche Ogbuji claimed that “elementtree is fast, pythonic and very simple to use”. I decided to put it to the test by writing something I’ve been meaning to do for a while, writing a script to purge completed entries from the Zaurus ToDo application. I wasn’t let down.

Maybe I’m blind but I just couldn’t find a way of bulk removing entries using the limited GUI interface of the application, and I was building up hundreds of completed entries which must slow down processing over time. Luckily help was at hand because Sharp, in their wisdom, decided to store its data in XML documents.

Here is an example of todolist.xml

<!DOCTYPE Tasks>
<Tasks>
<RIDMax>
223
</RIDMax>
 <Task Completed="1" HasDate="0" Priority="3"
  Categories="-1042017918" Description="make a cup of tea"
  Uid="-1045778896" rid="221" rinfo="0"/>
 <Task Completed="0" HasDate="0" Priority="2"
  Categories="-1044048088" Description="write photolog plugin"
  Uid="-1045038462" rid="153" rinfo="0"/>
</Tasks>

The Task element has the attibute of Completed; 1 for completed, 0 for uncompleted. It should be a simple task to remove all the completed Tasks.

import sys
from elementtree.ElementTree import ElementTree, Element
tasks = ElementTree(file="/home/root/Applications/todolist/todolist.xml")
tree = tasks.getroot()
iter = tasks.getiterator()
for task in iter:
  if task.get("Completed") == "1":
    tree.remove(task)

f = open("/home/root/Applications/todolist/todolist.xml",'w')
tasks.write(f)
f.close()

There was just a couple of things to work out. One, you couldn’t remove from the iterator, so you have to use the getroot() function as well as the getiterator() function. Also, the examples looked like you could do

tasks.write("/home/root/Applications/todolist/todolist.xml")

but I couldn’t get this to work.

Note: you have to make sure that the todo applications isn’t “fastloaded” on the system, otherwise it will write over your changes with its in-memory copy.

Finally, for this to work on the Zaurus, you’ll need more than the standard Python IPK. You need python-xml, python-compress and python-codecs (all available from Riverbank). (You might also want python-devel to install modules but this isn’t completely necessary)

Paul Python, XML, Zaurus