[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Omaha.pm] Suggested XML modules...
On Sun, Nov 30, 2008 at 3:26 PM, Dan Linder <dan@linder.org> wrote:
> I was looking at it a bit because our XML files have the potential to get
> quite large (>50GB dumps). On the other hand, the day-to-day files should
> stay quite manageable (between 100K to 10M), so XML::Twig's ability to
> process only a portion of an XML file might be overkill.
Just a quick note of warning, it can be very surprising how much RAM
is required for processing XML documents as they get large. Loading
the entire document into memory has a way of balooning really fast.
We ran into some issues with that on a project at my previous
employer.
As noted in the Perl XML FAQ:
"The memory requirements of a tree based parser can be surprisingly
high. Because each node in the tree needs to keep track of links to
ancestor, sibling and child nodes, the memory required to build a tree
can easily reach 10-30 times the size of the source document. You
probably don't need to worry about that though unless your documents
are multi-megabytes (or you're running on lower spec hardware)."
We had a couple of XML files that were under 10MB and they were
causing memory usage of nearly 500MB in the initial version of the
processing application.
> Dan
--
Christopher