[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Omaha.pm] lines2perl: partition.pl



Paul Johnson wrote:
On Fri, Apr 28, 2006 at 06:47:27PM -0500, Jay Hannah wrote:
Any chance you've got some magic up your sleeve which can prune my GEDCOM to only those people related to me? Some magical concoction of toolsets? Thanks!

I don't think there's anything built in (he says without looking), but
there are ancestors and descendents methods which could be used to
create a closure in a fairly inefficient way.  Or parents, children and
siblings methods with could probably do the job more efficiently.  Or
I'll bet there's another useful LifeLines script out there somewhere you
could translate :)

Oooo...

ftp://ftp.cac.psu.edu/pub/genealogy/lines/reports/INDEX.html

partition
  Jim Eggert
  Version 8, March 31, 1995
  Requires: LifeLines 2.3.3 or higher
  This program partitions individuals in a database into disjoint partitions. Each partition is composed of people related by one or more multiples of the following relations: parent, sibling, child, spouse. There is no known relationship between people in different partitions. The partitions are written to the report in overview form, full form, or in GEDCOM form, with the partitions delimited by a long line. You will have to edit the GEDCOM output to divide it up into its constituent files to be able to import the GEDCOM back into any application.

Sounds perfect. So I used lines2perl to create a Perl script and went for it -- all partitions for my entire GEDCOM. 8 HOURS later it still wasn't done. Amazing. My GEDCOM is only 1.5 MB (4000ish people), how can it possibly got for 8 hours?

So I restarted, trying to get just MY relatives:

$ time ./partition.pl -gedcom_file jay.ged
reading.................................................................
Enter a person for just one partition, nothing for all partitions: I0313
Enter 0 for overview, 1 for full, 2 for GEDCOM report: 2
Enter filename for GEDCOM partition: new.ged

1: 1 5 17 3    167225824
         I225   Shirl                         24 Jul 2006
55 63 69 125 125 174 178 184 5    167225824
         I2     Helen FORD                     1 Jan 1912    1 Jan 1998
464 464 478 793 793 843 845 1062 7    153290844
         I513   William A. SETON               9 Jan 1861    9 Jul 1866
1519 1519 1595 1598 1804 1809 1809
That's burned 6.5 CPU HOURS so far. Wow. Still using < 50MB of RAM.

$ ps -ef | grep perl
jhannah  10280 10014 99 09:22 pts/1    06:31:15 /usr/bin/perl -w ./partition.pl -gedcom_file jay.ged
$ ps l
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000 10280 10014  25   0  49576 47812 -      R+   pts/1    391:17 /usr/bin/perl -w ./partition.pl -gedcom_file jay.ged

Even top is amazed by this process:

top - 16:06:19 up 5 days, 18:19,  1 user,  load average: 1.00, 1.00, 1.00
Tasks:  77 total,   3 running,  74 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.7% us,  0.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.3% si
Mem:    256968k total,   244480k used,    12488k free,    41020k buffers
Swap:   262136k total,     2192k used,   259944k free,    84036k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10280 jhannah 25 0 49576 46m 1812 R 99.8 18.6 402:37.17 partition.pl
:)

This project has forced me to finally learn to use GNU screen. Way cool. I love discovering awesome tools that stopped being developed in 1994. :)

I wonder if the native LifeLines script would be fast?

j