[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Omaha.pm] Sort quickie



See Jay's IP sorting routine below: Very cool :-)

Now, who's up for a discussion about whether it's a good idea to convert first and then sort, or leave the solution as given?

While this is a cheap, clear solution for small arrays, actually using this technique for large arrays with N elements, means that the conversions will need to be done O (N * ln N) times with recent version of perl (which use the heap sort).  Where if you convert first, then sort then convert back, the conversion only needs to be done 2*N times.  Using older versions of perl where the quick sort method was used, if the data was already sorted, then quick sort performs very badly, and the comparisons jump to O (N^2).  There is fascinating (to me anyway) discussion happening on the "perl-5-porters" email list, which can be found near the bottom of the page here:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2004-06/thrd2.html#00441
The really interesting portion of the thread starts where you see the subject "Short circuiting sort".

To give you a quick taste of the discussion; as you may or may not know, sort in scalar context (what it returns) is undefined.  Several options have been discussed on what it should return in scalar context. Some want the first element in the resulting sorted list, some want a truth value of whether it was already sorted, some want the number of swaps performed, others want a ratio of how sorted the list was, etc.  Thus, due to the range of desires, and no clear cut "best option" it has been left undefined. That is the prehistory of this topic.

Recently someone presented a patch for sort to actually go play "nethack" if someone called it in scalar context. It was, of course, a joke, but some people ran with that joke...  Anyway, while that absurdity was being discussed, someone else said they would like to see the ability to short circuit the sort routine if they only wanted the first M elements resulting from a sort of N elements.  The argument being, "why sort the whole list if I only want the top 5 elements?".  Thus, the discussion of the pro's and con's of various sorting algorithms, how to implement a short circuiting ability without making a complete sort any more expensive etc.

Cool stuff.

-Scott


-----Original Message-----
From: omaha-pm-bounces@pm.org [mailto:omaha-pm-bounces@pm.org]On Behalf
Of Jay Hannah
Sent: Thursday, July 08, 2004 12:12 AM
To: Perl Mongers of Omaha, Nebraska USA
Subject: Re: [Omaha.pm] Sort quickie



On Jul 7, 2004, at 3:07 PM, Miller, Scott L (Omaha Networks) wrote:
> Another interesting possibility, expanding on what Jay started;
> it might be possible to use Jay's technique to sort IP addresses
> without first converting the addresses to their "long int" form...

cmp was driving me crazy so I jumped on IRC (irc.freenode.net #perl)...

The 2nd code block below is an IPv4 IP sorter for you... The 1st code 
block is a faulty one. -grin-

j


--------------------------------

This content is stored as  http://sial.org/pbot/3335.

From: "Omaha" at 68.13.20.113
Summary: Confused by cmp

I was writing a quick demo of how to sort IPv4 addresses. The problem 
is that the by_ip sort below should NOT work, as I undestand cmp, yet 
somehow it already does...

Shouldn't "sort @x" and "sort by_ip @x" as written below both return 
the same series? I thought '$a cmp $b' was the default behavior of 
sort?

Confused...

my @ips = qw(
    20.0.50.0
    20.0.100.0
    77.0.0.0
    100.0.0.0
);

print join ", ", sort @ips;
print "\n";
print join ", ", sort by_ip @ips;
print "\n";

sub by_ip {
    my ($a, $b) = @_;
    $a cmp $b;
}

----------

<pasteling> "Omaha" at 68.13.20.113 pasted "Confused by cmp" (23 lines, 
554B) at http://sial.org/pbot/3335
<broquaint> Omaha: that's because you're comparing two undef vars i.e 
$a & $b aren't put in @_, they're magical package level vars
<broquaint> drop my($a,$b) = @_ and it works as expected
<Omaha> ... ahhh... ok. So my cmp always returned 0, leaving the array 
in original order, which just happened to be sorted by IP already. Got 
it...
<Omaha> Any way to local($a, $b) so I can manipulate them? -ponder- 
Just trying to save a couple lines of code I guess...
<apeiron> Maybe more descriptive variable names would be a prudent idea.
* Omaha grins
<broquaint> avoid $a & $b outside of sort { ... }, it saves all sorts 
of headaches

-----------

This content is stored as  http://sial.org/pbot/3336.

From: "Omaha" at 68.13.20.113
Summary: sort by_ip -- feedback anyone?

Quick and dirty IPv4 sorter?

my @ips = qw( 20.0.100.0 20.0.50.0 100.0.0.0 77.0.0.0 );

print join ", ", sort @ips;
print "\n";
print join ", ", sort by_ip @ips;
print "\n";

sub by_ip {
    my ($j, $k) = ($a, $b);
    for ($j, $k) {
       $_ = sprintf("%03d.%03d.%03d.%03d", split /\./);
    }
    $j cmp $k;
}

------------------------

<pasteling> "Omaha" at 68.13.20.113 pasted "sort by_ip -- feedback 
anyone?" (17 lines, 327B) at http://sial.org/pbot/3336
<cfedde> Omaha: I'd use Sockets, inet_aton and {$a cmp $b}
<cfedde> Omaha: but yours works too

---------------

<apeiron> Beware False Hubris (inventing your own wheel), False 
Impatience (thinking you can make one more quickly than it'd take to 
learn an existing one), and False Laziness (thinking that making your 
own is less effort).

--------------

<nictuku> is this fine? if (!$lasttimestamp) { $lasttimestamp = 0; }
<Omaha> $lasttimestamp ||= 0;
<nictuku> Omaha, cool.
<Omaha> ya. :)


_______________________________________________
Omaha-pm mailing list
Omaha-pm@pm.org
http://www.pm.org/mailman/listinfo/omaha-pm