Chip's Tips for Developers

Contains coding, but not narcotic.

Synthesis 2.0.3: Regex rocks

February 2nd, 2010 9:56:15 am pst by Sterling Camden

Synthesis version 2.0.3, which you can download below, includes numerous beneficial (I hope) changes to Regex.  I have nearly filled out all of the features that I want to add to this class (I’ve only left a few to-dos on the list).  What I added in this version can be separated into new syntax and performance enhancements.

Syntax extensions

  • Similarly to *, +, or ?, you can now also use {n}, where n is a number of iterations.  Or, you can specify {n,m}, where n is the minimum and m is the maximum.  You can also omit one of these by keeping the comma.  Thus, {3,} means “at least 3, no maximum” and {,8} means “up to 8, no minimum”.  For the implementation, I considered replicating the state machine the appropriate number of times with transitions through or around them, but it occurred to me that you could easily create a monster with something like /(a{,1000}b){,2000}/.  Instead, I created a new facility to determine whether a transition is enabled within the current context.  This proved very valuable, and made the implementation of word boundary checking (see below) dead simple.
  • For *, +, ?, and {}, you can immediately follow them with a ? to specify a non-greedy search (the default has always been greedy).  For this, I added a scoring mechanism that adds to or subtracts from a match’s score for each repeating character, depending on the greediness mode of the current context.
  • Like Perl, Ruby, et al I added \A as a ^ equivalent.  It anchors to the beginning of the text, but unlike ^ does not anchor following a newline when in Multiline mode.  Likewise, \z anchors to the very end without regard for newlines.  \Z (uppercase) anchors to the end of text, but if the last character is a newline it anchors to the position of the newline (regardless of Multiline mode).
  • \b now anchors to a word boundary.  To use \b for backspace instead, you must enclose it in a character class – e.g., [\b].  \B can be used to match a word non-boundary.
  • \a matches a bell character (\x07), and \e matches escape (\x1b).
  • When not in a character class, \Q specifies that all characters up to the next \E will be treated as literals.
  • We now have named group support, a la Python:  (?P<group>) captures a sub-expression named “group”, (?P=group) can be used to back-reference it within the expression, and \g<group> can be used in the substitution string of replace to insert its last captured content.
  • I added character class subtraction even though very few regex parsers include it, just because it’s so cool.  For instance, [a-z-[aeiou]] matches all lowercase consonants.

Performance enhancements

The above syntax extensions added a huge number of tests, but I wanted to make sure I covered the code well.  Unfortunately, all that new code meant that running htmlcover on test_regex would run for days.  Since CodeCoverage makes ample use of Regex, improving the performance of Regex itself would doubly benefit the elapsed time needed to test Regex testing.  I implemented the following optimizations:

  • If an expression begins with an unconditional \A or ^, then we don’t need to test the initial state of the machine on any part of the string except the beginning of a line.  In the case of \A, or when using ^ with Multiline off, we can also give up as soon as the number of remaining states equals zero (because we won’t add any more).
  • In previous versions, we always gathered all possible matches first and then picked the best one based on position, greediness score, and length.  But since we process the string left to right and the highest priority goes to leftmost position, we can eliminate some possibilities as soon as we accept any match.  We no longer need to test each remaining character against the initial machine state, and if the number of remaining states falls to zero at any point then we can take the best one so far, if we have one.

Obviously, the optimizations listed above have greater or lesser benefit depending on the expression and the input it is processing.  The best cases reduce matching from having to walk the entire string to exiting after the first character.  The worst cases impose a mild penalty for trying to optimize something that doesn’t benefit at all.

CodeCoverage performance

Based on these observations, I was able to tune the Regexen I use in CodeCoverage to boost performance even more.  First of all, rather than jumping right into a Regex match, I compare the initial part of each debugger line literally for strings that could be what I’m seeking, then use a Regex to parse out the file name and line number.  Adding a ^ anchor to each of those expressions prevents the overhead of the Regex engine trying to restart its state machine on each subsequent character.

I also replaced the associative lists in CodeCoverage with Hashes.  That boosts performance on looking up the source file and line number associations in the Map, which happens many thousands of times during code sampling.

With all of those changes, I am now able to run coverage on test_regex in under 12 hours.  So it’s once again something I can kick off at the end of the day and see the results in the morning.  While that may still seem long, remember that Regex is a very complex case, involving lots of in-memory operations (so lots of statements to step through), and that test_regex tests all of the pathological cases that I could imagine.  One test in particular (matching /(a*b*)*/ against a string of a’s and b’s, took a few hours all by itself when sampling, though in normal execution it finishes in a few seconds.

CodeCoverage also now provides a mechanism for getting feedback on its progress, and I’ve modified htmlcover to take advantage of that.  During the Sample phase, it reports on each new line number within the main routine, so you can get an idea of where it is in the overall process.

I corrected a few bugs in this version as well.  Please see the change log for details.

Posted in Regexen, SynergyDE | 1 Comment » RSS 2.0 | Sphere it!

Masked coder saves the day with updates to tee utility for Windows

January 28th, 2010 11:05:37 am pst by Sterling Camden

A commenter going by Anonymous Coward (I bet that’s not his real name, and I beg forgiveness if the masculine pronoun doesn’t apply) provided a solution to the double-spaced output problem users occasionally experienced with my tee utility for Windows.

Naturally, this issue occurred because of the inconsistent usage of carriage-returns and linefeeds to terminate lines of text, combined with Console::ReadLine’s interpretation thereof.  The solution, simply enough, was to use BinaryReader for input, and BinaryWriter for output, so no interpretation or translation of the characters would occur.

Like most good programmers, once he had inserted hands in the code, Anonymous Coward couldn’t leave this one alone.  He coded up several more features and emailed them to me.  I made a few corrections and included the new version here.

New features include:

  • Duplicated file names are silently ignored, rather than causing a file open conflict.
  • /? or –help displays usage information.
  • -i or –ignore ignores cancel (Ctrl+C), instead of always ignoring it.
  • -f optionally marks the end of switches and interprets all remaining arguments as filenames.

Thanks, Anonymous Coward – whoever you are!

Posted in .NET, C and C++, Windows | 1 Comment » RSS 2.0 | Sphere it!

Synthesis 2.0.2: more improvements to regexen, lists, and code coverage

January 20th, 2010 11:34:12 am pst by Sterling Camden

I have just released version 2.0.2 of the Synthesis library for Synergy/DE 9.3 and above.  You can download the library sources using the link at the bottom of this page.  This version includes a few minor corrections, and the following enhancements:

Regular Expressions (Regex class)

You can now back-reference a captured group within the pattern itself.  This is particularly useful for matching pairs of items, like quotes or HTML tags.  For instance, R$("/(['""]).*\1/") matches a quoted string, either in single or double quotes, but not mixed. Note that the duplicate " in the character class is merely used to insert a quote into a quoted string in Synergy/DE.  For repeating groups, the back-reference uses the most recently encountered value.  If a reference is made to a group that has not yet been captured, then a zero-length string is used.

If you want to using grouping but don’t want it numbered among potential back-references, you can now enclose it within (?: ). That can be useful when you need more than 9 groupings, because you can’t have more than 9 back-references (at least, not until I implement named groups).

You can now insert a comment in your regex even if you aren’t using Extended syntax mode, by enclosing it within (?# ).

Lists (ls class)

While tuning a QuickSort algorithm in C for a client, it occurred to me to try some of the same optimizations on the quicksort method of ls.  Changing the algorithm to use the “collapse the walls” approach and falling back to an insertion sort when the list contains fewer than five elements both improved performance significantly.  I experimented with trying to optimize the pivot value using both a median of 3 and a pseudomedian of 9, but neither of those approaches yielded any benefit.  I think that the overhead of invoking the compare test generally outweighs the benefit of using a better pivot value, extreme cases excepted.

Then it occurred to me to take another look at my mergesort method to see if that could be tuned at all.  Sure enough, I figured out how to cut down the number of copy operations by about half.  This improved performance dramatically (as you might expect), and made mergesort faster than quicksort when the list contains about 100,000 elements or more (down from 900,000 in the previous version, even with the improvements to quicksort).  Below 100,000 elements, quicksort still wins consistently.

So, I decided to make the sort method into a hybrid.  It chooses quicksort for lists of less than 100,000 items.  For more than that, it starts out as a mergesort, but for any segment that contains less than 100,000 items it switches to quicksort.  Since quicksort switches to Insertion sort on less than 5 items, we get a combination of three different sort algorithms, depending on the size of the set being sorted.  This hybrid performs comparably to quicksort up to 100,000 (as you would expect), but beyond that point it outperforms both mergesort and quicksort by taking advantage of the strengths of both approaches.

Of course, MergeSort and QuickSort bring other considerations to the table besides performance.  MergeSort uses O(N) additional memory, while QuickSort does everything in place.  However, MergeSort is stable (meaning, the order of duplicates is preserved), while my new quicksort algorithm is not.  Since the sort method combines these approaches, it is neither stable nor lean on memory, but it performs best.  So, depending on your needs, you can choose which sort to use:

  • Fast:  sort
  • Lean: quicksort
  • Stable:  mergesort

Code coverage

I improved the performance of the CodeCoverage class by caching the source file object, rather than always looking it up in the associative list.  That benefits sampling of executables that stay in one source file for a number of lines at a time.  I also improved the code coverage for most of the test routines.

Posted in Regexen, SynergyDE | No Comments » RSS 2.0 | Sphere it!

Synthesis 2.0.1: chasing bugs out of the woodwork with coverage analysis

December 31st, 2009 2:50:45 pm pst by Sterling Camden

I just released version 2.0.1 of my Synthesis library for Synergy/DE, which you can download below.

As you can see from the change log, this version grew mostly from the application of my tools for analyzing code coverage to my own creations.  By fleshing out my own unit tests to nearly 100% coverage, I shook out all nine corrections listed for this version.  So you can see from my own example that achieving 100% test coverage is not merely an academic exercise.  Code always looks more solid when it’s never executed.

These tests also revealed some areas in which my code coverage analysis needed improvement.  The six enhancements listed for the CodeCoverage class all came from those observations.

In fact, this version includes only one change that did not spring from my automated navel-gazing:  the addition of the ‘x’ option to Regex.  This, of course, allows white space, line breaks, and comments within a regular expression.  My other enhancement to Regex (optimizing out unnecessary code) was a result of code coverage analysis — because otherwise I might never have determined that the code removed was not necessary.

My code coverage only suffers from one major flaw, and that is not its own:  the debugger (on which it relies for sampling) sometimes exits prematurely when stepping through code (Synergex tr#30947).  You can tell when this happens by examining the coverage report for your mainline to see where coverage stops.  Unfortunately, one of the places where it regularly happens is within U_START, so analyzing an application that uses the Synergy UI Toolkit shall be an exercise in futility until such time as that tracker shall have been resolved.

However, I would highly recommend using this utility to analyze test coverage for code that doesn’t rely on the Toolkit – as well as for Toolkit code once that debugger bug is fixed.  It’s certainly pointed out some oversights in my code.

I have a few ideas on how to enhance this further:

  • Currently, the command line tools only include sampling from one executable.  I’d like to add the ability to sample multiple runs, combining their statistics.  That way, you could sample a whole test suite and see how well your code is covered by it.
  • Unreachable code is currently included in the analysis.  For instance, code that’s .ifdef’d out, or an mreturn at the end of a method that always throws an exception (the mreturn being required by the compiler).  I’m thinking of creating a way to tag code as “not to be included in coverage analysis”.
  • Once analyzing Toolkit code becomes possible, I can foresee sampling taking a lot longer than it needs to, because most users wouldn’t want to include the internals of Toolkit routines in the analysis.  I’d like to create a way to specify the source files that you want to include in sampling, so that the sampler could simply skip over the rest.

Anything else that you’d like to see here?

Posted in Regexen, SynergyDE, UI Toolkit | 1 Comment » RSS 2.0 | Sphere it!

Synthesis 2.0: code coverage analysis, .NET compatibility, and more

December 26th, 2009 3:08:06 pm pst by Sterling Camden

I can’t believe that it’s been more than two months since I unleashed the Synthesis library on the unsuspecting Synergy/DE world.  I promised then to enhance it further, so herewith I announce the arrival of Synthesis 2.0, which you can download via the link below.  I bumped the primary version number because this version involves numerous changes throughout, and some significant new features.  See the full change log here.

Synergy/DE Version 9.3.1

Synthesis 2.0 is fully compatible with Synergy/DE version 9.3.1.  Unfortunately, I was unable to maintain compatibility with earlier versions of Synergy/DE, so don’t upgrade Synthesis until you’re ready to upgrade Synergy/DE as well.

Synergy.NET beta

Synergy/DE 9.3.1 also includes a beta version of Synergy.NET (Synergy/DE for the .NET Framework) on 32-bit Windows platforms.  Synthesis 2.0 includes limited compatibility with this version as well.  See the Known Issues for what’s missing.  You can also search the code for DOTNET_BETA_WORKAROUND to see the code that I expect to support in the future versus what I’m doing now.

Code Coverage analysis

Synthesis 2.0 includes two new utilities for analyzing code coverage:  cover and htmlcover.  These are command-line utilities that can be run with either dbr or dbs.  The only difference between them is the format of their output – cover writes tab-delimited data to stdout, while htmlcover writes an HTML-formatted report to a file.

Both utilities sample code execution by driving a debug session of the executable and stepping a line at a time through the code.  For this purpose, they use the Synergy/DE remote debugger, which operates over Telnet.  Thus, I added classes to Synthesis to support the Telnet protocol and sockets.

You can sample the code coverage for any Synergy/DE executable (except for .NET, because the Telnet debugger isn’t supported there).  I’ve added a ‘coverage’ dependency to the makefile in the tests directory to generate HTML reports for each of the Synthesis tests into the coverage subdirectory.  I didn’t include those in the download because of their size, but you can generate them with a ‘build coverage’ in the tests directory, if you have PVCS Configuration Builder and you’re patient enough to wait for them (especially test_regex).  They’ve already revealed some areas where I need to beef up my tests.

For more information about how this works, see the CodeCoverage class.

Bug fixes

In several places, the comparison operators used for alphanumeric did not distinguish between strings of unequal length that match up to the lesser length (e.g., “bug” == “bugger”).  That’s been corrected.

I fixed two issues related to Regular Expressions:

  1. You may now enclose a begin-line or end-line anchor within parentheses.  Previously, they could only occur as the very first or last character in the pattern, respectively.  Now, for example,
    R$('/(^front|or_not)/')

    will correctly find “front” at the beginning of the line, or “or_not” anywhere.  Likewise for a $ anchor.

  2. Certain complex nested sub-expressions had a bug that prevented them from ever matching.

For the future

I still have much more to add to this library.  Here are some of the things on my list:

  • Improved .NET compatibility
  • Additional Regex features (full Perl 5 syntax)
  • More methods for ls (inject/fold, etc.)
  • Prevent multiple evaluation of assertion arguments when formatting a message
  • … and more.

Is there anything you’d like to see added?

Posted in .NET, OpenVMS, Regexen, SynergyDE, UNIX, Windows | 1 Comment » RSS 2.0 | Sphere it!

Prime numbers in Common Lisp

October 16th, 2009 12:37:24 pm pst by Sterling Camden

As an exercise in my spare time, I decided to translate my Ruby prime number generator to Common Lisp.  What — isn’t that the sort of thing you do in your spare time?

Common Lisp does not have call/cc (that’s a Scheme thing), although it’s possible to implement it.  But in this generator, I really only need one piece of state information:  the list of known primes, from which I can derive the highest known prime and the next number to test.  So I just created a lexical closure for that list, which is shared by the functions that need it.

Other than the use of continuations, the algorithm follows pretty closely the one I developed for Ruby.  I “prime” the list with (2 3) and then test every odd number thereafter.  If the number does not have a factor within the list of known primes from 2 up to the square root of the number itself (inclusive), then it must be prime.  That works because the list of known primes is kept in order from lowest to highest, so once we exceed the square root of our number, it can’t be a factor without the number also having a lower factor.

Given our prime number generator, we can then expose the following functions:

nth-prime n => prime
primep n => n OR nil
prime-factors n => list
lcd &rest integers+ => integer
gcf &rest integers+ => integer

Of course, Common Lisp already implements the last two as lcm and gcd – but they made for a good exercise in mapping lists.  I found through experimentation that, at least in CLISP, union and intersection may or may not remove duplicates from the original lists – it seems to depend on which list comes first.  So I force uniqueness by calling remove-duplicates, then use a method similar to my Ruby implementation to generate the max or min occurrences of each factor.  Rather than creating a new list of the non-uniqued union or intersection, however, I just exponentiate each factor by the appropriate number of occurrences.

The final test in test_prime.lisp is to compute the 10000th prime number.  This takes about 10 seconds on my system.  The same test in my Ruby implementation takes about 4 seconds.  I’m uncertain how much of that difference to attribute to some flaw in my Lisp implementation, the lack of continuations, or an inherent performance difference between the language implementations.  Any improvements on my algorithm will be welcomed.

Posted in Lisp | 4 Comments » RSS 2.0 | Sphere it!

Prime ruby

October 12th, 2009 5:35:25 pm pst by Sterling Camden

While watching the MLB playoffs yesterday, I could overhear my wife helping our son with his homework – fractions.  She explained “making the bottom number the same” in terms that he could understand, but I couldn’t help ruminating on the algorithm for computing Least Common Denominator using prime factors.

Then it occurred to me that before you can factor anything, you need access to the set of prime numbers.  Because that set is infinite, this seemed like the perfect application for a continuation-based generator, as outlined in The Ruby Way, by Hal Fulton.

I adapted Hal’s algorithm into a set of class methods, because I wanted to be able to say:

Prime[n] => m

where m is the nth prime.  Because of this random access to prime numbers, I memoized all previously computed primes in an array.

UPDATE 2009-10-13: I streamlined the algorithm for computing the next prime by breaking out of the loop when the next prime to test as a factor is greater than the square root of the candidate prime.

How about determining whether a number is prime?  I reopened Integer and added a prime? method that asks the Prime class about that, by calling Prime.ordinal(n).

I also added an is_factor method to Integer in preparation for computing factors.  You might think that this was as simple as self % factor == 0, but testing revealed that I needed to add tests for zero also.  Then I was able to add:

Prime.factors(n) => ary

where ary is an array of the prime factors of n.

Now that I had prime factorization, I was ready for LCD and GCF.  Implementing those functions, I realized that the array methods | (union) and & (intersection) assume set rules – i.e., they both eliminate duplicates from the original lists.  That won’t work for my purposes, because to compute, for instance, the LCD of 20 and 30, we need the non-uniqued union of their factor lists:  {2, 2, 5} | {2, 3, 5} => {2, 2, 3, 5}, but array union yields [2, 3, 5].  Likewise for GCF we need a non-uniqued intersection.  So I added two methods to Array:  union_dups and intersect_dups.  Each of these methods uses the uniqued version of the respective operation, counts how many of each occur in each array, and puts the max or min (respectively) of that number in the resulting array.

I find it hard to believe that Ruby arrays don’t already have some easier way to do this.  To implement it, I also had to add a how_many method (similar to Lisp’s count-if).  Am I reinventing the wheel here?  Or is this just a “prime” example of how young Ruby still is?

Posted in Ruby | 2 Comments » RSS 2.0 | Sphere it!

WiPeD debugger for WordPress now has traceback

October 10th, 2009 4:13:41 pm pst by Sterling Camden

Sam Mela liked my WiPeD debugger for WordPress, but he wanted an easy way to include call traceback information.  So he added it, and he kindly shared his addition with me.

I’ve adapted what he did to my own tastes and included it in the download below.  Whenever you want to add the current call stack to the debug log, just call:

WPD_backtrace();

In order to test this, I added a new option in the Options page:

Trace the following actions (comma-separated)

Just enter the name of any action hooks on which you want to get a traceback added to the debug log.  This is not only useful for testing the plugin, it can also be used to find out where in the twisty little passages of WordPress each of the actions gets invoked.

Largely because of this feature, I now initialize this plugin whenever it is activated.  Previously I delayed setting it up until WPD_print was invoked.  So this plugin now imposes a small overhead if it is activated and not used.  Be advised.

I also adopted Sam’s idea of providing default styling (which I modified significantly).  If you don’t like it, you can remove or modify WiPeD.css.

Posted in CSS, PHP, Web, WordPress | No Comments » RSS 2.0 | Sphere it!

A synthesis of new capabilities in Synergy/DE version 9

October 6th, 2009 12:43:46 pm pst by Sterling Camden

As I threatened in my last post, I’ve combined a lot of my Synergy/DE version 9 work into a single library, which I have named “synthesis”.  Why synthesis?

  • It’s a synthesis of most of my explorations of what I can do with Synergy/DE version 9.
  • The programming model is a synthesis of object-oriented and functional programming.
  • The terminology synthesizes vocabulary from Lisp, Ruby, C++, .NET, and historical Synergy/DE.
  • It needed to start with "syn" to fit my tagging conventions here on ChipsTips.
  • It embodies my “thesis” on the future of Synergy/DE programming.

Included in the download below are all the sources, mixins, tests, documentation, and PVCS archives for the synthesis library.  You can also access the documentation online here.  As usual, this content is published under the Open Works License, so you’re free to pull it apart, modify it, and reuse it in any application, commercial or personal, open or closed source – but please retain credit for the author (and I’d appreciate a link back here).

In the root directory you will find setenv.sh and setenv.bat.  These are scripts to setup the environment variables used for building the library.  setenv.sh is a *nix shell script, and setenv.bat is a Windows batch file.  If you don’t want to use the PVCS archives, also set NOVCS=1 in your environment.  Next, you’ll see the following directories:

  • sources – here’s where you build the library, which ends up in the main directory.
  • mixins – include files for mixins
  • prototypes – you’ll need to create this directory to hold the dbh files
  • tests – build the unit tests here
  • documentation – full documentation, starting at index.html

The build assumes you have PVCS Configuration Builder, so if you don’t you will need to translate makefile.mak in the sources and tests directories to whatever build script you prefer.

Enhancements

One of my reasons for combining these modules into a single library was so I could add more enhancements without duplicating effort.  Here are a few that I have added already:

I’m just getting warmed up, folks.

Posted in SynergyDE | 1 Comment » RSS 2.0 | Sphere it!

Regular Expression parser for Synergy/DE

September 26th, 2009 5:06:51 pm pst by Sterling Camden

Yes, Richard, I did it – I created another problem for everyone.  I’ve implemented a regex parser for Synergy/DE.  And because it’s written in Synergy/DE (as opposed to say, a C DLL), it’s portable between any number of Unices, Linux, Windows, and even OpenVMS – as long as you’re running Synergy/DE version 9.1.5 or above.

It performs pretty well, too – mostly because I started with an excellent (if limited) deterministic algorithm from Amer Gerzic.  I must admit that extending this algorithm to track parenthesized sub-expressions was one of the greatest programming challenges I’ve faced in quite some time.  It forced me to make the engine not quite as deterministic (there can now be more than one active state for each potential start of a match), but I completely avoided back-tracking – so performance suffers very little.  I was also able to optimize for the case in which no sub-expressions were compiled.

The breakthrough occurred to me while I was waiting in the doctor’s office for a colonoscopy – during which they would administer the Milk of Amnesia to make me forget the procedure.  I was so worried that it would wash away my brilliant algorithm that I asked my wife for a piece of paper so I could jot down some notes.  Fortunately, though, the anesthetic seemed to have only a limited effect (I was awake throughout the procedure – ewwww) and I remembered everything about my algorithm afterwards.

The trick was to abstract the notion of inputs beyond just characters received.  My implementation includes a transition for entering and exiting a potential sub-expression.  Whenever we can move in such a direction, we create a fork in the road for our deterministic engine – but rather than trying them one at a time, we try them all at once.  Each character thereafter collects the new states for each path we’re on until they dead-end or reach an accepting state.

I used a similar approach for handling line anchors.  Character classes also have more abstract transitions than just the characters they include, which reduces the number of paths we have to maintain in the state table — though it can increase the number of paths we’re following simultaneously.

Any way, it works, it’s fast, and I’m happy.  You can find full documentation here, and download the sources below.

I also added “match” and “replace” methods to Var, created mappers MapMatched(regex) and MapReplace(regex, with), and added some more useful methods to ls – all of which are included below.  I haven’t documented these or updated the official downloads for Var and ls yet, because it will involve adding everything from Regex.

Speaking of which, I’m thinking of consolidating the downloads for all of my Synergy/DE version 9 stuff and organizing it into a single library with a single include file and a right proper makefile.  What do you Synergistas think of that plan?

OK, I’ve given you regexen, now go save the day.

UPDATE: now merged into synthesis library.

Posted in Regexen, SynergyDE, Wildly popular | No Comments » RSS 2.0 | Sphere it!

Synergy/DE ls class gets set

September 11th, 2009 10:09:35 am pst by Sterling Camden

In a continuing project, I needed a Set data structure.  Sets are similar to lists, but with the additional constraint that they do not contain duplicated elements.  I considered creating a new Set class that would impose that constraint, but decided instead to follow the guiding principle of empowering users rather than protecting them from themselves.  Thus, I added a few methods to the ls class to support set-like behavior, while not preventing you from doing decidedly anti-setlike things with it if you so choose.  If you want to shoot yourself in the foot, be my guest – just don’t aim my way.

The first Set characteristic, adding members only if they aren’t already members, is essentially a union of the original set with the set of members being added.  So I extended the union method in two ways:  (1) it can now accept a non-arraylist object as an argument, in which case it will treat that as a list of one member, and (2) I added a union$ method that modifies the list in-place.  So the equivalent of the theoretical Set.add(object) is ls.union$(object).

Conversely, you also want to be able to remove members of a set without knowing their indices.  I added the methods remove and remove$, which take an object as argument and remove all occurrences of that object (even though when a list is used as a set, there should be no more than one).  Note that if the argument is a list, it will still be treated as an object by these methods, so you can remove an element that is itself a list.  To achieve set subtraction, I added the methods subtract and subtract$, which remove all the elements of one list from another.  I then provided a minus (-) operator, which invokes subtract for a list argument and remove for any other argument.

Next, I added some idiomatic Set language.  Method contains returns true if an object is a member of a list.  Like remove, it makes no distinction between lists and other objects.  For that distinction, I added methods issubsetof and issupersetof, which determine whether all the elements of one list are contained in another. The override for Object.Equals already provided a robust test for set equality.

Now I’m all set to get to work!

UPDATE: Merged into synthesis library.

Posted in SynergyDE | No Comments » RSS 2.0 | Sphere it!

Multimap support for Synergy/DE ls class

September 10th, 2009 12:54:06 pm pst by Sterling Camden

I’m working on a project in which I needed a Multimap data structure – that is, an association of key values with zero or more objects for each key.  Both the Hash and the alist methods of ls are based on the model of zero or one value per key.  But because ls implements the key/value pair as a sublist, it could be naturally extended to support more than one value.  That is what I have done.

I added a new method, keyadd, which adds a value to a key without removing any previous associations (as keyset and the indexer do).

The method keyget already returns multiple values as an ls, but if there is only one associated value then just that object is returned.  If there is no association, keyget returns null.  So if you’re expecting multiple values, you would need to check for null, then check whether the returned object is a list, then access its members.  To shortcut all that, I added a method keygetl, which always returns a list whether it contains many, one, or no elements.

I also noticed in my testing that using new ls() to create an empty list is not always syntactically convenient.  You can’t use new in every context where you can use a method call.  So I added a static method empty, which simply creates and returns an empty ls.

Posted in SynergyDE | No Comments » RSS 2.0 | Sphere it!

Synergy/DE ls class gets array conversion and deep mapping

September 5th, 2009 5:22:50 pm pst by Sterling Camden

Based on a request from Richard Barndt, I’ve added the ability to convert a System.Array (aka array@class) to an ls.  I added three new overloads to the from method, so you can convert arrays of up to three dimensions.  I’ll add more if there’s demand, but I hate repeating myself, and the Synergy/DE argument type matching doesn’t provide a way to define an argument as an array without specifying its dimensions – so I need one method for each number of possible dimensions.  The Synergy/DE macro facility isn’t smart enough to be able to write all of these methods for me — or maybe I’m not smart enough to figure out how to do it.  We’ll see if I have a sudden epiphany in my sleep or in the shower that makes this possible.

If the array has only one dimension, passing it to ls.from will produce a simple ls.  If two dimensions, the result is an ls containing ls members that contain the members of the array.  Three dimensions adds another level of lists within lists.

In testing this feature, I tried converting an array of strings, as in Richard’s request.  But I found that the result was not equal to a list that I composed by appending the same alpha values directly to a new ls.  That’s because ls.append automatically boxes primitive alpha values as Var, not string – but ls.from keeps the original type of each element.

If you want the resulting list to contain Vars instead of strings, then you can map the result using the map method:

ls.from(array).map(new MapAlpha())

This works fine for a one-dimensional array, but we’ll have a problem with multi-dimensioned arrays:  the sublists will get converted to their string representation.  Not what we wanted.  So, I added a new Mapper:  MapDeep.  MapDeep takes a mapper as a constructor argument, but it doesn’t apply it to members that are also lists – instread, it recurses its deep mapping to each of that list’s members.  So, converting a multi-dimensioned array of strings to a nested list of Vars can be accomplished like so:

ls.from(array).map(new MapDeep(new MapAlpha()))

UPDATE 2009-09-13: I added support for real arrays of primitive types, but I ran into a compiler bug (tr#30719) so it only works in version 9.3 and above (the fix didn’t make it into the 9.2.1 beta). It also only supports one dimensional arrays, because I’m having further compiler trouble that I haven’t been able to narrow down yet. I also had to make a change for 9.3 compatibility to the Object.Equals override in both ls.dbl and var.dbl.

Posted in SynergyDE | No Comments » RSS 2.0 | Sphere it!

Singleton mixin for Synergy/DE

September 1st, 2009 1:03:16 pm pst by Sterling Camden

One of the most common (and perhaps most misused) design patterns is the Singleton:  a class that may have only one instance.  In many cases it’s a symptom of poor design – functioning as an object-oriented equivalent of globals, or merely serving to aggregate common functions.  But sometimes it is the right design – for instance, when it represents an encapsulated resource that can only exist once, like an embedded HTTP server, or the set of environment variables associated with the current process.

To make a class into a Singleton, you have to prevent it from being instantiated more than once (usually by making its constructor private), and provide a way to access the sole instance – creating it if needed.  You must also prevent the instance from being cloned.  That’s at least three (usually four) members that you have to write almost the same way for every Singleton class.  Wouldn’t it be nice to automatically apply that pattern to a class with just a few statements?  Sounds like a job for a mixin to me.

In the download below you will find an include module named Singleton.dbl, which provides the Singleton pattern to any class in which it is included.  Somewhere within your class definition, add:

.define SINGLETON_CLASS myclass
.include “Singleton”

Where myclass is the name of the class in which we’re including the module.  We have to define the class we’re in, because the Synergy/DE compiler does not provide that to us (it’s on my wish list) and we need it for the name of the constructor as well as for the type returned (to avoid casting).

The include generates the private constructor and static member to hold the instance, as well as the public static property “instance” that returns that instance (so to access the single object, use myclass.instance).  It also overrides Object.MemberwiseClone to throw a System.InvalidOperationException.

The generated constructor calls a member method “initialize” to initialize the single instance.  You must provide this method.  It takes no arguments, and does not need to return a value (any value returned will be ignored).  It can be private or public, though I’d recommend private because reinitializing the instance would be sort of counter to the spirit of the Singleton pattern.

Unlike some Singleton implementations, the instance is not created until the first access to the “instance” property.  That’s mainly because Synergy/DE classes do not have an initialization phase of their own – but it also provides the option of deferring allocation until (or unless) needed.  You should be aware that your “initialize” code will therefore be called on that first access, so should you need to up-front that processing simply access the “instance” property at the beginning.

The included test_singleton.dbl uses assertions to test the mechanism.  Instantiation via “new” is prevented at compile-time, so my test for that is commented out.

Posted in SynergyDE | No Comments » RSS 2.0 | Sphere it!

Ruby expression evaluator updated

August 31st, 2009 4:39:38 pm pst by Sterling Camden

It had been more than three and a half years since I looked at the code for my expression evaluator in Ruby, but a user named Xagyg ran into a problem with it.  He added a function “max” that worked fine for

max(1,2,3)
3

But didn’t like negative parameters:

max(-1,-2,-3)
-6

Huh?  It turned out that because I was only treating comma as a whitespace delimiter, the above expression was equivalent to:

max(-1 -2 -3)

So the first unary negation was interpreted correctly, but the rest were read as subtraction.

To cure this, I introduced a new type of syntactic element:  the Separator.  A separator simulates grouping using the same mechanism as for operator precedence – it creates implied parentheses.  Comma now automatically maps to a Separator (though you could remove it from the @operators hash if you don’t like that behavior).  So now:

max(-1,-2,-3)

is equivalent to:

max(-1 (-2) (-3))

But now you must avoid something like this:

max -1, -2, –3

Because the comma will force the evaluation of max before proceeding to the –2.

Revisiting this old code, I see a lot that I would write differently now.  It’s far too complex for the task it performs, even though I was amazed by its simplicity when I wrote it a few years ago.  It was, after all, my first serious piece of Ruby code.  Perhaps one day I’ll redo it right.

Posted in Ruby | No Comments » RSS 2.0 | Sphere it!

Lazy math added to Synergy/DE mappers

August 26th, 2009 12:11:53 pm pst by Sterling Camden

I mentioned in my last post on my ls class for Synergy/DE that it would be nice to include the mathematical operators in my support for lazy evaluation of mapping functors.  If you don’t understand what that means, perhaps you should go read that post on mapping and filtering Arraylists.

Version 1.5 of ls includes absolutely no changes to the ls class itself.  But it does include the following extensions to its functors:

  • MapVar – maps an object to a Var.  This isn’t particularly useful in itself – but it represents a jumping off point for what I needed to accomplish.  MapVar provides operators for +, –, *, and / that each return a derived class of MapVar to perform the arithmetic operation on each object being mapped, and return the result as a Var.
  • MapAlpha – this one existed before, but its now derived from MapVar, so you can use the same operators on it to do lazy string manipulation.
  • MapDec – maps the object to a decimal Var.  Also derived from MapVar.
  • MapInt – maps the object to an integer Var.  Also derived from MapVar.
  • CompareMap1 – a Compare functor that uses a MapObject to map the first object it receives before comparing it against the second one.
  • CompareMap2 – a Compare functor that uses a MapObject to map both of the objects it receives before comparing them.

What do these functors give you?  Well, for instance if you had an array of integers and you wanted to increment them all by 5, you could do:

myls.map$(new MapInt() + 5)

Or how about if you want to count the elements in a list that when multiplied by a discount percentage exceed a certain threshold?

num_over = myls.countif(new CompareMap1(new MapVarDec() * discount) > threshold)

Maybe you want to sort an array so that all null elements are at the end:

myls.sort$(new CompareMap2(new MapIf(new MapNull(), 1, 0), new CompareVar()))

In that last example, we’re mapping each object to a 1 or 0 depending on whether it’s null and then sorting on that mapped value using a CompareVar(), which will treat 1 as greater than 0.

That’s about all I want to do for lazy evaluators for now, other than to add some more operations to Var and MapVar:  modulo, bitwise operators, roots, powers, logarithms for numeric — and regular expressions for strings.

Posted in SynergyDE | 3 Comments » RSS 2.0 | Sphere it!

Categories now have feeds, thanks to a new WordPress plugin

August 25th, 2009 12:02:26 pm pst by Sterling Camden

image On this site I write examples in a variety of programming languages.  I’m sure that some of my WordPress (PHP) audience gets tired of hearing about Synergy/DE, Ruby, and C#.  Some Rubyist readers have commented that I should stick to writing about Ruby.  I experienced a wave of subscriptions from Lispers when I posted twice about Lisp, but they’re probably sick of all the non-Lisp posts I’ve written since then.  So, it’s time to give subscribers more choice.

Take a look over at the right sidebar (here, if you’re in a feed reader).  Scroll down below the ads and notice the new “Categories” widget.  You can now subscribe to any category by clicking the orange RSS icon beside its name.  Categories that contain other categories also include their content in the feed.  So, you can subscribe to specific programming languages, or the “WordPress” feed, or the “.NET” feed, or the “OPML” feed – depending on what you’re interested in reading.  Note that I often post under more than one category, so for instance if you subscribe to “PHP” you’ll get a superset of “WordPress” — because all WordPress posts are also under PHP – at least, so far.

How you, too, can display this glorious new Categories widget on your site

Surprise!  It’s really my trusty OPML Browser widget.  But it’s feeding from an OPML file that is automatically updated by a new WordPress plugin that you can download below.  This plugin generates an OPML file containing links to your categories and their feeds.  You can override the feed URLs (in case you want to feed them through FeedBurner), or let the plugin use the default feed URL.  You can generate the file on demand, but the plugin will automatically regenerate it any time you add, change, or delete a category.

Once you’ve got the Categories OPML plugin all set up, just add the OPML Browser widget to your sidebar and tell it where the file is located.  Make sure you have the latest version of the OPML Browser – the OPML for hierarchical categories posed a few challenges for the older version and required an update.

Posted in OPML, PHP, RSS, Site News, Web, WordPress | 4 Comments » RSS 2.0 | Sphere it!

OPML Browser for WordPress gets a new parser

August 24th, 2009 3:36:49 pm pst by Sterling Camden

I thought of a new use for my OPML Browser plugin, but the OPML file I intended to use with it threw me a couple of curve balls.

First of all, the file contained nested outline elements, with xmlURL attributes on the containing elements.  Until now, I displayed the feed icon if an item had a feed, and only displayed the folder icon if it didn’t (and it had child elements).  Because the only way to expand/collapse the tree is by clicking on the folder icon, I’ve changed this to display both icons in that case.

Second, my file nests the outline tags more than two levels deep.  I was using preg_match to parse the XML, with a non-greedy search for the end tag.  Can you guess what the problem was with that approach?  I’d get the end tag for an internal outline item when I was looking for the end of the outer item.  This only happened, of course, when an inner item was terminated by </outline>.

Rather than trying to get even fancier with my regular expressions, I decided to implement a real XML parser instead.  I took a look at SimpleXML, but it requires PHP5 and allow_fopen_url.  Then I looked at PHP’s built-in XML parser functions, but that’s event-based and I needed to wade through the document on my own terms.  So I implemented my own simple XML DOM parser, and included that with the plugin.

Posted in OPML, PHP, WordPress | No Comments » RSS 2.0 | Sphere it!

Simple XML DOM parser for PHP

August 24th, 2009 2:57:47 pm pst by Sterling Camden

image I realize that I’ve just added to the pile of reinvented wheels here, but I needed something simpler than SimpleXML (particularly in the requirements department) that nevertheless translated the event-based model of PHP’s built-in parser to a DOM model.

Thus, I created the very simple parser which you can download below.  It’s only 67 lines long.  It’s partially based on the example from wolfon, but simplified even further by adding an xmlElement class instead of trying to force all of the relationships into an array format.

To use:

require("xmlparse.php");

$dom = new xmlDOM($xml_text);   // Parses $xml_text

$dom->root is an xmlElement with a zero-length tagName whose children member is an array of the top-level elements (of which there should only be 1) .

$dom->find_tag($tagName, $from);

Returns the first xmlElement whose tagName matches $tagName, starting with the children of $from.  If $from is null, it starts from the root.  If not found, false is returned.

An xmlElement has the following members:

parent – the parent element
tagName – the tag name
attributes – an associative array of name => value
children – an array of child elements
data – any character data within the element

find_tag($tagName) – finds a descendant element with the specified tag name.  Returns false if none found.

Posted in PHP, XML | 2 Comments » RSS 2.0 | Sphere it!

Tee utility for Windows updated

August 18th, 2009 4:32:07 pm pst by Sterling Camden

How could something so simple turn into something so complicated?

One of my most popular (perhaps the most popular) post on this site is my tee utility for Windows.  I wrote it back in 2003 using the Visual Studio vintage of the same year.  Rather than including stdio.h and writing it in good old plain vanilla C code, I decided to employ the then-new managed extensions for C++ so it could <scare-quotes> benefit from </scare-quotes> the .NET Framework 1.1.  That’s where my troubles were begotten – but it took a while for them to gestate.

The first problem I encountered was trying to run it from a network share.  Microsoft, in a truly dramatic performance of security theater, decided to treat network shares as “Intranet Zone” and untrusted by default.  Sure, there were ways to establish a trusted zone, but it was a pain to remember.  Thankfully, they’ve solved that one for me.

Then recently, user Alex Armstrong noticed a few anomalies when using tee with ping.  He likes to use ping –t to repeat until interrupted, at which point ping prints out its statistics and then stops.  But when ping was piped to tee, Ctrl+C would interrupt tee instead – which closed the pipe and killed ping before it could relieve itself of its statistics.  Not only that, but since tee’s output files were not flushed, some of the output that had already been displayed on the console would not make it to the file(s).

I tried setting the Console property TreatControlCAsInput to true, but when the input stream is a pipe that caused tee to go into la-la land.  Reading further, I found the CancelKeyPress event – but for some reason I couldn’t get it to compile with the old syntax that I had used back in 2003.  With a little research, I found that the only reason why the compiler was tolerating most of my code was because the project upgrade wizard had added the /clr:oldSyntax switch when upgrading to VS2008.  When I removed that switch, all hell broke loose.  How could they change the syntax for writing managed code in C++ so drastically?  This little utility, which is only about 35 lines of real code, required changes on 15 of those lines.  Yes, they were mostly trivial – but how annoying.  It’s not the same language.  Yes, it’s probably better syntax, but it’s not the same language.

One change I would have liked to make is to use List<T>.ForEach to iterate through the list, rather than a for loop.  But I couldn’t find any way to write the Action code inline until lambdas arrive in VS2010.  Now that’s a syntax change I could get excited about.

Long story short, canceling the CancelKeyPress event (is that a double negative?) worked — Ctrl+C is now seen by ping, which prints out its statistics and closes the pipe, which stops tee normally.  If you should ever need to interrupt tee itself, use Ctrl+Break.

One more thing (as Columbo says) – Alex also noticed that ping’s output is double-spaced when piped through tee.  That’s because ping ends each line with a double carriage-return.  Console::ReadLine treats each of those as a terminator.  I don’t know why the command prompt window doesn’t have an issue with that, but I do.  I can’t just ignore zero-length lines, because then I’d lose spacing from other programs that don’t use a double carriage-return.  Any ideas?  This is supposed to be simple.

Posted in .NET, C and C++, Windows | 10 Comments » RSS 2.0 | Sphere it!

Better Tag Cloud