Contains coding, but not narcotic.

Managing simultaneous events in Synergy/DE UI Toolkit

January 14th, 2012 5:31:05 pm pst by Sterling Camden

synthesisI’ve written before (PDF) about the unique event model of the Synergy/DE UI Toolkit. For historical reasons, these events are often called “menu entries,” and even though they are handled in a synchronous, delayed manner you can’t have more than one such event pending at any given time. This didn’t cause too many problems back in the old days when they were almost limited to user-initiated events. But now that they’ve also become a standard means for converting asynchronous system-level events into synchronous Toolkit events, the need to generate more than one at a time has become something much more than academic.

Consider, for instance, an input field that needs to perform non-trivial processing at three points: when it is modified, when it is displayed, and when focus leaves the field. Each of these events can be hooked to a “method” routine, but if those routines need to do anything involving user interaction, then on Windows it needs to be deferred to the outer loop by means of passing an event to M_SIGNAL. Why? Because in an asychronous callback, the UI Toolkit can’t process Windows messages, so the user will not see any changes to the UI until after that event has been handled and the routine returns. Non-Windows platforms like Unix and OpenVMS don’t have that problem, but if your code is portable to Windows then you probably want to write it that way anyway.

Now, when the user types something in the field and then advances to the next field (via Tab, etc.), all three of those events will fire: first the “change” event, then the “display” event (to display the reformatted value after validation), then the “leave” event. All three associated methods will be called in that order. If each of them signals a menu entry event, then only the last one will be seen.

I’ve added a class to my Synthesis library to extend menu-entry signaling to maintain a queue of pending events. To make use of it, you replace M_SIGNAL with multi_signal. In the outer routine that consumes events, you change M_SIGNAL(…, D_REMOVE) (or testing g_select and g_entnam, after the older form), with a loop of calls to multi_remove, until it returns ^null. Each non-null return value will be a string containing an event that was signaled, in the order in which they were signaled.

To ease the transition, I’ve made multisignal somewhat compatible with M_SIGNAL. If the consuming code isn’t changed to the new paradigm, then it will still see the most recent event — whether it was passed to M_SIGNAL or multisignal.

See the test program (test_multisignal.dbl in the tests subdirectory) for an example of how to solve the scenario I related above.

With this feature, I’m putting a lid on Synthesis version 2.2.1. Any further changes will go in the next version.

BitBucket repository
tarball
Zip archive

Posted in .NET, SynergyDE, UI Toolkit, Unix, Windows | No Comments » RSS 2.0

Synergy/DE command line utility for printing labels

September 11th, 2011 5:13:24 pm pst by Sterling Camden

I recently started brewing beer again, after a 13 year hiatus. Now that my first batch has been in the bottle for a couple of weeks, and my second batch is fermenting, I decided I should label my bottles to keep the batches straight. I found some removable bottle labels at OnlineLabels.com, which I ordered and received.

Next came the question of how to print them? Back in the old days, I would painstakingly copy/paste/align each label in Microsoft Paint for every batch. There had to be an easier way.

I took a look at Microsoft Publisher, which appears to have the necessary options, but for some reason they just don’t work.

Then it occurred to me: why not write a little program using the Synergy/DE Windows Print API (which I helped design, after all)? I designed a command-line interface for specifying the origin, size, and spacing of the labels; as well as an image to print on them, optional outline (for testing alignment), and print preview. Implementation and testing took about an hour. ‘Tis a thing of beauty.

And so is the design for the labels for my first batch:

In case you’re wondering about the name — when I brewed my very first batch of beer in May of 1994, it was a nut brown ale. It turned out so good that I decided that was a good place to start for round 2. The good folks at Olympic Brewing Supplies provided an easy kit to get me back in the saddle. It isn’t my finest creation, but it’s a good bit nicer than anything you can buy in a store. Maybe I’m biased.

Anyway, I created a batch file to align for these labels, and I’ve included it as an example of usage. You still have to add the -image (or -outline for testing) and -n switches, but the spacing and sizing are all worked out for you.

Unfortunately, this program will only work on Windows platforms, because it makes use of the Windows Print API.

BitBucket repository
tarball
tarball

Posted in SynergyDE, Windows | No Comments » RSS 2.0

Oi: Object-integer mapping for Synergy/DE

March 28th, 2011 2:44:50 pm pst by Sterling Camden

synthesisOne of the most important features about the (new since version 9) object-oriented syntax in Synergy/DE is that it’s not a purist’s OOP. You can mix object-oriented and imperative programming at will. Thus, if you want to start using object-orientation in a decades-old application, you don’t have to rewrite a large portion of your existing code in order to do so.

However, not all existing code is well-suited to adding objects. In particular, existing routines that expect an argument of a primitive type (alpha, decimal, integer) cannot be passed an object instead. Since Synergy/DE supports boxing of primitives, you could go and change the type of each such parameter to object (@*), but then you’d have to go and change every place where those routines are called in order to box the existing primitive parameters. That could become a huge effort, and for the majority of cases it would add an unnecessary performance penalty.

In the case of the optional method data arguments passed through UI Toolkit routines to callback methods, Synergy/DE introduced a new parameter type, “any”. You can pass either an object or a primitive to an “any” parameter, but in a routine that accesses it in any way other than to pass it on, the parameter must be declared with a compatible type for the object passed. A good example of this would be a selection list that you’d like to load from an ls. In order to get a reference to the ls through to the load method, you can pass it as an optional argument to l_process. In the load method for that list, declare the corresponding argument as type @ls, and you can then use it.

But what if you do your list processing within a wrapper routine, so you don’t have control over the argument list? One way to attach data to a list is via the l_user routine, which stores or retrieves a record associated with the list for user-defined purposes. You can’t store an object (like an ls) in that record — Synergy/DE does not support that kind of serialization. What you need instead is a way to convert the object reference into a primitive type that you can then use later on to restore the reference. That’s the purpose of a new class I just added to the Synthesis library: Oi, which stands for Object-integer.

For any object, you can call Oi.o2i, passing the object, and retrieve a unique integer that only refers to that object. if you call it more than once for the same object, you’ll get the same integer (unless you drop it in between). To reverse the operation, pass the integer to Oi.i2o, and you’ll get back the object reference.

Naturally, this involves maintaining a reference to each object so registered within Oi. Thus, when you’re all done with an object, call Oi.drop and pass it either the object reference or its integer value, and that object will no longer be tracked.

Using Oi, you can therefore pass an integer around to any place that needs an object reference, without having to deal with the difficulties of passing an object. Obviously, a cleaner approach would be to change an API here or there, or store object references as properties of other objects. But the evolution of large systems doesn’t always lend itself to a proper redesign in order to get from point A to point B. Oi is intended to help you along your migration path, not to become a permanent solution or a recommended practice.

BitBucket repository
tarball
Zip archive

Posted in .NET, SynergyDE, Unix, Windows | 1 Comment » RSS 2.0

Synthesis 2.2: full compatibility for Unix, .NET, Synergy/DE 9.5

November 16th, 2010 1:33:35 pm pst by Sterling Camden

synthesis
Synergex will soon release Synergy/DE version 9.5, which includes a released version of Synergy for the .NET Framework. In preparation for this event, I have modified the Synthesis library to be fully compatible with the released version of 9.5. You can download it at the links below.

One late-breaking (double entendre alert) surprise in Synergy.NET involves class fields that are primitive .NET types, such as “int”. If class members having these types are not enclosed in a record (aka, “loose”), then you may not pass them as arguments to any of the built-in Synergy/DE subroutines or functions, because the runtime cannot pin them properly (resulting in all sorts of unpredictable memory access havoc). The easy fix for this is to enclose member fields within a member record. You can also use Synergy/DE types instead of .NET types (for instance, i4 instead of int), but it’s sometimes hard to tell what types the compiler will translate into .NET types. If you fail to catch one of these, the compiler should provide a “loose int” error to help you round up these promiscuous bitties.

Simultaneously, I purchased a Synergy/DE for Linux license and installed it on my FreeBSD 8.1-STABLE system using the Linux kernel module and the Fedora 10 port (ports/emulators/linux_base-f10). I still don’t have terminal input or termcap working right, but everything else seems to work, so that was enough for testing Synthesis. Rather than buying PVCS Configuration Builder for Unix, I translated the PVCS makefiles into standard Unix Makefiles, and those are now included in the distribution. I also made minor changes to the sources for compatibility with either Unix/Linux or make.

Third, I decided to move the archives for Synthesis to Mercurial on BitBucket and remove the PVCS archives from the distribution. This makes the distribution much smaller. If you need access to older versions, just contact me. I’ve also provided a tarball distribution in addition to the traditional ZIP format. Here are all those links:

BitBucket repository
tarball
Zip archive

Posted in .NET, SynergyDE, Unix, Windows | 1 Comment » RSS 2.0

Synthesis 2.1.4: split the difference

May 26th, 2010 10:12:32 am pst by Sterling Camden

Richard Barndt sent me a first pass implementation of a split method for Regex – that is, the ability to split a string at the location of an expression.  He was also nice enough to include several test cases using assertions.  I’ve adapted his syntax to my own preferences and included Regex.split in version 2.1.4 of Synthesis, which you can download below.

Whereas Richard’s version returns an ArrayList of strings, I elected to return an ls of Vars.  An ls can be used as an Arraylist, but it adds many useful methods.  Var is the normal way to store strings in an ls, for various reasons.  I also elected not to include the static Regex.split method – it seems needlessly repetitive, since you have to supply an expression anyway.  I did, however, add a split method to Var for convenience.

This implementation of split uses the GlobalSearch option (g) to determine whether to split at every occurrence of the expression or only the first one.  Captured groups within the expression are returned in the array as well at their respective positions – otherwise the match to the expression is not returned.  There’s also an option to include empty strings that result from the split, which is true by default.

Thanks, Richard, for the idea, the code, and the test cases!

download

Posted in Regexen, SynergyDE | 1 Comment » RSS 2.0

Changing the text and size of drill buttons in Synergy/DE

April 14th, 2010 10:16:53 am pst by Sterling Camden

Glynis Lyttle recently asked on the synergy-l mailing list about how to change the appearance of the drill buttons on Synergy/DE UI Toolkit input windows.  These are buttons that are placed to the right of an input field if it has a drill method (IDRILL_METHOD).  They’re always sized to the width of a vertical scroll bar, and contain the text “…” as show below:

image

Usually, these buttons are used to launch some sort of lookup function that will populate the field.  But Glynis has multiple uses for these buttons, and would like a way to indicate the distinction between launching the calendar control for a date field versus launching a new email to the address contained in an email address field.

The Synergy UI Toolkit does not provide a documented way to do this, but it can be achieved without resorting to any undocumented functionality.

The download below includes a new function named set_drill_text, which has the following syntax:

%set_drill_text(wndid, field, text) => success

where

  • wndid is the ID of the input window
  • field is either the name or the index of the input field whose drill button will be modified
  • text is the new text for the button
  • success is returned true if successful, false if not

The text of the button will be changed to the right-trimmed value of text, and the button will be resized appropriately.  The height of the button will not be changed, but its width will be set to the width of the text plus the width of two 3D edges as described by the current theme.  For example:

image

The function can fail for a number of reasons:

  • wndid is not a valid input window ID
  • field is not the name of a field in the window or a numeric value within the range of 1 to the number of fields
  • the drill button for the field has not yet been created.  Either I_FRAMES or I_INPUT must have been previously called for an input set that contains the field.
  • probably something else I haven’t thought of.

Note:  whenever the input set controls are recreated, the drill button text and size revert to their original form until you call this function again.  Input set controls are recreated whenever you switch the active input set for I_INPUT/I_FRAMES, or call I_INPFLD, as well as some cases of I_FLDMOD for that field.

It would be nice to be able to change the button to display an image instead of text.  Unfortunately, that’s quite a bit more involved and would require some non-Synergy/DE code (probably C or C++) to subclass the button’s WM_PAINT handler.  We’ll leave that as an exercise for another day.

download

Posted in SynergyDE, UI Toolkit, Windows | 3 Comments » RSS 2.0

JSON for Synergy/DE

March 18th, 2010 1:49:09 pm pst by Sterling Camden

In modern web applications, JavaScript Object Notation (JSON) has become a popular syntax for exchanging information between client and server, because (as you might guess from its name) it plays well with JavaScript.  Therefore, parsers and emitters for JSON exist for a wide variety of server-side programming languages.  Synergy/DE should be no different in this respect.

The download below includes classes for generating and parsing JSON in Synergy/DE.  The parser converts JSON into Synergy/DE objects, using a Mapper (or a DefaultMapper if none is provided) to control how that conversion takes place.  Conversely, the Mapper also determines how to convert Synergy/DE objects into JSON.  You can also combine the effects of different Mappers together, so you don’t have to reinvent anything.

I have added these classes to the Synthesis library, which is now at version 2.1.3.

download

Posted in AJAX, JSON, SynergyDE | 1 Comment » RSS 2.0

Memoization in Synergy/DE

March 8th, 2010 6:03:53 pm pst by Sterling Camden

Memoization is an optimization technique in which previously obtained results from a function are saved for quick retrieval rather than invoking the function again with the same arguments.  It’s a particularly useful strategy for recursive functions.  Take, for instance, the canonical Fibonacci function:

   1: function fib, d, reentrant

   2: D_INARG nth, i

   3: proc

   4:     using nth select

   5:     (0),

   6:       freturn 0

   7:     (1),

   8:       freturn 1

   9:     (),

  10:       freturn fib(nth-2) + fib(nth-1)

  11:     endusing

  12: end

Because the result for any value over 1 requires two recursion paths back to 1, each of which also branches, the complexity of this algorithm has nth as its exponent – meaning any moderately high value for nth will run for a very long time.  We can reduce that complexity by remembering previously computed values for nth, by using a new macro memoize:

   1: function memofib, d, reentrant

   2: D_INARG nth, i

   3: proc

   4:     using nth select

   5:     (0),

   6:       freturn 0

   7:     (1),

   8:       freturn 1

   9:     (),

  10:       begin

  11:         data n1, @*

  12:         data n2, @*

  13:

  14:         memoize(n1, memofib, nth - 2)

  15:         memoize(n2, memofib, nth - 1)

  16:

  17:         freturn (d)n1 + (d)n2

  18:       end

  19:     endusing

  20: end

Memoize assigns the result to its first argument (because I haven’t yet figured out how to provide a ternary IF in Synergy/DE that doesn’t evaluate both results).  If the function and argument have already been memoized, it returns that saved result.  Otherwise, it invokes the function with the argument, stores and returns the result.  This version computes memofib(35) approximately 1400 times faster than the first version.

There are restrictions on this macro, though.  The variable into which the result is returned must be typed object (@*), so you end up casting the result later on.  You can only supply one name for the function and one argument (which can be any type).  For the function name, you can use a static method as ‘Class.method’, or a member method as ‘obj.method’, but in the latter case the object variable must have the same name or memoization will not occur.

To get past these restrictions, you can use the underlying Memo class instead.  Multiple strings could be concatenated, and multiple objects (an instance, or multiple arguments, or both) could be joined in a list, because the object.Equals override for the ls class applies object.Equals to each list’s members.

We can rewrite the Fibonacci function using the Memo class directly and get a marginal performance improvement:

   1: function memofib2, d, reentrant

   2: D_INARG nth, i

   3: record

   4:     o   ,@*

   5:     rv  ,d28

   6: proc

   7:     if ((o = Memo.retrieve("memofib2", (object)nth)) != ^null)

   8:       freturn (d)o

   9:

  10:     using nth select

  11:     (0),

  12:       rv = 0

  13:     (1),

  14:       rv = 1

  15:     (),

  16:       rv = memofib2(nth-2) + memofib2(nth-1)

  17:     endusing

  18:     Memo.store((object)rv, "memofib2", (object)nth)

  19:     freturn rv

  20: end

As you can see, eschewing the macro means you have to do more casting yourself, as well as repeating the function name and argument.  But it allows you to separate the memoization from the function invocation.

I’ve included this class and macro in Synthesis 2.1.2, which you can download below.

download

Posted in SynergyDE | 1 Comment » RSS 2.0

Synthesis 2.1.1: a grab-bag of new features

February 22nd, 2010 2:46:37 pm pst by Sterling Camden

I lied.  I said that Regex was feature complete, and then I went and added a new feature:  the escape method, which you can use to produce a copy of a string with all Regex special characters escaped.  You can then turn around and include that in a regular expression and it will match the original string.  This idea came from Ruby.

I also improved Regex performance in two areas: compilation should be much faster, and I was able to optimize matching for an expression that begins with an unconditional \G (start from end of last match).  This optimization is very similar to the one for an initial ^ or \A.  Whenever you can anchor the beginning of the expression, you should see much better performance.

This version includes a couple of new classes: Random (pseudorandom number generator) and CommandLine (command line parser).  I used the former to replace previous calls to RANDM, and I used the latter to extend the command line syntax for the htmlcover utility so you can specify more than one execution sample per run.

I also added reduce and reduceRight methods to ls, similar to reduce in Lisp, JavaScript, Perl, Python, R, and Ruby; also known as inject in Ruby and Smalltalk; or fold in Erlang, Haskell, Maple, Mathematica, Mythryl, OCaml, Scala, and Scheme.  Because Synergy/DE doesn’t support anonymous functions yet, that necessitated the addition of a new functor class: Reducer.

Posted in Regexen, SynergyDE | 1 Comment » RSS 2.0

Synthesis 2.1.0: Regex rules

February 17th, 2010 10:33:38 am pst by Sterling Camden

I’ve just released version 2.1 of my Synthesis library for Synergy/DE.  I bumped the version number to 2.1 because I now consider the Regex class to be feature-complete.  Here’s the change log, but I’ll discuss some of the new features in more detail.

Replace gets $ tokens

Adding $ tokens to the replace brought several advantages, and only one minor disadvantage.  The disadvantage is that now you may have to escape a $ in your replacement string ($$ or \$ will work for that).  However, if the $ is not followed by a recognized escape token, then the dollar remains a dollar and does not need to be escaped.  The advantages to the new syntax include:

  • You can now reference a group whose number is greater than 9:  e.g., ${12}
  • The dollar syntax is familiar to users of Perl, PHP, JavaScript, Java, and even .NET regular expressions.
  • The $ syntax for named groups is more intuitive: ${group}

\G for end of last match

This feature works differently across various regex flavors, and mine is no exception to the exception.  Because there isn’t any way to attach data directly to a string in Synergy/DE, or even to get a unique value for a string instance, \G is not string-specific.  It is Regex-specific, which means that if you change strings on the same Regex then \G refers to the last match on the other string.  But, you can override that via the ContinueFrom public member.  Conversely, if you want to use a new Regex on the same string but pick up the last match from your previous Regex, then you just need to copy the ContinueFrom value from the first Regex to the second one.

Duplicate group names

In earlier versions, duplicate named groups caused a Regex compilation error.  I’ve decided that this is one thing that .NET does right, and I now allow duplicate groups to capture into the same back-reference.  That lets you have the same group name in alternate paths within the Regex, and you can capture either one into a specified group.  For instance:

R$('/(?P<num>\d+)A|B(?P<num>\d+)/')

will match one or more digits followed by A, or B followed by one or more digits — and will capture the numbers from either one into the named group "num".

.NET named groups

Speaking of .NET, I’ve added support for the .NET syntax for capturing and back-referencing named groups — both flavors (angle brackets or single quotes).  Additionally, I’ve emulated the arguably insane numbering scheme used by .NET for named groups – but only when the .NET syntax for group capture is used.  I decided to follow JGSoft’s implementation for that – not because I like the .NET behavior (I don’t), nor because I enjoyed the extra work (OK, maybe I did), but rather because this insures that you can copy/paste an expression from a .NET source and use it without worrying about obscure differences in behavior.  If you mix in Python-styled named groups (which you can do), they will be numbered according to where they occur, but .NET-style named groups are numbered after all other groups in the order they occurred.

Numbering of named groups

Of course, where duplicate named groups occur, they are numbered according to the number of the first such group, and that number is determined by its syntax.  So, to try to create an intelligible rule for group numbering:

  • unnamed groups and Python-style groups are numbered left-to-right, according to where their left parenthesis occurs
  • .NET named groups are numbered afterwards, according to where their left parenthesis occurs
  • EXCEPT named groups that share names adopt the number determined for the first such group according to the rules above
  • EXCEPT if the ExplicitCapture option is true, then unnamed groups aren’t numbered at all.

POSIX character classes

I added two flavors of support for POSIX character classes:  the usual [:classname:] syntax (which must be inside a [] character class), and the Java \p{Classname} syntax, which can be either inside our outside a character class.  Even though both flavors are standardized as case-sensitive and differ on their case requirements, I have elected to make them case-insensitive for a few reasons:

  • It’s more forgiving
  • None of the names collide without case-sensitivity
  • It was easier for me to implement

Check out the details here.

Posted in Regexen, SynergyDE | 2 Comments » RSS 2.0

Synthesis 2.0.4: Regex reaches adulthood

February 12th, 2010 4:34:57 pm pst by Sterling Camden

Earlier today I released version 2.0.4 of the Synthesis library for Synergy/DE.  This version rounds out most of the important features for Regex, along with a major performance enhancement for the Regex compilation phase and another one for ls, and a few bug fixes.  Take a look at the change log for details.

I’ve added a few more items to the Regex to-do list, each of which addresses wider compatibility with expressions used by other engines.  My goal is to make it possible for you to copy and paste a regular expression intended for any engine and use it as-is.

However, I can’t implement everything that anyone has included in regex syntax.  Some variations contradict more generally accepted features, and others don’t work with my implementation strategy, or with how Synergy/DE stores strings.  Thus, I have added a not-to-do list – which I like to call my (?!) list.  Each item in that list sports an explanation for why I don’t intend to implement it.  But this list is not immutable, so if it contains something that you cherish or I’ve missed some feature altogether, please leave a comment here.

For my feature reference, I used the excellent pages at regular-expressions.info.  I’d recommend it for anyone who wants to learn more about using regexen, or to write their own parser.

Posted in Regexen, SynergyDE | 1 Comment » RSS 2.0

Synthesis 2.0.3: Regex rocks

February 2nd, 2010 9:56:15 am pst by Sterling Camden

Synthesis version 2.0.3, which you can download below, includes numerous beneficial (I hope) changes to Regex.  I have nearly filled out all of the features that I want to add to this class (I’ve only left a few to-dos on the list).  What I added in this version can be separated into new syntax and performance enhancements.

Syntax extensions

  • Similarly to *, +, or ?, you can now also use {n}, where n is a number of iterations.  Or, you can specify {n,m}, where n is the minimum and m is the maximum.  You can also omit one of these by keeping the comma.  Thus, {3,} means “at least 3, no maximum” and {,8} means “up to 8, no minimum”.  For the implementation, I considered replicating the state machine the appropriate number of times with transitions through or around them, but it occurred to me that you could easily create a monster with something like /(a{,1000}b){,2000}/.  Instead, I created a new facility to determine whether a transition is enabled within the current context.  This proved very valuable, and made the implementation of word boundary checking (see below) dead simple.
  • For *, +, ?, and {}, you can immediately follow them with a ? to specify a non-greedy search (the default has always been greedy).  For this, I added a scoring mechanism that adds to or subtracts from a match’s score for each repeating character, depending on the greediness mode of the current context.
  • Like Perl, Ruby, et al I added \A as a ^ equivalent.  It anchors to the beginning of the text, but unlike ^ does not anchor following a newline when in Multiline mode.  Likewise, \z anchors to the very end without regard for newlines.  \Z (uppercase) anchors to the end of text, but if the last character is a newline it anchors to the position of the newline (regardless of Multiline mode).
  • \b now anchors to a word boundary.  To use \b for backspace instead, you must enclose it in a character class – e.g., [\b].  \B can be used to match a word non-boundary.
  • \a matches a bell character (\x07), and \e matches escape (\x1b).
  • When not in a character class, \Q specifies that all characters up to the next \E will be treated as literals.
  • We now have named group support, a la Python:  (?P<group>) captures a sub-expression named “group”, (?P=group) can be used to back-reference it within the expression, and \g<group> can be used in the substitution string of replace to insert its last captured content.
  • I added character class subtraction even though very few regex parsers include it, just because it’s so cool.  For instance, [a-z-[aeiou]] matches all lowercase consonants.

Performance enhancements

The above syntax extensions added a huge number of tests, but I wanted to make sure I covered the code well.  Unfortunately, all that new code meant that running htmlcover on test_regex would run for days.  Since CodeCoverage makes ample use of Regex, improving the performance of Regex itself would doubly benefit the elapsed time needed to test Regex testing.  I implemented the following optimizations:

  • If an expression begins with an unconditional \A or ^, then we don’t need to test the initial state of the machine on any part of the string except the beginning of a line.  In the case of \A, or when using ^ with Multiline off, we can also give up as soon as the number of remaining states equals zero (because we won’t add any more).
  • In previous versions, we always gathered all possible matches first and then picked the best one based on position, greediness score, and length.  But since we process the string left to right and the highest priority goes to leftmost position, we can eliminate some possibilities as soon as we accept any match.  We no longer need to test each remaining character against the initial machine state, and if the number of remaining states falls to zero at any point then we can take the best one so far, if we have one.

Obviously, the optimizations listed above have greater or lesser benefit depending on the expression and the input it is processing.  The best cases reduce matching from having to walk the entire string to exiting after the first character.  The worst cases impose a mild penalty for trying to optimize something that doesn’t benefit at all.

CodeCoverage performance

Based on these observations, I was able to tune the Regexen I use in CodeCoverage to boost performance even more.  First of all, rather than jumping right into a Regex match, I compare the initial part of each debugger line literally for strings that could be what I’m seeking, then use a Regex to parse out the file name and line number.  Adding a ^ anchor to each of those expressions prevents the overhead of the Regex engine trying to restart its state machine on each subsequent character.

I also replaced the associative lists in CodeCoverage with Hashes.  That boosts performance on looking up the source file and line number associations in the Map, which happens many thousands of times during code sampling.

With all of those changes, I am now able to run coverage on test_regex in under 12 hours.  So it’s once again something I can kick off at the end of the day and see the results in the morning.  While that may still seem long, remember that Regex is a very complex case, involving lots of in-memory operations (so lots of statements to step through), and that test_regex tests all of the pathological cases that I could imagine.  One test in particular (matching /(a*b*)*/ against a string of a’s and b’s, took a few hours all by itself when sampling, though in normal execution it finishes in a few seconds.

CodeCoverage also now provides a mechanism for getting feedback on its progress, and I’ve modified htmlcover to take advantage of that.  During the Sample phase, it reports on each new line number within the main routine, so you can get an idea of where it is in the overall process.

I corrected a few bugs in this version as well.  Please see the change log for details.

Posted in Regexen, SynergyDE | 1 Comment » RSS 2.0

Synthesis 2.0.2: more improvements to regexen, lists, and code coverage

January 20th, 2010 11:34:12 am pst by Sterling Camden

I have just released version 2.0.2 of the Synthesis library for Synergy/DE 9.3 and above.  You can download the library sources using the link at the bottom of this page.  This version includes a few minor corrections, and the following enhancements:

Regular Expressions (Regex class)

You can now back-reference a captured group within the pattern itself.  This is particularly useful for matching pairs of items, like quotes or HTML tags.  For instance, R$("/(['""]).*\1/") matches a quoted string, either in single or double quotes, but not mixed. Note that the duplicate " in the character class is merely used to insert a quote into a quoted string in Synergy/DE.  For repeating groups, the back-reference uses the most recently encountered value.  If a reference is made to a group that has not yet been captured, then a zero-length string is used.

If you want to using grouping but don’t want it numbered among potential back-references, you can now enclose it within (?: ). That can be useful when you need more than 9 groupings, because you can’t have more than 9 back-references (at least, not until I implement named groups).

You can now insert a comment in your regex even if you aren’t using Extended syntax mode, by enclosing it within (?# ).

Lists (ls class)

While tuning a QuickSort algorithm in C for a client, it occurred to me to try some of the same optimizations on the quicksort method of ls.  Changing the algorithm to use the “collapse the walls” approach and falling back to an insertion sort when the list contains fewer than five elements both improved performance significantly.  I experimented with trying to optimize the pivot value using both a median of 3 and a pseudomedian of 9, but neither of those approaches yielded any benefit.  I think that the overhead of invoking the compare test generally outweighs the benefit of using a better pivot value, extreme cases excepted.

Then it occurred to me to take another look at my mergesort method to see if that could be tuned at all.  Sure enough, I figured out how to cut down the number of copy operations by about half.  This improved performance dramatically (as you might expect), and made mergesort faster than quicksort when the list contains about 100,000 elements or more (down from 900,000 in the previous version, even with the improvements to quicksort).  Below 100,000 elements, quicksort still wins consistently.

So, I decided to make the sort method into a hybrid.  It chooses quicksort for lists of less than 100,000 items.  For more than that, it starts out as a mergesort, but for any segment that contains less than 100,000 items it switches to quicksort.  Since quicksort switches to Insertion sort on less than 5 items, we get a combination of three different sort algorithms, depending on the size of the set being sorted.  This hybrid performs comparably to quicksort up to 100,000 (as you would expect), but beyond that point it outperforms both mergesort and quicksort by taking advantage of the strengths of both approaches.

Of course, MergeSort and QuickSort bring other considerations to the table besides performance.  MergeSort uses O(N) additional memory, while QuickSort does everything in place.  However, MergeSort is stable (meaning, the order of duplicates is preserved), while my new quicksort algorithm is not.  Since the sort method combines these approaches, it is neither stable nor lean on memory, but it performs best.  So, depending on your needs, you can choose which sort to use:

  • Fast:  sort
  • Lean: quicksort
  • Stable:  mergesort

Code coverage

I improved the performance of the CodeCoverage class by caching the source file object, rather than always looking it up in the associative list.  That benefits sampling of executables that stay in one source file for a number of lines at a time.  I also improved the code coverage for most of the test routines.

Posted in Regexen, SynergyDE | No Comments » RSS 2.0

Synthesis 2.0.1: chasing bugs out of the woodwork with coverage analysis

December 31st, 2009 2:50:45 pm pst by Sterling Camden

I just released version 2.0.1 of my Synthesis library for Synergy/DE, which you can download below.

As you can see from the change log, this version grew mostly from the application of my tools for analyzing code coverage to my own creations.  By fleshing out my own unit tests to nearly 100% coverage, I shook out all nine corrections listed for this version.  So you can see from my own example that achieving 100% test coverage is not merely an academic exercise.  Code always looks more solid when it’s never executed.

These tests also revealed some areas in which my code coverage analysis needed improvement.  The six enhancements listed for the CodeCoverage class all came from those observations.

In fact, this version includes only one change that did not spring from my automated navel-gazing:  the addition of the ‘x’ option to Regex.  This, of course, allows white space, line breaks, and comments within a regular expression.  My other enhancement to Regex (optimizing out unnecessary code) was a result of code coverage analysis — because otherwise I might never have determined that the code removed was not necessary.

My code coverage only suffers from one major flaw, and that is not its own:  the debugger (on which it relies for sampling) sometimes exits prematurely when stepping through code (Synergex tr#30947).  You can tell when this happens by examining the coverage report for your mainline to see where coverage stops.  Unfortunately, one of the places where it regularly happens is within U_START, so analyzing an application that uses the Synergy UI Toolkit shall be an exercise in futility until such time as that tracker shall have been resolved.

However, I would highly recommend using this utility to analyze test coverage for code that doesn’t rely on the Toolkit – as well as for Toolkit code once that debugger bug is fixed.  It’s certainly pointed out some oversights in my code.

I have a few ideas on how to enhance this further:

  • Currently, the command line tools only include sampling from one executable.  I’d like to add the ability to sample multiple runs, combining their statistics.  That way, you could sample a whole test suite and see how well your code is covered by it.
  • Unreachable code is currently included in the analysis.  For instance, code that’s .ifdef’d out, or an mreturn at the end of a method that always throws an exception (the mreturn being required by the compiler).  I’m thinking of creating a way to tag code as “not to be included in coverage analysis”.
  • Once analyzing Toolkit code becomes possible, I can foresee sampling taking a lot longer than it needs to, because most users wouldn’t want to include the internals of Toolkit routines in the analysis.  I’d like to create a way to specify the source files that you want to include in sampling, so that the sampler could simply skip over the rest.

Anything else that you’d like to see here?

Posted in Regexen, SynergyDE, UI Toolkit | 1 Comment » RSS 2.0

Synthesis 2.0: code coverage analysis, .NET compatibility, and more

December 26th, 2009 3:08:06 pm pst by Sterling Camden

I can’t believe that it’s been more than two months since I unleashed the Synthesis library on the unsuspecting Synergy/DE world.  I promised then to enhance it further, so herewith I announce the arrival of Synthesis 2.0, which you can download via the link below.  I bumped the primary version number because this version involves numerous changes throughout, and some significant new features.  See the full change log here.

Synergy/DE Version 9.3.1

Synthesis 2.0 is fully compatible with Synergy/DE version 9.3.1.  Unfortunately, I was unable to maintain compatibility with earlier versions of Synergy/DE, so don’t upgrade Synthesis until you’re ready to upgrade Synergy/DE as well.

Synergy.NET beta

Synergy/DE 9.3.1 also includes a beta version of Synergy.NET (Synergy/DE for the .NET Framework) on 32-bit Windows platforms.  Synthesis 2.0 includes limited compatibility with this version as well.  See the Known Issues for what’s missing.  You can also search the code for DOTNET_BETA_WORKAROUND to see the code that I expect to support in the future versus what I’m doing now.

Code Coverage analysis

Synthesis 2.0 includes two new utilities for analyzing code coverage:  cover and htmlcover.  These are command-line utilities that can be run with either dbr or dbs.  The only difference between them is the format of their output – cover writes tab-delimited data to stdout, while htmlcover writes an HTML-formatted report to a file.

Both utilities sample code execution by driving a debug session of the executable and stepping a line at a time through the code.  For this purpose, they use the Synergy/DE remote debugger, which operates over Telnet.  Thus, I added classes to Synthesis to support the Telnet protocol and sockets.

You can sample the code coverage for any Synergy/DE executable (except for .NET, because the Telnet debugger isn’t supported there).  I’ve added a ‘coverage’ dependency to the makefile in the tests directory to generate HTML reports for each of the Synthesis tests into the coverage subdirectory.  I didn’t include those in the download because of their size, but you can generate them with a ‘build coverage’ in the tests directory, if you have PVCS Configuration Builder and you’re patient enough to wait for them (especially test_regex).  They’ve already revealed some areas where I need to beef up my tests.

For more information about how this works, see the CodeCoverage class.

Bug fixes

In several places, the comparison operators used for alphanumeric did not distinguish between strings of unequal length that match up to the lesser length (e.g., “bug” == “bugger”).  That’s been corrected.

I fixed two issues related to Regular Expressions:

  1. You may now enclose a begin-line or end-line anchor within parentheses.  Previously, they could only occur as the very first or last character in the pattern, respectively.  Now, for example,
    R$('/(^front|or_not)/')

    will correctly find “front” at the beginning of the line, or “or_not” anywhere.  Likewise for a $ anchor.

  2. Certain complex nested sub-expressions had a bug that prevented them from ever matching.

For the future

I still have much more to add to this library.  Here are some of the things on my list:

  • Improved .NET compatibility
  • Additional Regex features (full Perl 5 syntax)
  • More methods for ls (inject/fold, etc.)
  • Prevent multiple evaluation of assertion arguments when formatting a message
  • … and more.

Is there anything you’d like to see added?

Posted in .NET, OpenVMS, Regexen, SynergyDE, Unix, Windows | 1 Comment » RSS 2.0

A synthesis of new capabilities in Synergy/DE version 9

October 6th, 2009 12:43:46 pm pst by Sterling Camden

As I threatened in my last post, I’ve combined a lot of my Synergy/DE version 9 work into a single library, which I have named “synthesis”.  Why synthesis?

  • It’s a synthesis of most of my explorations of what I can do with Synergy/DE version 9.
  • The programming model is a synthesis of object-oriented and functional programming.
  • The terminology synthesizes vocabulary from Lisp, Ruby, C++, .NET, and historical Synergy/DE.
  • It needed to start with "syn" to fit my tagging conventions here on ChipsTips.
  • It embodies my “thesis” on the future of Synergy/DE programming.

Included in the download below are all the sources, mixins, tests, documentation, and PVCS archives for the synthesis library.  You can also access the documentation online here.  As usual, this content is published under the Open Works License, so you’re free to pull it apart, modify it, and reuse it in any application, commercial or personal, open or closed source – but please retain credit for the author (and I’d appreciate a link back here).

In the root directory you will find setenv.sh and setenv.bat.  These are scripts to setup the environment variables used for building the library.  setenv.sh is a *nix shell script, and setenv.bat is a Windows batch file.  If you don’t want to use the PVCS archives, also set NOVCS=1 in your environment.  Next, you’ll see the following directories:

  • sources – here’s where you build the library, which ends up in the main directory.
  • mixins – include files for mixins
  • prototypes – you’ll need to create this directory to hold the dbh files
  • tests – build the unit tests here
  • documentation – full documentation, starting at index.html

The build assumes you have PVCS Configuration Builder, so if you don’t you will need to translate makefile.mak in the sources and tests directories to whatever build script you prefer.

Enhancements

One of my reasons for combining these modules into a single library was so I could add more enhancements without duplicating effort.  Here are a few that I have added already:

I’m just getting warmed up, folks.

Posted in SynergyDE | 1 Comment » RSS 2.0

Regular Expression parser for Synergy/DE

September 26th, 2009 5:06:51 pm pst by Sterling Camden

Yes, Richard, I did it – I created another problem for everyone.  I’ve implemented a regex parser for Synergy/DE.  And because it’s written in Synergy/DE (as opposed to say, a C DLL), it’s portable between any number of Unices, Linux, Windows, and even OpenVMS – as long as you’re running Synergy/DE version 9.1.5 or above.

It performs pretty well, too – mostly because I started with an excellent (if limited) deterministic algorithm from Amer Gerzic.  I must admit that extending this algorithm to track parenthesized sub-expressions was one of the greatest programming challenges I’ve faced in quite some time.  It forced me to make the engine not quite as deterministic (there can now be more than one active state for each potential start of a match), but I completely avoided back-tracking – so performance suffers very little.  I was also able to optimize for the case in which no sub-expressions were compiled.

The breakthrough occurred to me while I was waiting in the doctor’s office for a colonoscopy – during which they would administer the Milk of Amnesia to make me forget the procedure.  I was so worried that it would wash away my brilliant algorithm that I asked my wife for a piece of paper so I could jot down some notes.  Fortunately, though, the anesthetic seemed to have only a limited effect (I was awake throughout the procedure – ewwww) and I remembered everything about my algorithm afterwards.

The trick was to abstract the notion of inputs beyond just characters received.  My implementation includes a transition for entering and exiting a potential sub-expression.  Whenever we can move in such a direction, we create a fork in the road for our deterministic engine – but rather than trying them one at a time, we try them all at once.  Each character thereafter collects the new states for each path we’re on until they dead-end or reach an accepting state.

I used a similar approach for handling line anchors.  Character classes also have more abstract transitions than just the characters they include, which reduces the number of paths we have to maintain in the state table — though it can increase the number of paths we’re following simultaneously.

Any way, it works, it’s fast, and I’m happy.  You can find full documentation here, and download the sources below.

I also added “match” and “replace” methods to Var, created mappers MapMatched(regex) and MapReplace(regex, with), and added some more useful methods to ls – all of which are included below.  I haven’t documented these or updated the official downloads for Var and ls yet, because it will involve adding everything from Regex.

Speaking of which, I’m thinking of consolidating the downloads for all of my Synergy/DE version 9 stuff and organizing it into a single library with a single include file and a right proper makefile.  What do you Synergistas think of that plan?

OK, I’ve given you regexen, now go save the day.

UPDATE: now merged into synthesis library.

Posted in Regexen, SynergyDE, Wildly popular | No Comments » RSS 2.0

Synergy/DE ls class gets set

September 11th, 2009 10:09:35 am pst by Sterling Camden

In a continuing project, I needed a Set data structure.  Sets are similar to lists, but with the additional constraint that they do not contain duplicated elements.  I considered creating a new Set class that would impose that constraint, but decided instead to follow the guiding principle of empowering users rather than protecting them from themselves.  Thus, I added a few methods to the ls class to support set-like behavior, while not preventing you from doing decidedly anti-setlike things with it if you so choose.  If you want to shoot yourself in the foot, be my guest – just don’t aim my way.

The first Set characteristic, adding members only if they aren’t already members, is essentially a union of the original set with the set of members being added.  So I extended the union method in two ways:  (1) it can now accept a non-arraylist object as an argument, in which case it will treat that as a list of one member, and (2) I added a union$ method that modifies the list in-place.  So the equivalent of the theoretical Set.add(object) is ls.union$(object).

Conversely, you also want to be able to remove members of a set without knowing their indices.  I added the methods remove and remove$, which take an object as argument and remove all occurrences of that object (even though when a list is used as a set, there should be no more than one).  Note that if the argument is a list, it will still be treated as an object by these methods, so you can remove an element that is itself a list.  To achieve set subtraction, I added the methods subtract and subtract$, which remove all the elements of one list from another.  I then provided a minus (-) operator, which invokes subtract for a list argument and remove for any other argument.

Next, I added some idiomatic Set language.  Method contains returns true if an object is a member of a list.  Like remove, it makes no distinction between lists and other objects.  For that distinction, I added methods issubsetof and issupersetof, which determine whether all the elements of one list are contained in another. The override for Object.Equals already provided a robust test for set equality.

Now I’m all set to get to work!

UPDATE: Merged into synthesis library.

Posted in SynergyDE | No Comments » RSS 2.0

Multimap support for Synergy/DE ls class

September 10th, 2009 12:54:06 pm pst by Sterling Camden

I’m working on a project in which I needed a Multimap data structure – that is, an association of key values with zero or more objects for each key.  Both the Hash and the alist methods of ls are based on the model of zero or one value per key.  But because ls implements the key/value pair as a sublist, it could be naturally extended to support more than one value.  That is what I have done.

I added a new method, keyadd, which adds a value to a key without removing any previous associations (as keyset and the indexer do).

The method keyget already returns multiple values as an ls, but if there is only one associated value then just that object is returned.  If there is no association, keyget returns null.  So if you’re expecting multiple values, you would need to check for null, then check whether the returned object is a list, then access its members.  To shortcut all that, I added a method keygetl, which always returns a list whether it contains many, one, or no elements.

I also noticed in my testing that using new ls() to create an empty list is not always syntactically convenient.  You can’t use new in every context where you can use a method call.  So I added a static method empty, which simply creates and returns an empty ls.

Posted in SynergyDE | No Comments » RSS 2.0

Synergy/DE ls class gets array conversion and deep mapping

September 5th, 2009 5:22:50 pm pst by Sterling Camden

Based on a request from Richard Barndt, I’ve added the ability to convert a System.Array (aka array@class) to an ls.  I added three new overloads to the from method, so you can convert arrays of up to three dimensions.  I’ll add more if there’s demand, but I hate repeating myself, and the Synergy/DE argument type matching doesn’t provide a way to define an argument as an array without specifying its dimensions – so I need one method for each number of possible dimensions.  The Synergy/DE macro facility isn’t smart enough to be able to write all of these methods for me — or maybe I’m not smart enough to figure out how to do it.  We’ll see if I have a sudden epiphany in my sleep or in the shower that makes this possible.

If the array has only one dimension, passing it to ls.from will produce a simple ls.  If two dimensions, the result is an ls containing ls members that contain the members of the array.  Three dimensions adds another level of lists within lists.

In testing this feature, I tried converting an array of strings, as in Richard’s request.  But I found that the result was not equal to a list that I composed by appending the same alpha values directly to a new ls.  That’s because ls.append automatically boxes primitive alpha values as Var, not string – but ls.from keeps the original type of each element.

If you want the resulting list to contain Vars instead of strings, then you can map the result using the map method:

ls.from(array).map(new MapAlpha())

This works fine for a one-dimensional array, but we’ll have a problem with multi-dimensioned arrays:  the sublists will get converted to their string representation.  Not what we wanted.  So, I added a new Mapper:  MapDeep.  MapDeep takes a mapper as a constructor argument, but it doesn’t apply it to members that are also lists – instread, it recurses its deep mapping to each of that list’s members.  So, converting a multi-dimensioned array of strings to a nested list of Vars can be accomplished like so:

ls.from(array).map(new MapDeep(new MapAlpha()))

UPDATE 2009-09-13: I added support for real arrays of primitive types, but I ran into a compiler bug (tr#30719) so it only works in version 9.3 and above (the fix didn’t make it into the 9.2.1 beta). It also only supports one dimensional arrays, because I’m having further compiler trouble that I haven’t been able to narrow down yet. I also had to make a change for 9.3 compatibility to the Object.Equals override in both ls.dbl and var.dbl.

Posted in SynergyDE | No Comments » RSS 2.0