The download below contains code for evaluating expressions in a string. While we could simply use the eval function, in a typical application that would be unsafe. Users could key in function or variable references that might affect the code of the application itself. Besides, your application will likely want to do something with the contents of variables that get assigned along the way. So we handle this by evaluating the expression in code, and we let the client application pass us a code block that evaluates and stores values associated with identifiers. In the download, calc.rb contains all of the reusable classes and modules, while testcalc.rb is an example client application that accepts an expression and writes out the result in a loop.
All of the operators, including assignment and grouping (e.g., parentheses) can be defined by the client application. Thus, it is possible to write a purely arithmetic expression processor, or a purely Boolean one, or some other type I haven’t imagined yet. The example in testcalc.rb combines arithmetic and Boolean support.
Some syntactic elements are assumed. Space and comma can be used to separate tokens, but are not required unless you have two literals or two identifiers next to each other. Literal numeric values are automatically processed. If one contains a “.”, it is presumed to be a floating point literal, otherwise it is treated as integer. That may affect arithmetic operations. Any number of adjacent word characters is first evaluated against the list of registered operators. If it is a defined operator (or function), it is used as such. Otherwise, it is taken to be an identifier, which is then passed to the code block supplied by the client application in order to optionally store a value and always return one. A single non-word (and non-numeric) character is presumed to be an operator. If no operator for that symbol is registered, then an exception is raised.
Operators and functions are synonymous. An operator can specify a symbol, which can be a word or a non-word character. Unfortunately, it cannot be more than one non-word character (e.g., “==” and “!=” cannot be used, but “=”, “eq”, and “ne” can be). Perhaps with a little more work I could figure out an elegant way to support multi-character non-word operators. The problem with supporting them is that if I allow “==” to go through as a single token, then “=-” wants to as well, which creates problems for expressions such as x=-1.
Gathering of the arguments for an operation is somewhat context-free. That is, the arguments can precede or follow the operator. Which means that (given the usual operators):
(* (+ 3 2) 5)
all evaluate to 25. So whether you like it Polish, Reverse Polish, algebraic, or Lispish, you can have it your way. You can even mix them (using grouping where needed to resolve ambiguities). Likewise, for functions, the following are synonymous (given the function “sum” as in testcalc.rb):
x=(1 2 3 sum)
x=(1 sum (2 3))
(= x (sum 1 2 3))
As you can see, arguments in parentheses (or equivalent operators) are merely aggregated with the other arguments of the current function or operator.
I have written expression processors before, but none so compact and powerful as this one (a mere 260 lines of code, excluding comments and blank lines). It’s a real testimony to the power and elegance of the Ruby programming language. It also serves as an example of many of the features of Ruby, including code blocks, mix-in modules, reflection and dynamic method invocation. Since I used this exercise to teach myself Ruby, I am certain that somebody out there can improve upon it. Please send me your suggestions.
I blogged about this topic here.
UPDATE 2009-08-31: Changed the operation of the comma so that it is no longer merely a delimiter — it forces evaluation of the expression back to the previous comma or open parenthesis. This allows a unary negative (or positive, for that matter) to be passed as an argument to a function without enclosing it in parentheses.