The General Schema Language (GSL)DescriptionGSL is a scripting language developed by iMatix Corporation. It was first designed as a schema language for code generation and has grown into a powerful tool for manipulating XML data. GSL is related to a reporting language such as is used to generate reports from a relational database, in that it provides a mechanism for iterating through the data, performing calculations and outputing text based on the data. Unlike a reporting language it can also manipulate, create, load and save data. Many GSL concepts are borrowed directly from database terminology, to which it is closely related. Important ConceptsData TypesGSL recognises two data types: numeric and string. It makes no formal distinction between them; if a value looks numeric then it is treated as such, otherwise it is treated as a string. ConstantsA string constant is specified with either single- or double-quotes as delimiters, for example: "ABC". String constants may continue over several source lines. The line break is considered part of the string constant, unless the last character in the line is a single backslash (`\') in which case neither the backslash nor the line break is part of the string. A numeric constant is a simple number with an optional sign and optional decimal characters, for example 123 and -0.3. ScopesA scope corresponds to an XML item or, more precisely, it is the presentation of an XML item to a piece of GSL script. It typically, although not necessarily, has the same name as the XML item. A scope is created by the `for' and `new' instructions and closed by the corresponding `endfor' and `endnew'. In between these lines, the value and attributes of the XML item can be defined and accessed. Child items (henceforth children) of the XML item can also be made available by introducing another scope with another `for' instruction. At the start of processing of a script, two scopes are implicitly defined by GSLgen. The first corresponds to the top-level item of the XML file. We refer to this scope as the root scope. The second is named `global' and holds the predefined identifiers (see below) and command-line switches. Because these scopes are opened in this order, command-line switches take precedence over attributes of the root XML item when no scope name is specified. Scopes may be referred to by name or by number. A positive number refers to the index of the scope starting from the first opened scope. A negative number or zero refers to the index of the scope relative to the most recently opened scope. One more complication: Extended ScopesAn extended scope also corresponds to an XML item. In its simplest form it is just a scope specification. It may also contain any number of `member' specifications (`->'); these refer to children of the XML item, their children, and so on. This allows you to avoid introducing a new scope to access an only child or when you are only interested in the first child of that name. Some examples of extended scopes: world The simplest form - this refers to a scope named `world'. . Shorthand for the most recently opened scope. 1-> field The first child named `field' of the XML item corresponding to the first open scope. -> parent-> baby The first child named `baby' of the first child named `parent' of the XML item corresponding to the most recently opened scope. IdentifiersGSL identifiers refer to XML attribute or item values. (An item value is the text between its open and close tags.) It generally consists of an extended scope specification and an attribute name. If the attribute name is missing then the identifier refers to the item value. There are also some short-hand forms. There are a total of five different forms of identifier specification:
To avoid clashes with GSL reserved words, the names of scopes, attributes and items may be enclosed in square brackets, for example: [for]. Some examples: TABLE.FIELD: LENGTH: .NAME: TABLE-> INDEX. FIELD: .: Case SensitivityGSLgen has two modes of handling the case of XML item and attribute names. In the default mode, GSLgen matches names without regard to the case (upper or lower) used to specify them. In certain substitutions GSLgen modifies the case of the value of the identifier to match the case used to specify the attribute name. In case-sensitive mode, GSLgen matches names taking into account the case, and does not modify the case of the result. See the description of subsitutions for details. To change modes, set the value of the identifier `ignorecase' in the global scope to 0 or 1. Eg: `global.ignorecase = 0' ExpressionsGSL expressions are much the same as expressions in other high-level programming languages. They include the following operators:
Operator precedence is standard (multiplicative, additive, default, comparative, logical) and brackets are treated as you would expect. Logical operators treat zero as FALSE and non-zero as TRUE. GSLgen optimises expression evaluation to the extent that the second operand of a binary logical operator (`|', `&') is not evaluated if the result of the expression is determined by the first operand. This allows you to use expressions such as defined (X) & X since the second operator is not evaluated when X is undefined. The default operator allows undefined expressions to be replaced by another expression. The value of <expr1> ? [<expr2>] is equal to the value of <expr1>, if defined; otherwise it is equal to the value of <expr2>, whether or not the latter is defined. If the second operand <expr2> is omitted then the evaluation of the expression is `safe', that is, GSLgen does not object (when this is feasible) to the result of the expression being undefined. This feature can be used in symbol definitions and substitutions to make GSLgen accept an undefined expression. See the description of these instructions for details. The safe comparative operators return the same result as their equivalent comparative operators when both operands are defined. If one or both operator is undefined, the safe operators return FALSE while the normal operators produce an error. Notice that `a ?<> b' returns TRUE if both a and b are defined and they are not equal and FALSE otherwise. If an operand is not a constant then its type depends its value; if it looks like a number then it is treated as a number, otherwise it is treated as a string. Generally, additive, multiplicative and logical operators only apply to numeric operands. There are two cases where an arithmetic operator can apply to string values:
Built-In Functions
Template and Script ModesGSL is useful as both a template and a scripting language. In template mode, the default mode when GSLgen starts with an XML file, script lines begin with a point (`.') and all other lines are template lines. In script mode, the default mode when GSLgen starts with a GSL script, template lines begin with a `>' (not necessarily in the first column) and all other lines are script lines. You can perform exactly the same operations in template and script modes - the only difference is convenience for the type of application. You can change between template and script mode with the `template' and `endtemplate' commands. See the description of these commands below for details. Template LinesThe simplest template line is just text, which is copied verbatim to the current output file. If no output file has been opened, or if the last output file has been closed, the output is copied to the standard output. If the last character of an template line is a backslash (`\') then the line is output with no line terminator; otherwise a line terminator follows the template line. A backslash followed by another character is generally replaced by that character; this allows characters which would normally be interpreted as script commands to be output literally. There is one exception: `\n' is replaced by a line break. To output a backslash, use a double-backslash (`\\'). In template mode an template line is any line which does not begin with a point (`.'). If an template line must begin with a point, use a backslash immediately before the point. In script mode, an template line begins with a greater-than sign (`>'), which is dropped before the line in output. Script LinesIn template mode, these are introduced by a point (`.') as the first non-space character in the line. In script mode, any line that does not begin with a greater-than sign (`>') is a command line. In script mode, a script line may also begin with a point; this allows script commands to work in case the current mode is unknown. The script commands are described below. If a script command line ends with a backslash (`\') then the following script line is treated as a continuation of the current line. CommentsThere are three ways to include comments in GSL scripts. The first is to place a hyphen (`-') immediately after the point (`.') in a template mode or as the first character in script mode. The second is to place a hash (`#') after a GSL command. Any characters following the hyphen are ignored by GSLGen. The third way (pace Tony Blair) is to enclose comment text (which may continue over more than one line) inside comment markers (`/*' and `*/') just as in C. However if GSLGen finds these characters in a template line (but not inside a substitution) it assumes that they are destined for output, so does not treat them as a comment. Examples: .- This entire line is a comment .output "file" # This is a trailing comment .output /* This is an embedded multi-line comment */ "file" If this is a template line then /* this is not a comment */ $("but "/* this is */) Substituting Symbols and ExpressionsAt any point within a template line, and in many places (described below) in a script line, a substitute construct may be used instead of literal text. The format for expression subsitution is: $( <expr> [% format] [: pretty-print] ) The order of the format and pretty-print modifiers is not important. If the expression has a default operator with no second operand, and its result is undefined then the substitution resolves to an empty string. If a format string is provided, it is used to format the result before continuing. The format string is similar to that used by the printf function in C. It must contain exactly one conversion specification, consisting of zero or more of the flags `#', `0', `-', ` ' and `+', an optional minimem field width, an optional precision consisting of a point (`.') followed by an optional number, and a mandatory conversion specifier among the following: `d', `i', `o', `u', `x', `X', `e', `E', `f', `g', `c' and `s'. The data are always converted to the appropriate type (one of long int, double, char or char *) for the conversion string. Note that not all legal C format strings are allowed in GSL. The pretty-print modifier specifies how case modification and replacement of certain characters takes place. The valid pretty-print modifiers (not case-sensitive) are:
More than one pretty-print modifier may be specified; they should be separated by commas. If GSLgen is in ignore case mode, and a substition expression consists of a single identifier and no case-modifier is specified (c or cobol may still be specified), the case in which the identifier name is specified is used as an example to determine whether the case of the result should be modified to UPPER, lower or Neat. A final exception is that if an empty pretty-print string is provided, no case modification is performed. Some examples: Assume the identifier IDENT has the value `IDENT value' and identifer XXX is undefined.
What You Can SubstituteA substitution can appear at any place inside straight text (template line or string constant) or an operand in an expression. It can also replace a single name in an identifier specification, but not a point (`.') or member (`->'). Some examples: Assume the identifier IDENT has the value `NUM' and identifer NUM has the value `1'.
ShuffleGSLgen can help to keep code neat by enlarging or shrinking white space so that column numbers match as far as possible between the script and the output file. For instance, in the value of the identifier X is ABCDEF then: $(X) . evaluates to ABCDEF . but $(X?"Undefined") . evaluates to ABCDEF . The shuffle algorithm uses a parameter `shuffle' (actually an attribute of the global scope) whose numeric value influences the operation. It expands a block of white space longer than `shuffle' as much as necessary so that the text following the white space is output in the same column. It also shrinks white space down to a minimum of `shuffle' to make space for text preceeding the white space. If `shuffle' is zero, then shuffle is disabled. The default value of `shuffle' is 1; this is the value which produces the results shown above. If the current output ends with a backslash, then the shuffle continues on the following line. Thus $(X?"Undefined")\ . evaluates to ABCDEF . Shuffle can cause problems in some cases, for example when outputting literal text where the size of white space is important. In this case shuffle should be disabled with .shuffle = 0 COBOLGSLgen helps you make neat COBOL code by automatically filling the first six characters of each line with the four-digit line number followed by two zeroes. To enable this function, define an attribute `cobol' of the root item either using .cobol = 1 or gslgen -cobol etc. when you invoke GSLgen, or even (yuk) define an attribute COBOL right in your XML file. Predefined IdentifiersThere are some identifiers whose value is maintained by GSLgen. They are defined as attributes of the global item.
|