Node:Top, Next:, Previous:(dir), Up:(dir)

Semantic is a program for Emacs which includes, at its core, a lexer, and a compiler compiler (bovinator). Additional tools include a bnf->semantic table converter, example tables, and a speedbar tool.

The core utility is the "semantic bovinator" which has similar behaviors as yacc or bison. Since it is not designed to be as feature rich as these tools, it uses the term "bovine" for cow, a lesser cousin of the yak and bison.

To send bug reports, or participate in discussions about semantic, use the mailing list cedet-semantic@sourceforge.net via the URL: <http://lists.sourceforge.net/lists/listinfo/cedet-semantic>


Node:Install, Next:, Previous:Top, Up:Top

Installation

To install semantic, untar the distribution into a subdirectory, such as /usr/share/emacs/site-lisp/semantic-#.#. Next, add the following lines into your individual .emacs file, or into site-lisp/site-start.el.

(setq semantic-load-turn-everything-on t)
(load-file "/path/to/semantic/semantic-load.el")

If you would like to turn individual tools on or off in your init file, skip the first line.


Node:Overview, Next:, Previous:Install, Up:Top

Overview

Semantic is a tool primarily for the Emacs-Lisp programmer. However, it comes with "applications" that non-programmer might find useful. This chapter is mostly for the benefit of these non-programmers as it gives brief descriptions of basic concepts such as grammars, parsers, compiler-compilers, parse-tree, etc.

The grammar of a natural language defines rules by which valid phrases and sentences can be composed using words, the fundamental units with which all sentences are created. In a similar fashion, a "context-free grammar" defines the rules by which programs can be composed using the fundamental units of the language, i.e., numbers, symbols, punctuations, etc. Context-free grammars are often specified in a well-known form called Backus-Naur Form, BNF for short. This is a systematic way of representing context-free grammars such that programs can read files with grammars written in BNF and generate code for "parser" of that language. YACC (Yet Another Compiler Compiler) is one such program that has been part of UNIX operating systems since the 1970's. YACC is pronounced the same as "yak", the long-haired ox found in Asia. The parser generated by YACC is usually a C program. Bison is also a "compiler compiler" that takes BNF grammars and produces parsers in C language. The difference between YACC and Bison is that Bison is free software and upward-compatible with YACC. It also comes with an excellent manual.

Semantic is similar in spirit to YACC and Bison. Semantic, however, is referred to as a bovinator rather than as a parser, because it is a lesser cousin of YACC and Bison. It is lesser in that it does not perform a full parse like YACC or Bison. Instead, it bovinates. "Bovination" refers to partial parsing which creates parse trees of only the top most expressions rather than parsing every nested expression. This is sufficient for the purposes for which semantic was designed. Semantic is meant to be used within Emacs for providing editor-related features such as code browsers and translators rather than for compiling which requires far more complex and complete parsers. Semantic is not designed to be able to create full parse trees.

One key benefit of semantic is that it creates parse trees (perhaps the term bovine tree may be more accurate) with the same structure regardless of the type of language involved. Higher level applications written to work with bovine trees will then work with any language for which the grammar is available. For example, a code browser written today that supports C, C++, and Java may work without any change on other languages that do not even exist yet. All one has to do is to write the BNF specification for the new language. The rest of the work is done by semantic. For certain languages, it is hard if not impossible to specify the syntax of the language in BNF form, e.g., texinfo and other document oriented languages. Semantic provides a parser for texinfo nevertheless. Instead of BNF grammar, texinfo files are "parsed" using Regexps.

Semantic comes with grammars for these languages:

Several tools employing semantic that provide user observable features are listed in Tools section.


Node:Semantic Components, Next:, Previous:Overview, Up:Top

Semantic Components

This chapter gives an overview of major components of semantic and how they interact with each other to perform its job.

The first step of parsing is to break up the input file into its fundamental components. This step is called lexing. The output of the lexer is a list of tokens that make up the file.

        syntax table, keywords list, and options
                         |
                         |
                         v
    input file  ---->  Lexer   ----> token stream

The next step is the parsing shown below.

                    bovine table
                         |
                         v
    token stream --->  Parser  ----> parse tree

The end result, the parse tree, is created based on the "bovine table", which is the internal representation of the BNF language grammar used by semantic.

Semantic database provides caching of the parse trees by saving them into files named semantic.cache automatically then loading them when appropriate instead of re-parsing. The reason for this is to save the time it takes to parse a file which could take several seconds or more for large files.

Finally, semantic provides an API for the Emacs-Lisp programmer to access the information in the parse tree.


Node:Lexing, Next:, Previous:Semantic Components, Up:Top

Preparing your language for Lexing

In order to reduce a source file into a token list, it must first be converted into a token stream. Tokens are syntactic elements such as whitespace, symbols, strings, lists, and punctuation.

The lexer uses the major-mode's syntax table for conversion. See Syntax Tables. As long as that is set up correctly (along with the important comment-start and comment-start-skip variable) the lexer should already work for your language.

The primary entry point of the lexer is the semantic-flex function shown below. Normally, you do not need to call this function. It is usually called by semantic-bovinate-toplevel for you.

semantic-flex start end &optional depth length Function
Using the syntax table, do something roughly equivalent to flex. Semantically check between START and END. Optional argument DEPTH indicates at what level to scan over entire lists. The return value is a token stream. Each element is a list of the form (symbol start-expression . end-expresssion). END does not mark the end of the text scanned, only the end of the beginning of text scanned. Thus, if a string extends past END, the end of the return token will be larger than END. To truly restrict scanning, use `narrow-to-region'. The last argument, LENGTH specifies that semantic-flex should only return LENGTH tokens.


Node:Lexer Overview, Next:, Previous:Lexing, Up:Lexing

Lexer Overview

Semantic lexer breaks up the content of an Emacs buffer into a list of tokens. This process is based mostly on regular expressions which in turn depend on the syntax table of the buffer's major mode being setup properly. See Major Modes. See Syntax Tables. See Regexps.

Specifically, the following regular expressions which rely on syntax tables are used:

\\s-
whitespace characters
\\sw
word constituent
\\s_
symbol constituent
\\s.
punctuation character
\\s<
comment starter
\\s>
comment ender
\\s\\
escape character
\\s)
close parenthesis character
\\s$
paired delimiter
\\s\"
string quote
\\s\'
expression prefix

In addition, Emacs' built-in features such as comment-start-skip, forward-comment, forward-list, and forward-sexp are employed.


Node:Lexer Output, Next:, Previous:Lexer Overview, Up:Lexing

Lexer Output

The lexer, semantic-flex, scans the content of a buffer and returns a token list. Let's illustrate this using this simple example.

00: /*
01:  * Simple program to demonstrate semantic.
02:  */
03:
04: #include <stdio.h>
05:
06: int i_1;
07:
08: int
09: main(int argc, char** argv)
10: {
11:     printf("Hello world.\n");
12: }

Evaluating (semantic-flex (point-min) (point-max)) within the buffer with the code above returns the following token list. The input line and string that produced each token is shown after each semi-colon.

((punctuation     52 .  53)     ; 04: #
 (INCLUDE         53 .  60)     ; 04: include
 (punctuation     61 .  62)     ; 04: <
 (symbol          62 .  67)     ; 04: stdio
 (punctuation     67 .  68)     ; 04: .
 (symbol          68 .  69)     ; 04: h
 (punctuation     69 .  70)     ; 04: >
 (INT             72 .  75)     ; 06: int
 (symbol          76 .  79)     ; 06: i_1
 (punctuation     79 .  80)     ; 06: ;
 (INT             82 .  85)     ; 08: int
 (symbol          86 .  90)     ; 08: main
 (semantic-list   90 . 113)     ; 08: (int argc, char** argv)
 (semantic-list  114 . 147)     ; 09-12: body of main function
 )

As shown above, the token list is a list of "tokens". Each token in turn is a list of the form

(TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)

where TOKEN-TYPE is a symbol, and the other two are integers indicating the buffer position that delimit the token such that

(buffer-substring BEGINNING-POSITION ENDING-POSITION)

would return the string form of the token.

Note that one line (line 4 above) can produce seven tokens while the whole body of the function produces a single token. This is because the depth parameter of semantic-flex was not specified. Let's see the output when depth is set to 1. Evaluate (semantic-flex (point-min) (point-max) 1) in the same buffer. Note the third argument of 1.

((punctuation    52 .  53)     ; 04: #
 (INCLUDE        53 .  60)     ; 04: include
 (punctuation    61 .  62)     ; 04: <
 (symbol         62 .  67)     ; 04: stdio
 (punctuation    67 .  68)     ; 04: .
 (symbol         68 .  69)     ; 04: h
 (punctuation    69 .  70)     ; 04: >
 (INT            72 .  75)     ; 06: int
 (symbol         76 .  79)     ; 06: i_1
 (punctuation    79 .  80)     ; 06: ;
 (INT            82 .  85)     ; 08: int
 (symbol         86 .  90)     ; 08: main

 (open-paren     90 .  91)     ; 08: (
 (INT            91 .  94)     ; 08: int
 (symbol         95 .  99)     ; 08: argc
 (punctuation    99 . 100)     ; 08: ,
 (CHAR          101 . 105)     ; 08: char
 (punctuation   105 . 106)     ; 08: *
 (punctuation   106 . 107)     ; 08: *
 (symbol        108 . 112)     ; 08: argv
 (close-paren   112 . 113)     ; 08: )

 (open-paren    114 . 115)     ; 10: {
 (symbol        120 . 126)     ; 11: printf
 (semantic-list 126 . 144)     ; 11: ("Hello world.\n")
 (punctuation   144 . 145)     ; 11: ;
 (close-paren   146 . 147)     ; 12: }
 )

The depth parameter "peeled away" one more level of "list" delimited by matching parenthesis or braces. The depth parameter can be specified to be any number. However, the parser needs to be able to handle the extra tokens.

This is an interesting benefit of the lexer having the full resources of Emacs at its disposal. Skipping over matched parenthesis is achieved by simply calling the built-in functions forward-list and forward-sexp.

All common token symbols are enumerated below. Additional token symbols aside from these can be generated by the lexer if user option semantic-flex-extensions is set. It is up to the user to add matching extensions to the parser to deal with the lexer extensions. An example use of semantic-flex-extensions is in semantic-make.el where semantic-flex-extensions is set to the value of semantic-flex-make-extensions which may generate shell-command tokens.

Default syntactic tokens if the lexer is not extended.

bol
Empty string matching a beginning of line. This token is produced only if the user set semantic-flex-enable-bol to non-nil.
charquote
String sequences that match \\s\\+.
close-paren
Characters that match \\s). These are typically ), }, ], etc.
comment
A comment chunk. These token types are not produced by default. They are produced only if the user set semantic-ignore-comments to nil.
newline
Characters matching \\s-*\\(\n\\|\\s>\\). This token is produced only if the user set semantic-flex-enable-newlines to non-nil.
open-paren
Characters that match \\s(. These are typically (, {, [, etc. Note that these are not usually generated unless the depth argument to semantic-flex is greater than 0.
punctuation
Characters matching \\(\\s.\\|\\s$\\|\\s'\\).
semantic-list
String delimited by matching parenthesis, braces, etc. that the lexer skipped over, because the depth parameter to semantic-flex was not high enough.
string
Quoted strings, i.e., string sequences that start and end with characters matching \\s\". The lexer relies on forward-sexp to find the matching end.
symbol
String sequences that match \\(\\sw\\|\\s_\\)+.
whitespace
Characters that match `\\s-+' regexp. This token is produced only if the user set semantic-flex-enable-whitespace to non-nil. If semantic-ignore-comments is non-nil too comments are considered as whitespaces.


Node:Lexer Options, Next:, Previous:Lexer Output, Up:Lexing

Lexer Options

Although most lexer functions are called for you by other semantic functions, there are ways for you to extend or customize the lexer. Three variables shown below serve this purpose.

semantic-flex-unterminated-syntax-end-function Variable
Function called when unterminated syntax is encountered. This should be set to one function. That function should take three parameters. The SYNTAX, or type of syntax which is unterminated. SYNTAX-START where the broken syntax begins. FLEX-END is where the lexical analysis was asked to end. This function can be used for languages that can intelligently fix up broken syntax, or the exit lexical analysis via throw or signal when finding unterminated syntax.

semantic-flex-extensions Variable
Buffer local extensions to the lexical analyzer. This should contain an alist with a key of a regex and a data element of a function. The function should both move point, and return a lexical token of the form:
( TYPE START . END)

nil is a valid return value. TYPE can be any type of symbol, as long as it doesn't occur as a nonterminal in the language definition.

semantic-flex-syntax-modifications Variable
Changes the syntax table for a given buffer. These changes are active only while the buffer is being flexed. This is a list where each element has the form
(CHAR CLASS)

CHAR is the char passed to `modify-syntax-entry', and CLASS is the string also passed to `modify-syntax-entry' to define what syntax class CHAR has.

(setq semantic-flex-syntax-modifications '((?. "_"))

This makes the period . a symbol constituent. This may be necessary if filenames are prevalent, such as in Makefiles.

semantic-flex-enable-newlines Variable
When flexing, report 'newlines as syntactic elements. Useful for languages where the newline is a special case terminator. Only set this on a per mode basis, not globally.

semantic-flex-enable-whitespace Variable
When flexing, report 'whitespace as syntactic elements. Useful for languages where the syntax is whitespace dependent. Only set this on a per mode basis, not globally.

semantic-flex-enable-bol Variable
When flexing, report beginning of lines as syntactic elements. Useful for languages like python which are indentation sensitive. Only set this on a per mode basis, not globally.

semantic-number-expression Variable
Regular expression for matching a number. If this value is nil, no number extraction is done during lex. This expression tries to match C and Java like numbers.
DECIMAL_LITERAL:
    [1-9][0-9]*
  ;
HEX_LITERAL:
    0[xX][0-9a-fA-F]+
  ;
OCTAL_LITERAL:
    0[0-7]*
  ;
INTEGER_LITERAL:
    <DECIMAL_LITERAL>[lL]?
  | <HEX_LITERAL>[lL]?
  | <OCTAL_LITERAL>[lL]?
  ;
EXPONENT:
    [eE][+-]?[09]+
  ;
FLOATING_POINT_LITERAL:
    [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
  | [.][0-9]+<EXPONENT>?[fFdD]?
  | [0-9]+<EXPONENT>[fFdD]?
  | [0-9]+<EXPONENT>?[fFdD]
  ;


Node:Keywords, Next:, Previous:Lexer Options, Up:Lexing

Keywords

Another important piece of the lexer is the keyword table (see Settings). You language will want to set up a keyword table for fast conversion of symbol strings to language terminals.

The keywords table can also be used to store additional information about those keywords. The following programming functions can be useful when examining text in a language buffer.

semantic-flex-keyword-p text Function
Return non-nil if TEXT is a keyword in the keyword table.

semantic-flex-keyword-put text property value Function
For keyword TEXT, set PROPERTY to VALUE.

semantic-token-put-no-side-effect token key value Function
For TOKEN, put the property KEY on it with VALUE without side effects. If VALUE is nil, then remove the property from TOKEN. All cons cells in the property list are replicated so that there are no side effects if TOKEN is in shared lists.

semantic-flex-keyword-get text property Function
For keyword TEXT, get the value of PROPERTY.

semantic-flex-map-keywords fun &optional property Function
Call function FUN on every semantic keyword. If optional PROPERTY is non-nil, call FUN only on every keyword which has a PROPERTY value. FUN receives a semantic keyword as argument.

semantic-flex-keywords &optional property Function
Return a list of semantic keywords. If optional PROPERTY is non-nil, return only keywords which have PROPERTY set.

Keyword properties can be set up in a BNF file for ease of maintenance. While examining the text in a language buffer, this can provide an easy and quick way of storing details about text in the buffer.


Node:Keyword Properties, Previous:Keywords, Up:Lexing

Standard Keyword Properties

Add known properties here when they are known.


Node:Bovinating, Next:, Previous:Lexing, Up:Top

Preparing a bovine table for your language

When converting a source file into a nonterminal token stream (parse-tree) it is important to specify rules to accomplish this. The rules are stored in the buffer local variable semantic-toplevel-bovine-table.

While it is certainly possible to write this table yourself, it is most likely you will want to use the BNF converter (see See BNF conversion.) This is an easier method for specifying your rules. You will still need to specify a variable in your language for the table, however. A good rule of thumb is to call it language-toplevel-bovine-table if it part of the language, or semantic-toplevel-language-bovine-table if you donate it to the semantic package.

When initializing a major-mode for your language, you will set the variable semantic-toplevel-bovine-table to the contents of your language table. semantic-toplevel-bovine-table is always buffer local.

Since it is important to know the format of the table when debugging , you should still attempt to understand the basics of the table.

Please see the documentation for the variable semantic-toplevel-bovine-table for details on its format.

* add more doc here *


Node:BNF conversion, Next:, Previous:Bovinating, Up:Top

Using the BNF converter to make bovine tables

The BNF converter takes a file in "Bovine Normal Form" which is similar to "Backus-Naur Form". If you have ever used yacc or bison, you will find it similar. The BNF form used by semantic, however, does not include token precedence rules, and several other features needed to make real parser generators.

It is important to have an Emacs Lisp file with a variable ready to take the output of your table (see See Bovinating.) Also, make sure that the file semantic-bnf.el is loaded. Give your language file the extension .bnf and you are ready.

The comment character is #.

When you want to test your file, use the keyboard shortcut C-c C-c to parse the file, generate the variable, and load the new definition in. It will then use the settings specified above to determine what to do. Use the shortcut C-c c to do the same thing, but spend extra time indenting the table nicely.

Make sure that you create the variable specified in the %parsetable token before trying to convert the BNF file. A simple definition like this is sufficient.

(defvar semantic-toplevel-lang-bovine-table
   nil
   "Table for use with semantic for parsing LANG.")

If you use tokens (created with the %token specifier), also make sure you have a keyword table available, like this:

(defvar semantic-lang-keyword-table
   nil
   "Table for use with semantic for keywords.")

Specify the name of the keyword table with the %keywordtable specifier.

The BNF file has two sections. The first is the settings section, and the second is the language definition, or list of semantic rules.


Node:Settings, Next:, Previous:BNF conversion, Up:BNF conversion

Settings

A setting is a keyword starting with a %. (This syntax is taken from yacc and bison. See (bison).)

There are several settings that can be made in the settings section. They are:

%start <nonterminal> Setting
Specify an alternative to bovine-toplevel. (See below)

%scopestart <nonterminal> Setting
Specify an alternative to bovine-inner-scope.

%outputfile <filename> Setting
Required. Specifies the file into which this files output is stored.

%parsetable <lisp-variable-name> Setting
Required. Specifies a lisp variable into which the output is stored.

%setupfunction <lisp-function-name> Setting
Required. Name of a function into which setup code is to be inserted.

%keywordtable <lisp-variable-name> Setting
Required if there are %token keywords. Specifies a lisp variable into which the output of a keyword table is stored. This obarray is used to turn symbols into keywords when applicable.

%token <name> "<text>" Setting
Optional. Specify a new token NAME. This is added to a lexical keyword list using TEXT. The symbol is then converted into a new lexical terminal. This requires that the %keywordtable specified variable is available in the file specified by %outputfile.

%token <name> type "<text>" Setting
Optional. Specify a new token NAME. It is made from an existing lexical token of type TYPE. TEXT is a string which will be matched explicitly. NAME can be used in match rules as though they were flex tokens, but are converted back to TYPE "text" internally.

%put <NAME> symbol <VALUE> Setting
%put <NAME> ( symbol1 <VALUE1> symbol2 <VALUE2> ... ) Setting
%put ( <NAME1> <NAME2>...) symbol <VALUE> Setting
Tokens created without a type are considered keywords, and placed in a keyword table. Use %put to apply properties to that keyword. (see Lexing).

%languagemode <lisp-function-name> Setting
%languagemode ( <lisp-function-name1> <lisp-function-name2> ... ) Setting
Optional. Specifies the Emacs major mode associated with the language being specified. When the language is converted, all buffers of this mode will get the new table installed.

%quotemode backquote Setting
Optional. Specifies how symbol quoting is handled in the Optional Lambda Expressions. (See below)

%( )% Setting
Specify setup code to be inserted into the %setupfunction. It will be inserted between two specifier strings, or added to the end of the function.

When working inside %( ... )% tokens, any lisp expression can be entered which will be placed inside the setup function. In general, you probably want to set variables that tell Semantic and related tools how the language works.

Here are some variables that control how different programs will work with your language.

semantic-flex-depth Variable
Default flexing depth. This specifies how many lists to create tokens in.

semantic-number-expression Variable
Regular expression for matching a number. If this value is nil, no number extraction is done during lex. Symbols which match this expression are returned as number tokens instead of symbol tokens.

The default value for this variable should work in most languages.

semantic-flex-extensions Variable
Buffer local extensions to the lexical analyzer. This should contain an alist with a key of a regex and a data element of a function. The function should both move point, and return a lexical token of the form:
( TYPE START . END)

nil is also a valid return. TYPE can be any type of symbol, as long as it doesn't occur as a nonterminal in the language definition.

semantic-flex-syntax-modifications Variable
Updates to the syntax table for this buffer. These changes are active only while this file is being flexed. This is a list where each element is of the form:
(CHAR CLASS)
Where CHAR is the char passed to modify-syntax-entry, and CLASS is the string also passed to modify-syntax-entry to define what class of syntax CHAR is.

semantic-flex-enable-newlines Variable
When flexing, report 'newlines as syntactic elements. Useful for languages where the newline is a special case terminator. Only set this on a per mode basis, not globally.

semantic-ignore-comments Variable
Default comment handling. t means to strip comments when flexing. Nil means to keep comments as part of the token stream.

semantic-symbol->name-assoc-list Variable
Association between symbols returned, and a string. The string is used to represent a group of objects of the given type. It is sometimes useful for a language to use a different string in place of the default, even though that language will still return a symbol. For example, Java return's includes, but the string can be replaced with Imports.

semantic-case-fold Variable
Value for case-fold-search when parsing.

semantic-expand-nonterminal Variable
Function to call for each nonterminal production. Return a list of non-terminals derived from the first argument, or nil if it does not need to be expanded. Languages with compound definitions should use this function to expand from one compound symbol into several. For example, in C the definition
int a, b;
is easily parsed into one token, but represents multiple variables. A functions should be written which takes this compound token and turns it into two tokens, one for A, and the other for B.

Within the language definition (the .bnf sources), it is often useful to set the NAME slot of a token with a list of items that distinguish each element in the compound definition.

This list can then be detected by the function set in semantic-expand-nonterminal to create multiple tokens. This function has one additional duty of managing the overlays created by semantic. It is possible to use the single overlay in the compound token for all your tokens, but this can pose problems identifying all tokens covering a given definition.

Please see semantic-java.el for an example of managing overlays when expanding a token into multiple definitions.

semantic-override-table Variable
Buffer local semantic function overrides alist. These overrides provide a hook for a `major-mode' to override specific behaviors with respect to generated semantic toplevel nonterminals and things that these non-terminals are useful for. Each element must be of the form: (SYM . FUN) where SYM is the symbol to override, and FUN is the function to override it with.

Available override symbols:

SYMBOL PARAMETERS DESCRIPTION
find-dependency (token) Find the dependency file
find-nonterminal (token & parent) Find token in buffer.
find-documentation (token & nosnarf) Find doc comments.
abbreviate-nonterminal (token & parent) Return summary string.
summarize-nonterminal (token & parent) Return summary string.
prototype-nonterminal (token) Return a prototype string.
concise-prototype-nonterminal' (tok & parent color) Return a concise prototype string.
uml-abbreviate-nonterminal' (tok & parent color) Return a UML standard abbreviation string.
uml-prototype-nonterminal' (tok & parent color) Return a UML like prototype string.
uml-concise-prototype-nonterminal' (tok & parent color) Return a UML like concise prototype string.
prototype-file (buffer) Return a file in which prototypes are placed
nonterminal-children (token) Return first rate children. These are children which may contain overlays.
nonterminal-external-member-parent (token) Parent of TOKEN
nonterminal-external-member-p (parent token) Non nil if TOKEN has PARENT, but is not in PARENT.
nonterminal-external-member-children (token & usedb) Get all external children of TOKEN.
nonterminal-protection (token & parent) Return protection as a symbol.
nonterminal-abstract (token & parent) Return if TOKEN is abstract.
nonterminal-leaf (token & parent) Return if TOKEN is leaf.
nonterminal-static (token & parent) Return if TOKEN is static.
beginning-of-context (& point) Move to the beginning of the current context.
end-of-context (& point) Move to the end of the current context.
up-context (& point) Move up one context level.
get-local-variables (& point) Get local variables.
get-all-local-variables(& point) Get all local variables.
get-local-arguments (& point) Get arguments to this function.
end-of-command Move to the end of the current command
beginning-of-command Move to the beginning of the current command
ctxt-current-symbol (& point) List of related symbols.
ctxt-current-assignment(& point) Variable being assigned to.
ctxt-current-function (& point) Function being called at point.
ctxt-current-argument (& point) The index to the argument of the current function the cursor is in.

Parameters mean:

&
Following parameters are optional
buffer
The buffer in which a token was found.
token
The nonterminal token we are doing stuff with
parent
If a TOKEN is stripped (of positional information) then this will be the parent token which should have positional information in it.

semantic-type-relation-separator-character Variable
Character strings used to separation a parent/child relationship. This list of strings are used for displaying or finding separators in variable field dereferencing. The first character will be used for display. In C, a type field is separated like this: "type.field" thus, the character is a ".". In C, and additional value of "->" would be in the list, so that "type->field" could be found.

semantic-dependency-include-path Variable
Defines the include path used when searching for files. This should be a list of directories to search which is specific to the file being included. This variable can also be set to a single function. If it is a function, it will be called with one arguments, the file to find as a string, and it should return the full path to that file, or nil.

This configures Imenu to use semantic parsing.

imenu-create-index-function Variable
The function to use for creating a buffer index.

It should be a function that takes no arguments and returns an index of the current buffer as an alist.

Simple elements in the alist look like (INDEX-NAME . INDEX-POSITION). Special elements look like (INDEX-NAME INDEX-POSITION FUNCTION ARGUMENTS...). A nested sub-alist element looks like (INDEX-NAME SUB-ALIST). The function imenu--subalist-p tests an element and returns t if it is a sub-alist.

This function is called within a save-excursion.

The variable is buffer-local.

These are specific to the document tool.

document-comment-start
Comment start string.
document-comment-line-prefix
Comment prefix string. Used at the beginning of each line.
document-comment-end
Comment end string.


Node:Rules, Next:, Previous:Settings, Up:BNF conversion

Rules

Writing the rules should be very similar to bison for basic syntax. Each rule is of the form

RESULT : MATCH1 (optional-lambda-expression)
       | MATCH2 (optional-lambda-expression)
       ;

RESULT is a non-terminal, or a token synthesized in your grammar. MATCH is a list of elements that are to be matched if RESULT is to be made. The optional lambda expression is a list containing simplified rules for concocting the parse tree.

In bison, each time an element of a MATCH is found, it is "shifted" onto the parser stack. (The stack of matched elements.) When all of MATCH1's elements have been matched, it is "reduced" to RESULT. See (bison)Algorithm.

The first RESULT written into your language specification should be bovine-toplevel, or the symbol specified with %start. When starting a parse for a file, this is the default token iterated over. You can use any token you want in place of bovine-toplevel if you specify what that nonterminal will be with a %start token in the settings section.

MATCH is made up of symbols and strings. A symbol such as foo means that a syntactic token of type foo must be matched. A string in the mix means that the previous symbol must have the additional constraint of exactly matching it. Thus, the combination:

symbol "moose"

means that a symbol must first be encountered, and then it must string-match "moose". Be especially careful to remember that the string is a regular expression. The code:

punctuation "."

will match any punctuation.

For the above example in bison, a LEX rule would be used to create a new token MOOSE. In this case, the MOOSE token would appear. For the bovinator, this task was mixed into the language definition to simplify implementation, though Bison's technique is more efficient.

To make a symbol match explicitly for keywords, for example, you can use the %token command in the settings section to create new symbols.

%token MOOSE "moose"

find_a_moose: MOOSE
            ;

will match "moose" explicitly, unlike the previous example where moose need only appear in the symbol. This is because "moose" will be converted to MOOSE in the lexical analysis stage. Thus the symbol MOOSE won't be available any other way.

If we specify our token in this way:

%token MOOSE symbol "moose"

find_a_moose: MOOSE
            ;

then MOOSE will match the string "moose" explicitly, but it won't do so at the lexical level, allowing use of the text "moose" in other forms of regular expressions.

Non symbol tokens are also allowed. For example:

%token PERIOD punctuation "."

filename : symbol PERIOD symbol
         ;

will explicitly match one period when used in the above rule.

See Default syntactic tokens.


Node:Optional Lambda Expression, Next:, Previous:Rules, Up:BNF conversion

Optional Lambda Expressions

The OLE (Optional Lambda Expression) is converted into a bovine lambda (see See Bovinating.) This lambda has special short-cuts to simplify reading the Emacs BNF definition. An OLE like this:

( $1 )

results in a lambda return which consists entirely of the string or object found by matching the first (zeroth) element of match. An OLE like this:

( ,(foo $1) )

executes `foo' on the first argument, and then splices its return into the return list whereas:

( (foo $1) )

executes foo, and that is placed in the return list.

Here are other things that can appear inline:

$1
the first object matched.
,$1
the first object spliced into the list (assuming it is a list from a non-terminal)
'$1
the first object matched, placed in a list. i.e. ( $1 )
foo
the symbol foo (exactly as displayed)
(foo)
a function call to foo which is stuck into the return list.
,(foo)
a function call to foo which is spliced into the return list.
'(foo)
a function call to foo which is stuck into the return list in a list.
(EXPAND $1 nonterminal depth)
a list starting with EXPAND performs a recursive parse on the token passed to it (represented by $1 above.) The semantic list is a common token to expand, as there are often interesting things in the list. The nonterminal is a symbol in your table which the bovinator will start with when parsing. nonterminal's definition is the same as any other nonterminal. depth should be at least 1 when descending into a semantic list.
(EXPANDFULL $1 nonterminal depth)
is like EXPAND, except that the parser will iterate over nonterminal until there are no more matches. (The same way the parser iterates over bovine-toplevel. This lets you have much simpler rules in this specific case, and also lets you have positional information in the returned tokens, and error skipping.
(ASSOC symbol1 value1 symbol2 value2 ... )
This is used for creating an association list. Each SYMBOL is included in the list if the associated VALUE is non-nil. While the items are all listed explicitly, the created structure is an association list of the form:
( ( symbol1 . value1) (symbol2 . value2) ... )

If the symbol %quotemode backquote is specified, then use ,@ to splice a list in, and , to evaluate the expression. This lets you send $1 as a symbol into a list instead of having it expanded inline.


Node:Examples, Next:, Previous:Optional Lambda Expression, Up:BNF conversion

Examples

The rule:

SYMBOL : symbol

is equivalent to

SYMBOL : symbol
         ( $1 )

which, if it matched the string "A", would return

( "A" )

If this rule were used like this:

ASSIGN: SYMBOL punctuation "=" SYMBOL
        ( $1 $3 )

it would match "A=B", and return

( ("A") ("B") )

The letters A and B come back in lists because SYMBOL is a nonterminal, not an actual lexical element.

to get a better result with nonterminals, use , to splice lists in like this;

ASSIGN: SYMBOL punctuation "=" SYMBOL
        ( ,$1 ,$3 )

which would return

( "A" "B" )


Node:Style Guide, Previous:Examples, Up:BNF conversion

Semantic Token Style Guide

In order for a generalized program using Semantic to work with multiple languages, it is important to have a consistent meaning for the contents of the tokens returned. The variable semantic-toplevel-bovine-table is documented with the complete list of a tokens that a functional or OO language may use. While any given language is free to create their own tokens, such a language definition would not produce a stream of tokens usable by a generalized tool.

Minimum Requirements

In general, all tokens returned from a parser should be generated with the following form:

("NAME" type-symbol ... "DOCSTRING" PROPERTIES OVERLAY)

NAME and type-symbol are the only syntactic elements of a nonterminal which are guaranteed to exist. This means that a parser which uses nil for either of these two slots, or some value which is not type consistent is wrong.

NAME is also guaranteed to be a string. This string represents the name of the nonterminal, usually a named definition which the language will use elsewhere as a reference to the syntactic element found.

type-symbol is a symbol representing the type of the nonterminal. Valid type-symbols can be anything, as long is it is an Emacs Lisp symbol.

DOCSTRING is a required slot in the nonterminal, but can be nil. Some languages have the documentation saved as a comment nearby. In these cases, DOCSTRING is nil, and the function `semantic-find-documentation'.

PROPERTIES is a slot generated by the semantic parser harness, and need not be provided by a language author. Programmatically access nonterminal properties with semantic-token-put and semantic-token-get to access properties.

OVERLAY represents positional information for this token. It is automatically generated by the semantic parser harness, and need not be provided by the language author, unless they provide a nonterminal expansion function via semantic-expand-nonterminal.

The OVERLAY property is accessed via several functions returning the beginning, end, and buffer of a token. Use these functions unless the overlay is really needed (see Token Queries). Depending on the overlay in a program can be dangerous because sometimes the overlay is replaced with an integer pair

[ START END ]
when the buffer the token belongs to is not in memory. This happens when a using has activated the Semantic Database semanticdb.

Nonterminals for Functional Languages.

If a parser produces tokens for a functional language, then the following token formats are available.

Variable
("NAME" variable "TYPE" DEFAULT-VALUE EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
TYPE is a string representing the type of this variable. TYPE can be nil for untyped languages. Languages which support variable declarations without a type (Such as C) should supply a string representing the default type for that language.

DEFAULT-VALUE can be a string, or something pre-parsed and language specific. Hopefully this slot will be better defined in future versions of Semantic.

EXTRA-SPEC are extra specifiers. See below.

Function
("NAME" function "TYPE" ( ARG-LIST ) EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
TYPE is a string representing the return type of this function or method. type can be nil for untyped languages, or for procedures in languages which support functions with no return data. See above for more.

ARG-LIST is a list of arguments passed to this function. Each element in the arg list can be one of the following:

Semantic Token
A full semantic token with positional information.
A partial semantic token
Partial tokens may contain the NAME slot, token-symbol, and possibly a TYPE.
String
A string representing the name of the argument. Common in untyped languages.

Type Declaration
("NAME" type "TYPE" ( PART-LIST ) ( PARENTS ) EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
TYPE a string representing the of the type, such as (in C) "struct", "union", "enum", "typedef", or "class". The TYPE for a type token should not be nil, as even untyped languages with structures have type types.

PART-LIST is the list of individual entries inside compound types. Structures, for example, can contain several fields which can be represented as variables. Valid entries in a PART-LIST are:

Semantic Token
A full semantic token with positional information.
A partial semantic token
Partial tokens may contain the NAME slot, token-symbol, and possibly a TYPE.
String
A string representing the name of the slot or field. Common in untyped languages.

PARENTS represents a list of parents of this type. Parents are used in two situations.

Inheritance
For types which inherit from other types of the same type-type (Such as classes).
Aliases
For types which are aliases of other types, the parent type is the type being aliased. The Types' type is the command specifying that it is an alias (Such as "typedef" in C or C++).

The structure of the PARENTS list is of this form:

( EXPLICIT-PARENTS . INTERFACE-PARENTS)
EXPLICIT-PARENTS can be a single string (Just one parent) or a list of parents (in a multiple inheritance situation. It can also be nil.

INTERFACE-PARENTS is a list of strings representing the names of all INTERFACES, or abstract classes inherited from. It can also be nil.

This slot can be interesting because the form:

( nil "string")
is a valid parent where there is no explicit parent, and only an interface.
Include files
("FILE" include SYSTEM "DOCSTRING" PROPERTIES OVERLAY)
A statement which gets additional definitions from outside the current file, such as an #include statement in C. In this case, instead of NAME, a FILE is specified. FILE can be a subset of the actual file to be loaded.

SYSTEM is true if this include is part of a set of system includes. This field isn't currently being used and may be eliminated.

Package & Provide statements
("NAME" package DETAIL "DOCSTRING" PROPERTIES OVERLAY)
A statement which declares a given file is part of a package, such as the Java package statement, or a provide in Emacs Lisp.

DETAIL might be an associated file name, or some other language specific bit of information.

Extra Specifiers

Some default token types have a slot EXTRA-SPEC, for extra specifiers. These specifiers provide additional details not commonly used, or not available in all languages. This list is an alist, and if a given key is nil, it is not in the list, saving space. Some valid extra specifiers are:

(parent . "text")
Name of a parent type/class. This is not the same as a parent for a type. In C++ and CLOS allow the creation of a function outside the body of that class. Such functions will set the parent specifier to a plain text string which is the name of that parent.
(dereference . INT)
Number of levels of dereference. In C, the number of array dimensions.
(pointer . INT)
Number of levels of pointers. In C, the number of * characters.
(typemodifiers . ( "text" ... ))
Keyword modifiers for a type. In C, such words would include register' and volatile'
(suffix . "text")
Suffix information for a variable. Not currently used.
(const . t)
This exists if the variable or function return value is constant.
(throws . ( "text" ... ))
For functions or methods in languages that support typed signal throwing, this is a list of exceptions that can be thrown.
(destructor . t)
This exists for functions which are destructor methods in a class definition. In C++, a destructor's name excludes the ~ character. When producing the name of the function, the ~ is added back in.
(constructor . t)
This exists for functions which are constructors in a class definition. In C++ this is t when the name of this function is the same as the name of the parent class.
(user-visible . t)
For functions in interpreted languages such as Emacs Lisp, this signals that a function or variable is user visible. In Emacs Lisp, this means a function is interactive.
(prototype . t)
For functions or variables that are not declared locally, a prototype is something that will define that function or variable for use. In C, the term represents prototypes generally used in header files. In Emacs Lisp, the autoload statement creates prototypes.


Node:Compiling, Next:, Previous:BNF conversion, Up:Top

Compiling a language file with the bovinator

From a program you can use the function semantic-bovinate-toplevel. This function takes one optional parameter specifying if the cache should be refreshed. By default, the cached results of the last parse are always used. Specifying that the cache should be checked will cause it to be flushed if it is out of date.

Another function you can use is semantic-bovinate-nonterminal. This command takes a token stream returned by the function semantic-flex followed by a DEPTH (as above). This takes an additional optional argument of NONTERMINAL which is the nonterminal in your table it is to start parsing with.

bovinate &optional clear Command
Bovinate the current buffer. Show output in a temp buffer. Optional argument CLEAR will clear the cache before bovinating.

semantic-clear-toplevel-cache Command
Clear the toplevel bovine cache for the current buffer. Clearing the cache will force a complete reparse next time a token stream is requested.

semantic-bovinate-toplevel &optional checkcache Function
Bovinate the entire current buffer. If the optional argument CHECKCACHE is non-nil, then flush the cache iff there has been a size change.


Node:Debugging, Next:, Previous:Compiling, Up:Top

Debugging

Writing language files using BNF is significantly easier than writing then using regular expressions in a functional manner. Debugging them, however, can still prove challenging.

There are two ways to debug a language definition if it is not behaving as expected. One way is to debug against the source .bnf file. The second is to debug against the lisp table created from the .bnf source, or perhaps written by hand.

If your language definition was written in BNF notation, debugging is quite easy. The command bovinate-debug will start you off.

bovinate-debug Command
Bovinate the current buffer and run in debug mode.

If you prefer debugging against the Lisp table, find the table in a buffer, place the cursor in it, and use the command semantic-bovinate-debug-set-table in it.

semantic-bovinate-debug-set-table &optional clear Command
Set the table for the next debug to be here. Optional argument CLEAR to unset the debug table.

After the table is set, the bovinate-debug command can be run at any time for the given language.

While debugging, two windows are visible. One window shows the file being parsed, and the syntactic token being tested is highlighted. The second window shows the table being used (either in the BNF source, or the Lisp table) with the current rule highlighted. The cursor will sit on the specific match rule being tested against.

In the minibuffer, a brief summary of the current situation is listed. The first element is the syntactic token which is a list of the form:

(TYPE START . END)

The rest of the display is a list of all strings collected for the currently tested rule. Each time a new rule is entered, the list is restarted. Upon returning from a rule into a previous match list, the previous match list is restored, with the production of the dependent rule in the list.

Use C-g to stop debugging. There are no commands for any fancier types of debugging.


Node:Programming, Next:, Previous:Debugging, Up:Top

Programming

Once a source file has been parsed, the following APIs can be used to write programs that use the token stream most effectively.


Node:Token Queries, Next:, Previous:Programming, Up:Programming

Token Queries

When writing programs that use the bovinator, the following functions are needed to find get details out of a nonterminal.

semantic-equivalent-tokens-p token1 token2 Function
Compare TOKEN1 and TOKEN2 and return non-nil if they are equivalent. Use eq to test of two tokens are the same. Use this function if tokens are being copied and regrouped to test for if two tokens represent the same thing, but may be constructed of different cons cells.

semantic-token-token token Function
Retrieve from TOKEN the token identifier. i.e., the symbol 'variable, 'function, 'type, or other.

semantic-token-name token Function
Retrieve the name of TOKEN.

semantic-token-docstring token &optional buffer Function
Retrieve the documentation of TOKEN. Optional argument BUFFER indicates where to get the text from. If not provided, then only the POSITION can be provided.

semantic-token-overlay token Function
Retrieve the OVERLAY part of TOKEN. The returned item may be an overlay or an unloaded buffer representation.

semantic-token-extent token Function
Retrieve the extent (START END) of TOKEN.

semantic-token-start token Function
Retrieve the start location of TOKEN.

semantic-token-end token Function
Retrieve the end location of TOKEN.

semantic-token-type token Function
Retrieve the type of TOKEN.

semantic-token-put token property value Function
On token, set property to value.

semantic-token-get token property Function
For token get the value of property.

semantic-token-extra-spec token spec Function
Retrieve a specifier for the variable TOKEN. SPC is the symbol whose modifier value to get. This function can get specifiers from any type of TOKEN. Do not use this function if you know what type of token you are dereferencing. Instead, use the function specific to that token type. It will be faster.

semantic-token-type-parts token Function
Retrieve the parts of the type TOKEN.

semantic-token-type-parent token Function
Retrieve the parent of the type TOKEN. The return value is a list. A value of nil means no parents. The car of the list is either the parent class, or a list of parent classes. The cdr of the list is the list of interfaces, or abstract classes which are parents of TOKEN.

semantic-token-type-parent-superclass token Function
Retrieve the parent super classes of type type TOKEN.

semantic-token-type-parent-implement token Function
Retrieve the parent interfaces of type type TOKEN.

semantic-token-type-modifiers token Function
Retrieve the type modifiers for the type TOKEN.

semantic-token-type-extra-specs token Function
Retrieve the extra specifiers for the type TOKEN.

semantic-token-type-extra-spec token spec Function
Retrieve a extra specifier for the type TOKEN. SPEC is the symbol whose modifier value to get.

semantic-token-function-args token Function
Retrieve the arguments of the function TOKEN.

semantic-token-function-modifiers token Function
Retrieve the type modifiers of the function TOKEN.

semantic-token-function-destructor token Function
Non-nil if TOKEN is a destructor function.

semantic-token-function-extra-specs token Function
Retrieve the extra specifiers of the function TOKEN.

semantic-token-function-extra-spec token spec Function
Retrieve a specifier for the function TOKEN. SPEC is a symbol whose specifier value to get.

semantic-token-function-throws token Function
Retrieve the throws signal of the function TOKEN. This is an optional field, and returns nil if it doesn't exist.

semantic-token-function-parent token Function
The parent of the function TOKEN. A function has a parent if it is a method of a class, and if the function does not appear in body of its parent class.

semantic-token-variable-const token Function
Retrieve the status of constantness from the variable TOKEN.

semantic-token-variable-default token Function
Retrieve the default value of the variable TOKEN.

semantic-token-variable-modifiers token Function
Retrieve type modifiers for the variable TOKEN.

semantic-token-variable-extra-specs token Function
Retrieve extra specifiers for the variable TOKEN.

semantic-token-variable-extra-spec token spec Function
Retrieve a specifier value for the variable TOKEN. SPEC is the symbol whose specifier value to get.

semantic-token-include-system token Function
Retrieve the flag indicating if the include TOKEN is a system include.

For override methods that query a token, see See Token Details.


Node:Nonterminal Streams, Next:, Previous:Token Queries, Up:Programming

Nonterminal streams

These functions take some key, and returns information found inside the nonterminal stream. Some will return one token (the first matching item found.) Others will return a list of all items matching a given criterion. All these functions work regardless of a buffer being in memory or not.

semantic-find-nonterminal-by-name name streamorbuffer &optional search-parts search-include Function
Find a nonterminal NAME within STREAMORBUFFER. NAME is a string. If SEARCH-PARTS is non-nil, search children of tokens. If SEARCH-INCLUDE is non-nil, search include files.

semantic-find-nonterminal-by-property property value streamorbuffer &optional search-parts search-includes Function
Find all nonterminals with PROPERTY equal to VALUE in STREAMORBUFFER. Properties can be added with semantic-token-put. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-by-extra-spec spec streamorbuffer &optional search-parts search-includes Function
Find all nonterminals with a given SPEC in STREAMORBUFFER. SPEC is a symbol key into the modifiers association list. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-by-extra-spec-value spec value streamorbuffer &optional search-parts search-includes Function
Find all nonterminals with a given SPEC equal to VALUE in STREAMORBUFFER. SPEC is a symbol key into the modifiers association list. VALUE is the value that SPEC should match. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-by-position position streamorbuffer &optional nomedian Function
Find a nonterminal covering POSITION within STREAMORBUFFER. POSITION is a number, or marker. If NOMEDIAN is non-nil, don't do the median calculation, and return nil.

semantic-find-innermost-nonterminal-by-position position streamorbuffer &optional nomedian Function
Find a list of nonterminals covering POSITION within STREAMORBUFFER. POSITION is a number, or marker. If NOMEDIAN is non-nil, don't do the median calculation, and return nil. This function will find the topmost item, and recurse until no more details are available of findable.

semantic-find-nonterminal-by-token token streamorbuffer &optional search-parts search-includes Function
Find all nonterminals with a token TOKEN within STREAMORBUFFER. TOKEN is a symbol representing the type of the tokens to find. Optional argument SEARCH-PARTS and SEARCH-INCLUDE are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-standard streamorbuffer &optional search-parts search-includes Function
Find all nonterminals in STREAMORBUFFER which define simple token types. Optional argument SEARCH-PARTS and SEARCH-INCLUDE are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-by-type type streamorbuffer &optional search-parts search-includes Function
Find all nonterminals with type TYPE within STREAMORBUFFER. TYPE is a string which is the name of the type of the token returned. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function.

semantic-find-nonterminal-by-function function streamorbuffer &optional search-parts search-includes Function
Find all nonterminals in which FUNCTION match within STREAMORBUFFER. FUNCTION must return non-nil if an element of STREAM will be included in the new list.

If optional argument SEARCH-PARTS is non-nil, all sub-parts of tokens are searched. The over-loadable function semantic-nonterminal-children is used for the searching child lists. If SEARCH-PARTS is the symbol 'positiononly, then only children that have positional information are searched.

If SEARCH-INCLUDES is non-nil, then all include files are also searched for matches.

semantic-find-nonterminal-by-function-first-match function streamorbuffer &optional search-parts search-includes Function
Find the first nonterminal which FUNCTION match within STREAMORBUFFER. FUNCTION must return non-nil if an element of STREAM will be included in the new list. If optional argument SEARCH-PARTS, all sub-parts of tokens are searched. The over-loadable function semantic-nonterminal-children is used for searching. If SEARCH-INCLUDES is non-nil, then all include files are also searched for matches.

semantic-recursive-find-nonterminal-by-name name buffer Function
Recursively find the first occurrence of NAME. Start search with BUFFER. Recurse through all dependencies till found. The return item is of the form (BUFFER TOKEN) where BUFFER is the buffer in which TOKEN (the token found to match NAME) was found.


Node:Nonterminals at point, Next:, Previous:Nonterminal Streams, Up:Programming

Nonterminals at point

When you just want to get at a nonterminal the cursor is on, there is a more efficient mechanism than using semantic-find-nonterminal-by-position. This mechanism directly queries the overlays the parsing step leaves in the buffer. This provides for very rapid retrieval of what function or variable the cursor is currently in.

These functions query the current buffer's overlay system for tokens.

semantic-find-nonterminal-by-overlay &optional positionormarker buffer Function
Find all nonterminals covering POSITIONORMARKER by using overlays. If POSITIONORMARKER is nil, use the current point. Optional BUFFER is used if POSITIONORMARKER is a number, otherwise the current buffer is used. This finds all tokens covering the specified position by checking for all overlays covering the current spot. They are then sorted from largest to smallest via the start location.

semantic-find-nonterminal-by-overlay-in-region start end &optional buffer Function
Find all nonterminals which exist in whole or in part between START and END. Uses overlays to determine position. Optional BUFFER argument specifies the buffer to use.

semantic-current-nonterminal Function
Return the current nonterminal in the current buffer. If there are more than one in the same location, return the smallest token.

semantic-current-nonterminal-parent Function
Return the current nonterminals parent in the current buffer. A token's parent would be a containing structure, such as a type containing a field. Return nil if there is no parent.


Node:Nonterminal Sorting, Next:, Previous:Nonterminals at point, Up:Programming

Nonterminal sorting

Sometimes it is important to reorganize a token stream into a form that is better for display to a user. It is important to not use functions with side effects when doing this, and that could effect the token cache.

There are some existing utility functions which will reorganize the token list for you.

semantic-bucketize tokens &optional parent filter Function
Sort TOKENS into a group of buckets based on token type. Unknown types are placed in a Misc bucket. Type bucket names are defined by either `semantic-symbol->name-assoc-list'. If PARENT is specified, then TOKENS belong to this PARENT in some way. This will use `semantic-symbol->name-assoc-list-for-type-parts' to generate bucket names. Optional argument FILTER is a filter function to be applied to each bucket. The filter function will take one argument, which is a list of tokens, and may re-organize the list with side-effects.

semantic-bucketize-token-token Variable
Function used to get a symbol describing the class of a token. This function must take one argument of a semantic token. It should return a symbol found in `semantic-symbol->name-assoc-list' which semantic-bucketize uses to bin up tokens. To create new bins for an application augment `semantic-symbol->name-assoc-list', and `semantic-symbol->name-assoc-list-for-type-parts' in addition to setting this variable (locally in your function).

semantic-adopt-external-members tokens Function
Rebuild TOKENS so that externally defined members are regrouped. Some languages such as C++ and CLOS permit the declaration of member functions outside the definition of the class. It is easier to study the structure of a program when such methods are grouped together more logically.

This function uses semantic-nonterminal-external-member-p to determine when a potential child is an externally defined member.

Note: Applications which use this function must account for token types which do not have a position, but have children which *do* have positions.

Applications should use semantic-mark-external-member-function to modify all tokens which are found as externally defined to some type. For example, changing the token type for generating extra buckets with the bucket function.

semantic-orphaned-member-metaparent-type Variable
In semantic-adopt-external-members, the type of 'type for metaparents. A metaparent is a made-up type semantic token used to hold the child list of orphaned members of a named type.

semantic-mark-external-member-function Variable
Function called when an externally defined orphan is found. Be default, the token is always marked with the adopted property. This function should be locally bound by a program that needs to add additional behaviors into the token list. This function is called with one argument which is a shallow copy of the token to be modified. This function should return the token (or a copy of it) which is then integrated into the revised token list.


Node:Nonterminal Completion, Next:, Previous:Nonterminal Sorting, Up:Programming

Nonterminal completion

These functions provide ways reading the names of items in a buffer with completion.

semantic-read-symbol prompt &optional default stream filter Function
Read a symbol name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from. FILTER is provides a filter on the types of things to complete. FILTER must be a function to call on each element. (See !!!

semantic-read-variable prompt &optional default stream Function
Read a variable name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from.

semantic-read-function prompt &optional default stream Function
Read a function name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from.

semantic-read-type prompt &optional default stream Function
Read a type name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from.


Node:Override Methods, Next:, Previous:Nonterminal Completion, Up:Programming

Override Methods

These functions are called `override methods' because they provide generic behaviors, which a given language can override. For example, finding a dependency file in Emacs lisp can be done with the `locate-library' command (which overrides the default behavior.) In C, a dependency can be found by searching a generic search path which can be passed in via a variable.


Node:Token->Text, Next:, Previous:Override Methods, Up:Override Methods

Token->Text

Any given token consists of Meta information which is best viewed in some textual form. This could be as simple as the token's name, or as a prototype to be added to header file in C. Not only are there several default converters from a Token into text, but there is also some convenient variables that can be used with them. Use these variables to allow options on output forms when displaying tokens in your programs.

semantic-token->text-functions Variable
List of functions which convert a token to text. Each function must take the parameters TOKEN &optional PARENT COLOR. TOKEN is the token to convert. PARENT is a parent token or name which refers to the structure or class which contains TOKEN. PARENT is NOT a class which a TOKEN would claim as a parent. COLOR indicates that the generated text should be colored using font-lock.

semantic-token->text-custom-list Variable
A List used by customizable variables to choose a token to text function. Use this variable in the :type field of a customizable variable.

Every token to text conversion function must take the same parameters, which are TOKEN, the token to be converted, PARENT, the containing parent (like a structure which contains a variable), and COLOR, which is a flag specifying that color should be applied to the returned string.

When creating, or using these strings, particularly with color, use concat to build up larger strings instead of format. This will preserve text properties.

semantic-name-nonterminal token &optional parent color Function
Return the name string describing TOKEN. The name is the shortest possible representation. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.

semantic-summarize-nonterminal token &optional parent color Function
Summarize TOKEN in a reasonable way. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.

semantic-prototype-nonterminal token &optional parent color Function
Return a prototype for TOKEN. This function should be overloaded, though it need not be used. This is because it can be used to create code by language independent tools. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.

semantic-prototype-file buffer Function
Return a file in which prototypes belonging to BUFFER should be placed. Default behavior (if not overridden) looks for a token specifying the prototype file, or the existence of an EDE variable indicating which file prototypes belong in.

semantic-abbreviate-nonterminal token &optional parent color Function
Return an abbreviated string describing TOKEN. The abbreviation is to be short, with possible symbols indicating the type of token, or other information. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.

semantic-concise-prototype-nonterminal token &optional parent color Function
Return a concise prototype for TOKEN. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.

semantic-uml-abbreviate-nonterminal token &optional parent color Function
Return a UML style abbreviation for TOKEN. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors.


Node:Token Details, Next:, Previous:Token->Text, Up:Override Methods

Token Details

These functions help derive information about tokens that may not be obvious for non-traditional languages with their own token types.

semantic-nonterminal-children token &optional positionalonly Function
Return the list of top level children belonging to TOKEN. Children are any sub-tokens which may contain overlays. The default behavior (if not overridden with nonterminal-children is to return type parts for a type, and arguments for a function.

If optional argument POSITIONALONLY is non-nil, then only return valid children if they contain positions. Some languages may choose to create lists of children without position/overlay information.

If this function is overridden, use semantic-nonterminal-children-default to also include the default behavior, and merely extend your own.

Note for language authors: If a mode defines a language that has tokens in it with overlays that should not be considered children, you should still return them with this function. If you do not, then token re-parsing, and database saving will fail.

semantic-nonterminal-external-member-parent token Function
Return a parent for TOKEN when TOKEN is an external member. TOKEN is an external member if it is defined at a toplevel and has some sort of label defining a parent. The parent return will be a string.

The default behavior, if not overridden with nonterminal-external-member-parent is get the 'parent extra specifier of TOKEN.

If this function is overridden, use semantic-nonterminal-external-member-parent-default to also include the default behavior, and merely extend your own.

semantic-nonterminal-external-member-p parent token Function
Return non-nil if PARENT is the parent of TOKEN. TOKEN is an external member of PARENT when it is somehow tagged as having PARENT as it's parent.

The default behavior, if not overridden with nonterminal-external-member-p is to match 'parent extra specifier in the name of TOKEN.

If this function is overridden, use semantic-nonterminal-external-member-children-p-default to also include the default behavior, and merely extend your own.

semantic-nonterminal-external-member-children token &optional usedb Function
Return the list of children which are not *in* TOKEN. If optional argument USEDB is non-nil, then also search files in the Semantic Database. If USEDB is a list of databases, search those databases.

Children in this case are functions or types which are members of TOKEN, such as the parts of a type, but which are not defined inside the class. C++ and CLOS both permit methods of a class to be defined outside the bounds of the class' definition.

The default behavior, if not overridden with nonterminal-external-member-children is to search using semantic-nonterminal-external-member-p in all top level definitions with a parent of TOKEN.

If this function is overridden, use semantic-nonterminal-external-member-children-default to also include the default behavior, and merely extend your own.

semantic-nonterminal-protection token &optional parent Function
Return protection information about TOKEN with optional PARENT. This function returns on of the following symbols: nil - No special protection. Language dependent. 'public - Anyone can access this TOKEN. 'private - Only methods in the local scope can access TOKEN. 'friend - Like private, except some outer scopes are allowed access to token. Some languages may choose to provide additional return symbols specific to themselves. Use of this function should allow for this.

The default behavior (if not overridden with nonterminal-protection is to return a symbol based on type modifiers.

semantic-nonterminal-abstract token &optional parent Function
Return non nil if TOKEN is abstract. Optional PARENT is the parent token of TOKEN. In UML, abstract methods and classes have special meaning and behavior in how methods are overridden. In UML, abstract methods are italicized.

The default behavior (if not overridden with nonterminal-abstract is to return true if abstract is in the type modifiers.

semantic-nonterminal-leaf token &optional parent Function
Return non nil if TOKEN is leaf. Optional PARENT is the parent token of TOKEN. In UML, leaf methods and classes have special meaning and behavior.

The default behavior (if not overridden with nonterminal-leaf is to return true if leaf is in the type modifiers.

semantic-nonterminal-static token &optional parent Function
Return non nil if TOKEN is static. Optional PARENT is the parent token of TOKEN. In UML, static methods and attributes mean that they are allocated in the parent class, and are not instance specific. UML notation specifies that STATIC entries are underlined.

The default behavior (if not overridden with nonterminal-static is to return true if static is in the type modifiers.

semantic-find-dependency token Function
Find the filename represented from TOKEN. TOKEN may be a stripped element, in which case PARENT specifies a parent token that has positional information. Depends on semantic-dependency-include-path for searching. Always searches `.' first, then searches additional paths.

semantic-dependency-include-path Variable
Defines the include path used when searching for files. This should be a list of directories to search which is specific to the file being included. This variable can also be set to a single function. If it is a function, it will be called with one arguments, the file to find as a string, and it should return the full path to that file, or nil.

semantic-find-nonterminal token &optional parent Function
Find the location of TOKEN. TOKEN may be a stripped element, in which case PARENT specifies a parent token that has position information. Different behaviors are provided depending on the type of token. For example, dependencies (includes) will seek out the file that is depended on, and functions will move to the specified definition.

semantic-find-documentation token Function
Find documentation from TOKEN and return it as a clean string. TOKEN might have DOCUMENTATION set in it already. If not, there may be some documentation in a comment preceding TOKEN's definition which we can look for. When appropriate, this can be overridden by a language specific enhancement.


Node:Local Context, Next:, Previous:Token Details, Up:Override Methods

Local Context

semantic-up-context &optional point Function
Move point up one context from POINT. Return non-nil if there are no more context levels. Overloaded functions using up-context take no parameters.

semantic-beginning-of-context &optional point Function
Move POINT to the beginning of the current context. Return non-nil if there is no upper context. The default behavior uses semantic-up-context. It can be overridden with beginning-of-context.

semantic-end-of-context &optional point Function
Move POINT to the end of the current context. Return non-nil if there is no upper context. Be default, this uses semantic-up-context, and assumes parenthetical block delimiters. This can be overridden with end-of-context.

semantic-get-local-variables &optional point Function
Get the local variables based on POINT's context. Local variables are returned in Semantic token format. Be default, this calculates the current bounds using context blocks navigation, then uses the parser with bovine-inner-scope to parse tokens at the beginning of the context. This can be overridden with get-local-variables.

semantic-get-local-arguments &optional point Function
Get arguments (variables) from the current context at POINT. Parameters are available if the point is in a function or method. This function returns a list of tokens. If the local token returns just a list of strings, then this function will convert them to tokens. Part of this behavior can be overridden with get-local-arguments.

semantic-get-all-local-variables &optional point Function
Get all local variables for this context, and parent contexts. Local variables are returned in Semantic token format. Be default, this gets local variables, and local arguments. This can be overridden with get-all-local-variables. Optional argument POINT is the location to start getting the variables from.

These next set of functions handle local context parsing. This means looking at the code (locally) and navigating, and fetching information such as a the type of the parameter the cursor may be typing in.

semantic-end-of-command Function
Move to the end of the current command. Be default, uses semantic-command-separation-character. Override with end-of-command.

semantic-beginning-of-command Function
Move to the beginning of the current command. Be default, users semantic-command-separation-character. Override with beginning-of-command.

semantic-ctxt-current-symbol &optional point Function
Return the current symbol the cursor is on at POINT in a list. This will include a list of type/field names when applicable. This can be overridden using ctxt-current-symbol.

semantic-ctxt-current-assignment &optional point Function
Return the current assignment near the cursor at POINT. Return a list as per semantic-ctxt-current-symbol. Return nil if there is nothing relevant. Override with ctxt-current-assignment.

semantic-ctxt-current-function &optional point Function
Return the current function the cursor is in at POINT. The function returned is the one accepting the arguments that the cursor is currently in. This can be overridden with ctxt-current-function.

semantic-ctxt-current-argument &optional point Function
Return the current symbol the cursor is on at POINT. Override with ctxt-current-argument.

semantic-ctxt-scoped-types &optional point Function
Return a list of type names currently in scope at POINT. Override with ctxt-scoped-types.

For details on using these functions to get more detailed information about the current context: See Context Analysis.


Node:Making New Methods, Previous:Local Context, Up:Override Methods

Making New Methods


Node:Parser Hooks, Next:, Previous:Override Methods, Up:Programming

Parser Hooks

If you write a program that uses the stream of tokens in a persistent display or database, it is necessary to know when tokens change so that your displays can be updated. This is especially important as tokens can be replaced, changed, or deleted, and the associated overlays will then throw errors when you try to use them. Complete integration with token changes can be achieved via several very important hooks.

One interesting way to interact with the parser is to let it know that changes you are going to make will not require re-parsing.

semantic-edits-are-safe Variable
When non-nil, modifications do not require a reparse. This prevents tokens from being marked dirty, and it prevents top level edits from causing a cache check. Use this when writing programs that could cause a full reparse, but will not change the tag structure, such as adding or updating top-level comments.

Next, it is sometimes useful to know what the current parsing state is. These function can let you know what level of re-parsing may be needed. Careful choices on when to reparse can make your program much faster.

semantic-bovine-toplevel-full-reparse-needed-p &optional checkcache Function
Return non-nil if the current buffer needs a full reparse. Optional argument CHECKCACHE indicates if the cache check should be made.

semantic-bovine-toplevel-partial-reparse-needed-p &optional checkcache Function
Return non-nil if the current buffer needs a partial reparse. This only returns non-nil if semantic-bovine-toplevel-full-reparse-needed-p returns nil. Optional argument CHECKCACHE indicates if the cache check should be made when checking semantic-bovine-toplevel-full-reparse-needed-p.

If you need very close interaction with the user's editing, then these two hooks can be used to find out when a given tag is being changed. These hooks could even be used to cut down on re-parsing if used correctly.

For all hooks, make sure you are careful to add it as a local hook if you only want to effect a single buffer. Setting it globally can cause unwanted effects if your program is concerned with a single buffer.

semantic-dirty-token-hooks Variable
Hooks run after when a token is marked as dirty (edited by the user). The functions must take TOKEN, START, and END as a parameters. This hook will only be called once when a token is first made dirty, subsequent edits will not cause this to run a second time unless that token is first cleaned. Any token marked as dirty will also be called with semantic-clean-token-hooks, unless a full reparse is done instead.

semantic-clean-token-hooks Variable
Hooks run after a token is marked as clean (re-parsed after user edits.) The functions must take a TOKEN as a parameter. Any token sent to this hook will have first been called with semantic-dirty-token-hooks. This hook is not called for tokens marked dirty if the buffer is completely re-parsed. In that case, use semantic-after-toplevel-cache-change-hook.

semantic-change-hooks Variable
Hooks run when semantic detects a change in a buffer. Each hook function must take three arguments, identical to the common hook after-change-function.

Lastly, if you just want to know when a buffer changes, use this hook.

semantic-after-toplevel-bovinate-hook Variable
Hooks run after a toplevel token parse. It is not run if the toplevel parse command is called, and buffer does not need to be fully re-parsed. This function is also called when the toplevel cache is flushed, and the cache is emptied. For language specific hooks, make sure you define this as a local hook.

This hook should not be used any more. Use semantic-after-toplevel-cache-change-hook instead.

semantic-after-toplevel-cache-change-hook Variable
Hooks run after the buffer token list has changed. This list will change when a buffer is re-parsed, or when the token list in a buffer is cleared. It is *NOT* called if the current token list partially re-parsed.

Hook functions must take one argument, which is the new list of tokens associated with this buffer.

For language specific hooks, make sure you define this as a local hook.

semantic-after-partial-cache-change-hook Variable
Hooks run after the buffer token list has been updated. This list will change when the current token list has been partially re-parsed.

Hook functions must take one argument, which is the list of tokens updated among the ones associated with this buffer.

For language specific hooks, make sure you define this as a local hook.

semantic-before-toplevel-cache-flush-hook Variable
Hooks run before the toplevel nonterminal cache is flushed. For language specific hooks, make sure you define this as a local hook. This hook is called before a corresponding semantic-after-toplevel-cache-change-hook which is also called during a flush when the cache is given a new value of nil.


Node:Example Programs, Previous:Parser Hooks, Up:Programming

Programmi