Semantic is a program for Emacs which includes, at its core, a lexer, and a compiler compiler (bovinator). Additional tools include a bnf->semantic table converter, example tables, and a speedbar tool.
The core utility is the "semantic bovinator" which has similar behaviors as yacc or bison. Since it is not designed to be as feature rich as these tools, it uses the term "bovine" for cow, a lesser cousin of the yak and bison.
To send bug reports, or participate in discussions about semantic,
use the mailing list cedet-semantic@sourceforge.net via the URL:
<http://lists.sourceforge.net/lists/listinfo/cedet-semantic>
To install semantic, untar the distribution into a subdirectory, such as
/usr/share/emacs/site-lisp/semantic-#.#. Next, add the following
lines into your individual .emacs file, or into
site-lisp/site-start.el.
(setq semantic-load-turn-everything-on t) (load-file "/path/to/semantic/semantic-load.el")
If you would like to turn individual tools on or off in your init file, skip the first line.
Semantic is a tool primarily for the Emacs-Lisp programmer. However, it comes with "applications" that non-programmer might find useful. This chapter is mostly for the benefit of these non-programmers as it gives brief descriptions of basic concepts such as grammars, parsers, compiler-compilers, parse-tree, etc.
The grammar of a natural language defines rules by which valid phrases and sentences can be composed using words, the fundamental units with which all sentences are created. In a similar fashion, a "context-free grammar" defines the rules by which programs can be composed using the fundamental units of the language, i.e., numbers, symbols, punctuations, etc. Context-free grammars are often specified in a well-known form called Backus-Naur Form, BNF for short. This is a systematic way of representing context-free grammars such that programs can read files with grammars written in BNF and generate code for "parser" of that language. YACC (Yet Another Compiler Compiler) is one such program that has been part of UNIX operating systems since the 1970's. YACC is pronounced the same as "yak", the long-haired ox found in Asia. The parser generated by YACC is usually a C program. Bison is also a "compiler compiler" that takes BNF grammars and produces parsers in C language. The difference between YACC and Bison is that Bison is free software and upward-compatible with YACC. It also comes with an excellent manual.
Semantic is similar in spirit to YACC and Bison. Semantic, however, is referred to as a bovinator rather than as a parser, because it is a lesser cousin of YACC and Bison. It is lesser in that it does not perform a full parse like YACC or Bison. Instead, it bovinates. "Bovination" refers to partial parsing which creates parse trees of only the top most expressions rather than parsing every nested expression. This is sufficient for the purposes for which semantic was designed. Semantic is meant to be used within Emacs for providing editor-related features such as code browsers and translators rather than for compiling which requires far more complex and complete parsers. Semantic is not designed to be able to create full parse trees.
One key benefit of semantic is that it creates parse trees (perhaps the term bovine tree may be more accurate) with the same structure regardless of the type of language involved. Higher level applications written to work with bovine trees will then work with any language for which the grammar is available. For example, a code browser written today that supports C, C++, and Java may work without any change on other languages that do not even exist yet. All one has to do is to write the BNF specification for the new language. The rest of the work is done by semantic. For certain languages, it is hard if not impossible to specify the syntax of the language in BNF form, e.g., texinfo and other document oriented languages. Semantic provides a parser for texinfo nevertheless. Instead of BNF grammar, texinfo files are "parsed" using Regexps.
Semantic comes with grammars for these languages:
Several tools employing semantic that provide user observable features are listed in Tools section.
This chapter gives an overview of major components of semantic and how they interact with each other to perform its job.
The first step of parsing is to break up the input file into its fundamental components. This step is called lexing. The output of the lexer is a list of tokens that make up the file.
syntax table, keywords list, and options
|
|
v
input file ----> Lexer ----> token stream
The next step is the parsing shown below.
bovine table
|
v
token stream ---> Parser ----> parse tree
The end result, the parse tree, is created based on the "bovine table", which is the internal representation of the BNF language grammar used by semantic.
Semantic database provides caching of the parse trees by saving them
into files named semantic.cache automatically then loading them
when appropriate instead of re-parsing. The reason for this is to save the
time it takes to parse a file which could take several seconds or more
for large files.
Finally, semantic provides an API for the Emacs-Lisp programmer to access the information in the parse tree.
In order to reduce a source file into a token list, it must first be converted into a token stream. Tokens are syntactic elements such as whitespace, symbols, strings, lists, and punctuation.
The lexer uses the major-mode's syntax table for conversion.
See Syntax Tables.
As long as that is set up correctly (along with the important
comment-start and comment-start-skip variable) the lexer
should already work for your language.
The primary entry point of the lexer is the semantic-flex function shown below. Normally, you do not need to call this function. It is usually called by semantic-bovinate-toplevel for you.
| semantic-flex start end &optional depth length | Function |
| Using the syntax table, do something roughly equivalent to flex. Semantically check between START and END. Optional argument DEPTH indicates at what level to scan over entire lists. The return value is a token stream. Each element is a list of the form (symbol start-expression . end-expresssion). END does not mark the end of the text scanned, only the end of the beginning of text scanned. Thus, if a string extends past END, the end of the return token will be larger than END. To truly restrict scanning, use `narrow-to-region'. The last argument, LENGTH specifies that semantic-flex should only return LENGTH tokens. |
Semantic lexer breaks up the content of an Emacs buffer into a list of tokens. This process is based mostly on regular expressions which in turn depend on the syntax table of the buffer's major mode being setup properly. See Major Modes. See Syntax Tables. See Regexps.
Specifically, the following regular expressions which rely on syntax tables are used:
\\s-
\\sw
\\s_
\\s.
\\s<
\\s>
\\s\\
\\s)
\\s$
\\s\"
\\s\'
In addition, Emacs' built-in features such as
comment-start-skip,
forward-comment,
forward-list,
and
forward-sexp
are employed.
The lexer, semantic-flex, scans the content of a buffer and returns a token list. Let's illustrate this using this simple example.
00: /*
01: * Simple program to demonstrate semantic.
02: */
03:
04: #include <stdio.h>
05:
06: int i_1;
07:
08: int
09: main(int argc, char** argv)
10: {
11: printf("Hello world.\n");
12: }
Evaluating (semantic-flex (point-min) (point-max))
within the buffer with the code above returns the following token list.
The input line and string that produced each token is shown after
each semi-colon.
((punctuation 52 . 53) ; 04: # (INCLUDE 53 . 60) ; 04: include (punctuation 61 . 62) ; 04: < (symbol 62 . 67) ; 04: stdio (punctuation 67 . 68) ; 04: . (symbol 68 . 69) ; 04: h (punctuation 69 . 70) ; 04: > (INT 72 . 75) ; 06: int (symbol 76 . 79) ; 06: i_1 (punctuation 79 . 80) ; 06: ; (INT 82 . 85) ; 08: int (symbol 86 . 90) ; 08: main (semantic-list 90 . 113) ; 08: (int argc, char** argv) (semantic-list 114 . 147) ; 09-12: body of main function )
As shown above, the token list is a list of "tokens". Each token in turn is a list of the form
(TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)
where TOKEN-TYPE is a symbol, and the other two are integers indicating the buffer position that delimit the token such that
(buffer-substring BEGINNING-POSITION ENDING-POSITION)
would return the string form of the token.
Note that one line (line 4 above) can produce seven tokens while
the whole body of the function produces a single token.
This is because the depth parameter of semantic-flex was
not specified.
Let's see the output when depth is set to 1.
Evaluate (semantic-flex (point-min) (point-max) 1) in the same buffer.
Note the third argument of 1.
((punctuation 52 . 53) ; 04: #
(INCLUDE 53 . 60) ; 04: include
(punctuation 61 . 62) ; 04: <
(symbol 62 . 67) ; 04: stdio
(punctuation 67 . 68) ; 04: .
(symbol 68 . 69) ; 04: h
(punctuation 69 . 70) ; 04: >
(INT 72 . 75) ; 06: int
(symbol 76 . 79) ; 06: i_1
(punctuation 79 . 80) ; 06: ;
(INT 82 . 85) ; 08: int
(symbol 86 . 90) ; 08: main
(open-paren 90 . 91) ; 08: (
(INT 91 . 94) ; 08: int
(symbol 95 . 99) ; 08: argc
(punctuation 99 . 100) ; 08: ,
(CHAR 101 . 105) ; 08: char
(punctuation 105 . 106) ; 08: *
(punctuation 106 . 107) ; 08: *
(symbol 108 . 112) ; 08: argv
(close-paren 112 . 113) ; 08: )
(open-paren 114 . 115) ; 10: {
(symbol 120 . 126) ; 11: printf
(semantic-list 126 . 144) ; 11: ("Hello world.\n")
(punctuation 144 . 145) ; 11: ;
(close-paren 146 . 147) ; 12: }
)
The depth parameter "peeled away" one more level of "list" delimited by matching parenthesis or braces. The depth parameter can be specified to be any number. However, the parser needs to be able to handle the extra tokens.
This is an interesting benefit of the lexer having the full
resources of Emacs at its disposal.
Skipping over matched parenthesis is achieved by simply calling
the built-in functions forward-list and forward-sexp.
All common token symbols are enumerated below. Additional token
symbols aside from these can be generated by the lexer if user option
semantic-flex-extensions is set. It is up to the user to add
matching extensions to the parser to deal with the lexer
extensions. An example use of semantic-flex-extensions is in
semantic-make.el where semantic-flex-extensions is set to
the value of semantic-flex-make-extensions which may generate
shell-command tokens.
bol
nil.
charquote
\\s\\+.
close-paren
\\s).
These are typically ), }, ], etc.
comment
nil.
newline
\\s-*\\(\n\\|\\s>\\).
This token is produced only if the user set
semantic-flex-enable-newlines to
non-nil.
open-paren
\\s(.
These are typically (, {, [, etc.
Note that these are not usually generated unless the depth
argument to semantic-flex is greater than 0.
punctuation
\\(\\s.\\|\\s$\\|\\s'\\).
semantic-list
string
\\s\".
The lexer relies on forward-sexp to find the
matching end.
symbol
\\(\\sw\\|\\s_\\)+.
whitespace
nil. If
semantic-ignore-comments is non-nil too comments are
considered as whitespaces.
Although most lexer functions are called for you by other semantic functions, there are ways for you to extend or customize the lexer. Three variables shown below serve this purpose.
| semantic-flex-unterminated-syntax-end-function | Variable |
| Function called when unterminated syntax is encountered. This should be set to one function. That function should take three parameters. The SYNTAX, or type of syntax which is unterminated. SYNTAX-START where the broken syntax begins. FLEX-END is where the lexical analysis was asked to end. This function can be used for languages that can intelligently fix up broken syntax, or the exit lexical analysis via throw or signal when finding unterminated syntax. |
| semantic-flex-extensions | Variable |
Buffer local extensions to the lexical analyzer.
This should contain an alist with a key of a regex and a data element of
a function. The function should both move point, and return a lexical
token of the form:
( TYPE START . END)
|
| semantic-flex-syntax-modifications | Variable |
Changes the syntax table for a given buffer.
These changes are active only while the buffer is being flexed.
This is a list where each element has the form
(CHAR CLASS) CHAR is the char passed to `modify-syntax-entry', and CLASS is the string also passed to `modify-syntax-entry' to define what syntax class CHAR has. (setq semantic-flex-syntax-modifications '((?. "_")) This makes the period . a symbol constituent. This may be necessary if filenames are prevalent, such as in Makefiles. |
| semantic-flex-enable-newlines | Variable |
When flexing, report 'newlines as syntactic elements.
Useful for languages where the newline is a special case terminator.
Only set this on a per mode basis, not globally.
|
| semantic-flex-enable-whitespace | Variable |
When flexing, report 'whitespace as syntactic elements.
Useful for languages where the syntax is whitespace dependent.
Only set this on a per mode basis, not globally.
|
| semantic-flex-enable-bol | Variable |
| When flexing, report beginning of lines as syntactic elements. Useful for languages like python which are indentation sensitive. Only set this on a per mode basis, not globally. |
| semantic-number-expression | Variable |
Regular expression for matching a number.
If this value is nil, no number extraction is done during lex.
This expression tries to match C and Java like numbers.
DECIMAL_LITERAL:
[1-9][0-9]*
;
HEX_LITERAL:
0[xX][0-9a-fA-F]+
;
OCTAL_LITERAL:
0[0-7]*
;
INTEGER_LITERAL:
<DECIMAL_LITERAL>[lL]?
| <HEX_LITERAL>[lL]?
| <OCTAL_LITERAL>[lL]?
;
EXPONENT:
[eE][+-]?[09]+
;
FLOATING_POINT_LITERAL:
[0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
| [.][0-9]+<EXPONENT>?[fFdD]?
| [0-9]+<EXPONENT>[fFdD]?
| [0-9]+<EXPONENT>?[fFdD]
;
|
Another important piece of the lexer is the keyword table (see Settings). You language will want to set up a keyword table for fast conversion of symbol strings to language terminals.
The keywords table can also be used to store additional information about those keywords. The following programming functions can be useful when examining text in a language buffer.
| semantic-flex-keyword-p text | Function |
Return non-nil if TEXT is a keyword in the keyword table.
|
| semantic-flex-keyword-put text property value | Function |
| For keyword TEXT, set PROPERTY to VALUE. |
| semantic-token-put-no-side-effect token key value | Function |
For TOKEN, put the property KEY on it with VALUE without side effects.
If VALUE is nil, then remove the property from TOKEN.
All cons cells in the property list are replicated so that there
are no side effects if TOKEN is in shared lists.
|
| semantic-flex-keyword-get text property | Function |
| For keyword TEXT, get the value of PROPERTY. |
| semantic-flex-map-keywords fun &optional property | Function |
| Call function FUN on every semantic keyword. If optional PROPERTY is non-nil, call FUN only on every keyword which has a PROPERTY value. FUN receives a semantic keyword as argument. |
| semantic-flex-keywords &optional property | Function |
| Return a list of semantic keywords. If optional PROPERTY is non-nil, return only keywords which have PROPERTY set. |
Keyword properties can be set up in a BNF file for ease of maintenance. While examining the text in a language buffer, this can provide an easy and quick way of storing details about text in the buffer.
Add known properties here when they are known.
When converting a source file into a nonterminal token stream
(parse-tree) it is important to specify rules to accomplish this. The
rules are stored in the buffer local variable
semantic-toplevel-bovine-table.
While it is certainly possible to write this table yourself, it is most
likely you will want to use the BNF converter (see See BNF conversion.)
This is an easier method for specifying your rules. You will still need
to specify a variable in your language for the table, however. A good
rule of thumb is to call it language-toplevel-bovine-table if it
part of the language, or semantic-toplevel-language-bovine-table
if you donate it to the semantic package.
When initializing a major-mode for your language, you will set the
variable semantic-toplevel-bovine-table to the contents of your
language table. semantic-toplevel-bovine-table is always buffer
local.
Since it is important to know the format of the table when debugging , you should still attempt to understand the basics of the table.
Please see the documentation for the variable
semantic-toplevel-bovine-table for details on its format.
* add more doc here *
The BNF converter takes a file in "Bovine Normal Form" which is similar to "Backus-Naur Form". If you have ever used yacc or bison, you will find it similar. The BNF form used by semantic, however, does not include token precedence rules, and several other features needed to make real parser generators.
It is important to have an Emacs Lisp file with a variable ready to take
the output of your table (see See Bovinating.) Also, make sure that the
file semantic-bnf.el is loaded. Give your language file the
extension .bnf and you are ready.
The comment character is #.
When you want to test your file, use the keyboard shortcut C-c C-c to parse the file, generate the variable, and load the new definition in. It will then use the settings specified above to determine what to do. Use the shortcut C-c c to do the same thing, but spend extra time indenting the table nicely.
Make sure that you create the variable specified in the
%parsetable token before trying to convert the BNF file. A
simple definition like this is sufficient.
(defvar semantic-toplevel-lang-bovine-table nil "Table for use with semantic for parsing LANG.")
If you use tokens (created with the %token specifier), also
make sure you have a keyword table available, like this:
(defvar semantic-lang-keyword-table nil "Table for use with semantic for keywords.")
Specify the name of the keyword table with the %keywordtable
specifier.
The BNF file has two sections. The first is the settings section, and the second is the language definition, or list of semantic rules.
A setting is a keyword starting with a %. (This syntax is taken from yacc and bison. See (bison).)
There are several settings that can be made in the settings section. They are:
| %start <nonterminal> | Setting |
Specify an alternative to bovine-toplevel. (See below)
|
| %scopestart <nonterminal> | Setting |
Specify an alternative to bovine-inner-scope.
|
| %outputfile <filename> | Setting |
| Required. Specifies the file into which this files output is stored. |
| %parsetable <lisp-variable-name> | Setting |
| Required. Specifies a lisp variable into which the output is stored. |
| %setupfunction <lisp-function-name> | Setting |
| Required. Name of a function into which setup code is to be inserted. |
| %keywordtable <lisp-variable-name> | Setting |
Required if there are %token keywords.
Specifies a lisp variable into which the output of a keyword table is
stored. This obarray is used to turn symbols into keywords when applicable.
|
| %token <name> "<text>" | Setting |
Optional. Specify a new token NAME. This is added to a lexical
keyword list using TEXT. The symbol is then converted into a new
lexical terminal. This requires that the %keywordtable specified
variable is available in the file specified by %outputfile.
|
| %token <name> type "<text>" | Setting |
| Optional. Specify a new token NAME. It is made from an existing lexical token of type TYPE. TEXT is a string which will be matched explicitly. NAME can be used in match rules as though they were flex tokens, but are converted back to TYPE "text" internally. |
| %put <NAME> symbol <VALUE> | Setting |
| %put <NAME> ( symbol1 <VALUE1> symbol2 <VALUE2> ... ) | Setting |
| %put ( <NAME1> <NAME2>...) symbol <VALUE> | Setting |
Tokens created without a type are considered keywords, and placed in a
keyword table. Use %put to apply properties to that keyword.
(see Lexing).
|
| %languagemode <lisp-function-name> | Setting |
| %languagemode ( <lisp-function-name1> <lisp-function-name2> ... ) | Setting |
| Optional. Specifies the Emacs major mode associated with the language being specified. When the language is converted, all buffers of this mode will get the new table installed. |
| %quotemode backquote | Setting |
| Optional. Specifies how symbol quoting is handled in the Optional Lambda Expressions. (See below) |
| %( |
Setting |
Specify setup code to be inserted into the %setupfunction.
It will be inserted between two specifier strings, or added to
the end of the function.
|
When working inside %( ... )% tokens, any lisp expression can be
entered which will be placed inside the setup function. In general, you
probably want to set variables that tell Semantic and related tools how
the language works.
Here are some variables that control how different programs will work with your language.
| semantic-flex-depth | Variable |
| Default flexing depth. This specifies how many lists to create tokens in. |
| semantic-number-expression | Variable |
Regular expression for matching a number.
If this value is nil, no number extraction is done during lex.
Symbols which match this expression are returned as number
tokens instead of symbol tokens.
The default value for this variable should work in most languages. |
| semantic-flex-extensions | Variable |
Buffer local extensions to the lexical analyzer.
This should contain an alist with a key of a regex and a data element of
a function. The function should both move point, and return a lexical
token of the form:
( TYPE START . END)
|
| semantic-flex-syntax-modifications | Variable |
Updates to the syntax table for this buffer.
These changes are active only while this file is being flexed.
This is a list where each element is of the form:
(CHAR CLASS)Where CHAR is the char passed to modify-syntax-entry, and CLASS is the string also passed to modify-syntax-entry to define what class of syntax CHAR is. |
| semantic-flex-enable-newlines | Variable |
When flexing, report 'newlines as syntactic elements.
Useful for languages where the newline is a special case terminator.
Only set this on a per mode basis, not globally.
|
| semantic-ignore-comments | Variable |
Default comment handling.
t means to strip comments when flexing. Nil means to keep comments
as part of the token stream.
|
| semantic-symbol->name-assoc-list | Variable |
Association between symbols returned, and a string.
The string is used to represent a group of objects of the given type.
It is sometimes useful for a language to use a different string
in place of the default, even though that language will still
return a symbol. For example, Java return's includes, but the
string can be replaced with Imports.
|
| semantic-case-fold | Variable |
Value for case-fold-search when parsing.
|
| semantic-expand-nonterminal | Variable |
Function to call for each nonterminal production.
Return a list of non-terminals derived from the first argument, or nil
if it does not need to be expanded.
Languages with compound definitions should use this function to expand
from one compound symbol into several. For example, in C the
definition
int a, b;is easily parsed into one token, but represents multiple variables. A functions should be written which takes this compound token and turns it into two tokens, one for A, and the other for B. Within the language definition (the This list can then be detected by the function set in
Please see |
| semantic-override-table | Variable |
|
Buffer local semantic function overrides alist.
These overrides provide a hook for a `major-mode' to override specific
behaviors with respect to generated semantic toplevel nonterminals and
things that these non-terminals are useful for.
Each element must be of the form: (SYM . FUN)
where SYM is the symbol to override, and FUN is the function to
override it with.
Available override symbols:
Parameters mean:
|
| semantic-type-relation-separator-character | Variable |
| Character strings used to separation a parent/child relationship. This list of strings are used for displaying or finding separators in variable field dereferencing. The first character will be used for display. In C, a type field is separated like this: "type.field" thus, the character is a ".". In C, and additional value of "->" would be in the list, so that "type->field" could be found. |
| semantic-dependency-include-path | Variable |
| Defines the include path used when searching for files. This should be a list of directories to search which is specific to the file being included. This variable can also be set to a single function. If it is a function, it will be called with one arguments, the file to find as a string, and it should return the full path to that file, or nil. |
This configures Imenu to use semantic parsing.
| imenu-create-index-function | Variable |
|
The function to use for creating a buffer index.
It should be a function that takes no arguments and returns an index of the current buffer as an alist. Simple elements in the alist look like This function is called within a The variable is buffer-local. |
These are specific to the document tool.
document-comment-start
document-comment-line-prefix
document-comment-end
Writing the rules should be very similar to bison for basic syntax. Each rule is of the form
RESULT : MATCH1 (optional-lambda-expression)
| MATCH2 (optional-lambda-expression)
;
RESULT is a non-terminal, or a token synthesized in your grammar. MATCH is a list of elements that are to be matched if RESULT is to be made. The optional lambda expression is a list containing simplified rules for concocting the parse tree.
In bison, each time an element of a MATCH is found, it is "shifted" onto the parser stack. (The stack of matched elements.) When all of MATCH1's elements have been matched, it is "reduced" to RESULT. See (bison)Algorithm.
The first RESULT written into your language specification should
be bovine-toplevel, or the symbol specified with %start.
When starting a parse for a file, this is the default token iterated
over. You can use any token you want in place of bovine-toplevel
if you specify what that nonterminal will be with a %start token
in the settings section.
MATCH is made up of symbols and strings. A symbol such as
foo means that a syntactic token of type foo must be
matched. A string in the mix means that the previous symbol must have
the additional constraint of exactly matching it. Thus, the
combination:
symbol "moose"
means that a symbol must first be encountered, and then it must
string-match "moose". Be especially careful to remember that the
string is a regular expression. The code:
punctuation "."
will match any punctuation.
For the above example in bison, a LEX rule would be used to create a new token MOOSE. In this case, the MOOSE token would appear. For the bovinator, this task was mixed into the language definition to simplify implementation, though Bison's technique is more efficient.
To make a symbol match explicitly for keywords, for example, you can use
the %token command in the settings section to create new symbols.
%token MOOSE "moose"
find_a_moose: MOOSE
;
will match "moose" explicitly, unlike the previous example where moose need only appear in the symbol. This is because "moose" will be converted to MOOSE in the lexical analysis stage. Thus the symbol MOOSE won't be available any other way.
If we specify our token in this way:
%token MOOSE symbol "moose"
find_a_moose: MOOSE
;
then MOOSE will match the string "moose" explicitly, but it won't
do so at the lexical level, allowing use of the text "moose" in other
forms of regular expressions.
Non symbol tokens are also allowed. For example:
%token PERIOD punctuation "."
filename : symbol PERIOD symbol
;
will explicitly match one period when used in the above rule.
The OLE (Optional Lambda Expression) is converted into a bovine lambda (see See Bovinating.) This lambda has special short-cuts to simplify reading the Emacs BNF definition. An OLE like this:
( $1 )
results in a lambda return which consists entirely of the string or object found by matching the first (zeroth) element of match. An OLE like this:
( ,(foo $1) )
executes `foo' on the first argument, and then splices its return into the return list whereas:
( (foo $1) )
executes foo, and that is placed in the return list.
Here are other things that can appear inline:
$1
,$1
'$1
foo
(foo)
,(foo)
'(foo)
(EXPAND $1 nonterminal depth)
(EXPANDFULL $1 nonterminal depth)
bovine-toplevel. This lets you have
much simpler rules in this specific case, and also lets you have
positional information in the returned tokens, and error skipping.
(ASSOC symbol1 value1 symbol2 value2 ... )
( ( symbol1 . value1) (symbol2 . value2) ... )
If the symbol %quotemode backquote is specified, then use
,@ to splice a list in, and , to evaluate the expression.
This lets you send $1 as a symbol into a list instead of having
it expanded inline.
The rule:
SYMBOL : symbol
is equivalent to
SYMBOL : symbol
( $1 )
which, if it matched the string "A", would return
( "A" )
If this rule were used like this:
ASSIGN: SYMBOL punctuation "=" SYMBOL
( $1 $3 )
it would match "A=B", and return
( ("A") ("B") )
The letters A and B come back in lists because SYMBOL is a nonterminal, not an actual lexical element.
to get a better result with nonterminals, use , to splice lists in like this;
ASSIGN: SYMBOL punctuation "=" SYMBOL
( ,$1 ,$3 )
which would return
( "A" "B" )
In order for a generalized program using Semantic to work with
multiple languages, it is important to have a consistent meaning for
the contents of the tokens returned. The variable
semantic-toplevel-bovine-table is documented with the complete
list of a tokens that a functional or OO language may use. While any
given language is free to create their own tokens, such a language
definition would not produce a stream of tokens usable by a
generalized tool.
In general, all tokens returned from a parser should be generated with the following form:
("NAME" type-symbol ... "DOCSTRING" PROPERTIES OVERLAY)
NAME and type-symbol are the only syntactic elements of a
nonterminal which are guaranteed to exist. This means that a parser
which uses nil for either of these two slots, or some value
which is not type consistent is wrong.
NAME is also guaranteed to be a string. This string represents the name of the nonterminal, usually a named definition which the language will use elsewhere as a reference to the syntactic element found.
type-symbol is a symbol representing the type of the nonterminal. Valid type-symbols can be anything, as long is it is an Emacs Lisp symbol.
DOCSTRING is a required slot in the nonterminal, but can be nil. Some languages have the documentation saved as a comment nearby. In these cases, DOCSTRING is nil, and the function `semantic-find-documentation'.
PROPERTIES is a slot generated by the semantic parser harness,
and need not be provided by a language author. Programmatically access
nonterminal properties with semantic-token-put and
semantic-token-get to access properties.
OVERLAY represents positional information for this token. It is
automatically generated by the semantic parser harness, and need not
be provided by the language author, unless they provide a nonterminal
expansion function via semantic-expand-nonterminal.
The OVERLAY property is accessed via several functions returning the beginning, end, and buffer of a token. Use these functions unless the overlay is really needed (see Token Queries). Depending on the overlay in a program can be dangerous because sometimes the overlay is replaced with an integer pair
[ START END ]when the buffer the token belongs to is not in memory. This happens when a using has activated the Semantic Database semanticdb.
If a parser produces tokens for a functional language, then the following token formats are available.
("NAME" variable "TYPE" DEFAULT-VALUE EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
nil for untyped languages. Languages which
support variable declarations without a type (Such as C) should supply
a string representing the default type for that language.
DEFAULT-VALUE can be a string, or something pre-parsed and language specific. Hopefully this slot will be better defined in future versions of Semantic.
EXTRA-SPEC are extra specifiers. See below.
("NAME" function "TYPE" ( ARG-LIST ) EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
nil for untyped languages, or for
procedures in languages which support functions with no return data.
See above for more.
ARG-LIST is a list of arguments passed to this function. Each element in the arg list can be one of the following:
("NAME" type "TYPE" ( PART-LIST ) ( PARENTS ) EXTRA-SPEC
"DOCSTRING" PROPERTIES OVERLAY)
PART-LIST is the list of individual entries inside compound types. Structures, for example, can contain several fields which can be represented as variables. Valid entries in a PART-LIST are:
PARENTS represents a list of parents of this type. Parents are used in two situations.
The structure of the PARENTS list is of this form:
( EXPLICIT-PARENTS . INTERFACE-PARENTS)EXPLICIT-PARENTS can be a single string (Just one parent) or a list of parents (in a multiple inheritance situation. It can also be nil.
INTERFACE-PARENTS is a list of strings representing the names of all INTERFACES, or abstract classes inherited from. It can also be nil.
This slot can be interesting because the form:
( nil "string")is a valid parent where there is no explicit parent, and only an interface.
("FILE" include SYSTEM "DOCSTRING" PROPERTIES OVERLAY)
#include statement in C.
In this case, instead of NAME, a FILE is specified.
FILE can be a subset of the actual file to be loaded.
SYSTEM is true if this include is part of a set of system
includes. This field isn't currently being used and may be
eliminated.
("NAME" package DETAIL "DOCSTRING" PROPERTIES OVERLAY)
package statement, or a provide in Emacs Lisp.
DETAIL might be an associated file name, or some other language specific bit of information.
Some default token types have a slot EXTRA-SPEC, for extra specifiers. These specifiers provide additional details not commonly used, or not available in all languages. This list is an alist, and if a given key is nil, it is not in the list, saving space. Some valid extra specifiers are:
(parent . "text")
(dereference . INT)
(pointer . INT)
* characters.
(typemodifiers . ( "text" ... ))
register' and volatile'
(suffix . "text")
(const . t)
(throws . ( "text" ... ))
(destructor . t)
(constructor . t)
(user-visible . t)
(prototype . t)
autoload statement creates prototypes.
From a program you can use the function semantic-bovinate-toplevel.
This function takes one optional parameter specifying if the cache
should be refreshed. By default, the cached results of the last parse
are always used. Specifying that the cache should be checked will cause
it to be flushed if it is out of date.
Another function you can use is semantic-bovinate-nonterminal.
This command takes a token stream returned by the function
semantic-flex followed by a DEPTH (as above). This takes an
additional optional argument of NONTERMINAL which is the nonterminal in
your table it is to start parsing with.
| bovinate &optional clear | Command |
| Bovinate the current buffer. Show output in a temp buffer. Optional argument CLEAR will clear the cache before bovinating. |
| semantic-clear-toplevel-cache | Command |
| Clear the toplevel bovine cache for the current buffer. Clearing the cache will force a complete reparse next time a token stream is requested. |
| semantic-bovinate-toplevel &optional checkcache | Function |
Bovinate the entire current buffer.
If the optional argument CHECKCACHE is non-nil, then flush the cache iff
there has been a size change.
|
Writing language files using BNF is significantly easier than writing then using regular expressions in a functional manner. Debugging them, however, can still prove challenging.
There are two ways to debug a language definition if it is not
behaving as expected. One way is to debug against the source .bnf
file. The second is to debug against the lisp table created from the
.bnf source, or perhaps written by hand.
If your language definition was written in BNF notation, debugging is
quite easy. The command bovinate-debug will start you off.
| bovinate-debug | Command |
| Bovinate the current buffer and run in debug mode. |
If you prefer debugging against the Lisp table, find the table in a
buffer, place the cursor in it, and use the command
semantic-bovinate-debug-set-table in it.
| semantic-bovinate-debug-set-table &optional clear | Command |
| Set the table for the next debug to be here. Optional argument CLEAR to unset the debug table. |
After the table is set, the bovinate-debug command can be run
at any time for the given language.
While debugging, two windows are visible. One window shows the file being parsed, and the syntactic token being tested is highlighted. The second window shows the table being used (either in the BNF source, or the Lisp table) with the current rule highlighted. The cursor will sit on the specific match rule being tested against.
In the minibuffer, a brief summary of the current situation is listed. The first element is the syntactic token which is a list of the form:
(TYPE START . END)
The rest of the display is a list of all strings collected for the currently tested rule. Each time a new rule is entered, the list is restarted. Upon returning from a rule into a previous match list, the previous match list is restored, with the production of the dependent rule in the list.
Use C-g to stop debugging. There are no commands for any fancier types of debugging.
Once a source file has been parsed, the following APIs can be used to write programs that use the token stream most effectively.
When writing programs that use the bovinator, the following functions are needed to find get details out of a nonterminal.
| semantic-equivalent-tokens-p token1 token2 | Function |
Compare TOKEN1 and TOKEN2 and return non-nil if they are equivalent.
Use eq to test of two tokens are the same. Use this function if tokens
are being copied and regrouped to test for if two tokens represent the same
thing, but may be constructed of different cons cells.
|
| semantic-token-token token | Function |
Retrieve from TOKEN the token identifier.
i.e., the symbol 'variable, 'function, 'type, or other.
|
| semantic-token-name token | Function |
| Retrieve the name of TOKEN. |
| semantic-token-docstring token &optional buffer | Function |
| Retrieve the documentation of TOKEN. Optional argument BUFFER indicates where to get the text from. If not provided, then only the POSITION can be provided. |
| semantic-token-overlay token | Function |
| Retrieve the OVERLAY part of TOKEN. The returned item may be an overlay or an unloaded buffer representation. |
| semantic-token-extent token | Function |
| Retrieve the extent (START END) of TOKEN. |
| semantic-token-start token | Function |
| Retrieve the start location of TOKEN. |
| semantic-token-end token | Function |
| Retrieve the end location of TOKEN. |
| semantic-token-type token | Function |
| Retrieve the type of TOKEN. |
| semantic-token-put token property value | Function |
| On token, set property to value. |
| semantic-token-get token property | Function |
| For token get the value of property. |
| semantic-token-extra-spec token spec | Function |
| Retrieve a specifier for the variable TOKEN. SPC is the symbol whose modifier value to get. This function can get specifiers from any type of TOKEN. Do not use this function if you know what type of token you are dereferencing. Instead, use the function specific to that token type. It will be faster. |
| semantic-token-type-parts token | Function |
| Retrieve the parts of the type TOKEN. |
| semantic-token-type-parent token | Function |
Retrieve the parent of the type TOKEN.
The return value is a list. A value of nil means no parents.
The car of the list is either the parent class, or a list
of parent classes. The cdr of the list is the list of
interfaces, or abstract classes which are parents of TOKEN.
|
| semantic-token-type-parent-superclass token | Function |
| Retrieve the parent super classes of type type TOKEN. |
| semantic-token-type-parent-implement token | Function |
| Retrieve the parent interfaces of type type TOKEN. |
| semantic-token-type-modifiers token | Function |
| Retrieve the type modifiers for the type TOKEN. |
| semantic-token-type-extra-specs token | Function |
| Retrieve the extra specifiers for the type TOKEN. |
| semantic-token-type-extra-spec token spec | Function |
| Retrieve a extra specifier for the type TOKEN. SPEC is the symbol whose modifier value to get. |
| semantic-token-function-args token | Function |
| Retrieve the arguments of the function TOKEN. |
| semantic-token-function-modifiers token | Function |
| Retrieve the type modifiers of the function TOKEN. |
| semantic-token-function-destructor token | Function |
Non-nil if TOKEN is a destructor function.
|
| semantic-token-function-extra-specs token | Function |
| Retrieve the extra specifiers of the function TOKEN. |
| semantic-token-function-extra-spec token spec | Function |
| Retrieve a specifier for the function TOKEN. SPEC is a symbol whose specifier value to get. |
| semantic-token-function-throws token | Function |
Retrieve the throws signal of the function TOKEN.
This is an optional field, and returns nil if it doesn't exist.
|
| semantic-token-function-parent token | Function |
| The parent of the function TOKEN. A function has a parent if it is a method of a class, and if the function does not appear in body of its parent class. |
| semantic-token-variable-const token | Function |
| Retrieve the status of constantness from the variable TOKEN. |
| semantic-token-variable-default token | Function |
| Retrieve the default value of the variable TOKEN. |
| semantic-token-variable-modifiers token | Function |
| Retrieve type modifiers for the variable TOKEN. |
| semantic-token-variable-extra-specs token | Function |
| Retrieve extra specifiers for the variable TOKEN. |
| semantic-token-variable-extra-spec token spec | Function |
| Retrieve a specifier value for the variable TOKEN. SPEC is the symbol whose specifier value to get. |
| semantic-token-include-system token | Function |
| Retrieve the flag indicating if the include TOKEN is a system include. |
For override methods that query a token, see See Token Details.
These functions take some key, and returns information found inside the nonterminal stream. Some will return one token (the first matching item found.) Others will return a list of all items matching a given criterion. All these functions work regardless of a buffer being in memory or not.
| semantic-find-nonterminal-by-name name streamorbuffer &optional search-parts search-include | Function |
Find a nonterminal NAME within STREAMORBUFFER. NAME is a string.
If SEARCH-PARTS is non-nil, search children of tokens.
If SEARCH-INCLUDE is non-nil, search include files.
|
| semantic-find-nonterminal-by-property property value streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals with PROPERTY equal to VALUE in STREAMORBUFFER. Properties can be added with semantic-token-put. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-by-extra-spec spec streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals with a given SPEC in STREAMORBUFFER. SPEC is a symbol key into the modifiers association list. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-by-extra-spec-value spec value streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals with a given SPEC equal to VALUE in STREAMORBUFFER. SPEC is a symbol key into the modifiers association list. VALUE is the value that SPEC should match. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-by-position position streamorbuffer &optional nomedian | Function |
Find a nonterminal covering POSITION within STREAMORBUFFER.
POSITION is a number, or marker. If NOMEDIAN is non-nil, don't do
the median calculation, and return nil.
|
| semantic-find-innermost-nonterminal-by-position position streamorbuffer &optional nomedian | Function |
Find a list of nonterminals covering POSITION within STREAMORBUFFER.
POSITION is a number, or marker. If NOMEDIAN is non-nil, don't do
the median calculation, and return nil.
This function will find the topmost item, and recurse until no more
details are available of findable.
|
| semantic-find-nonterminal-by-token token streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals with a token TOKEN within STREAMORBUFFER. TOKEN is a symbol representing the type of the tokens to find. Optional argument SEARCH-PARTS and SEARCH-INCLUDE are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-standard streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals in STREAMORBUFFER which define simple token types. Optional argument SEARCH-PARTS and SEARCH-INCLUDE are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-by-type type streamorbuffer &optional search-parts search-includes | Function |
| Find all nonterminals with type TYPE within STREAMORBUFFER. TYPE is a string which is the name of the type of the token returned. Optional argument SEARCH-PARTS and SEARCH-INCLUDES are passed to semantic-find-nonterminal-by-function. |
| semantic-find-nonterminal-by-function function streamorbuffer &optional search-parts search-includes | Function |
Find all nonterminals in which FUNCTION match within STREAMORBUFFER.
FUNCTION must return non-nil if an element of STREAM will be included
in the new list.
If optional argument SEARCH-PARTS is non- If SEARCH-INCLUDES is non- |
| semantic-find-nonterminal-by-function-first-match function streamorbuffer &optional search-parts search-includes | Function |
Find the first nonterminal which FUNCTION match within STREAMORBUFFER.
FUNCTION must return non-nil if an element of STREAM will be included
in the new list.
If optional argument SEARCH-PARTS, all sub-parts of tokens are searched.
The over-loadable function semantic-nonterminal-children is used for
searching.
If SEARCH-INCLUDES is non-nil, then all include files are also
searched for matches.
|
| semantic-recursive-find-nonterminal-by-name name buffer | Function |
| Recursively find the first occurrence of NAME. Start search with BUFFER. Recurse through all dependencies till found. The return item is of the form (BUFFER TOKEN) where BUFFER is the buffer in which TOKEN (the token found to match NAME) was found. |
When you just want to get at a nonterminal the cursor is on, there is
a more efficient mechanism than using
semantic-find-nonterminal-by-position. This mechanism
directly queries the overlays the parsing step leaves in the buffer.
This provides for very rapid retrieval of what function or variable
the cursor is currently in.
These functions query the current buffer's overlay system for tokens.
| semantic-find-nonterminal-by-overlay &optional positionormarker buffer | Function |
Find all nonterminals covering POSITIONORMARKER by using overlays.
If POSITIONORMARKER is nil, use the current point.
Optional BUFFER is used if POSITIONORMARKER is a number, otherwise the current
buffer is used. This finds all tokens covering the specified position
by checking for all overlays covering the current spot. They are then sorted
from largest to smallest via the start location.
|
| semantic-find-nonterminal-by-overlay-in-region start end &optional buffer | Function |
| Find all nonterminals which exist in whole or in part between START and END. Uses overlays to determine position. Optional BUFFER argument specifies the buffer to use. |
| semantic-current-nonterminal | Function |
| Return the current nonterminal in the current buffer. If there are more than one in the same location, return the smallest token. |
| semantic-current-nonterminal-parent | Function |
Return the current nonterminals parent in the current buffer.
A token's parent would be a containing structure, such as a type
containing a field. Return nil if there is no parent.
|
Sometimes it is important to reorganize a token stream into a form that is better for display to a user. It is important to not use functions with side effects when doing this, and that could effect the token cache.
There are some existing utility functions which will reorganize the token list for you.
| semantic-bucketize tokens &optional parent filter | Function |
| Sort TOKENS into a group of buckets based on token type. Unknown types are placed in a Misc bucket. Type bucket names are defined by either `semantic-symbol->name-assoc-list'. If PARENT is specified, then TOKENS belong to this PARENT in some way. This will use `semantic-symbol->name-assoc-list-for-type-parts' to generate bucket names. Optional argument FILTER is a filter function to be applied to each bucket. The filter function will take one argument, which is a list of tokens, and may re-organize the list with side-effects. |
| semantic-bucketize-token-token | Variable |
| Function used to get a symbol describing the class of a token. This function must take one argument of a semantic token. It should return a symbol found in `semantic-symbol->name-assoc-list' which semantic-bucketize uses to bin up tokens. To create new bins for an application augment `semantic-symbol->name-assoc-list', and `semantic-symbol->name-assoc-list-for-type-parts' in addition to setting this variable (locally in your function). |
| semantic-adopt-external-members tokens | Function |
|
Rebuild TOKENS so that externally defined members are regrouped.
Some languages such as C++ and CLOS permit the declaration of member
functions outside the definition of the class. It is easier to study
the structure of a program when such methods are grouped together
more logically.
This function uses semantic-nonterminal-external-member-p to determine when a potential child is an externally defined member. Note: Applications which use this function must account for token types which do not have a position, but have children which *do* have positions. Applications should use |
| semantic-orphaned-member-metaparent-type | Variable |
In semantic-adopt-external-members, the type of 'type for metaparents.
A metaparent is a made-up type semantic token used to hold the child list
of orphaned members of a named type.
|
| semantic-mark-external-member-function | Variable |
Function called when an externally defined orphan is found.
Be default, the token is always marked with the adopted property.
This function should be locally bound by a program that needs
to add additional behaviors into the token list.
This function is called with one argument which is a shallow copy
of the token to be modified. This function should return the
token (or a copy of it) which is then integrated into the
revised token list.
|
These functions provide ways reading the names of items in a buffer with completion.
| semantic-read-symbol prompt &optional default stream filter | Function |
| Read a symbol name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from. FILTER is provides a filter on the types of things to complete. FILTER must be a function to call on each element. (See !!! |
| semantic-read-variable prompt &optional default stream | Function |
| Read a variable name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from. |
| semantic-read-function prompt &optional default stream | Function |
| Read a function name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from. |
| semantic-read-type prompt &optional default stream | Function |
| Read a type name from the user for the current buffer. PROMPT is the prompt to use. Optional arguments: DEFAULT is the default choice. If no default is given, one is read from under point. STREAM is the list of tokens to complete from. |
These functions are called `override methods' because they provide generic behaviors, which a given language can override. For example, finding a dependency file in Emacs lisp can be done with the `locate-library' command (which overrides the default behavior.) In C, a dependency can be found by searching a generic search path which can be passed in via a variable.
Any given token consists of Meta information which is best viewed in some textual form. This could be as simple as the token's name, or as a prototype to be added to header file in C. Not only are there several default converters from a Token into text, but there is also some convenient variables that can be used with them. Use these variables to allow options on output forms when displaying tokens in your programs.
| semantic-token->text-functions | Variable |
List of functions which convert a token to text.
Each function must take the parameters TOKEN &optional PARENT COLOR.
TOKEN is the token to convert.
PARENT is a parent token or name which refers to the structure
or class which contains TOKEN. PARENT is NOT a class which a TOKEN
would claim as a parent.
COLOR indicates that the generated text should be colored using
font-lock.
|
| semantic-token->text-custom-list | Variable |
A List used by customizable variables to choose a token to text function.
Use this variable in the :type field of a customizable variable.
|
Every token to text conversion function must take the same parameters, which are TOKEN, the token to be converted, PARENT, the containing parent (like a structure which contains a variable), and COLOR, which is a flag specifying that color should be applied to the returned string.
When creating, or using these strings, particularly with color, use concat to build up larger strings instead of format. This will preserve text properties.
| semantic-name-nonterminal token &optional parent color | Function |
| Return the name string describing TOKEN. The name is the shortest possible representation. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
| semantic-summarize-nonterminal token &optional parent color | Function |
| Summarize TOKEN in a reasonable way. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
| semantic-prototype-nonterminal token &optional parent color | Function |
| Return a prototype for TOKEN. This function should be overloaded, though it need not be used. This is because it can be used to create code by language independent tools. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
| semantic-prototype-file buffer | Function |
| Return a file in which prototypes belonging to BUFFER should be placed. Default behavior (if not overridden) looks for a token specifying the prototype file, or the existence of an EDE variable indicating which file prototypes belong in. |
| semantic-abbreviate-nonterminal token &optional parent color | Function |
| Return an abbreviated string describing TOKEN. The abbreviation is to be short, with possible symbols indicating the type of token, or other information. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
| semantic-concise-prototype-nonterminal token &optional parent color | Function |
| Return a concise prototype for TOKEN. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
| semantic-uml-abbreviate-nonterminal token &optional parent color | Function |
| Return a UML style abbreviation for TOKEN. Optional argument PARENT is the parent type if TOKEN is a detail. Optional argument COLOR means highlight the prototype with font-lock colors. |
These functions help derive information about tokens that may not be obvious for non-traditional languages with their own token types.
| semantic-nonterminal-children token &optional positionalonly | Function |
Return the list of top level children belonging to TOKEN.
Children are any sub-tokens which may contain overlays.
The default behavior (if not overridden with nonterminal-children
is to return type parts for a type, and arguments for a function.
If optional argument POSITIONALONLY is non- If this function is overridden, use semantic-nonterminal-children-default to also include the default behavior, and merely extend your own. Note for language authors: If a mode defines a language that has tokens in it with overlays that should not be considered children, you should still return them with this function. If you do not, then token re-parsing, and database saving will fail. |
| semantic-nonterminal-external-member-parent token | Function |
|
Return a parent for TOKEN when TOKEN is an external member.
TOKEN is an external member if it is defined at a toplevel and
has some sort of label defining a parent. The parent return will
be a string.
The default behavior, if not overridden with
If this function is overridden, use semantic-nonterminal-external-member-parent-default to also include the default behavior, and merely extend your own. |
| semantic-nonterminal-external-member-p parent token | Function |
Return non-nil if PARENT is the parent of TOKEN.
TOKEN is an external member of PARENT when it is somehow tagged
as having PARENT as it's parent.
The default behavior, if not overridden with
If this function is overridden, use
|
| semantic-nonterminal-external-member-children token &optional usedb | Function |
Return the list of children which are not *in* TOKEN.
If optional argument USEDB is non-nil, then also search files in
the Semantic Database. If USEDB is a list of databases, search those
databases.
Children in this case are functions or types which are members of TOKEN, such as the parts of a type, but which are not defined inside the class. C++ and CLOS both permit methods of a class to be defined outside the bounds of the class' definition. The default behavior, if not overridden with
If this function is overridden, use semantic-nonterminal-external-member-children-default to also include the default behavior, and merely extend your own. |
| semantic-nonterminal-protection token &optional parent | Function |
Return protection information about TOKEN with optional PARENT.
This function returns on of the following symbols:
nil - No special protection. Language dependent.
'public - Anyone can access this TOKEN.
'private - Only methods in the local scope can access TOKEN.
'friend - Like private, except some outer scopes are allowed
access to token.
Some languages may choose to provide additional return symbols specific
to themselves. Use of this function should allow for this.
The default behavior (if not overridden with |
| semantic-nonterminal-abstract token &optional parent | Function |
Return non nil if TOKEN is abstract.
Optional PARENT is the parent token of TOKEN.
In UML, abstract methods and classes have special meaning and behavior
in how methods are overridden. In UML, abstract methods are italicized.
The default behavior (if not overridden with |
| semantic-nonterminal-leaf token &optional parent | Function |
Return non nil if TOKEN is leaf.
Optional PARENT is the parent token of TOKEN.
In UML, leaf methods and classes have special meaning and behavior.
The default behavior (if not overridden with |
| semantic-nonterminal-static token &optional parent | Function |
Return non nil if TOKEN is static.
Optional PARENT is the parent token of TOKEN.
In UML, static methods and attributes mean that they are allocated
in the parent class, and are not instance specific.
UML notation specifies that STATIC entries are underlined.
The default behavior (if not overridden with |
| semantic-find-dependency token | Function |
Find the filename represented from TOKEN.
TOKEN may be a stripped element, in which case PARENT specifies a
parent token that has positional information.
Depends on semantic-dependency-include-path for searching. Always searches
`.' first, then searches additional paths.
|
| semantic-find-nonterminal token &optional parent | Function |
| Find the location of TOKEN. TOKEN may be a stripped element, in which case PARENT specifies a parent token that has position information. Different behaviors are provided depending on the type of token. For example, dependencies (includes) will seek out the file that is depended on, and functions will move to the specified definition. |
| semantic-find-documentation token | Function |
| Find documentation from TOKEN and return it as a clean string. TOKEN might have DOCUMENTATION set in it already. If not, there may be some documentation in a comment preceding TOKEN's definition which we can look for. When appropriate, this can be overridden by a language specific enhancement. |
| semantic-up-context &optional point | Function |
Move point up one context from POINT.
Return non-nil if there are no more context levels.
Overloaded functions using up-context take no parameters.
|
| semantic-beginning-of-context &optional point | Function |
Move POINT to the beginning of the current context.
Return non-nil if there is no upper context.
The default behavior uses semantic-up-context. It can
be overridden with beginning-of-context.
|
| semantic-end-of-context &optional point | Function |
Move POINT to the end of the current context.
Return non-nil if there is no upper context.
Be default, this uses semantic-up-context, and assumes parenthetical
block delimiters. This can be overridden with end-of-context.
|
| semantic-get-local-variables &optional point | Function |
Get the local variables based on POINT's context.
Local variables are returned in Semantic token format.
Be default, this calculates the current bounds using context blocks
navigation, then uses the parser with bovine-inner-scope to
parse tokens at the beginning of the context.
This can be overridden with get-local-variables.
|
| semantic-get-local-arguments &optional point | Function |
Get arguments (variables) from the current context at POINT.
Parameters are available if the point is in a function or method.
This function returns a list of tokens. If the local token returns
just a list of strings, then this function will convert them to tokens.
Part of this behavior can be overridden with get-local-arguments.
|
| semantic-get-all-local-variables &optional point | Function |
Get all local variables for this context, and parent contexts.
Local variables are returned in Semantic token format.
Be default, this gets local variables, and local arguments.
This can be overridden with get-all-local-variables.
Optional argument POINT is the location to start getting the variables from.
|
These next set of functions handle local context parsing. This means looking at the code (locally) and navigating, and fetching information such as a the type of the parameter the cursor may be typing in.
| semantic-end-of-command | Function |
Move to the end of the current command.
Be default, uses semantic-command-separation-character.
Override with end-of-command.
|
| semantic-beginning-of-command | Function |
Move to the beginning of the current command.
Be default, users semantic-command-separation-character.
Override with beginning-of-command.
|
| semantic-ctxt-current-symbol &optional point | Function |
Return the current symbol the cursor is on at POINT in a list.
This will include a list of type/field names when applicable.
This can be overridden using ctxt-current-symbol.
|
| semantic-ctxt-current-assignment &optional point | Function |
Return the current assignment near the cursor at POINT.
Return a list as per semantic-ctxt-current-symbol.
Return nil if there is nothing relevant.
Override with ctxt-current-assignment.
|
| semantic-ctxt-current-function &optional point | Function |
Return the current function the cursor is in at POINT.
The function returned is the one accepting the arguments that
the cursor is currently in.
This can be overridden with ctxt-current-function.
|
| semantic-ctxt-current-argument &optional point | Function |
Return the current symbol the cursor is on at POINT.
Override with ctxt-current-argument.
|
| semantic-ctxt-scoped-types &optional point | Function |
Return a list of type names currently in scope at POINT.
Override with ctxt-scoped-types.
|
For details on using these functions to get more detailed information about the current context: See Context Analysis.
If you write a program that uses the stream of tokens in a persistent display or database, it is necessary to know when tokens change so that your displays can be updated. This is especially important as tokens can be replaced, changed, or deleted, and the associated overlays will then throw errors when you try to use them. Complete integration with token changes can be achieved via several very important hooks.
One interesting way to interact with the parser is to let it know that changes you are going to make will not require re-parsing.
| semantic-edits-are-safe | Variable |
When non-nil, modifications do not require a reparse.
This prevents tokens from being marked dirty, and it
prevents top level edits from causing a cache check.
Use this when writing programs that could cause a full
reparse, but will not change the tag structure, such
as adding or updating top-level comments.
|
Next, it is sometimes useful to know what the current parsing state is. These function can let you know what level of re-parsing may be needed. Careful choices on when to reparse can make your program much faster.
| semantic-bovine-toplevel-full-reparse-needed-p &optional checkcache | Function |
Return non-nil if the current buffer needs a full reparse.
Optional argument CHECKCACHE indicates if the cache check should be made.
|
| semantic-bovine-toplevel-partial-reparse-needed-p &optional checkcache | Function |
Return non-nil if the current buffer needs a partial reparse.
This only returns non-nil if semantic-bovine-toplevel-full-reparse-needed-p
returns nil.
Optional argument CHECKCACHE indicates if the cache check should be made
when checking semantic-bovine-toplevel-full-reparse-needed-p.
|
If you need very close interaction with the user's editing, then these two hooks can be used to find out when a given tag is being changed. These hooks could even be used to cut down on re-parsing if used correctly.
For all hooks, make sure you are careful to add it as a local hook if you only want to effect a single buffer. Setting it globally can cause unwanted effects if your program is concerned with a single buffer.
| semantic-dirty-token-hooks | Variable |
Hooks run after when a token is marked as dirty (edited by the user).
The functions must take TOKEN, START, and END as a parameters.
This hook will only be called once when a token is first made dirty,
subsequent edits will not cause this to run a second time unless that
token is first cleaned. Any token marked as dirty will
also be called with semantic-clean-token-hooks, unless a full
reparse is done instead.
|
| semantic-clean-token-hooks | Variable |
Hooks run after a token is marked as clean (re-parsed after user edits.)
The functions must take a TOKEN as a parameter.
Any token sent to this hook will have first been called with
semantic-dirty-token-hooks. This hook is not called for tokens
marked dirty if the buffer is completely re-parsed. In that case, use
semantic-after-toplevel-cache-change-hook.
|
| semantic-change-hooks | Variable |
Hooks run when semantic detects a change in a buffer.
Each hook function must take three arguments, identical to the
common hook after-change-function.
|
Lastly, if you just want to know when a buffer changes, use this hook.
| semantic-after-toplevel-bovinate-hook | Variable |
|
Hooks run after a toplevel token parse.
It is not run if the toplevel parse command is called, and buffer does
not need to be fully re-parsed.
This function is also called when the toplevel cache is flushed, and
the cache is emptied.
For language specific hooks, make sure you define this as a local hook.
This hook should not be used any more.
Use |
| semantic-after-toplevel-cache-change-hook | Variable |
|
Hooks run after the buffer token list has changed.
This list will change when a buffer is re-parsed, or when the token
list in a buffer is cleared. It is *NOT* called if the current token
list partially re-parsed.
Hook functions must take one argument, which is the new list of tokens associated with this buffer. For language specific hooks, make sure you define this as a local hook. |
| semantic-after-partial-cache-change-hook | Variable |
|
Hooks run after the buffer token list has been updated.
This list will change when the current token list has been partially
re-parsed.
Hook functions must take one argument, which is the list of tokens updated among the ones associated with this buffer. For language specific hooks, make sure you define this as a local hook. |
| semantic-before-toplevel-cache-flush-hook | Variable |
Hooks run before the toplevel nonterminal cache is flushed.
For language specific hooks, make sure you define this as a local hook.
This hook is called before a corresponding
semantic-after-toplevel-cache-change-hook which is also called
during a flush when the cache is given a new value of nil.
|