If you maintain support for a language mode in Emacs, or would like to be a maintainer for an unsupported language, then integrating the use of your language with CEDET can be a quick way to get some advanced features.
The tools in the CEDET Suite represent a lot of infrastructure, and have been designed to provide useful and clean interfaces to maintainers of different kinds of tools. Adding support for a language under CEDET can be challenging as the language will need to be "made to fit" a generic tagging infrastructure so that applications built on top of CEDET can work in a language agnostic way.
This document is set up to describe at a high level how to add support for a language, and is sorted so as maximize the number of features provided by your efforts.
The first thing to do is to join the cedet-devel mailing list. This is for two reasons:
The most critical feature to start with is to write a tagging parser for CEDET for your language. The tagging system used by CEDET is in the tool called Semantic.
The very first step in this process is to make sure there is a traditional Emacs major-mode for the language. This major mode must have a robust and tested syntax-table that can correctly identify strings and comments, and has an accurate representation of all parenthetical expressions.
The next step is to choose how to parse your language. A parser for you language can take one of several forms.
Writing a lexer/grammar in Semantic
The most obvious tactic is to write a lexical analyzer, and a syntactic analyzer using Semantic. Within Semantic, there is a port of Bison called Wisent.
You should choose to use the Semantic lexer/grammer format for your language if it has a deterministic grammar. Often times you can download a pre-existing BNF grammar for a language. These BNF grammars can be converted to Wisent grammar format fairly easily.
Using the Semantic grammar format provides the greatest number of features to the CEDET tool suite.
Refer to the Semantic documentation for more on how to do this.
Write a regular-expression based tag generator
If the grammar of your file is irregular, or contains a lot of free-form text, you probably want to write a regular expression based tag generator.
Writing a regular expression based tag generator works fine with the Semantic tool set. You do not get the advantage of incremental parsing, or invalid syntax highlighting with these parsers.
To get started, the existing tag generators for Texinfo in cedet/semantic/semantic-texi.el is a good place to start. Be sure to read the Semantic guide on the TAG API to learn how to create and cook your tags.
Integrate an external parsing tool
You can also pull in an external tool like Exuberant CTags to parse your files. This can be handy to bootstrap support for your language. External parsing tools often have issues where the actual extents of the TAG in your buffer is unknown. They also have problems in that the buffer must be saved before they can work.
Adding an external parser can also be useful in conjunction with a semantic lexical analyzer/grammar. Files not in buffers can be parsed by the external tool, while in buffer files can be parsed by the internal grammar.
Exuberant CTags is already partially supported by Semantic. Extending ECtags support to new languages can be straight forward. See the file cedet/semantic/ctags/semantic-ectag-lang.el for an example of sh script support.
Once you have a good tagging system in place, many tools become enabled for your language. Tools such as tag decoration mode, stickyfunc mode, ECB code browser, and tag jumping are all enabled. In addition, the Semantic Database system will start storing and searching your tags.
The next step is to start tuning the infrastructure to your language. The only way to do this is dig in and start trying out different Semantic tools, and find out when things either don't look right, or don't behave property.
A simple place to start is with the command: semantic-test-all-format-tag-functions. This command will execute a range of functions that will format your tags in different ways. These routines are used in hundreds of places, and are a simple way to start learning about the mode-local tool, and how it works.
For example, in C, the basic way a tag name is formatted is
augmented so that if it is a function pointer, the syntax for that
is added to the name. The code in semantic-c.el looks
Here are some functions you may need to override
There are a wide array of functions to override for your language. If you have a hard time getting your language tuned, join the cedet-devel mailing list and find out more.
EDE provides the infrastructure for how files relate to each other. The Smart Completion tool cannot work in a system involving multiple grouped directories if it cannot rely on EDE to provide this infrastructure.
If your language is typically used in a single directory, or if the links between files are explicity, you may be able to skip this step.
There are three possible ways to involve EDE.
If your language uses Makefiles and compilers the way C does, then you can extend the ede-proj-obj.el infrastructure. It would likely involve creating a few compiler declarations and that's about it.
If your language is more complex, but still supported by AutoMake, then your job is a little more complex. There are several examples in either the Emacs Lisp support, or Texinfo support that you could use as examples.
If your language's build system uses a set of predictable files, such as "Makefile.am", then you can write an EDE project that maps th EDE project structure onto your files. EDE's internal structure happens to make to directories and files, so you will need to find a way to do the same mapping. This will allow you to associate build commands to specific buffers.
The ede-emacs.el and ede-linux.el projects are very specific examples of how something like this might work.
You can also create EDE project wrappers that need to be explicitly created in a user's .emacs file. These are little easier to create and manage for the developer, but harder for a user to use. These are helpful if there are too many unpredictable ways to store and build your source code for your language, and provides an explicit place for users to have project local settings.
The ede-cpp-root.el project is a great example of how to build such a project.
You probably still want to jump to the Smart Completion? Well, to do that, you need to get your language's local context parsing to work. The suite of functions is in semantic-ctxt.el. The code here allows basic movement withing your code, up and down nested code blocks, parsing local variables, and identifying scope modifiers, such as the C++ using statement.
Parenthetical languages like C or Java probably don't need a lot of work here. Non-parenthetical languages that have complex code constructs will need heavy customization.
A key function to start with
is semantic-get-local-variables. Try it out with
something like this:
The condensed output may be insufficient to examine the output. If
so, a helpful configuration for your .emacs file is:
Some other functions that are handy to have working are:
Another important function suite to get right are the current-symbol set of functions. These are all core to the smart completion engine having something to complete. Try them out using M-:. The key ones are:
Hopefully the logic in these key functions won't need to be overridden. They will, however, lead to other smaller scoped functions that you can override for your language. For example, you may need to provide values for, or override these:
The next set of commands you may need to override for you language is probably for the smart completion tool. Getting this tool to work for you language is a tricky proposition, and there are no hard and fast rules. You will need to try out the different parts of the analyzer framework, and tune the output results as needed.
The completion engine is made of two key parts. There is
a Context Analyzer, and a Possible Completions
engine. The first thing is to try out the context analysis with
It should then show a dump of what it found. For C++, it might
look like this:
To get this information, it needs to derive a type cache
and a local scope. The type cache collects all the
datatypes known for the buffer and sorts them together for fast
lookups. It can be queried like this:
The output buffer is in 'Data Debug' mode, so use SPC to open up different lines to see what else is there.
The local scope then can learn about the classes and using
type statements, and put a Scope together. Query the scope like
Once these two tools are producing rational output for your
language, then you can move back to debugging the analyzer. A
reasonable place to start is by using the command:
This command will report everything it can find out about your context, with suggestions as to why it cannot identify a symbol under point.
Each of the above tools has a wide range of functions that can be
overriden. Some example functions that need customization for C
that are used by the completion engine are:
Once you get your first smart completion working, consider adding a new test file to semantic/semantic-ia-utest.el and putting your samples there. Re-running these tests will ensure you do not regress as you get more and more bits of the smart-completion engine working.
This step could also occur after your basic language parser is working. It does not need Smart Completion or EDE support, but does use some local context parsing.
Semantic's code generation system is in SRecode, the Semantic Recoder. This tool uses template files to specify basic structure for recoding tags generated by your language grammar.
Use SRecode minor mode
Enable SRecode minor mode in your .emacs file.
Create a template file
The first step is to create a template file for your language, and add it to the SRecode template path. You can do this by putting your files in your ~/.srecode directory, or modifying the srecode-map-load-path to the location of your templates.
Now create an empty template file, such as srecode-LANG.srt. Use the SRecode minor mode to fill the buffer with C-c / /. It will prompt for the template to use, suggesting the empty template. Press RET to fill in the buffer.
A good start here is to create your own "empty" template. Creating an empty template is a great way to learn about quite a few SRecode template features, as you can include the default filecomment template, and learn about the :file and :user template arguments that fill in template macro values.
Create a custom template argument
The next step is to code up an argument for your language. In the image to the left, the :time and :user keywords are template arguments. They map to Emacs Lisp functions called srecode-semantic-handle-:time and srecode-semantic-handle-:user. These functions fill in the dictionary values calculated from the Emacs environment. Look at the argument handler srecode-semantic-handle-:cpp to learn more about how the FILENAME_SYMBOL is filled in as a C++ specific macro.
The next step is to start creating templates following the Semantic tag generation naming convention. See the SRecode info manual for more on this topic.
Some SRecode applications, such as the srecode-document-insert-comment command also have some custom templates that may need to be filled in. Adding support for any such application templates is the same as creating a generic set of templates. To do so, create a new template file, and fill it in, making sure to specify an application for that file.
Copyright(C) 1997,98,99,2000,01,02,03,04,05,06,07,08,09,10,11 Eric M. Ludlam
Verbatim copying and distribution is permitted in any medium, provided this notice is preserved.