Syntax Highlighting in QTextEdit

The appropriate use of colors and fonts to highlight the different elements of programming and markup languages helps the brain to grasp document structures. By flagging syntax errors, interactive syntax highlighting also helps reduce the time spent in the "compile, run, test" cycle. Highlighting can even be used to flag spelling mistakes in text documents. With Qt, adding syntax highlighting to a QTextEdit is very easy, as this article will demonstrate.

Qt 3.1 introduced QSyntaxHighlighter as an abstract base class to provide syntax highlighting to a QTextEdit. In Qt 4, the class was moved to the Qt3Support library, and a new QSyntaxHighlighter class took its place. The new QSyntaxHighlighter class is based on Qt 4's new rich text engine and works on arbitrary QTextDocument objects. In this article, we will concentrate on the new class.

Subclassing QSyntaxHighlighter

Adding syntax highlighting to a QTextEdit involves subclassing QSyntaxHighlighter, reimplementing the highlightBlock() function, and instantiating the QSyntaxHighlighter subclass with the QTextEdit's underlying QTextDocument (returned by QTextEdit::document()) as the parent.

QTextDocument will then call highlightBlock() for every line in the document as necessary. In the highlightBlock() reimplementation, we can call setFormat() to set the formatting of the different elements in the line. The following code snippet shows a trivial reimplementation of highlightBlock(), where we set every non-alphanumeric character to be green:

    void MyHighlighter::highlightBlock(const QString &text)
    {
        for (int i = 0; i < text.length(); ++i) {
            if (!text.at(i).isLetterOrNumber())
                setFormat(i, 1, Qt::green);
        }
    }

The QSyntaxHighlighter::setFormat() function exists in three versions, with the following signatures:

    void setFormat(int start, int count,
                   const QTextCharFormat &format);
    void setFormat(int start, int count, const QColor &color);
    void setFormat(int start, int count, const QFont &font);

The start parameter indicates the start index in the text string; count, the number of characters. The third parameter can be a QTextCharFormat object, a QColor, or a QFont.

Keeping Track of State Information

In the trivial example above, the highlighting of each line could be done independently of other lines. For most realistic languages, this assumption does not hold. Take a C++ syntax highlighter as an example. If the user opens a C-style comment on line 10 and closes it on line 15, the lines in between should be highlighted differently than if they were not part of a comment.

QSyntaxHighlighter makes this possible through its "state" mechanism. When we finish highlighting a line, we can associate a state with the line (e.g., "Inside C-Style Comment"), which we can retrieve when we start highlighting the following line. The state is stored as an int.

The following code shows how to handle both C-style and C++-style comments in a C++ syntax highlighter. The highlighter has two states: NormalState and InsideCStyleComment. We define NormalState as equal to -1 because QSyntaxHighlighter's state at the top of the document is always -1.

    void CppHighlighter::highlightBlock(const QString &text)
    {
        enum { NormalState = -1, InsideCStyleComment };
    
        int state = previousBlockState();
        int start = 0;
    
        for (int i = 0; i < text.length(); ++i) {
    
            if (state == InsideCStyleComment) {
                if (text.mid(i, 2) == "*/") {
                    state = NormalState;
                    setFormat(start, i - start + 2, Qt::blue);
                }
            } else {
                if (text.mid(i, 2) == "//") {
                    setFormat(i, text.length() - i, Qt::red);
                    break;
                } else if (text.mid(i, 2) == "/*") {
                    start = i;
                    state = InsideCStyleComment;
                }
            }
        }
        if (state == InsideCStyleComment)
            setFormat(start, text.length() - start, Qt::blue);
    
        setCurrentBlockState(state);
    }

At the beginning of the function, we retrieve the previous line's state using QSyntaxHighlighter::previousBlockState() and store it in the state local variable. Then we iterate over the characters in text and update the state as necessary when we meet /* or */.

We also highlight // comments. Since these comments cannot span multiple lines, we don't need a separate state for these. At the end of the function, we call setCurrentBlockState() with the new state so that it's available when highlighting the next line.

Cpp-Highlight

In addition to keeping track of the state, we also call setFormat() to show C-style comments in blue and C++-style comments in red.

The Syntax Highlighter example provided with Qt 4.2 also demonstrates many of the same principles as this example, but relies on regular expressions to locate different tokens in C++ source code. The state-based approach used by QSyntaxHighlighter allows for a certain degree of flexibility in the way highlighters are implemented.

Example: A Basic HTML Highlighter

We will now show the full code for a slightly more complex example &endash; a class to highlight HTML entities (e.g., &mdash;), tags (e.g., <p>), and comments (e.g., <!-- sep -->).

In addition to reimplementing highlightBlock(), as in the previous example, we also provide functions to let the user of the class specify which colors to use for the various HTML constructs. Let's start with the class definition.

    class HtmlHighlighter : public QSyntaxHighlighter
    {
        Q_OBJECT
    
    public:
        enum Construct {
            Entity,
            Tag,
            Comment,
            LastConstruct = Comment
        };
    
        HtmlHighlighter(QTextDocument *document);
    
        void setFormatFor(Construct construct,
                          const QTextCharFormat &format);
        QTextCharFormat formatFor(Construct construct) const
            { return m_formats[construct]; }
    
    protected:
        enum State {
            NormalState = -1,
            InComment,
            InTag
        };
    
        void highlightBlock(const QString &text);
    
    private:
        QTextCharFormat m_formats[LastConstruct + 1];
    };

The setFormatFor() and formatFor() functions let the user access the formatting used for the supported HTML constructs (entities, tags, and comments). The State enum specifies the three states in which our HTML parser can be in after parsing one line. The NormalState is set to -1, the default state in QSyntaxHighlighter.

Html-Highlight

Let's review the implementation, starting with the constructor:

    HtmlHighlighter::HtmlHighlighter(QTextDocument *document)
        : QSyntaxHighlighter(document)
    {
        QTextCharFormat entityFormat;
        entityFormat.setForeground(QColor(0, 128, 0));
        entityFormat.setFontWeight(QFont::Bold);
        setFormatFor(Entity, entityFormat);
    
        QTextCharFormat tagFormat;
        tagFormat.setForeground(QColor(192, 16, 112));
        tagFormat.setFontWeight(QFont::Bold);
        setFormatFor(Tag, tagFormat);
    
        QTextCharFormat commentFormat;
        commentFormat.setForeground(QColor(128, 10, 74));
        commentFormat.setFontItalic(true);
        setFormatFor(Comment, commentFormat);
    }

In the constructor, we set the default formats for HTML entities, tags, and comments. A format is represented by a QTextCharFormat object. We specify the properties that we want for the highlighted text &endash; the foreground color, the font width (bold or not), the underline style (single, "wiggly"), etc. &endash; and these are applied on top of the existing attributes of the QTextDocument.

    void HtmlHighlighter::setFormatFor(Construct construct,
                                const QTextCharFormat &format)
    {
        m_formats[construct] = format;
        rehighlight();
    }

The setFormatFor() function sets the QTextCharFormat for a given HTML construct. We call QSyntaxHighlighter::rehighlight() to immediately apply the change on the whole document.

The only function left to review is highlightBlock(), which is reimplemented from QSyntaxHighlighter. It's a rather big function, so we will study it chuck by chuck.

    void HtmlHighlighter::highlightBlock(const QString &text)
    {
        int state = previousBlockState();
        int len = text.length();
        int start = 0;
        int pos = 0;

Like in the C++ highlighter example, we start by retrieving the previous line's state using previousBlockState(). Inside the loop, we switch on the current state, interpreting characters differently depending on whether we are inside a tag or a comment.

      while (pos < len) {
            switch (state) {
            case NormalState:
            default:
                while (pos < len) {
                    QChar ch = text.at(pos);
                    if (ch == '<') {
                        if (text.mid(pos, 4) == "<!--") {
                            state = InComment;
                        } else {
                            state = InTag;
                        }
                        break;
                    } else if (ch == '&') {
                        start = pos;
                        while (pos < len
                               && text.at(pos++) != ';')
                            ;
                        setFormat(start, pos - start,
                                  m_formats[Entity]);
                    } else {
                        ++pos;
                    }
                }
                break;

For the NormalState, we advance looking for a left angle (<) or an ampersand (&) character.

If we hit a left angle (<), we enter the InTag or InComment state. If we hit an ampersand (&), we deal with the HTML entity on the spot, formatting it as required. (We could have used an InEntity state instead and handled it in its own switch case, but it isn't necessary because entities, unlike tags and comments, cannot span multiple lines.)

          case InComment:
                start = pos;
                while (pos < len) {
                    if (text.mid(pos, 3) == "-->") {
                        pos += 3;
                        state = NormalState;
                        break;
                    } else {
                        ++pos;
                    }
                }
                setFormat(start, pos - start,
                          m_formats[Comment]);
                break;

If we are inside a comment, we look for the closing --> token. If we find it, we enter the NormalState. In all cases, we highlight the comment using QSyntaxHighlighter::setFormat().

          case InTag:
                QChar quote = QChar::Null;
                start = pos;
                while (pos < len) {
                    QChar ch = text.at(pos);
                    if (quote.isNull()) {
                        if (ch == '\" || ch == '"') {
                            quote = ch;
                        } else if (ch == '>') {
                            ++pos;
                            state = NormalState;
                            break;
                        }
                    } else if (ch == quote) {
                        quote = QChar::Null;
                    }
                    ++pos;
                }
                setFormat(start, pos - start, m_formats[Tag]);
            }
        }

If we are inside a tag, we look for the > token, skipping any quoted attribute value (e.g., <img alt=">>>>">). If we find a >, we enter the NormalState. In all cases, we highlight the tag.

      setCurrentBlockState(state);
    }

At the end of the function, we store the current state so that the next line can start in the correct state.

To use the highlighter, we simply need to instantiate it with a QTextDocument as the parent. For example:

    int main(int argc, char *argv[])
    {
        QApplication app(argc, argv);
        QTextEdit editor;
        HtmlSyntaxHighlighter highlighter(editor.document());
        editor.show();
        return app.exec();
    }

If we wanted to improve the HTML highlighter further, we would use different highlighting for the tag delimiters, names, attributes, and attribute values.

Summary

QSyntaxHighlighter makes it very easy to write interactive syntax highlighters. When the user edits the text, it only rehighlights the parts of the text that need to be updated. For example, if the user modifies line 3 of a 2,000-line document, QSyntaxHighlighter will start by rehighlighting line 3, and will continue only if the state associated with line 3 has changed as a result, stopping at the first line for which the old and new states are identical.

A Graphics View-based syntax highlighter

In Qt 4, QSyntaxHighlighter isn't restricted to QTextEdit. It can be used with any editor based on QTextDocument, including the new QGraphicsTextItem for QGraphicsView, making it possible to provide highlighted text editing facilities for transformed text.

The new QSyntaxHighlighter is also extremely flexible when it comes to handling character formats, thanks to the powerful QTextCharFormat class.


This document is licensed under the Creative Commons Attribution-Share Alike 2.5 license.

Copyright © 2007 Trolltech Trademarks