CodeToRestSphinx.py - a Sphinx extension to translate source code to reST

This modules enables Sphinx to read in source files by converting the source code to reST before passing the file on to Sphinx. The overall design:

  1. Monkeypatch Sphinx to include source files in the build, keeping the source file’s extension intact. (Sphinx strips the extension of reST files).

  2. When Sphinx reads a source file, check to see if the file’s extension is intact. If so, it’s a source file; translate it to reST then pass it on to Sphinx.

Imports

These are listed in the order prescribed by PEP 8.

Standard library

import contextlib
import os
from pathlib import Path
from typing import Dict
 

Third-party imports

import sphinx
from sphinx.application import Sphinx
import sphinx.builders
from sphinx.config import Config
import sphinx.io
import sphinx.project
import sphinx.util
 

This was deprecated in Sphinx v5.1.0.

if sphinx.version_info[:3] >= (5, 1, 0):
    from sphinx.util.osutil import path_stabilize
else:
    from sphinx.util import path_stabilize
 

The exception FiletypeNotFoundError was deprecated in Sphinx v2.4.0 by moving it from sphinx.io to sphinx.errors.

if sphinx.version_info[:3] >= (2, 4, 0):
    from sphinx.errors import FiletypeNotFoundError
else:
    from sphinx.io import FiletypeNotFoundError
import pygments.util
 

Local application imports

from .CodeToRest import code_to_rest_string, add_highlight_language
from .CodeToMarkdown import code_to_markdown_string
from .CommentDelimiterInfo import SUPPORTED_GLOBS
from .SourceClassifier import get_lexer
from . import __version__
 
 

Utility

exclude_small_files

Python requires __init__.py files; these are often small or even empty. Provide a function to exclude these from the docs in order to reduce noise. Invoke this from a setup function in a Sphinx extension.

Therefore, this function excludes small files matching the given glob from a Sphinx build.

def exclude_small_files(

The Sphinx application object.

    app_: Sphinx,

A glob pattern specifying which files should be excluded if they are empty.

    pattern: str,

Optional; the maximum size of an ignorable file, in bytes. Defaults to 0.

    max_size: int = 0,
):

This returns a function which will be called by the config-inited event.

    def excluder(app: Sphinx, config: Config):

The path must start in the srcdir.

        root_path = Path(app.srcdir)

This is slightly inefficient, since it doesn’t use the existing excludes to avoid searching already-excluded values.

        app.config.exclude_patterns += [  # type: ignore

Paths must be relative to the srcdir.

            x.relative_to(root_path).as_posix()
            for x in root_path.glob(pattern)
            if x.stat().st_size <= max_size
        ]
 

Connect this to the config-inited Sphinx event.

    app_.connect("config-inited", excluder)
 
 

source-read event

Create a logger for issuing warnings during the build process.

logger = sphinx.util.logging.getLogger(__name__)
 
 

The source-read event occurs when a source file is read. If it’s code, this routine changes it into reST or Markdown.

def _source_read(
    app,

docname: The name of the document that was read. It contains a path relative to the project directory and (typically) no extension.

    docname,

A list whose single element is the contents of the source file.

    source,
):
    if is_source_code(app.env, docname):

See if it’s an extension we should process.

        try:

See if source_file matches any of the globs.

            lexer = None
            lfg = app.config.CodeChat_lexer_for_glob
            path_docname = Path(docname)
            for glob, lexer_alias in lfg.items():
                if path_docname.match(glob):

On a match, pass the specified lexer alias.

                    lexer = get_lexer(alias=lexer_alias)
                    break

Do this after checking the CodeChat_lexer_for_glob list, since this will raise an exception on failure.

            lexer = lexer or get_lexer(filename=docname, code=source[0])
 

Translate code to reST or Markdown.

            if is_markdown_docname(app.config, docname):
                source[0] = code_to_markdown_string(source[0], lexer=lexer)
                markup = "Markdown"
            else:
                source[0] = code_to_rest_string(source[0], lexer=lexer)
                source[0] = add_highlight_language(source[0], lexer)
                markup = "reST"
            logger.info(
                "Converted as {} using the {} lexer.".format(markup, lexer.name)
            )

        except (KeyError, pygments.util.ClassNotFound) as e:

We don’t support this language.

            logger.warning(
                "Unsupported source code language: " + str(e), location=docname
            )
 
 

Return True if the supplied docname is source code.

def is_source_code(
    env,

See docname.

    docname,
):

If the docname’s extension doesn’t change when asking for its full path, then it’s source code. Normally, the docname of foo.rst is foo; only for source code is the docname of foo.c also foo.c. Look up the name and extension using doc2path.

    docname_ext = env.doc2path(docname, None)
    return Path(docname_ext) == Path(docname)
 
 

Return True if the supplied docname is Markdown; False means reST.

def is_markdown_docname(

The Sphinx config object.

    config,

See docname.

    docname,
):

Get the second extension: given a file named a.foo.bar, produce [".foo"]; given a.bar, produce [].

    docname_suffixes = Path(docname).suffixes

See if this is a recognized Markdown extension.

    return (
        len(docname_suffixes) > 1
        and config.source_suffix.get(docname_suffixes[-2]) == "markdown"
    )
 
 

Monkeypatch

Sphinx doesn’t naturally look for source files. Simply adding all supported source file extensions to conf.py’s source_suffix doesn’t work, since foo.c and foo.h will now both been seen as the docname foo, making then indistinguishable. See also my post on sphinx-users.

path2doc patch

For source files, make their docname the same as the file name; for reST files, allow Sphinx to strip off the extension as before. This patch accomplishes this. It comes from sphinx.project.Project, line 79 and following in Sphinx 7.2.6.

def _path2doc(self, filename: str | os.PathLike[str]) -> str | None:

Return the docname for the filename if the file is a document.

filename should be absolute or relative to the source directory.

    try:
        return self._path_to_docname[filename]  # type: ignore[index]
    except KeyError:
        if os.path.isabs(filename):
            with contextlib.suppress(ValueError):
                filename = os.path.relpath(filename, self.srcdir)

        for suffix in self.source_suffix:
            if os.path.basename(filename).endswith(suffix):
                return path_stabilize(filename).removesuffix(suffix)
 

The following code was added.

        if is_supported_language(filename):
            return filename
 

the file does not have a docname

        return None


sphinx.project.Project.path2doc = _path2doc
 

Avoid recomputing the value of this variable by defining it globally.

source_suffixpatterns = None
 
 

Return True if the provided filename is a source code language CodeChat supports.

def is_supported_language(filename):

type: (str) -> bool Initialize this if necessary.

    global source_suffixpatterns
    if not source_suffixpatterns:
        source_suffixpatterns = SUPPORTED_GLOBS | set(
            _config.CodeChat_lexer_for_glob.keys()
        )
    path_filename = Path(filename)
    for source_suffixpattern in source_suffixpatterns:
        if path_filename.match(source_suffixpattern):
            return True
    return False
 
 

doc2path patch

Next, the way docnames get transformed back to a full path needs to be fixed for source files. Specifically, a docname might be the source file, without adding an extension. This code comes from sphinx.project.Project of Sphinx 7.2.6.

def _doc2path(self, docname: str, absolute: bool) -> str:

Return the filename for the document name.

If absolute is True, return as an absolute path. Else, return as a relative path to the source directory.

    try:
        filename = self._docname_to_path[docname]
    except KeyError:

Three lines of code added here – check for the no-extension case.

        if os.path.isfile(os.path.join(self.srcdir, docname)):
            filename = docname
        else:

Backwards compatibility: the document does not exist

            filename = docname + self._first_source_suffix

    if absolute:
        return os.path.join(self.srcdir, filename)
    return filename


sphinx.project.Project.doc2path = _doc2path
 
 

get_filetype patch

The get_filetype function raises an exception if it can’t determine the type of a file. Patch it to also recognize source code as reST. This was taken from sphinx.util, version 7.2.6.

def _get_filetype(source_suffix: Dict[str, str], filename: str) -> str:
    for suffix, filetype in source_suffix.items():
        if filename.endswith(suffix):

If default filetype (None), considered as restructuredtext.

            return filetype or "restructuredtext"
    else:

The following code was added.

        if is_supported_language(filename):
            return (
                "markdown"
                if is_markdown_docname(_config, filename)
                else "restructuredtext"
            )

This was the existing code.

        raise FiletypeNotFoundError
 
 

Per the where to patch docs, patch this where the get_filetype function is used, not where it’s defined:

SPHINX_VERSION = sphinx.version_info[:3]
if SPHINX_VERSION >= (2, 4, 0) and SPHINX_VERSION < (4, 0, 0):

The function sphinx.io.get_filetype was deprecated in Sphinx v2.4.0; it was renamed to sphinx.util.get_filetype instead. Sphinx uses sphinx.deprecation._ModuleWrapper to perform deprecation. Since get_filetype is used in sphinx.io, we need to monkeypatch inside it, hence the _module (a member of the _ModuleWrapper).

    sphinx.io._module.get_filetype = _get_filetype
elif SPHINX_VERSION >= (4, 0, 0) and SPHINX_VERSION < (5, 0, 0):

In these versions, get_filetype is used in sphinx.io. It’s no longer deprecated, but removed; therefore, a direct monkeypatch works.

    sphinx.io.get_filetype = _get_filetype
else:

In current Sphinx, get_filetype is used in several places:

    sphinx.builders.get_filetype = _get_filetype

Current Sphinx (7.2.6) doesn’t need this.

    sphinx.io.get_filetype = _get_filetype
    sphinx.transforms.i18n.get_filetype = _get_filetype
 
 

Correct naming for the “show source” option

The following function corrects the extension of source files in the “source” link. By default, Sphinx (in sphinx.builders.html.StandaloneHTMLBuilder.get_doc_context) creates a sourcename by appending a file’s extension to the value returned by doc2path. For non-source files, doc2path’s return value contains no extension, so this works fine. However, for source files, doc2path’s return value contains an extension, so that appending the extension to source files produces a doubled extension – .py.py, for example.

def _html_page_context(

See app.

    app,

The canonical name of the page being rendered, that is, without the .html suffix and using slashes as path separators.

    pagename,

The name of the template to render; this will be ‘page.html’ for all pages from reST documents.

    templatename,

A dictionary of values that are given to the template engine to render the page and can be modified to include custom values. Keys must be strings.

    context,

A doctree when the page is created from a reST documents; None when the page is created from an HTML template alone.

    doctree,
):
    sourcename = context.get("sourcename")
    ext = Path(pagename).suffix

The extension Sphinx uses optionally includes the html_sourcelink_suffix.

    sphinx_ext = ext + (
        ""
        if ext == app.config.html_sourcelink_suffix
        else app.config.html_sourcelink_suffix
    )
    double_ext = ext + sphinx_ext

Only provide the rename if necessary.

    if sourcename and ext and sourcename.endswith(double_ext):

Take off the second of the double extensions.

        context["sourcename"] = sourcename[: -len(double_ext)] + sphinx_ext
 
 

Extension setup

This routine defines the entry point called by Sphinx to initialize this extension.

def setup(

See app.

    app,
):

Ensure we’re using a new enough Sphinx using require_sphinx.

    app.require_sphinx("2.0")
 

Use the source-read event hook to transform source code to reST before Sphinx processes it.

    app.connect("source-read", _source_read)
 

Add the CodeChat.css style sheet using add_css_file.

    app.add_css_file("CodeChat.css")
 

Add the CodeChat_lexer_for_glob config value. See add_config_value.

    app.add_config_value("CodeChat_lexer_for_glob", {}, "html")
 

Use the html-page-context event to correct the extension of source files.

    app.connect("html-page-context", _html_page_context)
 

An ugly hack: we need to get to the Config object after conf.py’s values have been loaded. They aren’t loaded yet, so we store the config object to access it later when it is loaded.

    global _config
    _config = app.config
 
    return {"version": __version__, "parallel_read_safe": True}