RestToCode.py - a module to translate reST to source code¶
Imports¶
These are listed in the order prescribed by PEP 8.
Third-party imports¶
For the docutils default style sheet and template
Local application imports¶
Supporting Functions¶
This section covers all functions that support the main two functions rest_to_code_string and rest_to_code_file.
find_file_ext: Find the file extension needed, given the language name
See lang.
find_lexer_class operates on the name which is different from the alias . The name is an attribute of every lexer class
This grabs the first filename.
Use only the .py
part of *.py
and similar.
language_comment_type: Allows the use of languages that have only inline comments or only block comments Checks to make sure the comment type is available and returns that information
( '#', '"""', '"""')
for the key 'Python'
( '//', '/*', '*/')
for the key 'C'
If the language supports inline comments, index zero will have a sequence containing a non-empty string as its first element. If the language supports block comments, index one will have a non-empty string.
formulate_comment:Tells the program whether to make a block comment or an inline comment. Number of lines required for the block comment to activate is currently 10,000 consecutive comments. Block comments also activate if the language has no inline comments. Inline comments also activate if the language has no block comments.
line: This is a string of reST that will be turned into a comment by placing the correct comment delimiters in the correct places.
See lang.
is_block_comment: This is a Boolean value that tells the program whether to make line into an inline comment or a block comment. This might be overruled if the language does not support the wanted type of comment.
position: This variable is an integer, and when paired with line_counter, it allows block
comments to be reformatted. The integer starts at the same value as line_counter and is
decremented by one for each line that is written to the string. Once it reaches 0
, the end
comment delimiter is placed at the end of the line.
line_counter: This variable is an integer. It is the number of lines that exist in this block comment. It remains constant as position decrements. It is also used to check to see if there are enough lines to be considered a block comment.
Grab the comment delimiters for the given language
Check to see what kinds of comments the language supports
Create an inline comment
Stops the whole program if there is no inline comment delimiter and it is asking to use an inline comment.
Formulates an inline comment
Create a block comment
formulate_block_comment: Formulates a block comment one line at a time.
See line.
See comment_delimiters.
See position.
See line_counter.
This covers the case that the language does not support inline comments. It places block comment delimiters around a single line to give an inline effect There is no added space between the end of the comment and the end delimiter because this space is not taken out in the Code to reST translation. If a space is added, it is no longer round trip stable. (It adds a space every time it is translated.)
This is the regular block comment case.
It places the open comment delimiter in front of the first line of the comment,
It places a ' * '
in front of every other line, including the last line in the comment
for consistency and visual appeal.
It places the closing comment delimiter at the end of the last line of the comment.
Core Functions¶
This section contains the main two functions rest_to_code_string and rest_to_code_file. |
rest_to_code_file: This function uses rest_to_code_string to convert a reST file into another language. Inputs a reST file, outputs a code file.
lang: Specify the language that the reST will be translated into. This is the key to
the dictionary found in CommentDelimiterInfo.py - Info on comment delimiters for many languages
Ex: 'Python'
or 'C'
source_rst_path: Path to a source reST file to process.
out_path:Path to a destination code file to create. It will be overwritten if it already exists. If out_path is None, the output path will be the source_rst_path but with the correct file extension for the given lang
input_encoding: Encoding to use for the input file. The default of None detects the encoding of the input file.
output_encoding: Encoding to use for the output file.
Find the file extension of the given language.
Obtain the file path without the .rst
extension.
Place the file extension on the file path.
Use docutil’s I/O classes to better handle and sniff encodings.
Note: both these classes automatically close themselves after a read or write.
Gather the entire file into a singe string for easy parsing.
Convert the string of reST to code.
Write the code to the output file.
Remove the CodeChat style from the beginning of the given reST if it’s present.
If the rest begins with the CodeChat style,
Snip it off.
rest_to_code_string: Take string of reST as input, returns a string of code. The string is separated into lines and fed through the conversion one line at a time.
rest_str: The string of reST that will get converted into code. This string is generally multiple lines. The program separates and processes all the lines it is given.
See lang.
This replaces all tabs with four spaces. This is put into place to maintain a consistent translation and promote healthy habits.
Split the reST string into lines. These are compiled into the line_list.
While there are still lines left, convert them.
This try/except pair is put in place to catch unexpected input. If the try doesn’t work, it checks to see if it is even valid reST input.
This is translation for regular code, not comments
Makes sure that the lines that are supposed to be there are there.
See Boolean.
Take the front space off
Add the line of code to the output string.
skips over the added code not including the setline part of the code. Makes sure that the lines that are supposed to be there are there.
See Boolean.
this exception catches the case that the file ends with code rather than a comment
Makes sure that the lines that are supposed to be there are there.
See Boolean.
This is to find the <div>
comments and turn them into comments.
Makes sure that the lines that are supposed to be there are there.
See Boolean.
Used to control the line number of the document
Splits the line into ' <div style="margin-left'
and '{size}em;">'
Splits '{size}em;">'
into '{size}'
and 'm;">'
Turns '{size}'
into a number of divs. Ex. 1.0
or 3.5
Gets the number of spaces needed to put the comment(s) back where it was.
Also, needs to be int()
in order to work in the for loop. size starts out as a float.
Makes sure that the lines that are supposed to be there are there.
See Boolean.
Skips over the added code including the setline part of the code.
All <div>
comments are considered to be inline comments.
Makes sure that the lines that are supposed to be there are there.
See Boolean.
skips over the added code
Makes sure that the lines that are supposed to be there are there.
See Boolean.
Check to see how many lines the comment has.
This line sets the lower limit for consecutive comments turning into block comments
Actually formulate the comments.
This catches the case that the string runs out of lines.
This ensures that the while loop does not loop forever due to the next line being None
See Boolean.
See Boolean.
Boolean: If Boolean is set to True, there was a line of invalid reST, so the program returns an error string.
Return an error message.
html_to_code_file: This function uses html_to_code_string to convert a HTML file into another language. Inputs a HTML file, outputs a code file.
See lang
source_html_path: Path to a source HTML file to process.
See out_path. If out_path is None, the output path will be the source_html_path but with the correct file extension for the given lang
See input_encoding
See output_encoding
Find the file extension of the given language.
Obtain the file path without the .rst
extension.
Place the file extension on the file path.
Use docutil’s I/O classes to better handle and sniff encodings.
Note: both these classes automatically close themselves after a read or write.
Gather the entire file into a singe string for easy parsing.
Convert the string of HTML to code.
Write the code to the output file.
html_to_code_string: Take string of HTML as input, returns a string of code. The string is separated into lines and fed through the conversion one line at a time.
html_str: The string of HTML that will get converted into code. This string is generally multiple lines. The program separates and processes all the lines it is given.
See lang.
lang,
):
string_out = "\n.. set-line:: -3\n\n..\n\n"
def traverse_etree(element):
nonlocal string_out
if isinstance(element.tag, str):
tag = element.tag
if tag == "p":
string_out += "\n.. set-line:: -3\n\n..\n\n"
if element.get("id") is not None:
s = element.get("id")
s = s.replace("-", "_")
string_out += ".. _{}:\n\n".format(s)
if element.text is not None:
element.text
starts out as NoneType
, so tell the program to convert it to a string.
string_out += str(element.text)
elif tag == "div":
div(element)
if element.get("class") == "contents topic":
return
elif tag == "pre":
s = str(element.text)
s = s.replace("\n", "\n ")
s, _ = s.rsplit(" ", 1)
string_out += "\n.. fenced-code::\n\n Beginning fence\n {} Ending fence\n\n..\n\n".format(
s
)
elif tag == "a":
link(element)
elif tag == "b":
string_out += "**{}**".format(element.text)
if element.tail is not None:
string_out += str(element.tail)
elif tag == "i" or tag == "em":
string_out += "*{}*".format(element.text)
if element.tail is not None:
string_out += str(element.tail)
elif tag == "span":
if element.get("class") == "target":
string_out += "_`{}`".format(element.text)
if element.tail is not None:
string_out += str(element.tail)
elif tag == "tt":
if element.get("class") == "docutils literal":
string_out += "``{}".format(element.text)
for child in element:
traverse_etree(child)
if isinstance(element.tag, str):
tag = element.tag
if tag == "p":
string_out += "\n"
elif tag == "div":
div_end(element)
elif tag == "tt":
string_out += "``"
if element.tail is not None:
string_out += str(element.tail)
def div(element):
nonlocal string_out
if element.get("style") is not None:
string_out += '\n.. raw:: html\n\n <div style="{}">\n\n'.format(
element.get("style")
)
elif element.get("class") == "contents topic":
string_out += "\n.. contents::\n\n"
def div_end(element):
nonlocal string_out
if element.get("style") is not None:
string_out += "\n\n.. raw:: html\n\n </div>\n\n..\n\n"
def link(element):
nonlocal string_out
if element.get("class") == "reference internal":
target = element.get("href")
_, target = target.split("#")
target = target.replace("-", "_")
string_out += "`{} <{}_>`_".format(element.text, target)
elif element.get("class") == "reference external":
target = element.get("href")
string_out += "`{} <{}>`_".format(element.text, target)
if element.tail is not None:
string_out += str(element.tail)
Take out the xml encoding; it messes up the tree
If the style sheets are not in the string, get out of the for loop.
Take out the 2 style sheets; they are not needed
If the style sheets are not in the string, get out of the for loop.
This needs to be preprocessed because there are cases where there are <span>
(s) in the middle,
preventing the end tics from being placed in the right spot
html_str = html_str.replace(‘<tt class=”docutils literal”>’, ‘``’)
html_str = html_str.replace(‘</tt>’, ‘``’)
need to write a recursive function so it can catch all the levels of divs
string_out = rest_to_code_string(string_out, lang)
return string_out
""" # Take out the 2 style sheets; they are not needed
for i in range(2):
try:
html_list = html_str.split('<style type="text/css">', 1)
html_list2 = html_list[1].split('</style>\n', 1)
html_str = html_list[0] + html_list2[1]
# If the style sheets are not in the string, get out of the for loop.
except:
break
has_code_block = True
string_out = ""
while has_code_block:
try:
html_to_run, html_to_parse = html_str.split('<pre class="code literal-block">\n', 1)
rst = convert_text(html_to_run, to='rst', format='html')
code, html_str = html_to_parse.split('</pre>', 1)
string_out = string_out + rst + code
except:
rst = convert_text(html_str, to='rst', format='html')
string_out = string_out + rst
has_code_block = False
#string_out = html_str
return string_out
"""