From: Ben Schmidt Date: Tue, 28 Feb 2012 13:46:35 +0000 (+1100) Subject: Add wrapping modes to facilitate wrapping non-English texts. X-Git-Tag: RELEASE_1_2_18a1~13 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=dbd4145109593b4628800a8a227c7daa7714792d;p=thirdparty%2Fmlmmj.git Add wrapping modes to facilitate wrapping non-English texts. - Add %wordwrap%, %charwrap% and %userwrap% line-breaking modes. - \ now means a non-breakable space, not a break opportunity. - Introduce \/ to mark a break opportunity. - Introduce \= to inhibit a break. --- diff --git a/ChangeLog b/ChangeLog index 55e7434b..5984229f 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,4 @@ + o Add different wrapping modes to facilitate wrapping many languages o Fix backslash escaping mechanism so double backslash can't effectively recurse and form part of another escape sequence, other non-unicode escapes aren't ignored, and first lines of included files don't 'escape' escaping. @@ -9,10 +10,9 @@ o Make mlmmj-sub and +subscribe[-digest|-nomail] switch existing subscriptions. o Add a switch to bypass notifying the owner on subscribe/unsubscribe. - o Introduce \ to indicate line-break positions to enable sensible - wrapping of Chinese and similar text. - o Allow lines to be longer than the wrapping width if there are no spaces, - as generated email addresses (e.g. for moderation) won't work if split. + o Introduce \ to indicate non-breakable space, \= to mark other + locations where breaks should not occur, and \/ to mark locations where + breaks can occur o Add rejection of posts and obstruction of subscriptions. o Avoid bogus error messages when logging that the list address has been found in To: or CC: headers. diff --git a/README.listtexts b/README.listtexts index 84f872bf..f1e7b2e5 100644 --- a/README.listtexts +++ b/README.listtexts @@ -15,8 +15,11 @@ This file documents the following aspects of list texts: - Supported list texts - Format - Conditionals -- Formatting and formatted substitutions +- Wrapping +- Formatting and comments +- Formatted substitutions - Unformatted substitutions +- Escapes Naming scheme ------------- @@ -240,21 +243,75 @@ need to worry about it. Note that when multiple parameters can be given for the directives, these have 'or' behaviour; to get 'and' behaviour, nest conditionals. -Formatting and formatted substitutions --------------------------------------- +Wrapping +-------- -These formatting-related directives work with multiple lines, so are generally -not appropriate for use in headers. They are: +There are various directives available to assist with wrapping and formatting. +Wrapping needs to be enabled for each paragraph with: - %wrap% - %wrap W% - lines until the next blank line are concatenated and are then rewrapped to a - width of W (or 76 if W is omitted); lines have whitespace trimmed before - being joined with a single space; lines are broken at spaces or at points - marked for breaking with \; the width is reckoned including any text + concatenate and rewrap lines until the next blank line to a width of W (or 76 + if W is omitted); second and later lines are preceded with as many spaces as + the width preceding the directive; the width is reckoned including any text preceding the directive and any indentation preserved from a file which - included the current one, so it is an absolute maximum width; it is measured - in bytes + included the current one, so it is an absolute maximum width + +To cater for various languages, there are a number of different wrapping modes +that can be set. These can be set either before or after wrapping is specified, +and can even be changed part way through a paragraph if desired. The following +directives control them: + +- %wordwrap% +- %ww% + use word-wrapping (this is the default; good for English, French, Greek and + other languages that use an alphabet and spaces between words); lines have + whitespace trimmed from both ends and are joined with a single space; lines + are broken at spaces or at points marked for breaking with \/, but not at + spaces escaped with a backslash + +- %charwrap% +- %cw% + use character-wrapping (good for Chinese, Japanese and Korean which use + characters without spaces between words); lines have only leading whitespace + trimmed and are joined without inserting anything at the joint; lines are + broken at space or any non-ASCII character except where disallowed with \= + +- %userwrap% +- %uw% + use user-wrapping (for more complex languages or wherever complete manual + control is desired); lines have only leading whitespace trimmed and are + joined without inserting anything at the joint; lines are broken only where + marked for breaking with \/ + +If a line with any of the directives in this section, after processing, +contains only whitespace, the line does not appear at all in the output (the +newline and any whitespace is omitted). + +Formatting and comments +----------------------- + +The following directives are available to assist with formatting and +readability: + +- %^% + start the line here; anything preceding this directive is ignored (useful for + using indentation for readability without ruining the formatting of the text + when it is processed) + +- %comment% +- %$% + end the line here; anything following this directive is ignored + +If a line with any of these directives, after processing, contains only +whitespace, the line does not appear at all in the output (the newline and any +whitespace is omitted). + +Formatted substitutions +----------------------- + +These formatted substitutions work with multiple lines, so are generally not +appropriate for use in headers. They are: - %text T% text from the file named T in the listdir/text directory; the name may only @@ -303,27 +360,12 @@ not appropriate for use in headers. They are: the list of indexes of messages which may not have been received as they bounced -- %^% - start the line here; anything preceding this directive is ignored (useful for - using indentation for readability without ruining the formatting of the text - when it is processed) - -- %comment% -- %$% - end the line here; anything following this directive is ignored - -- %% - a single % - Directives which include a list of items have the behaviour that each item is preceded and followed by the same text as preceded and followed the directive -on its line. Only one such directive is supported per line. - -The %wrap% and %wrap W% directives, as well as those which include a block of -text, have the behaviour that second and later lines are preceded with as many -spaces as there were characters preceding the directive. Apart from the -%wrap% and %wrap W% directives, any text following the directive on the same -line is omitted. +on its line; only one such directive is supported per line. Those which include +a block of text have the behaviour that second and later lines are preceded +with as many spaces as there were bytes preceding the directive; any text +following such directives on the same line is omitted. If a line with any of these directives, after processing, contains only whitespace, the line does not appear at all in the output (the newline and any @@ -332,6 +374,8 @@ whitespace is omitted). Unformatted substitutions ------------------------- +Unformatted substitutions that are available are: + - $bouncenumbers$ (available only in probe) the formatted list of indexes of messages which may not have been received as @@ -494,18 +538,35 @@ Unformatted substitutions newline stripped; the name may only include letters, digits, underscore, dot and hyphen; note that there is a formatted version of this directive +Escapes +------- + +These allow you to avoid special meanings of characters used for other purposes +in list texts, as well as control the construction of the texts at a fairly low +level. + - $$ a single $ +- %% + a single % + +- \\ + a single \ + - \uNNNN - (NNNN are hex digits) + (NNNN represents four hex digits) a Unicode character (this is not really appropriate for use in a header, except perhaps the Subject: header as Mlmmj does automatic quoting for that header as described above) - \ + a space, but don't allow the line to be broken here when wrapping + +- \/ nothing, but allow the line to be broken here when wrapping -- \\ - a single \ +- \= + nothing, but don't allow the line to be broken here when wrapping + diff --git a/src/prepstdreply.c b/src/prepstdreply.c index 9eeed326..53da1cf5 100644 --- a/src/prepstdreply.c +++ b/src/prepstdreply.c @@ -98,6 +98,13 @@ enum conditional_target { }; +enum wrap_mode { + WRAP_WORD, + WRAP_CHAR, + WRAP_USER +}; + + struct text { char *action; char *reason; @@ -108,6 +115,7 @@ struct text { formatted *fmts; int wrapindent; int wrapwidth; + enum wrap_mode wrapmode; conditional *cond; conditional *skip; }; @@ -458,6 +466,7 @@ text *open_text_file(const char *listdir, const char *filename) txt->fmts = NULL; txt->wrapindent = 0; txt->wrapwidth = 0; + txt->wrapmode = WRAP_WORD; txt->cond = NULL; txt->skip = NULL; @@ -916,6 +925,20 @@ static int handle_directive(text *txt, char **line_p, char **pos_p, *line_p = line; return 0; } + } else if(strcmp(token, "ww") == 0 || + strcmp(token, "wordwrap") == 0 || + strcmp(token, "cw") == 0 || + strcmp(token, "charwrap") == 0 || + strcmp(token, "uw") == 0 || + strcmp(token, "userwrap") == 0) { + if (*token == 'w') txt->wrapmode = WRAP_WORD; + if (*token == 'c') txt->wrapmode = WRAP_CHAR; + if (*token == 'u') txt->wrapmode = WRAP_USER; + line = concatstr(2, line, endpos + 1); + *pos_p = line + (*pos_p - *line_p); + myfree(*line_p); + *line_p = line; + return 0; } else if(strncmp(token, "control ", 8) == 0) { token = filename_token(token + 8); if (token != NULL) { @@ -990,8 +1013,8 @@ char *get_processed_text_line(text *txt, int headers, char *tmp; char *prev = NULL; int len, i; - int directive; int incision, spc; + int directive, inhibitbreak; int peeking = 0; /* for a failed conditional without an else */ int skipwhite; /* skip whitespace after a conditional directive */ int swallow; @@ -1047,8 +1070,11 @@ char *get_processed_text_line(text *txt, int headers, /* Wrapping */ len = strlen(prev); pos = prev + len - 1; - while (pos > prev && (*pos == ' ' || *pos == '\t')) - pos--; + if (txt->wrapmode == WRAP_WORD) { + while (pos > prev && + (*pos == ' ' || *pos == '\t')) + pos--; + } pos++; *pos = '\0'; len = pos - prev; @@ -1071,8 +1097,12 @@ char *get_processed_text_line(text *txt, int headers, if (*prev == '\0') { tmp = mystrdup(pos); } else { - tmp = concatstr(3, prev, " ", pos); - len++; + if (txt->wrapmode == WRAP_WORD) { + tmp = concatstr(3, prev, " ", pos); + len++; + } else { + tmp = concatstr(2, prev, pos); + } } myfree(line); line = tmp; @@ -1096,9 +1126,13 @@ char *get_processed_text_line(text *txt, int headers, incision = -1; } directive = 0; + inhibitbreak = 0; while (*pos != '\0') { if (txt->wrapwidth != 0 && len >= txt->wrapwidth && !peeking && spc != -1) break; + if ((unsigned char)*pos > 0xbf && txt->skip == NULL && + txt->wrapmode == WRAP_CHAR && + !inhibitbreak) spc = len - 1; if (*pos == '\r') { *pos = '\0'; pos++; @@ -1113,23 +1147,35 @@ char *get_processed_text_line(text *txt, int headers, txt->src->upcoming = mystrdup(pos); break; } else if (*pos == ' ') { - if (txt->skip == NULL) { - spc = pos - line; - } + if (txt->skip == NULL && + txt->wrapmode != WRAP_USER && + !inhibitbreak) spc = len; + inhibitbreak = 0; } else if (*pos == '\t') { /* Avoid breaking due to peeking */ + inhibitbreak = 0; } else if (txt->src->transparent) { /* Do nothing if the file is to be included * transparently */ if (peeking && txt->skip == NULL) break; + inhibitbreak = 0; } else if (*pos == '\\' && txt->skip == NULL) { if (peeking) break; - if (*(pos + 1) == ' ') { + if (*(pos + 1) == '/') { spc = len - 1; tmp = pos + 2; + inhibitbreak = 0; + } else if (*(pos + 1) == '=') { + tmp = pos + 2; + /* Ensure we don't wrap the next + * character */ + inhibitbreak = 1; } else { - /* Includes backslash */ + /* Includes space and backslash */ tmp = pos + 1; + /* Ensure we don't wrap a space */ + if (*(pos+1) == ' ') inhibitbreak = 1; + else inhibitbreak = 0; } *pos = '\0'; tmp = concatstr(2, line, tmp); @@ -1143,6 +1189,10 @@ char *get_processed_text_line(text *txt, int headers, substitute_one(&line, &pos, listaddr, listdelim, listdir, txt); if (len != pos - line) { + /* Cancel any break inhibition if the + * length changed (which will be + * because of $$) */ + inhibitbreak = 0; len = pos - line; } skipwhite = 0; @@ -1175,6 +1225,11 @@ char *get_processed_text_line(text *txt, int headers, } } if (len != pos - line) { + /* Cancel any break inhibition if the + * length changed (which will be + * because of %% or %^% or an empty + * list) */ + inhibitbreak = 0; len = pos - line; } /* handle_directive() sets up for the next @@ -1217,7 +1272,8 @@ char *get_processed_text_line(text *txt, int headers, continue; } if (spc != -1) { - if (line[spc] == ' ') line[spc] = '\0'; + if (txt->wrapmode == WRAP_WORD && + line[spc] == ' ') line[spc] = '\0'; spc++; if (line[spc] == '\0') spc = -1; }