Very similar localization strings

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

During the translation job I found several very similar strings with only slight differences: mostly grammatical (e.g. lettercase or punctuation), but sometimes in content (e.g. "Recent activity" and "Latest activity"), which means the same. These examples of redundancy also contributes to increasing the total number of all strings to an enormous level (more than half million currently). Once we will need to figure out some programmed method to rationalize these “quasi-duplications”. As a first step I started to collect them together to have a better overview.

You are warmly invited to submit your newly found pairs of strings via the Form or scroll through the already found strings in the Spreadsheet.

Legend of Categorization

To oversee the differences, I organized these couple hundreds pair of strings into the following categories:

Main Category Sub-category Description
Identical Strings seems to be exactly the same. Maybe a bug of l10n_server module could be the reason of their parallel existence?
Different case lowercase / Uppercase One-word String ’A‘ has a lowercase initial, but String ’B‘ starts with an uppercase letter.
lowercase / Sentence case Multiple words of String ’A‘ all consists only lowercase letters, but the first word of String ’B‘ starts with uppercase initial.
Sentence case / Title Case Only the first word of String ’A‘ has an uppercase initial, but all words of String ’B‘ starts only with capital letters.
Acronym Uppercase / CAPITALIZED A mispelled acronym (a longer expression's shortened form by the initials of its words).
Punctuation String ’B‘ has an extra character, usually one of the followings: comma (,), period (.), colon (:), dash (-), underscore (_), asterisk (*), triple dots (...), apostrophe ('), single quote (’), etc.
Whitespace Compound word divided A word which should be written in one, is separated in parts by a space.
Content Same meaning, same length Both String ’A‘ and String ’B‘ carries the same meaning with same length of characters as well.
Same meaning, diff length Even if String ’A‘ and String ’B‘ means (nearly) the same, but one of them is longer in term of number of characters.
Plurals String ’B‘ is the plural form of String ’A‘.
Variable name String ’A‘ and String ’B‘ are identical or means the same, but includes different variable names.
Formatting String ’A‘ and String ’B‘ are identical or means the same, but one of them is HTML-formatted.

Generally left column (String ’A‘) is the preferred form, right column (String ’B‘) is considered as incorrect.

Related issues

  • #194141: Implement msgmerge type fuzzy matching
  • #563228: Suggestion for similar strings
  • #371056: Duplicate translations verifier should be case sensitive
  • #691790: Advanced search: Case sensitive search
  • #1791612 Logorrhea killer