Differences

This shows you the differences between two versions of the page.

--- public:nnels:etext:regex [2018/07/11 22:31]
leah.brochu
+++ public:nnels:etext:regex [2024/05/29 18:37]
rachel.osolen
@@ Line 18: / Line 18: @@
       * If you wanted to remove the hyphen from "BB-8" you would enter ''\1\2'' (i.e., the two groups with nothing between them) into the Replace field. Or, if you wanted to change the hyphen to a space, you would enter ''\1 \2'' (i.e., the two groups with a space between them) into the Replace field.
       * Another example: ''(John) (Smith)'' replaced by ''\2 \1'' (note the spaces in the search and replace strings) – will produce ''Smith John''
+<note tip>Word has a lot of options to find letters (^$) and numbers (^#) when using the non-regex [[public:nnels:etext:find-and-replace|Find & Replace]], but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.
+</note>
-====Tips====
+<note tip>A lot of the codes for special characters (e.g. page break) are under the "Special..." button.
+</note>
-[[https://support.office.com/en-ca/article/Find-and-replace-text-and-other-data-in-your-Word-2010-files-c6728c16-469e-43cd-afe4-7708c6c779b7?ui=en-US&rs=en-CA&ad=CA#__toc282774574|Using wildcards in Microsoft Word]] (this is similar to regular expressions, but Word has a lot of its own syntax)
-  * Word has a lot of options to find letters (^$) and numbers (^#) but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off.
-  * A lot of the codes for special characters (e.g. page break) are under the "Special..." button.
+<note>If you discover a solution to a problem that is not on this page, please contact the Production Coordinator. They can teach you how to add your own solutions through updating this wiki page!</note>
-{{:public:nnels:regex.png?400|}}
-==== In LibreOffice & OpenOffice ====
-Make sure that the ''Regular expressions'' box is checked on the Alternative Find & Replace dialog for all of the search and replace actions below.
-[[https://help.libreoffice.org/Common/List_of_Regular_Expressions|Regular expressions in LibreOffice]]
+=====Problems and Solutions Using Regular Expressions=====
-[[https://wiki.openoffice.org/wiki/Documentation/How_Tos/Regular_Expressions_in_Writer|Regular Expressions in OpenOffice]]
-===== Conversion Fixes =====
+In this section you will find examples of different ways to use ''Find and Replace'' to help you with some common reformatting issues.
-The following fixes assume you are using Word, unless otherwise stated.
-<note>Contribute your problems and regex solutions below. Attach your screenshots of both the problem and solution.</note>
+<note tip>If you don't see the solution to your problem on this page, go to the [[public:nnels:etext:find-and-replace|Using Find & Replace]]. If you still can't find it, they try writing your own Regex, or using a wild card for find and replace.</note>
 ----
@@ Line 77: / Line 72: @@
 <WRAP center round box 80%>
-**PROBLEM:** OCR converted some "1" digits to "i/I" letters, resulting in dates like "i984" or numbers like "3i".
+**PROBLEM**: Hyphenated words that break single word (not over two lines).
+**SOLUTION**: Replace with the same text minus the hyphen.
+Find: ''([a-z])-([a-z])''
+Replace with: ''\1\2''
+Using a-z restricts what it finds to lowercase.
+You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed.
+</WRAP>
+----
+<WRAP center round box 80%>
+**PROBLEM:** OCR converted some "1" digits to "i/I" letters, resulting in dates like "i984" or numbers like "3I".
 **SOLUTION:** Replace "i/I"s that come immediately before of after a number with "1"s. This will be done in two steps
@@ Line 90: / Line 101: @@
 ----
 <WRAP center round box 80%>
-**PROBLEM**: There are extra paragraph breaks. We want to keep the real paragraph breaks and remove the fake extra paragraph breaks.
-**SOLUTION**: Use MS Word's find and replace to remove the extra paragraph breaks using special Word symbols.
+**PROBLEM:** OCR did not recognize spaces around quotation marks.
+  * Example A: As one of Montgomery's British staff officers later put ''it,"I'' feel Monty was astonishing in his relationship with all the Dominion troops.
+  * Example B: The "nasty little ''troublemaker,"as'' Montgomery was widely known in the British army...
+This problem has an added complexity; the pattern has two different solutions:
+  * Example A will need to say: ... later put ''it, "I'' feel Monty... (or, comma-space-quotation mark)
+  * Example B will need to say: The "nasty little troublemaker''," as'' Montgomery... (or, comma-quotation mark-space
+**SOLUTIONS:**
+Example A:\\
+Find: ''([,])(["])([A-z])''\\
+Replace: ''\1 \2\3''
+Example B:
+Find: ''([,])(["])([A-z])''\\
+Replace: ''\1\2 \3''
-Find: ''^p^p'' (you can also search for more than 2 paragraph breaks, i.e. ''^p^p^p'')
+Notes:
+  * You will **not** be able to use "replace all" in this situation. You will need to keep hitting ''Find Next'' and replacing the pattern with the appropriate solution.
+  * You will also need to re-do this, searching for periods instead of commas.
-Replace with: ''^p''
 </WRAP>
 ----
 <WRAP center round box 80%>
-**PROBLEM**: There are newlines/line breaks (↵) instead of paragraph marks (¶).
+**PROBLEM**: There are extra paragraph breaks. We want to keep the real paragraph breaks and remove the fake extra paragraph breaks.
-**SOLUTION**: Find and remove all line breaks and replace with a single paragraph break.
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
+</WRAP>
-Find: ''^m''
+----
-Replace with: ''^p''
+<WRAP center round box 80%>
+**PROBLEM**: There are newlines/line breaks (↵) instead of paragraph marks (¶).
-In LibreOffice, replace all ''\n'' with ''\p'' to convert them to paragraphs.
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
 </WRAP>
@@ Line 121: / Line 152: @@
 ''231(paragraph break)MacG_9781770494220_5p_all_r1.indd 231(paragraph break)10/27/14 11:56 AM(paragraph break)''
-**SOLUTION**: Without using wildcards:
+**SOLUTION**: See: [[public:nnels:etext:find-and-replace|Find & Replace]]
-Find:  ''^#^#^#^pMacG_9781770494220_5p_all_r1.indd ^#^#^#^p10/27/14 11:56 AM^p''
-Replace with: nothing. If you're doing a paginated title, replace with page breaks.
-You will need to remove one of the ^# at the beginning and after the .indd to remove it for 2 digit page numbers, and one last time for single digit page numbers. The following screenshot is an example with a 1-digit page number (see below), followed by the command used to isolate all such instances.
-<WRAP center round box 60%>
-{{:nnels:documentation:content:production:screen_shot_2015-08-06_at_6.10.55_pm.png?300|}}
-Find: ^#^pMacG_9781770494220_5p_all_r1.indd ^#^p10/27/14 11:56 AM^p
-</WRAP>
-You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it.
 </WRAP>
@@ Line 152: / Line 168: @@
   * ''\p.+\s+[0-9OoIil]{1,3}\p'' ### Detect bad line breaks ###
   * ''[^\."?!]$''
+[[public:nnels:etext:start|Return to main eText Page]]

User Tools

Differences

Page Tools

BC Libraries Coop wiki

Site Tools