This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
public:nnels:etext:regex [2017/04/10 23:01] farrah.little |
public:nnels:etext:regex [2018/07/11 22:31] leah.brochu |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Regular Expressions ====== | ====== Regular Expressions ====== | ||
Regular expressions (aka regex) is useful for replacing patterns of text, such as headers/ | Regular expressions (aka regex) is useful for replacing patterns of text, such as headers/ | ||
+ | |||
+ | With regex, you can define patterns of text in a number of different ways, but the most commonly used ones for our purposes are **Ranges** and **Groups**. For more information about others, you can take a look at [[https:// | ||
+ | * Ranges | ||
+ | * Square brackets are always used in pairs and are used to identify //specific characters// | ||
+ | * [A-Z] will find any upper case letter; | ||
+ | * [a-z] will find any lower case letter; | ||
+ | * [A-z] will find any letter (upper or lower case); | ||
+ | * [0-9] will find any number | ||
+ | * [abc] will find any of the letters a, b, or c. | ||
+ | * [F] will find upper case “F” | ||
+ | * [Fred] will find " | ||
+ | * Groups | ||
+ | * Round brackets are used in pairs to enclose //groups//. For example: | ||
+ | * '' | ||
+ | * They must be used in pairs and are addressed by number in the replacement. In the replace field, \1 represents the first group, \2 represents the second group, and so on. For example: | ||
+ | * If you wanted to remove the hyphen from " | ||
+ | * Another example: '' | ||
====Tips==== | ====Tips==== | ||
Line 23: | Line 40: | ||
---- | ---- | ||
+ | <WRAP center round box 80%> | ||
**PROBLEM**: | **PROBLEM**: | ||
- | **SOLUTION**: | + | **SOLUTION**: |
In Word, this will only work with wildcards turned on. | In Word, this will only work with wildcards turned on. | ||
Line 33: | Line 51: | ||
Replace with: '' | Replace with: '' | ||
- | This looks for the pattern: any-letter space paragraph-break any-letter | + | This looks for the pattern: |
The parentheses are used to group what it finds, so \1 refers to the first " | The parentheses are used to group what it finds, so \1 refers to the first " | ||
In this way, you are putting back exactly what it found minus the paragraph break. | In this way, you are putting back exactly what it found minus the paragraph break. | ||
+ | </ | ||
---- | ---- | ||
+ | <WRAP center round box 80%> | ||
**PROBLEM**: | **PROBLEM**: | ||
Line 52: | Line 72: | ||
You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed. | You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed. | ||
+ | </ | ||
+ | |||
+ | ---- | ||
+ | |||
+ | <WRAP center round box 80%> | ||
+ | **PROBLEM: | ||
+ | |||
+ | **SOLUTION: | ||
+ | |||
+ | - | ||
+ | - Find: '' | ||
+ | - Replace: '' | ||
+ | - | ||
+ | - Find: '' | ||
+ | - Replace: '' | ||
+ | </ | ||
---- | ---- | ||
+ | <WRAP center round box 80%> | ||
**PROBLEM**: | **PROBLEM**: | ||
Line 62: | Line 99: | ||
Replace with: '' | Replace with: '' | ||
+ | </ | ||
---- | ---- | ||
+ | <WRAP center round box 80%> | ||
**PROBLEM**: | **PROBLEM**: | ||
Line 74: | Line 113: | ||
In LibreOffice, | In LibreOffice, | ||
+ | </ | ||
---- | ---- | ||
- | <note important> | + | <WRAP center round box 80%> |
- | + | ||
- | < | + | |
- | - Paragraphs will be separated by a blank line. replace those with a unique set of characters that won't be in the text, e.g. '' | + | |
- | - If the lines all end with a space, replace all '' | + | |
- | - Finally, replace all '' | + | |
- | * If the lines wrap properly but there is still a blank line between paragraphs, then a simple replace '' | + | |
- | + | ||
- | < | + | |
- | - Find and replace all double paragraphs | + | |
- | * initiate a find for, ^p^p | + | |
- | - Replace with a unique symbol or code, eg, ' xswedc ' | + | |
- | * (I found placing a space before and after helps make it even more unique and avoid it bunching up with other double paragraphs) this isn't anything special about these letters, other than that they are a unique string of letters we can search on later | + | |
- | - Find and replace all remaining single paragraphs, find = ^p, replace = [single keyboard space] | + | |
- | - Find and replace all the double paragraphs you previously changed into a special symbol or code and change back to a single paragraph | + | |
- | - Find and remove all line breaks, change into double or single paragraphs instead (find = ^m, replace = ^p )</del> | + | |
- | + | ||
- | ---- | + | |
**PROBLEM**: | **PROBLEM**: | ||
'' | '' | ||
Line 115: | Line 137: | ||
You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it. | You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it. | ||
+ | </ | ||
In LibreOffice: | In LibreOffice: | ||
Line 130: | Line 153: | ||
* '' | * '' | ||
- | ---- |