This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
public:nnels:etext:regex [2017/10/01 16:48] sabina.iseli-otto Page moved from public:nnels:regex to public:nnels:public:nnels:etext:regex |
public:nnels:etext:regex [2017/11/02 18:46] farrah.little |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Regular Expressions ====== | ||
+ | Regular expressions (aka regex) is useful for replacing patterns of text, such as headers/ | ||
+ | |||
+ | ====Tips==== | ||
+ | |||
+ | [[https:// | ||
+ | |||
+ | * Word has a lot of options to find letters (^$) and numbers (^#) but these only work with the wildcard option //off// (which it is by default). Only turn the wildcard option on if you're using regex options. Read the info page carefully on when things apply with the wildcard option on/off. | ||
+ | |||
+ | * A lot of the codes for special characters (e.g. page break) are under the " | ||
+ | {{: | ||
+ | ==== In LibreOffice & OpenOffice ==== | ||
+ | Make sure that the '' | ||
+ | |||
+ | [[https:// | ||
+ | [[https:// | ||
+ | |||
+ | ===== Conversion Fixes ===== | ||
+ | The following fixes assume you are using Word, unless otherwise stated. | ||
+ | |||
+ | < | ||
+ | |||
+ | ---- | ||
+ | |||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | In Word, this will only work with wildcards turned on. | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | This looks for the pattern: any-letter space paragraph-break any-letter | ||
+ | |||
+ | The parentheses are used to group what it finds, so \1 refers to the first " | ||
+ | |||
+ | In this way, you are putting back exactly what it found minus the paragraph break. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | Using a-z restricts what it finds to lowercase. | ||
+ | |||
+ | You will likely have to do it again for lines that end with a comma, and possibly en and em dash. Look through your document for patterns of anything else it might have missed. | ||
+ | |||
+ | ---- | ||
+ | |||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | ---- | ||
+ | |||
+ | **PROBLEM**: | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: '' | ||
+ | |||
+ | In LibreOffice, | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | **PROBLEM**: | ||
+ | '' | ||
+ | |||
+ | **SOLUTION**: | ||
+ | |||
+ | Find: '' | ||
+ | |||
+ | Replace with: nothing. If you're doing a paginated title, replace with page breaks. | ||
+ | |||
+ | You will need to remove one of the ^# at the beginning and after the .indd to remove it for 2 digit page numbers, and one last time for single digit page numbers. The following screenshot is an example with a 1-digit page number (see below), followed by the command used to isolate all such instances. | ||
+ | |||
+ | <WRAP center round box 60%> | ||
+ | |||
+ | {{: | ||
+ | |||
+ | Find: ^# | ||
+ | </ | ||
+ | |||
+ | You will also need to do it with the leading ^#^p to catch the footer text that do not have any page numbers with it. | ||
+ | |||
+ | In LibreOffice: | ||
+ | |||
+ | * Verso (left hand) | ||
+ | * '' | ||
+ | * taken piece-by-piece, | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * Recto (right hand) | ||
+ | * '' | ||
+ | * '' | ||