visual git cherry-pick & rebase

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Cherry Pick

cherry-pick命令”复制”一个提交节点并在当前分支做一次完全一样的新提交。

Rebase

衍合是合并命令的另一种选择。合并把两个父分支合并进行一次提交，提交历史不是线性的。衍合在当前分支上重演另一个分支的历史，提交历史是线性的。本质上，这是线性化的自动的 cherry-pick

上面的命令都在topic分支中进行，而不是master分支，在master分支上重演，并且把分支指向新的节点。注意旧提交没有被引用，将被回收。

要限制回滚范围，使用—onto选项。下面的命令在master分支上重演当前分支从169a6以来的最近几个提交，即2c33a。

同样有git rebase —interactive让你更方便的完成一些复杂操作，比如丢弃、重排、修改、合并提交。没有图片体现这些，细节看这里:git-rebase(1)

visual git reset & merge

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Reset

reset命令把当前分支指向另一个位置，并且有选择的变动工作目录和索引。也用来在从历史仓库中复制文件到索引，而不动工作目录。

如果不给选项，那么当前分支指向到那个提交。如果用—hard选项，那么工作目录也更新，如果用—soft选项，那么都不变。

如果没有给出提交点的版本号，那么默认用HEAD。这样，分支指向不变，但是索引会回滚到最后一次提交，如果用—hard选项，工作目录也同样。

如果给了文件名(或者 -p选项), 那么工作效果和带文件名的checkout差不多，除了索引被更新。

Merge

merge 命令把不同分支合并起来。合并前，索引必须和当前提交相同。如果另一个分支是当前提交的祖父节点，那么合并命令将什么也不做。另一种情况是如果当前提交是另一个分支的祖父节点，就导致fast-forward合并。指向只是简单的移动，并生成一个新的提交。

否则就是一次真正的合并。默认把当前提交(ed489 如下所示)和另一个提交(33104)以及他们的共同祖父节点(b325c)进行一次三方合并。结果是先保存当前目录和索引，然后和父节点33104一起做一次新提交。

visual git checkout

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Checkout

checkout命令用于从历史提交（或者暂存区域）中拷贝文件到工作目录，也可用于切换分支。

当给定某个文件名（或者打开-p选项，或者文件名和-p选项同时打开）时，git会从指定的提交中拷贝文件到暂存区域和工作目录。比如，git checkout HEAD~ foo.c会将提交节点HEAD~(即当前提交节点的父节点)中的foo.c复制到工作目录并且加到暂存区域中。（如果命令中没有指定提交节点，则会从暂存区域中拷贝内容。）注意当前分支不会发生变化。

当不指定文件名，而是给出一个（本地）分支时，那么HEAD标识会移动到那个分支（也就是说，我们“切换”到那个分支了），然后暂存区域和工作目录中的内容会和HEAD对应的提交节点一致。新提交节点（下图中的a47c3）中的所有文件都会被复制（到暂存区域和工作目录中）；只存在于老的提交节点（ed489）中的文件会被删除；不属于上述两者的文件会被忽略，不受影响。

如果既没有指定文件名，也没有指定分支名，而是一个标签、远程分支、SHA-1值或者是像master~3类似的东西，就得到一个匿名分支，称作detached HEAD（被分离的HEAD标识）。这样可以很方便地在历史版本之间互相切换。比如说你想要编译1.6.6.1版本的git，你可以运行git checkout v1.6.6.1（这是一个标签，而非分支名），编译，安装，然后切换回另一个分支，比如说git checkout master。然而，当提交操作涉及到“分离的HEAD”时，其行为会略有不同，详情见在下面。

HEAD标识处于分离状态时的提交操作

当HEAD处于分离状态（不依附于任一分支）时，提交操作可以正常进行，但是不会更新任何已命名的分支。(你可以认为这是在更新一个匿名分支。)

一旦此后你切换到别的分支，比如说master，那么这个提交节点（可能）再也不会被引用到，然后就会被丢弃掉了。注意这个命令之后就不会有东西引用2eecb。

但是，如果你想保存这个状态，可以用命令git checkout -b name来创建一个新的分支。

visual git diff & commit

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Diff

有许多种方法查看两次提交之间的变动。下面是一些示例。

Commit

提交时，git用暂存区域的文件创建一个新的提交，并把此时的节点设为父节点。然后把当前分支指向新的提交节点。下图中，当前分支是master。在运行命令之前，master指向ed489，提交后，master指向新的节点f0cec并以ed489作为父节点。

即便当前分支是某次提交的祖父节点，git会同样操作。下图中，在master分支的祖父节点maint分支进行一次提交，生成了1800b。这样，maint分支就不再是master分支的祖父节点。此时，合并 (或者衍合) 是必须的。

如果想更改一次提交，使用 git commit —amend。git会使用与当前提交相同的父节点进行一次新提交，旧的提交会被取消。

另一个例子是分离HEAD提交,后文讲。

visual git

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

基本用法

上面的四条命令在工作目录、暂存目录(也叫做索引)和仓库之间复制文件。

git add files 把当前文件放入暂存区域。
git commit 给暂存区域生成快照并提交。
git reset — files 用来撤销最后一次git add files，你也可以用git reset 撤销所有暂存区域文件。
git checkout — files 把文件从暂存区域复制到工作目录，用来丢弃本地修改。

你可以用 git reset -p, git checkout -p, or git add -p进入交互模式。
也可以跳过暂存区域直接从仓库取出文件或者直接提交代码。

git commit -a 相当于运行 git add 把所有当前目录下的文件加入暂存区域再运行。git commit.
git commit files 进行一次包含最后一次提交加上工作目录中文件快照的提交。并且文件被添加到暂存区域。
git checkout HEAD — files 回滚到复制最后一次提交。

约定

后文中以下面的形式使用图片。

绿色的5位字符表示提交的ID，分别指向父节点。分支用橘色显示，分别指向特定的提交。当前分支由附在其上的HEAD标识。这张图片里显示最后5次提交，ed489是最新提交。 master分支指向此次提交，另一个maint分支指向祖父提交节点。

Markdown syntax

Posted on 2017-02-07 | Edited on 2020-09-17 | In tool

**Blod**
*Emphasize*
++Undeline++
~~Strikethrough~~
==Heightlight==
^Superscript^
~Subscript~

![pic](pic.jpg)
[git](http://github.com)
@[]()[?]

- unordered list
1. Order list
- [] Task[?]

[^id]
[^id]:xxx

`code`
\`\`\`code block\`\`\`

***page break

---section break

___sentence break

[TOC]

Unicode block

Posted on 2017-02-07 | Edited on 2020-09-17 | In regex

|
|
|
|

A	B	C	D
0000-007F	\p{InBasicLatin}	0000-007F	C0控制符及基本拉丁文</br>(C0 Control and Basic Latin)
0080-00FF	\p{InLatin-1Supplement}	0080-00FF	C1控制符及拉丁文补充-1</br>(C1 Control and Latin 1 Supplement)
0100-017F	\p{InLatinExtended-A}	0100-017F	拉丁文扩展-A</br>(Latin Extended-A)
0180-024F	\p{InLatinExtended-B}	0180-024F	拉丁文扩展-B</br>(Latin Extended-B)
0250-02AF	\p{InIPAExtensions}	0250-02AF	国际音标扩展</br>(IPA Extensions)
02B0-02FF	\p{InSpacingModifierLetters}	02B0-02FF	空白修饰字母</br>(Spacing Modifiers)
0300-036F	\p{InCombiningDiacriticalMarks}	0300-036F	结合用读音符号</br>(Combining Diacritics Marks)
0370-03FF	\p{InGreekandCoptic}	0370-03FF	希腊文及科普特文</br>(Greek and Coptic)
0400-04FF	\p{InCyrillic}	0400-04FF	西里尔字母(Cyrillic)
0500-052F	\p{InCyrillicSupplement}	0500-052F	西里尔字母补充</br>(Cyrillic Supplement)
0530-058F	\p{InArmenian}	0530-058F	亚美尼亚语</br>(Armenian)
0590-05FF	\p{InHebrew}	0590-05FF	希伯来文</br>(Hebrew)
0600-06FF	\p{InArabic}	0600-06FF	阿拉伯文</br>(Arabic)
0700-074F	\p{InSyriac}	0700-074F	叙利亚文</br>(Syriac)
0750-077F	\p{InArabicSupplement}	0750-077F	阿拉伯文补充</br>(Arabic Supplement)
0780-07BF	\p{InThaana}	0780-07BF	马尔代夫语</br>(Thaana)
07C0-07FF	\p{InNKo}	07C0-077F	西非书面语言</br>(N’Ko)
0800-083F	\p{InSamaritan}	0800-085F	阿维斯塔语及巴列维语(Avestan and Pahlavi)
0840-085F	\p{InMandaic}	0860-087F	Mandaic
08A0-08FF	\p{InArabicExtended-A}	0880-08AF	撒马利亚语</br>(Samaritan)
0900-097F	\p{InDevanagari}	0900-097F	天城文书</br>(Devanagari)
0980-09FF	\p{InBengali}	0980-09FF	孟加拉语</br>(Bengali)
0A00-0A7F	\p{InGurmukhi}	0A00-0A7F	锡克教文</br>(Gurmukhi)
0A80-0AFF	\p{InGujarati}	0A80-0AFF	古吉拉特文</br>(Gujarati)
0B00-0B7F	\p{InOriya}	0B00-0B7F	奥里亚文</br>(Oriya)
0B80-0BFF	\p{InTamil}	0B80-0BFF	泰米尔文</br>(Tamil)
0C00-0C7F	\p{InTelugu}	0C00-0C7F	泰卢固文</br>(Telugu)
0C80-0CFF	\p{InKannada}	0C80-0CFF	卡纳达文</br>(Kannada)
0D00-0D7F	\p{InMalayalam}	0D00-0D7F	德拉维族语</br>(Malayalam)
0D80-0DFF	\p{InSinhala}	0D80-0DFF	僧伽罗语</br>(Sinhala)
0E00-0E7F	\p{InThai}	0E00-0E7F	泰文</br>(Thai)
0E80-0EFF	\p{InLao}	0E80-0EFF	老挝文</br>(Lao)
0F00-0FFF	\p{InTibetan}	0F00-0FFF	藏文</br>(Tibetan)
1000-109F	\p{InMyanmar}	1000-109F	缅甸语</br>(Myanmar)
10A0-10FF	\p{InGeorgian}	10A0-10FF	格鲁吉亚语(Georgian)
1100-11FF	\p{InHangulJamo}	1100-11FF	朝鲜文</br>(Hangul Jamo)
1200-137F	\p{InEthiopic}	1200-137F	埃塞俄比亚语</br>(Ethiopic)
1380-139F	\p{InEthiopicSupplement}	1380-139F	埃塞俄比亚语补充</br>(Ethiopic Supplement)
13A0-13FF	\p{InCherokee}	13A0-13FF	切罗基语</br>(Cherokee)
1400-167F	\p{InUnifiedCanadianAboriginalSyllabics}	1400-167F	统一加拿大土著语音节</br>(Unified Canadian Aboriginal Syllabics)
1680-169F	\p{InOgham}	1680-169F	欧甘字母</br>(Ogham)
16A0-16FF	\p{InRunic}	16A0-16FF	如尼文(Runic)
1700-171F	\p{InTagalog}	1700-171F	塔加拉语</br>(Tagalog)
1720-173F	\p{InHanunoo}	1720-173F	Hanunóo
1740-175F	\p{InBuhid}	1740-175F	Buhid
1760-177F	\p{InTagbanwa}	1760-177F	Tagbanwa
1780-17FF	\p{InKhmer}	1780-17FF	高棉语</br>(Khmer)
1800-18AF	\p{InMongolian}	1800-18AF	蒙古文</br>(Mongolian)
18B0-18FF	\p{InUnifiedCanadianAboriginalSyllabicsExtended}	18B0-18FF	Cham
1900-194F	\p{InLimbu}	1900-194F	Limbu
1950-197F	\p{InTaiLe}	1950-197F	德宏泰语</br>(Tai Le)
1980-19DF	\p{InNewTaiLue}	1980-19DF	新傣仂语</br>(New Tai Lue)
19E0-19FF	\p{InKhmerSymbols}	19E0-19FF	高棉语记号</br>(Kmer Symbols)
1A00-1A1F	\p{InBuginese}	1A00-1A1F	Buginese
1A20-1AAF	\p{InTaiTham}	1A20-1A5F	Batak
1B00-1B7F	\p{InBalinese}	1A80-1AEF	Lanna
1B80-1BBF	\p{InSundanese}	1B00-1B7F	巴厘语</br>(Balinese)
1BC0-1BFF	\p{InBatak}	1B80-1BB0	巽他语</br>(Sundanese)
1C00-1C4F	\p{InLepcha}	1BC0-1BFF	Pahawh Hmong
1C50-1C7F	\p{InOlChiki}	1C00-1C4F	雷布查语(Lepcha)
1CC0-1CCF	\p{InSundaneseSupplement}	1C50-1C7F	Ol Chiki
1CD0-1CFF	\p{InVedicExtensions}	1C80-1CDF	曼尼普尔语(Meithei/Manipuri)
1D00-1D7F	\p{InPhoneticExtensions}	1D00-1D7F	语音学扩展</br>(Phonetic Extensions)
1D80-1DBF	\p{InPhoneticExtensionsSupplement}	1D80-1DBF	语音学扩展补充</br>(Phonetic Extensions Supplement)
1DC0-1DFF	\p{InCombiningDiacriticalMarksSupplement}	1DC0-1DFF	结合用读音符号补充</br>(Combining Diacritics Marks Supplement)
1E00-1EFF	\p{InLatinExtendedAdditional}	1E00-1EFF	拉丁文扩充附加</br>(Latin Extended Additional)
1F00-1FFF	\p{InGreekExtended}	1F00-1FFF	希腊语扩充</br>(Greek Extended)
2000-206F	\p{InGeneralPunctuation}	2000-206F	常用标点(General Punctuation)
2070-209F	\p{InSuperscriptsandSubscripts}	2070-209F	上标及下标</br>(Superscripts and Subscripts)
20A0-20CF	\p{InCurrencySymbols}	20A0-20CF	货币符号</br>(Currency Symbols)
20D0-20FF	\p{InCombiningDiacriticalMarksforSymbols}	20D0-20FF	组合用记号</br>(Combining Diacritics Marks for Symbols)
2100-214F	\p{InLetterlikeSymbols}	2100-214F	字母式符号</br>(Letterlike Symbols)
2150-218F	\p{InNumberForms}	2150-218F	数字形式</br>(Number Form)
2190-21FF	\p{InArrows}	2190-21FF	箭头</br>(Arrows)
2200-22FF	\p{InMathematicalOperators}	2200-22FF	数学运算符</br>(Mathematical Operator)
2300-23FF	\p{InMiscellaneousTechnical}	2300-23FF	杂项工业符号</br>(Miscellaneous Technical)
2400-243F	\p{InControlPictures}	2400-243F	控制图片</br>(Control Pictures)
2440-245F	\p{InOpticalCharacterRecognition}	2440-245F	光学识别符</br>(Optical Character Recognition)
2460-24FF	\p{InEnclosedAlphanumerics}	2460-24FF	封闭式字母数字</br>(Enclosed Alphanumerics)
2500-257F	\p{InBoxDrawing}	2500-257F	制表符</br>(Box Drawing)
2580-259F	\p{InBlockElements}	2580-259F	方块元素</br>(Block Element)
25A0-25FF	\p{InGeometricShapes}	25A0-25FF	几何图形</br>(Geometric Shapes)
2600-26FF	\p{InMiscellaneousSymbols}	2600-26FF	杂项符号</br>(Miscellaneous Symbols)
2700-27BF	\p{InDingbats}	2700-27BF	印刷符号</br>(Dingbats)
27C0-27EF	\p{InMiscellaneousMathematicalSymbols-A}	27C0-27EF	杂项数学符号-A</br>(Miscellaneous Mathematical Symbols-A)
27F0-27FF	\p{InSupplementalArrows-A}	27F0-27FF	追加箭头-A</br>(Supplemental Arrows-A)
2800-28FF	\p{InBraillePatterns}	2800-28FF	盲文点字模型</br>(Braille Patterns)
2900-297F	\p{InSupplementalArrows-B}	2900-297F	追加箭头-B</br>(Supplemental Arrows-B)
2980-29FF	\p{InMiscellaneousMathematicalSymbols-B}	2980-29FF	杂项数学符号-B</br>(Miscellaneous Mathematical Symbols-B)
2A00-2AFF	\p{InSupplementalMathematicalOperators}	2A00-2AFF	追加数学运算符</br>(Supplemental Mathematical Operator)
2B00-2BFF	\p{InMiscellaneousSymbolsandArrows}	2B00-2BFF	杂项符号和箭头</br>(Miscellaneous Symbols and Arrows)
2C00-2C5F	\p{InGlagolitic}	2C00-2C5F	格拉哥里字母(Glagolitic)
2C60-2C7F	\p{InLatinExtended-C}	2C60-2C7F	拉丁文扩展-C</br>(Latin Extended-C)
2C80-2CFF	\p{InCoptic}	2C80-2CFF	古埃及语</br>(Coptic)
2D00-2D2F	\p{InGeorgianSupplement}	2D00-2D2F	格鲁吉亚语补充</br>(Georgian Supplement)
2D30-2D7F	\p{InTifinagh}	2D30-2D7F	提非纳文</br>(Tifinagh)
2D80-2DDF	\p{InEthiopicExtended}	2D80-2DDF	埃塞俄比亚语扩展</br>(Ethiopic Extended)
2DE0-2DFF	\p{InCyrillicExtended-A}
2E00-2E7F	\p{InSupplementalPunctuation}	2E00-2E7F	追加标点</br>(Supplemental Punctuation)
2E80-2EFF	\p{InCJKRadicalsSupplement}	2E80-2EFF	CJK 部首补充</br>(CJK Radicals Supplement)
2F00-2FDF	\p{InKangxiRadicals}	2F00-2FDF	康熙字典部首</br>(Kangxi Radicals)
2FF0-2FFF	\p{InIdeographicDescriptionCharacters}	2FF0-2FFF	表意文字描述符</br>(Ideographic Description Characters)
3000-303F	\p{InCJKSymbolsandPunctuation}	3000-303F	CJK 符号和标点</br>(CJK Symbols and Punctuation)
3040-309F	\p{InHiragana}	3040-309F	日文平假名</br>(Hiragana)
30A0-30FF	\p{InKatakana}	30A0-30FF	日文片假名</br>(Katakana)
3100-312F	\p{InBopomofo}	3100-312F	注音字母</br>(Bopomofo)
3130-318F	\p{InHangulCompatibilityJamo}	3130-318F	朝鲜文兼容字母</br>(Hangul Compatibility Jamo)
3190-319F	\p{InKanbun}	3190-319F	象形字注释标志</br>(Kanbun)
31A0-31BF	\p{InBopomofoExtended}	31A0-31BF	注音字母扩展</br>(Bopomofo Extended)
31C0-31EF	\p{InCJKStrokes}	31C0-31EF	CJK 笔画</br>(CJK Strokes)
31F0-31FF	\p{InKatakanaPhoneticExtensions}	31F0-31FF	日文片假名语音扩展</br>(Katakana Phonetic Extensions)
3200-32FF	\p{InEnclosedCJKLettersandMonths}	3200-32FF	封闭式 CJK 文字和月份</br>(Enclosed CJK Letters and Months)
3300-33FF	\p{InCJKCompatibility}	3300-33FF	CJK 兼容</br>(CJK Compatibility)
3400-4DBF	\p{InCJKUnifiedIdeographsExtensionA}	3400-4DBF	CJK 统一表意符号扩展 A</br>(CJK Unified Ideographs Extension A)
4DC0-4DFF	\p{InYijingHexagramSymbols}	4DC0-4DFF	易经六十四卦符号</br>(Yijing Hexagrams Symbols)
4E00-9FFF	\p{InCJKUnifiedIdeographs}	4E00-9FBF	CJK 统一表意符号</br>(CJK Unified Ideographs)
A000-A48F	\p{InYiSyllables}	A000-A48F	彝文音节</br>(Yi Syllables)
A490-A4CF	\p{InYiRadicals}	A490-A4CF	彝文字根</br>(Yi Radicals)
A4D0-A4FF	\p{InLisu}
A500-A63F	\p{InVai}	A500-A61F	Vai
A640-A69F	\p{InCyrillicExtended-B}	A660-A6FF	统一加拿大土著语音节补充</br>(Unified Canadian Aboriginal Syllabics Supplement)
A6A0-A6FF	\p{InBamum}
A700-A71F	\p{InModifierToneLetters}	A700-A71F	声调修饰字母</br>(Modifier Tone Letters)
A720-A7FF	\p{InLatinExtended-D}	A720-A7FF	拉丁文扩展-D</br>(Latin Extended-D)
A800-A82F	\p{InSylotiNagri}	A800-A82F	Syloti Nagri
A830-A83F	\p{InCommonIndicNumberForms}
A840-A87F	\p{InPhags-pa}	A840-A87F	八思巴字</br>(Phags-pa)
A880-A8DF	\p{InSaurashtra}	A880-A8DF	Saurashtra
A8E0-A8FF	\p{InDevanagariExtended}
A900-A92F	\p{InKayahLi}	A900-A97F	爪哇语</br>(Javanese)
A930-A95F	\p{InRejang}
A960-A97F	\p{InHangulJamoExtended-A}
A980-A9DF	\p{InJavanese}	A980-A9DF	Chakma
AA00-AA5F	\p{InCham}	AA00-AA3F	Varang Kshiti
AA60-AA7F	\p{InMyanmarExtended-A}	AA40-AA6F	Sorang Sompeng
AA80-AADF	\p{InTaiViet}	AA80-AADF	Newari
AAE0-AAFF	\p{InMeeteiMayekExtensions}
AB00-AB2F	\p{InEthiopicExtended-A}	AB00-AB5F	越南傣语</br>(Vi?t Thái)
ABC0-ABFF	\p{InMeeteiMayek}	AB80-ABA0	Kayah Li
AC00-D7AF	\p{InHangulSyllables}	AC00-D7AF	朝鲜文音节</br>(Hangul Syllables)
D7B0-D7FF	\p{InHangulJamoExtended-B}	D800-DBFF	High-half zone of UTF-16
D800-DB7F	\p{InHighSurrogates}
DB80-DBFF	\p{InHighPrivateUseSurrogates}
DC00-DFFF	\p{InLowSurrogates}	DC00-DFFF	Low-half zone of UTF-16
E000-F8FF	\p{InPrivateUseArea}	E000-F8FF	自行使用区域</br>(Private Use Zone)
F900-FAFF	\p{InCJKCompatibilityIdeographs}	F900-FAFF	CJK 兼容象形文字</br>(CJK Compatibility Ideographs)
FB00-FB4F	\p{InAlphabeticPresentationForms}	FB00-FB4F	字母表达形式</br>(Alphabetic Presentation Form)
FB50-FDFF	\p{InArabicPresentationForms-A}	FB50-FDFF	阿拉伯表达形式A</br>(Arabic Presentation Form-A)
FE00-FE0F	\p{InVariationSelectors}	FE00-FE0F	变量选择符</br>(Variation Selector)
FE10-FE1F	\p{InVerticalForms}	FE10-FE1F	竖排形式</br>(Vertical Forms)
FE20-FE2F	\p{InCombiningHalfMarks}	FE20-FE2F	组合用半符号</br>(Combining Half Marks)
FE30-FE4F	\p{InCJKCompatibilityForms}	FE30-FE4F	CJK 兼容形式</br>(CJK Compatibility Forms)
FE50-FE6F	\p{InSmallFormVariants}	FE50-FE6F	小型变体形式</br>(Small Form Variants)
FE70-FEFF	\p{InArabicPresentationForms-B}	FE70-FEFF	阿拉伯表达形式B</br>(Arabic Presentation Form-B)
FF00-FFEF	\p{InHalfwidthandFullwidthForms}	FF00-FFEF	半型及全型形式</br>(Halfwidth and Fullwidth Form)
FFF0-FFFF	\p{InSpecials}	FFF0-FFFF	特殊</br>(Specials)

Match whole words

Posted on 2017-02-06 | Edited on 2018-12-16 | In regex

Problem

My cat is brown
category
octocat
staccato

find word ‘cat’
find word begin with ‘cat’
find word end with ‘cat’
find word contain ‘cat’
find word not begin with ‘cat’
find word not end with ‘cat’
find word not contain ‘cat’

Solution

Word boundaries

\bcat\b

Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
Nonboundaries
1
\Bcat\B
Regex options: None

Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

\bcat       (?<!\w)(?=\w)cat
cat\b       cat(?<=\w)(?!\w)
\Bcat       (?<=\w)cat(?!\w)
cat\B       (?<!\w)cat(?=\w)
\b(?!\w*?cat\w*?)\w+?\b

\b -> (?<=\w)(?!\w)|(?<!\w)(?=\w)

Discussion

‹\b› matches in these three positions:

Before the first character in the subject, if the first character is a word character
After the last character in the subject, if the last character is a word character
Between two characters in the subject, where one is a word character and the other
is not a word character

‹\B› matches in these five positions:

Before the first character in the subject, if the first character is not a word character
After the last character in the subject, if the last character is not a word character
Between two word characters
Between two nonword characters
The empty string

Word Characters

Java :
- Java 4 to 6 ‹\w› matches only ASCII characters
- Java 7 ‹\w› extended matches Unicode characters if set the UNICODE_CHARACTER_CLASS flag
- All version Java ‹\b› is Unicode-enabled, supporting any script

.NET, JavaScript, PCRE, Perl, Python, and Ruby have:
- ‹\b› match between two characters where one is matched by ‹\w› and the other by ‹\W›.
- ‹\B› always matches between two characters where both are matched by ‹\w› or ‹\W›
JavaScript, PCRE, and Ruby : ‹\w› is identical to ‹[a-zA-Z0-9_]› so only “whole words only” search in language which use Latin alphabet.
.NET : treats letters and digits from all scripts as word characters. You can do a “whole words only” search on words in any language
Python 2.x: non-ASCII characters are included only if you pass the UNICODE or U flag when creating the regex.
Python 3.x: non-ASCII character are included by default, but you can exclude them with the ASCII or Aflag. This flag affects both ‹\b› and ‹\w› equally.
Perl: depends on your version of Perl and /adlu flags whether ‹\w› is pure ASCII or includes all Unicode letters, digits, and underscores.

Match "start" & "end"

Posted on 2017-02-06 | Edited on 2018-12-16 | In regex

Problem

alpha…..↵
alpha…..↵
begin…..↵
…….end↵
….omega↵
….omega↵

Match 'alpha' occurs at the very beginning
Match two 'alpha' at line head
Match 'omega' occurs at the very end
Match two 'omega' at line end
Match 'begin' at line’s head
Match 'end' at line’s tail

Solution

Start of the subject

^alpha

Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

\Aalpha

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
End of the subject

omega$

Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

omega\Z

Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby Start of a line

^begin

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

end\$

Regex options: ^ and $ match at line breaks
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

JavaScript does not support ‹\A›.
The anchor ‹^› is equivalent to ‹\A›, as long as you do not turn on the “^ and $ match at line breaks” option.

The anchor ‹$› is equivalent to ‹\Z›, as long as you do not turn on the “^ and $ match at line breaks” option.

In Java is Pattern.MULTILINE option

/**
 * Enables multiline mode.
 *
 * <p> In multiline mode the expressions <tt>^</tt> and <tt>$</tt> match
 * just after or just before, respectively, a line terminator or the end of
 * the input sequence.  By default these expressions only match at the
 * beginning and the end of the entire input sequence.
 *
 * <p> Multiline mode can also be enabled via the embedded flag
 * expression&nbsp;<tt>(?m)</tt>.  </p>
 */

The anchors ‹\Z› and ‹\z› always match at the very end of the subject text, after the last character
<\Z› without having to worry about stripping off a trailing line break at the end of your subject text.
<\Z> The very last \r\n|\r|\n -> ↵ will be ignore.
<\z> The very last \r\n|\r|\n -> ↵ will not be ignore.

JavaScript does not support ‹\A›
JavaScript does not support ‹\Z› or ‹\z› at all
.NET, Java, PCRE, Perl, and Ruby support both ‹\Z› and ‹\z›.
Python supports only ‹\Z›.

Variations

.NET, Java, XRegExp, PCRE, Perl, and Python1: (?m) internal mode, for “^ and $ match at line breaks”.
Ruby uses‹(?m)› to turn on “dot matches line breaks” mode.
In Ruby, ‹^› and ‹$› always match at the start and end of each line.
‹(?-m)› to turn off the option.
<(?i)> turn on the ignore the sensitive of letter.
<(?s)> dot matches line breaks.[except Ruby]

Full-text retrieval fundamental

Posted on 2017-02-06 | Edited on 2018-12-16 | In search

Build Index

Data source(Documents) ready be index.
Lexcial analysis, language processing, translate to Terms.
Create dictionary, posting index table.
Writen into hard disk, or other space.

Query process

A. Input query parse.
B. Lexcial analysis, language processing, translate to terms.
C. Syntax analysis, translate to a query tree.
D. Read hard disk index to memory
E. Get every term’s documents list by query tree, get result documents by ‘And’/‘Or’/‘Not’ operations on list.
F. Sort result documents by doc relevance.
G. Return query result.