BurningBright

  • Home

  • Tags

  • Categories

  • Archives

  • Search

visual git cherry-pick & rebase

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Cherry Pick

cherry-pick命令”复制”一个提交节点并在当前分支做一次完全一样的新提交。

Rebase

衍合是合并命令的另一种选择。合并把两个父分支合并进行一次提交,提交历史不是线性的。衍合在当前分支上重演另一个分支的历史,提交历史是线性的。 本质上,这是线性化的自动的 cherry-pick

上面的命令都在topic分支中进行,而不是master分支,在master分支上重演,并且把分支指向新的节点。注意旧提交没有被引用,将被回收。

要限制回滚范围,使用—onto选项。下面的命令在master分支上重演当前分支从169a6以来的最近几个提交,即2c33a。

同样有git rebase —interactive让你更方便的完成一些复杂操作,比如丢弃、重排、修改、合并提交。没有图片体现这些,细节看这里:git-rebase(1)

visual git reset & merge

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Reset

reset命令把当前分支指向另一个位置,并且有选择的变动工作目录和索引。也用来在从历史仓库中复制文件到索引,而不动工作目录。

如果不给选项,那么当前分支指向到那个提交。如果用—hard选项,那么工作目录也更新,如果用—soft选项,那么都不变。

如果没有给出提交点的版本号,那么默认用HEAD。这样,分支指向不变,但是索引会回滚到最后一次提交,如果用—hard选项,工作目录也同样。

如果给了文件名(或者 -p选项), 那么工作效果和带文件名的checkout差不多,除了索引被更新。

Merge

merge 命令把不同分支合并起来。合并前,索引必须和当前提交相同。如果另一个分支是当前提交的祖父节点,那么合并命令将什么也不做。 另一种情况是如果当前提交是另一个分支的祖父节点,就导致fast-forward合并。指向只是简单的移动,并生成一个新的提交。

否则就是一次真正的合并。默认把当前提交(ed489 如下所示)和另一个提交(33104)以及他们的共同祖父节点(b325c)进行一次三方合并。结果是先保存当前目录和索引,然后和父节点33104一起做一次新提交。

visual git checkout

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Checkout

checkout命令用于从历史提交(或者暂存区域)中拷贝文件到工作目录,也可用于切换分支。

当给定某个文件名(或者打开-p选项,或者文件名和-p选项同时打开)时,git会从指定的提交中拷贝文件到暂存区域和工作目录。比如,git checkout HEAD~ foo.c会将提交节点HEAD~(即当前提交节点的父节点)中的foo.c复制到工作目录并且加到暂存区域中。(如果命令中没有指定提交节点,则会从暂存区域中拷贝内容。)注意当前分支不会发生变化。

当不指定文件名,而是给出一个(本地)分支时,那么HEAD标识会移动到那个分支(也就是说,我们“切换”到那个分支了),然后暂存区域和工作目录中的内容会和HEAD对应的提交节点一致。新提交节点(下图中的a47c3)中的所有文件都会被复制(到暂存区域和工作目录中);只存在于老的提交节点(ed489)中的文件会被删除;不属于上述两者的文件会被忽略,不受影响。

如果既没有指定文件名,也没有指定分支名,而是一个标签、远程分支、SHA-1值或者是像master~3类似的东西,就得到一个匿名分支,称作detached HEAD(被分离的HEAD标识)。这样可以很方便地在历史版本之间互相切换。比如说你想要编译1.6.6.1版本的git,你可以运行git checkout v1.6.6.1(这是一个标签,而非分支名),编译,安装,然后切换回另一个分支,比如说git checkout master。然而,当提交操作涉及到“分离的HEAD”时,其行为会略有不同,详情见在下面。

HEAD标识处于分离状态时的提交操作

当HEAD处于分离状态(不依附于任一分支)时,提交操作可以正常进行,但是不会更新任何已命名的分支。(你可以认为这是在更新一个匿名分支。)

一旦此后你切换到别的分支,比如说master,那么这个提交节点(可能)再也不会被引用到,然后就会被丢弃掉了。注意这个命令之后就不会有东西引用2eecb。

但是,如果你想保存这个状态,可以用命令git checkout -b name来创建一个新的分支。

visual git diff & commit

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

Diff

有许多种方法查看两次提交之间的变动。下面是一些示例。

Commit

提交时,git用暂存区域的文件创建一个新的提交,并把此时的节点设为父节点。然后把当前分支指向新的提交节点。下图中,当前分支是master。 在运行命令之前,master指向ed489,提交后,master指向新的节点f0cec并以ed489作为父节点。

即便当前分支是某次提交的祖父节点,git会同样操作。下图中,在master分支的祖父节点maint分支进行一次提交,生成了1800b。 这样,maint分支就不再是master分支的祖父节点。此时,合并 (或者 衍合) 是必须的。

如果想更改一次提交,使用 git commit —amend。git会使用与当前提交相同的父节点进行一次新提交,旧的提交会被取消。

另一个例子是分离HEAD提交,后文讲。

visual git

Posted on 2017-02-08 | Edited on 2020-06-17 | In tool

基本用法

上面的四条命令在工作目录、暂存目录(也叫做索引)和仓库之间复制文件。

  • git add files 把当前文件放入暂存区域。
  • git commit 给暂存区域生成快照并提交。
  • git reset — files 用来撤销最后一次git add files,你也可以用git reset 撤销所有暂存区域文件。
  • git checkout — files 把文件从暂存区域复制到工作目录,用来丢弃本地修改。

你可以用 git reset -p, git checkout -p, or git add -p进入交互模式。
也可以跳过暂存区域直接从仓库取出文件或者直接提交代码。

  • git commit -a 相当于运行 git add 把所有当前目录下的文件加入暂存区域再运行。git commit.
  • git commit files 进行一次包含最后一次提交加上工作目录中文件快照的提交。并且文件被添加到暂存区域。
  • git checkout HEAD — files 回滚到复制最后一次提交。

约定

后文中以下面的形式使用图片。

绿色的5位字符表示提交的ID,分别指向父节点。分支用橘色显示,分别指向特定的提交。当前分支由附在其上的HEAD标识。 这张图片里显示最后5次提交,ed489是最新提交。 master分支指向此次提交,另一个maint分支指向祖父提交节点。

Markdown syntax

Posted on 2017-02-07 | Edited on 2020-09-17 | In tool
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
**Blod**
*Emphasize*
++Undeline++
~~Strikethrough~~
==Heightlight==
^Superscript^
~Subscript~

![pic](pic.jpg)
[git](http://github.com)
@[]()[?]

- unordered list
1. Order list
- [] Task[?]

[^id]
[^id]:xxx

`code`
\`\`\`code block\`\`\`

***page break

---section break

___sentence break

[TOC]

Unicode block

Posted on 2017-02-07 | Edited on 2020-09-17 | In regex
1
2
3
4
|
|
|
|
A B C D
0000-007F \p{InBasicLatin} 0000-007F C0控制符及基本拉丁文</br>(C0 Control and Basic Latin)
0080-00FF \p{InLatin-1Supplement} 0080-00FF C1控制符及拉丁文补充-1</br>(C1 Control and Latin 1 Supplement)
0100-017F \p{InLatinExtended-A} 0100-017F 拉丁文扩展-A</br>(Latin Extended-A)
0180-024F \p{InLatinExtended-B} 0180-024F 拉丁文扩展-B</br>(Latin Extended-B)
0250-02AF \p{InIPAExtensions} 0250-02AF 国际音标扩展</br>(IPA Extensions)
02B0-02FF \p{InSpacingModifierLetters} 02B0-02FF 空白修饰字母</br>(Spacing Modifiers)
0300-036F \p{InCombiningDiacriticalMarks} 0300-036F 结合用读音符号</br>(Combining Diacritics Marks)
0370-03FF \p{InGreekandCoptic} 0370-03FF 希腊文及科普特文</br>(Greek and Coptic)
0400-04FF \p{InCyrillic} 0400-04FF 西里尔字母(Cyrillic)
0500-052F \p{InCyrillicSupplement} 0500-052F 西里尔字母补充</br>(Cyrillic Supplement)
0530-058F \p{InArmenian} 0530-058F 亚美尼亚语</br>(Armenian)
0590-05FF \p{InHebrew} 0590-05FF 希伯来文</br>(Hebrew)
0600-06FF \p{InArabic} 0600-06FF 阿拉伯文</br>(Arabic)
0700-074F \p{InSyriac} 0700-074F 叙利亚文</br>(Syriac)
0750-077F \p{InArabicSupplement} 0750-077F 阿拉伯文补充</br>(Arabic Supplement)
0780-07BF \p{InThaana} 0780-07BF 马尔代夫语</br>(Thaana)
07C0-07FF \p{InNKo} 07C0-077F 西非书面语言</br>(N’Ko)
0800-083F \p{InSamaritan} 0800-085F 阿维斯塔语及巴列维语(Avestan and Pahlavi)
0840-085F \p{InMandaic} 0860-087F Mandaic
08A0-08FF \p{InArabicExtended-A} 0880-08AF 撒马利亚语</br>(Samaritan)
0900-097F \p{InDevanagari} 0900-097F 天城文书</br>(Devanagari)
0980-09FF \p{InBengali} 0980-09FF 孟加拉语</br>(Bengali)
0A00-0A7F \p{InGurmukhi} 0A00-0A7F 锡克教文</br>(Gurmukhi)
0A80-0AFF \p{InGujarati} 0A80-0AFF 古吉拉特文</br>(Gujarati)
0B00-0B7F \p{InOriya} 0B00-0B7F 奥里亚文</br>(Oriya)
0B80-0BFF \p{InTamil} 0B80-0BFF 泰米尔文</br>(Tamil)
0C00-0C7F \p{InTelugu} 0C00-0C7F 泰卢固文</br>(Telugu)
0C80-0CFF \p{InKannada} 0C80-0CFF 卡纳达文</br>(Kannada)
0D00-0D7F \p{InMalayalam} 0D00-0D7F 德拉维族语</br>(Malayalam)
0D80-0DFF \p{InSinhala} 0D80-0DFF 僧伽罗语</br>(Sinhala)
0E00-0E7F \p{InThai} 0E00-0E7F 泰文</br>(Thai)
0E80-0EFF \p{InLao} 0E80-0EFF 老挝文</br>(Lao)
0F00-0FFF \p{InTibetan} 0F00-0FFF 藏文</br>(Tibetan)
1000-109F \p{InMyanmar} 1000-109F 缅甸语</br>(Myanmar)
10A0-10FF \p{InGeorgian} 10A0-10FF 格鲁吉亚语(Georgian)
1100-11FF \p{InHangulJamo} 1100-11FF 朝鲜文</br>(Hangul Jamo)
1200-137F \p{InEthiopic} 1200-137F 埃塞俄比亚语</br>(Ethiopic)
1380-139F \p{InEthiopicSupplement} 1380-139F 埃塞俄比亚语补充</br>(Ethiopic Supplement)
13A0-13FF \p{InCherokee} 13A0-13FF 切罗基语</br>(Cherokee)
1400-167F \p{InUnifiedCanadianAboriginalSyllabics} 1400-167F 统一加拿大土著语音节</br>(Unified Canadian Aboriginal Syllabics)
1680-169F \p{InOgham} 1680-169F 欧甘字母</br>(Ogham)
16A0-16FF \p{InRunic} 16A0-16FF 如尼文(Runic)
1700-171F \p{InTagalog} 1700-171F 塔加拉语</br>(Tagalog)
1720-173F \p{InHanunoo} 1720-173F Hanunóo
1740-175F \p{InBuhid} 1740-175F Buhid
1760-177F \p{InTagbanwa} 1760-177F Tagbanwa
1780-17FF \p{InKhmer} 1780-17FF 高棉语</br>(Khmer)
1800-18AF \p{InMongolian} 1800-18AF 蒙古文</br>(Mongolian)
18B0-18FF \p{InUnifiedCanadianAboriginalSyllabicsExtended} 18B0-18FF Cham
1900-194F \p{InLimbu} 1900-194F Limbu
1950-197F \p{InTaiLe} 1950-197F 德宏泰语</br>(Tai Le)
1980-19DF \p{InNewTaiLue} 1980-19DF 新傣仂语</br>(New Tai Lue)
19E0-19FF \p{InKhmerSymbols} 19E0-19FF 高棉语记号</br>(Kmer Symbols)
1A00-1A1F \p{InBuginese} 1A00-1A1F Buginese
1A20-1AAF \p{InTaiTham} 1A20-1A5F Batak
1B00-1B7F \p{InBalinese} 1A80-1AEF Lanna
1B80-1BBF \p{InSundanese} 1B00-1B7F 巴厘语</br>(Balinese)
1BC0-1BFF \p{InBatak} 1B80-1BB0 巽他语</br>(Sundanese)
1C00-1C4F \p{InLepcha} 1BC0-1BFF Pahawh Hmong
1C50-1C7F \p{InOlChiki} 1C00-1C4F 雷布查语(Lepcha)
1CC0-1CCF \p{InSundaneseSupplement} 1C50-1C7F Ol Chiki
1CD0-1CFF \p{InVedicExtensions} 1C80-1CDF 曼尼普尔语(Meithei/Manipuri)
1D00-1D7F \p{InPhoneticExtensions} 1D00-1D7F 语音学扩展</br>(Phonetic Extensions)
1D80-1DBF \p{InPhoneticExtensionsSupplement} 1D80-1DBF 语音学扩展补充</br>(Phonetic Extensions Supplement)
1DC0-1DFF \p{InCombiningDiacriticalMarksSupplement} 1DC0-1DFF 结合用读音符号补充</br>(Combining Diacritics Marks Supplement)
1E00-1EFF \p{InLatinExtendedAdditional} 1E00-1EFF 拉丁文扩充附加</br>(Latin Extended Additional)
1F00-1FFF \p{InGreekExtended} 1F00-1FFF 希腊语扩充</br>(Greek Extended)
2000-206F \p{InGeneralPunctuation} 2000-206F 常用标点(General Punctuation)
2070-209F \p{InSuperscriptsandSubscripts} 2070-209F 上标及下标</br>(Superscripts and Subscripts)
20A0-20CF \p{InCurrencySymbols} 20A0-20CF 货币符号</br>(Currency Symbols)
20D0-20FF \p{InCombiningDiacriticalMarksforSymbols} 20D0-20FF 组合用记号</br>(Combining Diacritics Marks for Symbols)
2100-214F \p{InLetterlikeSymbols} 2100-214F 字母式符号</br>(Letterlike Symbols)
2150-218F \p{InNumberForms} 2150-218F 数字形式</br>(Number Form)
2190-21FF \p{InArrows} 2190-21FF 箭头</br>(Arrows)
2200-22FF \p{InMathematicalOperators} 2200-22FF 数学运算符</br>(Mathematical Operator)
2300-23FF \p{InMiscellaneousTechnical} 2300-23FF 杂项工业符号</br>(Miscellaneous Technical)
2400-243F \p{InControlPictures} 2400-243F 控制图片</br>(Control Pictures)
2440-245F \p{InOpticalCharacterRecognition} 2440-245F 光学识别符</br>(Optical Character Recognition)
2460-24FF \p{InEnclosedAlphanumerics} 2460-24FF 封闭式字母数字</br>(Enclosed Alphanumerics)
2500-257F \p{InBoxDrawing} 2500-257F 制表符</br>(Box Drawing)
2580-259F \p{InBlockElements} 2580-259F 方块元素</br>(Block Element)
25A0-25FF \p{InGeometricShapes} 25A0-25FF 几何图形</br>(Geometric Shapes)
2600-26FF \p{InMiscellaneousSymbols} 2600-26FF 杂项符号</br>(Miscellaneous Symbols)
2700-27BF \p{InDingbats} 2700-27BF 印刷符号</br>(Dingbats)
27C0-27EF \p{InMiscellaneousMathematicalSymbols-A} 27C0-27EF 杂项数学符号-A</br>(Miscellaneous Mathematical Symbols-A)
27F0-27FF \p{InSupplementalArrows-A} 27F0-27FF 追加箭头-A</br>(Supplemental Arrows-A)
2800-28FF \p{InBraillePatterns} 2800-28FF 盲文点字模型</br>(Braille Patterns)
2900-297F \p{InSupplementalArrows-B} 2900-297F 追加箭头-B</br>(Supplemental Arrows-B)
2980-29FF \p{InMiscellaneousMathematicalSymbols-B} 2980-29FF 杂项数学符号-B</br>(Miscellaneous Mathematical Symbols-B)
2A00-2AFF \p{InSupplementalMathematicalOperators} 2A00-2AFF 追加数学运算符</br>(Supplemental Mathematical Operator)
2B00-2BFF \p{InMiscellaneousSymbolsandArrows} 2B00-2BFF 杂项符号和箭头</br>(Miscellaneous Symbols and Arrows)
2C00-2C5F \p{InGlagolitic} 2C00-2C5F 格拉哥里字母(Glagolitic)
2C60-2C7F \p{InLatinExtended-C} 2C60-2C7F 拉丁文扩展-C</br>(Latin Extended-C)
2C80-2CFF \p{InCoptic} 2C80-2CFF 古埃及语</br>(Coptic)
2D00-2D2F \p{InGeorgianSupplement} 2D00-2D2F 格鲁吉亚语补充</br>(Georgian Supplement)
2D30-2D7F \p{InTifinagh} 2D30-2D7F 提非纳文</br>(Tifinagh)
2D80-2DDF \p{InEthiopicExtended} 2D80-2DDF 埃塞俄比亚语扩展</br>(Ethiopic Extended)
2DE0-2DFF \p{InCyrillicExtended-A}
2E00-2E7F \p{InSupplementalPunctuation} 2E00-2E7F 追加标点</br>(Supplemental Punctuation)
2E80-2EFF \p{InCJKRadicalsSupplement} 2E80-2EFF CJK 部首补充</br>(CJK Radicals Supplement)
2F00-2FDF \p{InKangxiRadicals} 2F00-2FDF 康熙字典部首</br>(Kangxi Radicals)
2FF0-2FFF \p{InIdeographicDescriptionCharacters} 2FF0-2FFF 表意文字描述符</br>(Ideographic Description Characters)
3000-303F \p{InCJKSymbolsandPunctuation} 3000-303F CJK 符号和标点</br>(CJK Symbols and Punctuation)
3040-309F \p{InHiragana} 3040-309F 日文平假名</br>(Hiragana)
30A0-30FF \p{InKatakana} 30A0-30FF 日文片假名</br>(Katakana)
3100-312F \p{InBopomofo} 3100-312F 注音字母</br>(Bopomofo)
3130-318F \p{InHangulCompatibilityJamo} 3130-318F 朝鲜文兼容字母</br>(Hangul Compatibility Jamo)
3190-319F \p{InKanbun} 3190-319F 象形字注释标志</br>(Kanbun)
31A0-31BF \p{InBopomofoExtended} 31A0-31BF 注音字母扩展</br>(Bopomofo Extended)
31C0-31EF \p{InCJKStrokes} 31C0-31EF CJK 笔画</br>(CJK Strokes)
31F0-31FF \p{InKatakanaPhoneticExtensions} 31F0-31FF 日文片假名语音扩展</br>(Katakana Phonetic Extensions)
3200-32FF \p{InEnclosedCJKLettersandMonths} 3200-32FF 封闭式 CJK 文字和月份</br>(Enclosed CJK Letters and Months)
3300-33FF \p{InCJKCompatibility} 3300-33FF CJK 兼容</br>(CJK Compatibility)
3400-4DBF \p{InCJKUnifiedIdeographsExtensionA} 3400-4DBF CJK 统一表意符号扩展 A</br>(CJK Unified Ideographs Extension A)
4DC0-4DFF \p{InYijingHexagramSymbols} 4DC0-4DFF 易经六十四卦符号</br>(Yijing Hexagrams Symbols)
4E00-9FFF \p{InCJKUnifiedIdeographs} 4E00-9FBF CJK 统一表意符号</br>(CJK Unified Ideographs)
A000-A48F \p{InYiSyllables} A000-A48F 彝文音节</br>(Yi Syllables)
A490-A4CF \p{InYiRadicals} A490-A4CF 彝文字根</br>(Yi Radicals)
A4D0-A4FF \p{InLisu}
A500-A63F \p{InVai} A500-A61F Vai
A640-A69F \p{InCyrillicExtended-B} A660-A6FF 统一加拿大土著语音节补充</br>(Unified Canadian Aboriginal Syllabics Supplement)
A6A0-A6FF \p{InBamum}
A700-A71F \p{InModifierToneLetters} A700-A71F 声调修饰字母</br>(Modifier Tone Letters)
A720-A7FF \p{InLatinExtended-D} A720-A7FF 拉丁文扩展-D</br>(Latin Extended-D)
A800-A82F \p{InSylotiNagri} A800-A82F Syloti Nagri
A830-A83F \p{InCommonIndicNumberForms}
A840-A87F \p{InPhags-pa} A840-A87F 八思巴字</br>(Phags-pa)
A880-A8DF \p{InSaurashtra} A880-A8DF Saurashtra
A8E0-A8FF \p{InDevanagariExtended}
A900-A92F \p{InKayahLi} A900-A97F 爪哇语</br>(Javanese)
A930-A95F \p{InRejang}
A960-A97F \p{InHangulJamoExtended-A}
A980-A9DF \p{InJavanese} A980-A9DF Chakma
AA00-AA5F \p{InCham} AA00-AA3F Varang Kshiti
AA60-AA7F \p{InMyanmarExtended-A} AA40-AA6F Sorang Sompeng
AA80-AADF \p{InTaiViet} AA80-AADF Newari
AAE0-AAFF \p{InMeeteiMayekExtensions}
AB00-AB2F \p{InEthiopicExtended-A} AB00-AB5F 越南傣语</br>(Vi?t Thái)
ABC0-ABFF \p{InMeeteiMayek} AB80-ABA0 Kayah Li
AC00-D7AF \p{InHangulSyllables} AC00-D7AF 朝鲜文音节</br>(Hangul Syllables)
D7B0-D7FF \p{InHangulJamoExtended-B} D800-DBFF High-half zone of UTF-16
D800-DB7F \p{InHighSurrogates}
DB80-DBFF \p{InHighPrivateUseSurrogates}
DC00-DFFF \p{InLowSurrogates} DC00-DFFF Low-half zone of UTF-16
E000-F8FF \p{InPrivateUseArea} E000-F8FF 自行使用区域</br>(Private Use Zone)
F900-FAFF \p{InCJKCompatibilityIdeographs} F900-FAFF CJK 兼容象形文字</br>(CJK Compatibility Ideographs)
FB00-FB4F \p{InAlphabeticPresentationForms} FB00-FB4F 字母表达形式</br>(Alphabetic Presentation Form)
FB50-FDFF \p{InArabicPresentationForms-A} FB50-FDFF 阿拉伯表达形式A</br>(Arabic Presentation Form-A)
FE00-FE0F \p{InVariationSelectors} FE00-FE0F 变量选择符</br>(Variation Selector)
FE10-FE1F \p{InVerticalForms} FE10-FE1F 竖排形式</br>(Vertical Forms)
FE20-FE2F \p{InCombiningHalfMarks} FE20-FE2F 组合用半符号</br>(Combining Half Marks)
FE30-FE4F \p{InCJKCompatibilityForms} FE30-FE4F CJK 兼容形式</br>(CJK Compatibility Forms)
FE50-FE6F \p{InSmallFormVariants} FE50-FE6F 小型变体形式</br>(Small Form Variants)
FE70-FEFF \p{InArabicPresentationForms-B} FE70-FEFF 阿拉伯表达形式B</br>(Arabic Presentation Form-B)
FF00-FFEF \p{InHalfwidthandFullwidthForms} FF00-FFEF 半型及全型形式</br>(Halfwidth and Fullwidth Form)
FFF0-FFFF \p{InSpecials} FFF0-FFFF 特殊</br>(Specials)

Match whole words

Posted on 2017-02-06 | Edited on 2018-12-16 | In regex

Problem

My cat is brown
category
octocat
staccato

  • find word ‘cat’
  • find word begin with ‘cat’
  • find word end with ‘cat’
  • find word contain ‘cat’
  • find word not begin with ‘cat’
  • find word not end with ‘cat’
  • find word not contain ‘cat’

Solution

Word boundaries

1
\bcat\b

  • Regex options: None
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
    Nonboundaries

    1
    \Bcat\B
  • Regex options: None

  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
    1
    2
    3
    4
    5
    \bcat       (?<!\w)(?=\w)cat
    cat\b cat(?<=\w)(?!\w)
    \Bcat (?<=\w)cat(?!\w)
    cat\B (?<!\w)cat(?=\w)
    \b(?!\w*?cat\w*?)\w+?\b

\b -> (?<=\w)(?!\w)|(?<!\w)(?=\w)

Discussion

‹\b› matches in these three positions:

  • Before the first character in the subject, if the first character is a word character
  • After the last character in the subject, if the last character is a word character
  • Between two characters in the subject, where one is a word character and the other
    is not a word character

‹\B› matches in these five positions:

  • Before the first character in the subject, if the first character is not a word character
  • After the last character in the subject, if the last character is not a word character
  • Between two word characters
  • Between two nonword characters
  • The empty string

Word Characters

  • Java :
    • Java 4 to 6 ‹\w› matches only ASCII characters
    • Java 7 ‹\w› extended matches Unicode characters if set the UNICODE_CHARACTER_CLASS flag
    • All version Java ‹\b› is Unicode-enabled, supporting any script
  • .NET, JavaScript, PCRE, Perl, Python, and Ruby have:
    • ‹\b› match between two characters where one is matched by ‹\w› and the other by ‹\W›.
    • ‹\B› always matches between two characters where both are matched by ‹\w› or ‹\W›
  • JavaScript, PCRE, and Ruby : ‹\w› is identical to ‹[a-zA-Z0-9_]› so only “whole words only” search in language which use Latin alphabet.
  • .NET : treats letters and digits from all scripts as word characters. You can do a “whole words only” search on words in any language

  • Python 2.x: non-ASCII characters are included only if you pass the UNICODE or U flag when creating the regex.

  • Python 3.x: non-ASCII character are included by default, but you can exclude them with the ASCII or Aflag. This flag affects both ‹\b› and ‹\w› equally.

  • Perl: depends on your version of Perl and /adlu flags whether ‹\w› is pure ASCII or includes all Unicode letters, digits, and underscores.

Match "start" & "end"

Posted on 2017-02-06 | Edited on 2018-12-16 | In regex

Problem

alpha…..↵
alpha…..↵
begin…..↵
…….end↵
….omega↵
….omega↵

  • Match 'alpha' occurs at the very beginning
  • Match two 'alpha' at line head
  • Match 'omega' occurs at the very end
  • Match two 'omega' at line end
  • Match 'begin' at line’s head
  • Match 'end' at line’s tail

Solution

Start of the subject

1
^alpha

  • Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
1
\Aalpha
  • Regex options: None
  • Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
    End of the subject
1
omega$
  • Regex options: None (“^ and $ match at line breaks” must not be set, if not matches two)
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
1
omega\Z
  • Regex options: None
  • Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby Start of a line
1
^begin
  • Regex options: ^ and $ match at line breaks
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
1
end\$
  • Regex options: ^ and $ match at line breaks
  • Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

  • JavaScript does not support ‹\A›.
  • The anchor ‹^› is equivalent to ‹\A›, as long as you do not turn on the “^ and $ match at line breaks” option.
  • The anchor ‹$› is equivalent to ‹\Z›, as long as you do not turn on the “^ and $ match at line breaks” option.

    • In Java is Pattern.MULTILINE option
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      /**
      * Enables multiline mode.
      *
      * <p> In multiline mode the expressions <tt>^</tt> and <tt>$</tt> match
      * just after or just before, respectively, a line terminator or the end of
      * the input sequence. By default these expressions only match at the
      * beginning and the end of the entire input sequence.
      *
      * <p> Multiline mode can also be enabled via the embedded flag
      * expression&nbsp;<tt>(?m)</tt>. </p>
      */
  • The anchors ‹\Z› and ‹\z› always match at the very end of the subject text, after the last character

  • <\Z› without having to worry about stripping off a trailing line break at the end of your subject text.
  • <\Z> The very last \r\n|\r|\n -> ↵ will be ignore.
  • <\z> The very last \r\n|\r|\n -> ↵ will not be ignore.

  • JavaScript does not support ‹\A›
  • JavaScript does not support ‹\Z› or ‹\z› at all
  • .NET, Java, PCRE, Perl, and Ruby support both ‹\Z› and ‹\z›.
  • Python supports only ‹\Z›.

Variations

  • .NET, Java, XRegExp, PCRE, Perl, and Python1: (?m) internal mode, for “^ and $ match at line breaks”.
  • Ruby uses‹(?m)› to turn on “dot matches line breaks” mode.
  • In Ruby, ‹^› and ‹$› always match at the start and end of each line.
  • ‹(?-m)› to turn off the option.
  • <(?i)> turn on the ignore the sensitive of letter.
  • <(?s)> dot matches line breaks.[except Ruby]

Full-text retrieval fundamental

Posted on 2017-02-06 | Edited on 2018-12-16 | In search

Build Index

  1. Data source(Documents) ready be index.
  2. Lexcial analysis, language processing, translate to Terms.
  3. Create dictionary, posting index table.
  4. Writen into hard disk, or other space.

Query process

  • A. Input query parse.
  • B. Lexcial analysis, language processing, translate to terms.
  • C. Syntax analysis, translate to a query tree.
  • D. Read hard disk index to memory
  • E. Get every term’s documents list by query tree, get result documents by ‘And’/‘Or’/‘Not’ operations on list.
  • F. Sort result documents by doc relevance.
  • G. Return query result.
1…252627…29

Leon

282 posts
20 categories
58 tags
GitHub
Links
  • clock
  • typing-cn
  • mathjax
  • katex
  • cron
  • dos
  • keyboard
  • regex
  • sql
  • toy
© 2017 – 2024 Leon
Powered by Hexo v3.9.0
|
Theme – NexT.Muse v7.1.2