File: ucs-normalize.el.html

This program has passed the NormalizationTest-5.2.0.txt.

References: https://www.unicode.org/reports/tr15/ https://www.unicode.org/review/pr-29.html

HFS-Normalization: Reference: https://developer.apple.com/library/archive/technotes/tn/tn1150.html

HFS Normalization excludes following area for decomposition.

 U+02000 .. U+02FFF :: Punctuation, symbols, dingbats, arrows, etc.
                       (Characters in this region will be composed.)
 U+0F900 .. U+0FAFF :: CJK compatibility Ideographs.
 U+2F800 .. U+2FFFF :: CJK compatibility Ideographs.

HFS-Normalization is useful for normalizing text involving CJK Ideographs.

;
; Implementation Notes on NFC/HFS-NFC.
;

   <Stages> Decomposition Composition
  NFD: 'nfd nil
  NFC: 'nfd t
  NFKD: 'nfkd nil
  NFKC: 'nfkd t
  HFS-NFD: 'hfs-nfd 'hfs-nfd-comp-p
  HFS-NFC: 'hfs-nfd t

Algorithm for Normalization

Before normalization, following data will be prepared.

1. quick-check-list

 quick-check-list consists of characters that will be decomposed
 during normalization. It includes composition-exclusions,
 singletons, non-starter-decompositions and decomposable
 characters.

 quick-check-regexp will search the above characters plus
 combining characters.

2. decomposition-translation

 decomposition-translation is a translation table that will be
 used to decompose the characters.


Normalization Process

A. Searching (ucs-normalize-region)

   Region is searched for quick-check-regexp to find possibly
   normalizable point.

B. Identification of Normalization Block

   (1) start of the block
       If the searched character is a starter and not combining
       with previous character, then the beginning of the block is
       the searched character. If searched character is combining
       character, then previous character will be the target
       character
   (2) end of the block
       Block ends at non-composable starter character.

C. Decomposition (ucs-normalize-block)

   The entire block will be decomposed by
   decomposition-translation table.

D. Sorting and Composition of Smaller Blocks (ucs-normalize-block-compose-chars)

   The block will be split to multiple smaller blocks by starter
   characters. Each block is sorted, and composed if necessary.

E. Composition of Entire Block (ucs-normalize-compose-chars)

  Composed blocks are collected and again composed.

Defined variables (2)

ucs-normalize-combining-chars-regexpRegular expression to match sequence of combining characters.
ucs-normalize-decomposition-pair-to-primary-compositeHash table of decomposed pair to primary composite.

Defined functions (25)

ucs-normalize-HFS-NFC-region(FROM TO)
ucs-normalize-HFS-NFC-string(STR)
ucs-normalize-HFS-NFD-region(FROM TO)
ucs-normalize-HFS-NFD-string(STR)
ucs-normalize-NFC-region(FROM TO)
ucs-normalize-NFC-string(STR)
ucs-normalize-NFD-region(FROM TO)
ucs-normalize-NFD-string(STR)
ucs-normalize-NFKC-region(FROM TO)
ucs-normalize-NFKC-string(STR)
ucs-normalize-NFKD-region(FROM TO)
ucs-normalize-NFKD-string(STR)
ucs-normalize-block(FROM TO &optional DECOMPOSITION-TRANSLATION-TABLE COMPOSITION-PREDICATE)
ucs-normalize-block-compose-chars(CHARS COMPOSITION-PREDICATE)
ucs-normalize-ccc(CHAR)
ucs-normalize-compose-chars(CHARS COMPOSITION-PREDICATE)
ucs-normalize-hfs-nfd-comp-p(CHAR)
ucs-normalize-hfs-nfd-post-read-conversion(LEN)
ucs-normalize-hfs-nfd-pre-write-conversion(FROM TO)
ucs-normalize-make-hash-table-from-alist(ALIST)
ucs-normalize-make-translation-table-from-alist(ALIST)
ucs-normalize-primary-composite(DECOMPOSITION-PAIR COMPOSITION-PREDICATE)
ucs-normalize-region(FROM TO QUICK-CHECK-REGEXP TRANSLATION-TABLE COMPOSITION-PREDICATE)
ucs-normalize-sort(CHARS)
ucs-normalize-string(UCS-NORMALIZE-REGION)

Defined faces (0)