Skip to content
Snippets Groups Projects
  1. Jan 14, 2025
  2. Feb 14, 2022
  3. Jan 21, 2022
  4. Jan 17, 2022
  5. Oct 12, 2021
  6. Oct 11, 2021
  7. May 01, 2021
  8. Sep 10, 2020
  9. Mar 25, 2020
  10. Feb 03, 2020
    • Masahiro Yamada's avatar
      kbuild: rename hostprogs-y/always to hostprogs/always-y · 5f2fb52f
      Masahiro Yamada authored
      
      In old days, the "host-progs" syntax was used for specifying host
      programs. It was renamed to the current "hostprogs-y" in 2004.
      
      It is typically useful in scripts/Makefile because it allows Kbuild to
      selectively compile host programs based on the kernel configuration.
      
      This commit renames like follows:
      
        always       ->  always-y
        hostprogs-y  ->  hostprogs
      
      So, scripts/Makefile will look like this:
      
        always-$(CONFIG_BUILD_BIN2C) += ...
        always-$(CONFIG_KALLSYMS)    += ...
            ...
        hostprogs := $(always-y) $(always-m)
      
      I think this makes more sense because a host program is always a host
      program, irrespective of the kernel configuration. We want to specify
      which ones to compile by CONFIG options, so always-y will be handier.
      
      The "always", "hostprogs-y", "hostprogs-m" will be kept for backward
      compatibility for a while.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      5f2fb52f
  11. Sep 17, 2019
  12. Jun 20, 2019
  13. Jun 05, 2019
  14. May 21, 2019
  15. May 12, 2019
  16. Apr 28, 2019
    • Masahiro Yamada's avatar
      unicode: refactor the rule for regenerating utf8data.h · 28ba53c0
      Masahiro Yamada authored
      
      scripts/mkutf8data is used only when regenerating utf8data.h,
      which never happens in the normal kernel build. However, it is
      irrespectively built if CONFIG_UNICODE is enabled.
      
      Moreover, there is no good reason for it to reside in the scripts/
      directory since it is only used in fs/unicode/.
      
      Hence, move it from scripts/ to fs/unicode/.
      
      In some cases, we bypass build artifacts in the normal build. The
      conventional way to do so is to surround the code with ifdef REGENERATE_*.
      
      For example,
      
       - 7373f4f8 ("kbuild: add implicit rules for parser generation")
       - 6aaf49b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
      
      I rewrote the rule in a more kbuild'ish style.
      
      In the normal build, utf8data.h is just shipped from the check-in file.
      
      $ make
        [ snip ]
        SHIPPED fs/unicode/utf8data.h
        CC      fs/unicode/utf8-norm.o
        CC      fs/unicode/utf8-core.o
        CC      fs/unicode/utf8-selftest.o
        AR      fs/unicode/built-in.a
      
      If you want to generate utf8data.h based on UCD, put *.txt files into
      fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
      The mkutf8data tool will be automatically compiled to generate the
      utf8data.h from the *.txt files.
      
      $ make REGENERATE_UTF8DATA=1
        [ snip ]
        HOSTCC  fs/unicode/mkutf8data
        GEN     fs/unicode/utf8data.h
        CC      fs/unicode/utf8-norm.o
        CC      fs/unicode/utf8-core.o
        CC      fs/unicode/utf8-selftest.o
        AR      fs/unicode/built-in.a
      
      I renamed the check-in utf8data.h to utf8data.h_shipped so that this
      will work for the out-of-tree build.
      
      You can update it based on the latest UCD like this:
      
      $ make REGENERATE_UTF8DATA=1 fs/unicode/
      $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
      
      Also, I added entries to .gitignore and dontdiff.
      
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      28ba53c0
  17. Apr 25, 2019
    • Gabriel Krisman Bertazi's avatar
      unicode: update unicode database unicode version 12.1.0 · 1215d239
      Gabriel Krisman Bertazi authored
      
      Regenerate utf8data.h based on the latest UCD files and run tests
      against the latest version.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      1215d239
    • Gabriel Krisman Bertazi's avatar
      unicode: introduce test module for normalized utf8 implementation · f0d6cc00
      Gabriel Krisman Bertazi authored
      
      This implements a in-kernel sanity test module for the utf8
      normalization core.  At probe time, it will run basic sequences through
      the utf8n core, to identify problems will equivalent sequences and
      normalization/casefold code.  This is supposed to be useful for
      regression testing when adding support for a new version of utf8 to
      linux.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      f0d6cc00
    • Gabriel Krisman Bertazi's avatar
      unicode: implement higher level API for string handling · 9d53690f
      Gabriel Krisman Bertazi authored
      
      This patch integrates the utf8n patches with some higher level API to
      perform UTF-8 string comparison, normalization and casefolding
      operations.  Implemented is a variation of NFD, and casefold is
      performed by doing full casefold on top of NFD.  These algorithms are
      based on the core implemented by Olaf Weber from SGI.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      9d53690f
    • Olaf Weber's avatar
      unicode: reduce the size of utf8data[] · a8384c68
      Olaf Weber authored
      
      Remove the Hangul decompositions from the utf8data trie, and do
      algorithmic decomposition to calculate them on the fly. To store the
      decomposition the caller of utf8lookup()/utf8nlookup() must provide a
      12-byte buffer, which is used to synthesize a leaf with the
      decomposition. This significantly reduces the size of the utf8data[]
      array.
      
      Changes made by Gabriel:
        Rebase to mainline
        Fix checkpatch errors
        Extract robustness fixes and merge back to original mkutf8data.c patch
        Regenerate utf8data.h
      
      Signed-off-by: default avatarOlaf Weber <olaf@sgi.com>
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a8384c68
    • Olaf Weber's avatar
      unicode: introduce code for UTF-8 normalization · 44594c2f
      Olaf Weber authored
      
      Supporting functions for UTF-8 normalization are in utf8norm.c with the
      header utf8norm.h. Two normalization forms are supported: nfdi and
      nfdicf.
      
        nfdi:
         - Apply unicode normalization form NFD.
         - Remove any Default_Ignorable_Code_Point.
      
        nfdicf:
         - Apply unicode normalization form NFD.
         - Remove any Default_Ignorable_Code_Point.
         - Apply a full casefold (C + F).
      
      For the purposes of the code, a string is valid UTF-8 if:
      
       - The values encoded are 0x1..0x10FFFF.
       - The surrogate codepoints 0xD800..0xDFFFF are not encoded.
       - The shortest possible encoding is used for all values.
      
      The supporting functions work on null-terminated strings (utf8 prefix)
      and on length-limited strings (utf8n prefix).
      
      From the original SGI patch and for conformity with coding standards,
      the utf8data_t typedef was dropped, since it was just masking the struct
      keyword.  On other occasions, namely utf8leaf_t and utf8trie_t, I
      decided to keep it, since they are simple pointers to memory buffers,
      and using uchars here wouldn't provide any more meaningful information.
      
      From the original submission, we also converted from the compatibility
      form to canonical.
      
      Changes made by Gabriel:
        Rebase to Mainline
        Fix up checkpatch.pl warnings
        Drop typedefs
        move out of libxfs
        Convert from NFKD to NFD
      
      Signed-off-by: default avatarOlaf Weber <olaf@sgi.com>
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      44594c2f
    • Gabriel Krisman Bertazi's avatar
      unicode: introduce UTF-8 character database · 955405d1
      Gabriel Krisman Bertazi authored
      The decomposition and casefolding of UTF-8 characters are described in a
      prefix tree in utf8data.h, which is a generate from the Unicode
      Character Database (UCD), published by the Unicode Consortium, and
      should not be edited by hand.  The structures in utf8data.h are meant to
      be used for lookup operations by the unicode subsystem, when decoding a
      utf-8 string.
      
      mkutf8data.c is the source for a program that generates utf8data.h. It
      was written by Olaf Weber from SGI and originally proposed to be merged
      into Linux in 2014.  The original proposal performed the compatibility
      decomposition, NFKD, but the current version was modified by me to do
      canonical decomposition, NFD, as suggested by the community.  The
      changes from the original submission are:
      
        * Rebase to mainline.
        * Fix out-of-tree-build.
        * Update makefile to build 11.0.0 ucd files.
        * drop references to xfs.
        * Convert NFKD to NFD.
        * Merge back robustness fixes from original patch. Requested by
          Dave Chinner.
      
      The original submission is archived at:
      
      <https://linux-xfs.oss.sgi.narkive.com/Xx10wjVY/rfc-unicode-utf-8-support-for-xfs
      
      >
      
      The utf8data.h file can be regenerated using the instructions in
      fs/unicode/README.utf8data.
      
      - Notes on the update from 8.0.0 to 11.0:
      
      The structure of the ucd files and special cases have not experienced
      any changes between versions 8.0.0 and 11.0.0.  8.0.0 saw the addition
      of Cherokee LC characters, which is an interesting case for
      case-folding.  The update is accompanied by new tests on the test_ucd
      module to catch specific cases.  No changes to mkutf8data script were
      required for the updates.
      
      Signed-off-by: default avatarGabriel Krisman Bertazi <krisman@collabora.co.uk>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      955405d1
Loading