pcre2serialize(3) — Linux manual page

NAME | SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS | SECURITY CONCERNS | SAVING COMPILED PATTERNS | RE-USING PRECOMPILED PATTERNS | AUTHOR | REVISION | COLOPHON

PCRE2SERIALIZE(3)       Library Functions Manual       PCRE2SERIALIZE(3)

NAME         top

       PCRE2 - Perl-compatible regular expressions (revised API)

SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS         top


       int32_t pcre2_serialize_decode(pcre2_code **codes,
         int32_t number_of_codes, const uint8_t *bytes,
         pcre2_general_context *gcontext);

       int32_t pcre2_serialize_encode(const pcre2_code **codes,
         int32_t number_of_codes, uint8_t **serialized_bytes,
         PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext);

       void pcre2_serialize_free(uint8_t *bytes);

       int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes);

       If you are running an application that uses a large number of
       regular expression patterns, it may be useful to store them in a
       precompiled form instead of having to compile them every time the
       application is run. However, if you are using the just-in-time
       optimization feature, it is not possible to save and reload the
       JIT data, because it is position-dependent. The host on which the
       patterns are reloaded must be running the same version of PCRE2,
       with the same code unit width, and must also have the same
       endianness, pointer width and PCRE2_SIZE type. For example,
       patterns compiled on a 32-bit system using PCRE2's 16-bit library
       cannot be reloaded on a 64-bit system, nor can they be reloaded
       using the 8-bit library.

       Note that "serialization" in PCRE2 does not convert compiled
       patterns to an abstract format like Java or .NET serialization.
       The serialized output is really just a bytecode dump, which is
       why it can only be reloaded in the same environment as the one
       that created it. Hence the restrictions mentioned above.
       Applications that are not statically linked with a fixed version
       of PCRE2 must be prepared to recompile patterns from their
       sources, in order to be immune to PCRE2 upgrades.

SECURITY CONCERNS         top


       The facility for saving and restoring compiled patterns is
       intended for use within individual applications. As such, the
       data supplied to pcre2_serialize_decode() is expected to be
       trusted data, not data from arbitrary external sources. There is
       only some simple consistency checking, not complete validation of
       what is being re-loaded. Corrupted data may cause undefined
       results. For example, if the length field of a pattern in the
       serialized data is corrupted, the deserializing code may read
       beyond the end of the byte stream that is passed to it.

SAVING COMPILED PATTERNS         top


       Before compiled patterns can be saved they must be serialized,
       which in PCRE2 means converting the pattern to a stream of bytes.
       A single byte stream may contain any number of compiled patterns,
       but they must all use the same character tables. A single copy of
       the tables is included in the byte stream (its size is 1088
       bytes). For more details of character tables, see the section on
       locale support in the pcre2api documentation.

       The function pcre2_serialize_encode() creates a serialized byte
       stream from a list of compiled patterns. Its first two arguments
       specify the list, being a pointer to a vector of pointers to
       compiled patterns, and the length of the vector. The third and
       fourth arguments point to variables which are set to point to the
       created byte stream and its length, respectively. The final
       argument is a pointer to a general context, which can be used to
       specify custom memory management functions. If this argument is
       NULL, malloc() is used to obtain memory for the byte stream. The
       yield of the function is the number of serialized patterns, or
       one of the following negative error codes:

         PCRE2_ERROR_BADDATA      the number of patterns is zero or less
         PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the
       patterns
         PCRE2_ERROR_NOMEMORY     memory allocation failed
         PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same
       tables
         PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL

       PCRE2_ERROR_BADMAGIC means either that a pattern's code has been
       corrupted, or that a slot in the vector does not point to a
       compiled pattern.

       Once a set of patterns has been serialized you can save the data
       in any appropriate manner. Here is sample code that compiles two
       patterns and writes them to a file. It assumes that the variable
       fd refers to a file that is open for output. The error checking
       that should be present in a real application has been omitted for
       simplicity.

         int errorcode;
         uint8_t *bytes;
         PCRE2_SIZE erroroffset;
         PCRE2_SIZE bytescount;
         pcre2_code *list_of_codes[2];
         list_of_codes[0] = pcre2_compile("first pattern",
           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
         list_of_codes[1] = pcre2_compile("second pattern",
           PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
         errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
           &bytescount, NULL);
         errorcode = fwrite(bytes, 1, bytescount, fd);

       Note that the serialized data is binary data that may contain any
       of the 256 possible byte values. On systems that make a
       distinction between binary and non-binary data, be sure that the
       file is opened for binary output.

       Serializing a set of patterns leaves the original data untouched,
       so they can still be used for matching. Their memory must
       eventually be freed in the usual way by calling
       pcre2_code_free(). When you have finished with the byte stream,
       it too must be freed by calling pcre2_serialize_free(). If this
       function is called with a NULL argument, it returns immediately
       without doing anything.

RE-USING PRECOMPILED PATTERNS         top


       In order to re-use a set of saved patterns you must first make
       the serialized byte stream available in main memory (for example,
       by reading from a file). The management of this memory block is
       up to the application. You can use the
       pcre2_serialize_get_number_of_codes() function to find out how
       many compiled patterns are in the serialized data without
       actually decoding the patterns:

         uint8_t *bytes = <serialized data>;
         int32_t number_of_codes =
       pcre2_serialize_get_number_of_codes(bytes);

       The pcre2_serialize_decode() function reads a byte stream and
       recreates the compiled patterns in new memory blocks, setting
       pointers to them in a vector. The first two arguments are a
       pointer to a suitable vector and its length, and the third
       argument points to a byte stream. The final argument is a pointer
       to a general context, which can be used to specify custom memory
       management functions for the decoded patterns. If this argument
       is NULL, malloc() and free() are used. After deserialization, the
       byte stream is no longer needed and can be discarded.

         pcre2_code *list_of_codes[2];
         uint8_t *bytes = <serialized data>;
         int32_t number_of_codes =
           pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);

       If the vector is not large enough for all the patterns in the
       byte stream, it is filled with those that fit, and the remainder
       are ignored. The yield of the function is the number of decoded
       patterns, or one of the following negative error codes:

         PCRE2_ERROR_BADDATA    second argument is zero or less
         PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
         PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2
       version
         PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
         PCRE2_ERROR_MEMORY     memory allocation failed
         PCRE2_ERROR_NULL       first or third argument is NULL

       PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that
       it was compiled on a system with different endianness.

       Decoded patterns can be used for matching in the usual way, and
       must be freed by calling pcre2_code_free(). However, be aware
       that there is a potential race issue if you are using multiple
       patterns that were decoded from a single byte stream in a
       multithreaded application. A single copy of the character tables
       is used by all the decoded patterns and a reference count is used
       to arrange for its memory to be automatically freed when the last
       pattern is freed, but there is no locking on this reference
       count. Therefore, if you want to call pcre2_code_free() for these
       patterns in different threads, you must arrange your own locking,
       and ensure that pcre2_code_free() cannot be called by two threads
       at the same time.

       If a pattern was processed by pcre2_jit_compile() before being
       serialized, the JIT data is discarded and so is no longer
       available after a save/restore cycle. You can, however, process a
       restored pattern with pcre2_jit_compile() if you wish.

AUTHOR         top


       Philip Hazel
       Retired from University Computing Service
       Cambridge, England.

REVISION         top


       Last updated: 27 June 2018
       Copyright (c) 1997-2018 University of Cambridge.

COLOPHON         top

       This page is part of the PCRE (Perl Compatible Regular
       Expressions) project.  Information about the project can be found
       at ⟨http://www.pcre.org/⟩.  If you have a bug report for this
       manual page, see
       ⟨http://bugs.exim.org/enter_bug.cgi?product=PCRE⟩.  This page was
       obtained from the tarball fetched from
       ⟨https://github.com/PhilipHazel/pcre2.git⟩ on 2024-06-14.  If you
       discover any rendering problems in this HTML version of the page,
       or you believe there is a better or more up-to-date source for
       the page, or you have corrections or improvements to the
       information in this COLOPHON (which is not part of the original
       manual page), send a mail to [email protected]

PCRE2 10.32                   27 June 2018             PCRE2SERIALIZE(3)

Pages that refer to this page: pcre2test(1)