Class FST<T>

java.lang.Object
org.apache.lucene.util.fst.FST<T>
All Implemented Interfaces:
Accountable

public final class FST<T> extends Object implements Accountable
Represents an finite state machine (FST), using a compact byte[] format.

The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).

See the package documentation for some simple examples.

  • Field Details

    • metadata

      final FST.FSTMetadata<T> metadata
    • BASE_RAM_BYTES_USED

      private static final long BASE_RAM_BYTES_USED
    • BIT_FINAL_ARC

      static final int BIT_FINAL_ARC
      See Also:
    • BIT_LAST_ARC

      static final int BIT_LAST_ARC
      See Also:
    • BIT_TARGET_NEXT

      static final int BIT_TARGET_NEXT
      See Also:
    • BIT_STOP_NODE

      static final int BIT_STOP_NODE
      See Also:
    • BIT_ARC_HAS_OUTPUT

      public static final int BIT_ARC_HAS_OUTPUT
      This flag is set if the arc has an output.
      See Also:
    • BIT_ARC_HAS_FINAL_OUTPUT

      static final int BIT_ARC_HAS_FINAL_OUTPUT
      See Also:
    • ARCS_FOR_DIRECT_ADDRESSING

      static final byte ARCS_FOR_DIRECT_ADDRESSING
      Value of the arc flags to declare a node with fixed length dense arcs and bit table designed for direct addressing.
      See Also:
    • ARCS_FOR_CONTINUOUS

      static final byte ARCS_FOR_CONTINUOUS
      Value of the arc flags to declare a node with continuous arcs designed for pos the arc directly with labelToPos - firstLabel. like ARCS_FOR_BINARY_SEARCH we use flag combinations that will not occur at the same time.
      See Also:
    • FILE_FORMAT_NAME

      private static final String FILE_FORMAT_NAME
      See Also:
    • VERSION_START

      public static final int VERSION_START
      First supported version, this is the version that was used when releasing Lucene 7.0.
      See Also:
    • VERSION_LITTLE_ENDIAN

      private static final int VERSION_LITTLE_ENDIAN
      See Also:
    • VERSION_CONTINUOUS_ARCS

      public static final int VERSION_CONTINUOUS_ARCS
      Version that started storing continuous arcs.
      See Also:
    • VERSION_CURRENT

      public static final int VERSION_CURRENT
      Current version.
      See Also:
    • VERSION_90

      public static final int VERSION_90
      Version that was used when releasing Lucene 9.0.
      See Also:
    • FINAL_END_NODE

      static final long FINAL_END_NODE
      See Also:
    • NON_FINAL_END_NODE

      static final long NON_FINAL_END_NODE
      See Also:
    • END_LABEL

      public static final int END_LABEL
      If arc has this label then that arc is final/accepted
      See Also:
    • fstReader

      private final FSTReader fstReader
      The reader of the FST, used to read bytes from the underlying FST storage
    • outputs

      public final Outputs<T> outputs
    • DEFAULT_MAX_BLOCK_BITS

      private static final int DEFAULT_MAX_BLOCK_BITS
  • Constructor Details

  • Method Details

    • flag

      private static boolean flag(int flags, int bit)
    • fromFSTReader

      public static <T> FST<T> fromFSTReader(FST.FSTMetadata<T> fstMetadata, FSTReader fstReader)
      Create a FST from a FSTReader. Return null if the metadata is null.
      Parameters:
      fstMetadata - the metadata
      fstReader - the FSTReader
      Returns:
      the FST
    • readMetadata

      public static <T> FST.FSTMetadata<T> readMetadata(DataInput metaIn, Outputs<T> outputs) throws IOException
      Read the FST metadata from DataInput
      Type Parameters:
      T - the output type
      Parameters:
      metaIn - the DataInput of the metadata
      outputs - the FST outputs
      Returns:
      the FST metadata
      Throws:
      IOException - if exception occurred during parsing
    • ramBytesUsed

      public long ramBytesUsed()
      Description copied from interface: Accountable
      Return the memory usage of this object in bytes. Negative values are illegal.
      Specified by:
      ramBytesUsed in interface Accountable
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • numBytes

      public long numBytes()
    • getEmptyOutput

      public T getEmptyOutput()
    • getMetadata

      public FST.FSTMetadata<T> getMetadata()
    • save

      public void save(DataOutput metaOut, DataOutput out) throws IOException
      Save the FST to DataOutput.
      Parameters:
      metaOut - the DataOutput to write the metadata to
      out - the DataOutput to write the FST bytes to
      Throws:
      IOException
    • saveMetadata

      public void saveMetadata(DataOutput metaOut) throws IOException
      Save the metadata to a DataOutput
      Parameters:
      metaOut - the DataOutput to write the metadata to
      Throws:
      IOException
    • save

      public void save(Path path) throws IOException
      Writes an automaton to a file.
      Throws:
      IOException
    • read

      public static <T> FST<T> read(Path path, Outputs<T> outputs) throws IOException
      Reads an automaton from a file.
      Throws:
      IOException
    • readLabel

      public int readLabel(DataInput in) throws IOException
      Reads one BYTE1/2/4 label from the provided DataInput.
      Throws:
      IOException
    • targetHasArcs

      public static <T> boolean targetHasArcs(FST.Arc<T> arc)
      returns true if the node at this address has any outgoing arcs
    • getNumPresenceBytes

      static int getNumPresenceBytes(int labelRange)
      Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.
    • readPresenceBytes

      private void readPresenceBytes(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Reads the presence bits of a direct-addressing node. Actually we don't read them here, we just keep the pointer to the bit-table start and we skip them.
      Throws:
      IOException
    • getFirstArc

      public FST.Arc<T> getFirstArc(FST.Arc<T> arc)
      Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
    • readLastTargetArc

      FST.Arc<T> readLastTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Follows the follow arc and reads the last arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
      Returns:
      Returns the second argument (arc).
      Throws:
      IOException
    • readUnpackedNodeTarget

      private long readUnpackedNodeTarget(FST.BytesReader in) throws IOException
      Throws:
      IOException
    • readFirstTargetArc

      public FST.Arc<T> readFirstTargetArc(FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Follow the follow arc and read the first arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
      Returns:
      Returns the second argument (arc).
      Throws:
      IOException
    • readFirstArcInfo

      private void readFirstArcInfo(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Throws:
      IOException
    • readFirstRealTargetArc

      public FST.Arc<T> readFirstRealTargetArc(long nodeAddress, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Throws:
      IOException
    • isExpandedTarget

      boolean isExpandedTarget(FST.Arc<T> follow, FST.BytesReader in) throws IOException
      Returns whether arc's target points to a node in expanded format (fixed length arcs).
      Throws:
      IOException
    • readNextArc

      public FST.Arc<T> readNextArc(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      In-place read; returns the arc.
      Throws:
      IOException
    • readNextArcLabel

      int readNextArcLabel(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!
      Throws:
      IOException
    • readArcByIndex

      public FST.Arc<T> readArcByIndex(FST.Arc<T> arc, FST.BytesReader in, int idx) throws IOException
      Throws:
      IOException
    • readArcByContinuous

      public FST.Arc<T> readArcByContinuous(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException
      Reads a Continuous node arc, with the provided index in the label range.
      Parameters:
      rangeIndex - The index of the arc in the label range. It must be within the label range.
      Throws:
      IOException
    • readArcByDirectAddressing

      public FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex) throws IOException
      Reads a present direct addressing node arc, with the provided index in the label range.
      Parameters:
      rangeIndex - The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.
      Throws:
      IOException
    • readArcByDirectAddressing

      private FST.Arc<T> readArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in, int rangeIndex, int presenceIndex) throws IOException
      Reads a present direct addressing node arc, with the provided index in the label range and its corresponding presence index (which is the count of presence bits before it).
      Throws:
      IOException
    • readLastArcByDirectAddressing

      public FST.Arc<T> readLastArcByDirectAddressing(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Reads the last arc of a direct addressing node. This method is equivalent to call readArcByDirectAddressing(Arc, BytesReader, int) with rangeIndex equal to arc.numArcs() - 1, but it is faster.
      Throws:
      IOException
    • readLastArcByContinuous

      public FST.Arc<T> readLastArcByContinuous(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Reads the last arc of a continuous node.
      Throws:
      IOException
    • readNextRealArc

      public FST.Arc<T> readNextRealArc(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Never returns null, but you should never call this if arc.isLast() is true.
      Throws:
      IOException
    • readArc

      private FST.Arc<T> readArc(FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Reads an arc.
      Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.
      Throws:
      IOException
    • readEndArc

      static <T> FST.Arc<T> readEndArc(FST.Arc<T> follow, FST.Arc<T> arc)
    • findTargetArc

      public FST.Arc<T> findTargetArc(int labelToMatch, FST.Arc<T> follow, FST.Arc<T> arc, FST.BytesReader in) throws IOException
      Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.
      Throws:
      IOException
    • seekToNextNode

      private void seekToNextNode(FST.BytesReader in) throws IOException
      Throws:
      IOException
    • getBytesReader

      public FST.BytesReader getBytesReader()
      Returns a FST.BytesReader for this FST, positioned at position 0.