Click or drag to resize

GeneralizedSuffixArray Class

Given a text string, the substring starting at position i and until the end of text is called the suffix starting at position i. For a text string with N characters there are N suffixes. This class gives an efficient algorithm to sort all N suffixes lexicographically, and to determine the length of the longest common prefix of each suffix with its lexicographical predecessor. This is generally done in O(N) time.
Inheritance Hierarchy
SystemObject
  Altaxo.Collections.TextGeneralizedSuffixArray

Namespace: Altaxo.Collections.Text
Assembly: AltaxoCore (in AltaxoCore.dll) Version: 4.8.3179.0 (4.8.3179.0)
Syntax
C#
public class GeneralizedSuffixArray

The GeneralizedSuffixArray type exposes the following members.

Constructors
 NameDescription
Public methodGeneralizedSuffixArray(IntegerText)Constructs a new instance of the GeneralizedSuffixArray class from IntegerText.
Protected methodGeneralizedSuffixArray(Int32, Int32, Int32, Int32, Int32)Constructs an instance of the GeneralizedSuffixArray class.
Top
Properties
 NameDescription
Public propertyInverseSuffixArray Maps the suffix that starts at position i in the text to the lexicographical order position of this suffix, which is the value of the i-th element of this array.
Public propertyLCPArray Stores the length of the Longest Common Prefix of the lexicographically i-th suffix and its lexicographical predecessor (the lexicographically (i-1)-th suffix). The element at index 0 is always 0.
Public propertyLCPSArray 
Public propertyMaximumLcp Maximum of all values in the _LCP array.
Public propertyNumberOfWords Number of words, if the text was separated into individual words. Otherwise, this field is equal to one.
Public propertySuffixArray Maps the lexicographical order position i of a suffix to the starting position of the suffix in the text, which is the value of the i-th element of this array.
Public propertyWordIndices Maps the lexicographical order position i of a suffix to the index of the word, in which this suffix starts. This means, that for instance the value of the i-th element contains the index of the word, in which the lexicographically i-th suffix that starts at position _suffixArray[i] begins. The contents of this array is only meaningful, if you provided text that was separated into words, for instance for the longest common substring problem.
Public propertyWordStartPositions Start positions of the words in which the original text was separated in the array _text.
Top
Methods
 NameDescription
Public methodEqualsDetermines whether the specified object is equal to the current object.
(Inherited from Object)
Protected methodFinalizeAllows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection.
(Inherited from Object)
Public methodStatic memberFromSeparateWords(IEnumerableString, Boolean)Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
Public methodStatic memberFromSeparateWordsT(IEnumerableIEnumerableT, Boolean)Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
Public methodStatic memberFromWords(IEnumerableString, Boolean, Boolean, IComparerChar)Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
Public methodStatic memberFromWordsT(IEnumerableIEnumerableT, Boolean, Boolean, IComparerT)Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
Public methodGetHashCodeServes as the default hash function.
(Inherited from Object)
Public methodGetTypeGets the Type of the current instance.
(Inherited from Object)
Protected methodMemberwiseCloneCreates a shallow copy of the current Object.
(Inherited from Object)
Public methodToStringReturns a string that represents the current object.
(Inherited from Object)
Top
Fields
 NameDescription
Public fieldStatic memberRequiredTextPaddingGets the required text padding, i.e. the number of additional elements at the end of the text array that must be left empty (zero).
Top
Remarks

First of all, we need an O(N) algorithm to calculate the suffix array in O(N) time. This algorithm is implemented according to J. Kärkkäinen and P. Sanders: "Simple linear work suffix array construction.", Proc. 30th International Colloquium on Automata, Languages and Programming, volume 2719 of Lecture Notes in Computer Science, pages 943-955, Berlin, 2003, Springer-Verlag."

Secondly, an O(N) algorithm is neccessary to calculated the length of longest common prefix of two adjacent suffixes. This is implemented according to T. Kasai, G. Lee, H. Arimura, S. Arikawa1, and K. Park: "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications", Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001).

See also the very nice paper by Michael Arnold and Enno Ohlebusch, 'Linear Time Algorithms for Generalizations of the Longest Common Substring Problem', Algorithmica (2011) 60; 806-818; DOI: 10.1007/s00453-009-9369-1. The code here was partially adopted from the C++ sources from the web site of the authors at http://www.uni-ulm.de/in/theo/research/sequana.html.

See Also