GeneralizedSuffixArray Class

Given a text string, the substring starting at position i and until the end of text is called the suffix starting at position i. For a text string with N characters there are N suffixes. This class gives an efficient algorithm to sort all N suffixes lexicographically, and to determine the length of the longest common prefix of each suffix with its lexicographical predecessor. This is generally done in O(N) time.

Inheritance Hierarchy

SystemObject
Altaxo.Collections.TextGeneralizedSuffixArray

Namespace: Altaxo.Collections.Text
Assembly: AltaxoCore (in AltaxoCore.dll) Version: 4.8.3261.0 (4.8.3261.0)

Syntax

Copy

public class GeneralizedSuffixArray

The GeneralizedSuffixArray type exposes the following members.

Constructors

	Name	Description
	GeneralizedSuffixArray(IntegerText)	Constructs a new instance of the GeneralizedSuffixArray class from IntegerText.
	GeneralizedSuffixArray(Int32, Int32, Int32, Int32, Int32)	Constructs an instance of the GeneralizedSuffixArray class.

Top

Properties

	Name	Description
	InverseSuffixArray	Maps the suffix that starts at position i in the text to the lexicographical order position of this suffix, which is the value of the i-th element of this array.
	LCPArray	Stores the length of the Longest Common Prefix of the lexicographically i-th suffix and its lexicographical predecessor (the lexicographically (i-1)-th suffix). The element at index 0 is always 0.
	LCPSArray
	MaximumLcp	Maximum of all values in the _LCP array.
	NumberOfWords	Number of words, if the text was separated into individual words. Otherwise, this field is equal to one.
	SuffixArray	Maps the lexicographical order position i of a suffix to the starting position of the suffix in the text, which is the value of the i-th element of this array.
	WordIndices	Maps the lexicographical order position i of a suffix to the index of the word, in which this suffix starts. This means, that for instance the value of the i-th element contains the index of the word, in which the lexicographically i-th suffix that starts at position _suffixArray[i] begins. The contents of this array is only meaningful, if you provided text that was separated into words, for instance for the longest common substring problem.
	WordStartPositions	Start positions of the words in which the original text was separated in the array _text.

Top

Methods

	Name	Description
	Equals	Determines whether the specified object is equal to the current object. (Inherited from Object)
	Finalize	Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object)
	FromSeparateWords(IEnumerableString, Boolean)	Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
	FromSeparateWordsT(IEnumerableIEnumerableT, Boolean)	Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
	FromWords(IEnumerableString, Boolean, Boolean, IComparerChar)	Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
	FromWordsT(IEnumerableIEnumerableT, Boolean, Boolean, IComparerT)	Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class.
	GetHashCode	Serves as the default hash function. (Inherited from Object)
	GetType	Gets the Type of the current instance. (Inherited from Object)
	MemberwiseClone	Creates a shallow copy of the current Object. (Inherited from Object)
	ToString	Returns a string that represents the current object. (Inherited from Object)

Top

Fields

	Name	Description
	RequiredTextPadding	Gets the required text padding, i.e. the number of additional elements at the end of the text array that must be left empty (zero).

Top

Remarks

First of all, we need an O(N) algorithm to calculate the suffix array in O(N) time. This algorithm is implemented according to J. Kärkkäinen and P. Sanders: "Simple linear work suffix array construction.", Proc. 30th International Colloquium on Automata, Languages and Programming, volume 2719 of Lecture Notes in Computer Science, pages 943-955, Berlin, 2003, Springer-Verlag."

Secondly, an O(N) algorithm is neccessary to calculated the length of longest common prefix of two adjacent suffixes. This is implemented according to T. Kasai, G. Lee, H. Arimura, S. Arikawa1, and K. Park: "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications", Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001).

See also the very nice paper by Michael Arnold and Enno Ohlebusch, 'Linear Time Algorithms for Generalizations of the Longest Common Substring Problem', Algorithmica (2011) 60; 806-818; DOI: 10.1007/s00453-009-9369-1. The code here was partially adopted from the C++ sources from the web site of the authors at http://www.uni-ulm.de/in/theo/research/sequana.html.

Reference

Altaxo.Collections.Text Namespace