Generalized |
public class GeneralizedSuffixArray
The GeneralizedSuffixArray type exposes the following members.
Name | Description | |
---|---|---|
GeneralizedSuffixArray(IntegerText) | Constructs a new instance of the GeneralizedSuffixArray class from IntegerText. | |
GeneralizedSuffixArray(Int32, Int32, Int32, Int32, Int32) | Constructs an instance of the GeneralizedSuffixArray class. |
Name | Description | |
---|---|---|
InverseSuffixArray | Maps the suffix that starts at position i in the text to the lexicographical order position of this suffix, which is the value of the i-th element of this array. | |
LCPArray | Stores the length of the Longest Common Prefix of the lexicographically i-th suffix and its lexicographical predecessor (the lexicographically (i-1)-th suffix). The element at index 0 is always 0. | |
LCPSArray | ||
MaximumLcp | Maximum of all values in the _LCP array. | |
NumberOfWords | Number of words, if the text was separated into individual words. Otherwise, this field is equal to one. | |
SuffixArray | Maps the lexicographical order position i of a suffix to the starting position of the suffix in the text, which is the value of the i-th element of this array. | |
WordIndices | Maps the lexicographical order position i of a suffix to the index of the word, in which this suffix starts. This means, that for instance the value of the i-th element contains the index of the word, in which the lexicographically i-th suffix that starts at position _suffixArray[i] begins. The contents of this array is only meaningful, if you provided text that was separated into words, for instance for the longest common substring problem. | |
WordStartPositions | Start positions of the words in which the original text was separated in the array _text. |
Name | Description | |
---|---|---|
Equals | Determines whether the specified object is equal to the current object. (Inherited from Object) | |
Finalize | Allows an object to try to free resources and perform other cleanup operations before it is reclaimed by garbage collection. (Inherited from Object) | |
FromSeparateWords(IEnumerableString, Boolean) | Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class. | |
FromSeparateWordsT(IEnumerableIEnumerableT, Boolean) | Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class. | |
FromWords(IEnumerableString, Boolean, Boolean, IComparerChar) | Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class. | |
FromWordsT(IEnumerableIEnumerableT, Boolean, Boolean, IComparerT) | Constructs the generalized suffix array from separate 'words'. Each list in the parameter words is treated as 'word'. The elements are treated as characters. The elements are converted to an integer alphabet by means of the IntegerText class. | |
GetHashCode | Serves as the default hash function. (Inherited from Object) | |
GetType | Gets the Type of the current instance. (Inherited from Object) | |
MemberwiseClone | Creates a shallow copy of the current Object. (Inherited from Object) | |
ToString | Returns a string that represents the current object. (Inherited from Object) |
Name | Description | |
---|---|---|
RequiredTextPadding | Gets the required text padding, i.e. the number of additional elements at the end of the text array that must be left empty (zero). |
First of all, we need an O(N) algorithm to calculate the suffix array in O(N) time. This algorithm is implemented according to J. Kärkkäinen and P. Sanders: "Simple linear work suffix array construction.", Proc. 30th International Colloquium on Automata, Languages and Programming, volume 2719 of Lecture Notes in Computer Science, pages 943-955, Berlin, 2003, Springer-Verlag."
Secondly, an O(N) algorithm is neccessary to calculated the length of longest common prefix of two adjacent suffixes. This is implemented according to T. Kasai, G. Lee, H. Arimura, S. Arikawa1, and K. Park: "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications", Proc. 12th Annual Symposium on Combinatorial Pattern Matching. Lecture Notes in Computer Science, vol. 2089, pp. 181–192. Springer, Berlin (2001).
See also the very nice paper by Michael Arnold and Enno Ohlebusch, 'Linear Time Algorithms for Generalizations of the Longest Common Substring Problem', Algorithmica (2011) 60; 806-818; DOI: 10.1007/s00453-009-9369-1. The code here was partially adopted from the C++ sources from the web site of the authors at http://www.uni-ulm.de/in/theo/research/sequana.html.