Input Language



Input Language Summary

  Sort SMT-LIB C++ API
String Primitives
String literal String Any sequence of printable ASCII characters enclosed in double quotes (“”), such as “example123”. A double-quote can be escaped as \”. See the SMT-LIB 2.5 specification for details. Z3_mk_str
String variable String (declare-const X String), or equivalently, (declare-fun X () String)  
String concatenation String x String -> String (str.++ "exa" "mple") Z3_mk_str_concat
String length String -> Int (str.len "abc123") Z3_mk_str_length
String equality String x String -> Bool (= X "abc")  
Higher-Level Functions
Character at index String x Int -> String (str.at "abc" 2) Z3_mk_str_at
Substring String x Int x Int -> String (str.substr "abcdef" start count) Z3_mk_str_substr
Prefix predicate String x String -> Bool (str.prefixof sPrefix sFull) Z3_mk_str_prefixof
Suffix predicate String x String -> Bool (str.suffixof sSuffix sFull) Z3_mk_str_suffixof
Contains predicate String x String -> Bool (str.contains sHaystack sNeedle) Z3_mk_str_contains
Index of occurrence String x String x Int -> Int (str.indexof sHaystack sNeedle iStart) Z3_mk_str_indexof
Single replacement String x String x String -> String (str.replace sBase sTarget sReplacement) Z3_mk_str_replace
String to integer conversion String -> Int (str.to-int "123")  
Integer to string conversion Int -> String (str.from-int 123)  
Regular Expressions
String to regex (match literal) String -> Regex (str.to.re "abc123") Z3_mk_str_to_regex
Regular language membership String x Regex -> Bool (str.in.re S R) Z3_mk_str_in_regex
Regex concatenation Regex x Regex -> Regex (re.++ R1 R2) Z3_mk_regex_concat
Regex union Regex x Regex -> Regex (re.union R1 R2) Z3_mk_regex_union
Regex Kleene star Regex -> Regex (re.* R) Z3_mk_regex_star
Regex plus Regex -> Regex (re.+ "abc123") Z3_mk_regex_plus
Regex character range String x String -> Regex (re.range "a" "z") Z3_mk_regex_range
Bit-Vector Integration (Chapter TBD)
Library-Aware String Methods (Chapter TBD)

String Primitives


String literals

String literals are expressed as a sequence of ASCII printable characters enclosed in double quotes. In SMT-LIB 2.5 syntax, a double quote is escaped inside a string as "". All other escape sequences are interpreted in the usual way, e.g. the sequence \t is expanded to a tab character, the sequence \\ is expanded to a single backslash, etc. In addition, the character sequence \xNN is a hexadecimal escape, where NN is a two-digit hexadecimal constant (using any of 0-9a-fA-F as hex digits) and the sequence is converted to the single ASCII character having the given hexadecimal value.

Examples:

The following are all valid SMT-LIB 2.5 string literals:

"abc"
"123;456"
"say ""hello"" everyone"

The following is also a valid SMT-LIB 2.5 string literal:

"field1\tfield2"

It is expanded to the following sequence of characters (spaces added for clarity):

f i e l d 1 TAB f i e l d 2

The string literals “x41” and “A” represent identical string constants, even though they are written two different ways.


String variables

Variables may be declared with standard SMT-LIB 2.5 syntax, i.e. (declare-const X String) for a variable X of sort String, or equivalently, (declare-fun X () String).


String concatenation

Two strings may be concatenated with the str.++ operator. The interpretation of this operator is equal to a string formed by appending the characters in the second argument to the characters in the first argument, keeping them in the same relative order.

Examples:

(str.++ "abc" "def") ::= "abcdef"

(str.++ "123" "") ::= "123"


String length

The length of a string may be expressed using the str.len operator. The length of a string is equal to the number of characters in the string. This length is of sort Int and as a result is an arbitrary-precision natural number.

Examples:

(str.len "") ::= 0

(str.len "abcdef") ::= 6

For information on alternative length semantics that model fixed-precision overflow such as in C/C++ string operations, please refer to Chapter TBD.


String equality

Equality between any two terms of String sort may be asserted with the = operator. Two strings are considered equal iff they contain the same characters appearing in the same order.

Examples:

(= (str.++ "abc" "") (str.++ "" "abc")) ::= True

(= (str.++ "abc" "def) ("abcdefGHI)) ::= False



Higher-Level Functions


Character at index

The term (str.at Str Idx) selects the single character at integer index Idx in the string Str. The first character in a (non-empty) string occurs at index 0. If Idx is less than 0 or would index a character past the end of the string, the value of this term is the empty string "". Otherwise, the value of this term is a string of length 1 corresponding to the index character in Str.

Examples:

(str.at "abcde" 2) ::= "c"

(str.at "abcde" 5) ::= ""

(str.at "abcde" -1) ::= ""


Substring

The term (str.substr Str Start Len) denotes the substring of Str of length Len, taken beginning at index Start. If Start is less than 0 or would denote a position past the end of the string, or if the resulting substring would extend past the end of Str, the value of this term is undefined.

Examples:

(str.substr "abcde" 1 3) ::= "bcd"


Prefix predicate

The term (str.prefixof Pre Full) is a predicate that is true iff the string Pre is a prefix of the string Full. More precisely, (str.prefixof Pre Full) is true iff there exists a string S such that Full == (str.++ Pre S).

Examples:

(str.prefixof "abc" "abcde") ::= True

(str.prefixof "cde" "abcde") ::= False


Suffix predicate

The term (str.suffixof Post Full) is a predicate that is true iff the string Post is a suffix of the string Full. More precisely, (str.suffixof Post Full) is true iff there exists a string S such that Full == (str.++ S Post).

Examples:

(str.suffixof "cde" "abcde") ::= True

(str.suffixof "abc" "abcde") ::= True


Contains predicate

The term (str.contains Haystack Needle) is a predicate that is true iff the string Haystack contains the string Needle. More precisely, (str.contains Haystack Needle) is true iff there exist strings S and T such that Haystack == (str.++ S (str.++ Needle T)).

Examples:

(str.contains "abcde" "bc") ::= True

(str.contains "abcde" "ae") ::= False


Index of occurrence

The term (str.indexof Haystack Needle Start) is the integer position of the first occurrence of the string Needle in the string Haystack at or after the integer index Start, if any. If Needle is not a substring of Haystack, or Start is either negative or denotes a position beyond the end of Haystack, or Needle is the empty string, then the value of this term is -1.

Examples:

(str.indexof "AbcAbcAbc" "Abc" 2) ::= 3

(str.indexof "AbcAbcAbc" "xyz" 0) ::= -1

(str.indexof "AbcAbcAbc" "Abc" 10) ::= -1


Single replacement

The term (str.replace Base Target Replacement) is equal to the string obtained by replacing the first occurrence of Target in the string Base with the string Replacement. If the string Target does not occur in Base, the term is equal to the unchanged string Base.

Examples:

(str.replace "abcXYZdeXYZf" "XYZ" "999") ::= "abc999deXYZf"

(str.replace "ababc" "ab" "") ::= "abc"

(str.replace "abcdef" "xyz" "999") ::= "abcdef"


String to integer conversion

The term (str.to-int Str) is equal to the integer formed by the digits of the string Str. Conversion to an integer is performed as follows:

  • If Str is the empty string, the value of this term is -1.
  • If Str contains any non-digit characters, the value of this term is -1.
  • Otherwise, the value of this term is the base-10 natural number represented by the digits of Str.

Note that leading zeroes are permitted but are effectively ignored.

Examples:

(str.to-int "123") ::= 123

(str.to-int "00123") ::= 123

(str.to-int "nine thousand") ::= -1

(str.to-int "-50") ::= -1


Integer to string conversion

The term (str.from-int N) is equal to the minimal base-10 string representation of the integer N. Conversion to a string is performed as follows:

  • If N is less than 0, the value of this term is the empty string "".
  • If N is equal to 0, the value of this term is the string "0".
  • Otherwise, the value of this term is a string S containing only the digits 0 through 9, does not contain any leading zeroes, and is a base-10 string representation of N, i.e. it satisfies (str.to-int S) == N.

Examples:

(str.from-int 123) ::= "123"

(str.from-int 0) ::= "0"

(str.from-int -50) ::= ""



Regular Expressions

Note that in all regular expression terms, subexpressions of sort Regex must be constant expressions (i.e. they may not contain free variables, conditional expressions, etc.).


String to regex (match literal)

The term (str.to.re S) defines the regular language whose only member is the string S. The subexpression S must be a string constant or constant expression; otherwise, an error is raised.


Regular language membership

The term (str.in.re S R) denotes the predicate that the string S is a member of the regular language R.

Examples:

(str.in.re "abc" (str.to.re "abc")) ::= True

(str.in.re "A" (re.++ (str.to.re "A") (str.to.re "B"))) ::= False

(str.in.re "A" (re.union (str.to.re "A") (str.to.re "B"))) ::= False


Regex concatenation

The term (re.++ R1 R2) denotes the regular language formed by concatenating any string in R1 with any string in R2.

Examples:

(re.++ (str.to.re "A") (str.to.re "B")) ::= (str.to.re "AB")


Regex union

The term (re.union R1 R2) denotes the regular language that is the union of all strings in R1 and all strings in R2.

Examples:

(re.union (str.to.re "A") (str.to.re "B")) is the regular language {"A", "B"}.


Regex Kleene star

The term (re.* R) denotes the Kleene closure of the regular language R, i.e., the language of all finite-length strings that can be formed by concatenating zero or more arbitrary members of R.

Examples:

(re.* (str.to.re "A")) is the regular language {"", "A", "AA", "AAA", ...}.


Regex plus

The term (re.+ R) denotes the language of all finite-length strings that can be formed by concatenating at least one or more arbitrary members of R. Unlike the Kleene star, the empty string is not a member of this language.

Examples:

(re.+ (str.to.re "A")) is the regular language {"A", "AA", "AAA", ...}.


Regex character range

The term (re.range C1 C2) denotes a regular language containing each element in a range of ASCII characters. The strings C1, C2 must be string literals of length 1. The character range described is all ASCII characters from C1 up to and including C2. It is not valid if the ASCII value of C2 is smaller than that of C1.

Examples:

(re.range "a" "z") is the regular language containing the characters {"a", "b", "c", ..., "x", "y", "z"}.