SphinxQL 指南

SphinxQL是有我们自己特色的SQL子集,通过使用标准的SQL语句并加上Sphinx特定的扩展,将搜索驻守进程的功能对外提供。所有通过 SphinxAPI提供的功能都可以通过SphinxQL访问,不过反过来则不行,例如对RT索引的写操作只能通过SPhinxQL。本章的文档列出了所 支持的SphinxQL语句的语法。

7.1. SELECT(搜索查询)语法

SELECT
    select_expr [, select_expr ...]
    FROM index [, index2 ...]
    [WHERE where_condition]
    [GROUP BY {col_name | expr_alias}]
    [ORDER BY {col_name | expr_alias} {ASC | DESC} [, ...]]
    [WITHIN GROUP ORDER BY {col_name | expr_alias} {ASC | DESC}]
    [LIMIT offset, row_count]
    [OPTION opt_name = opt_value [, ...]]

SELECT 语句在版本0.9.9-rc2引入。 其语法基于通常的SQL,但是加入了一些Sphinx特定的扩展,并省略了一些功能(例如(目前)不支持多表联合(JOIN))。特别的,

  • Column list clause. Column names, arbitrary expressions, and star ('*') are all allowed (ie. SELECT @id, group_id*123+456 AS expr1 FROM test1 will work). Unlike in regular SQL, all computed expressions must be aliased with a valid identifier. Starting with version 2.0.1-beta, AS is optional. Special names such as @id and @weight should currently be used with leading at-sign. This at-sign requirement will be lifted in the future.

  • FROM clause. FROM clause should contain the list of indexes to search through. Unlike in regular SQL, comma means enumeration of full-text indexes as in Query()API call rather than JOIN.

  • WHERE clause. This clause will map both to fulltext query and filters. Comparison operators (=, !=, <, >, <=, >=), IN, AND, NOT, and BETWEEN are all supported and map directly to filters. OR is not supported yet but will be in the future. MATCH('query') is supported and maps to fulltext query. Query will be interpreted according to full-text query language rules. There must be at most one MATCH() in the clause. Starting with version 2.0.1-beta, {col_name | expr_alias} [NOT] IN @uservarcondition syntax is supported. (Refer to 第 7.7 节 “SET(设置服务端变量)语法” for a discussion of global user variables.)

  • GROUP BY clause. Currently only supports grouping by a single column. The column however can be a computed expression:

    SELECT *, group_id*1000+article_type AS gkey FROM example GROUP BY gkey

    Aggregate functions (AVG(), MIN(), MAX(), SUM()) in column list clause are supported. Arguments to aggregate functions can be either plain attributes or arbitrary expressions. COUNT(*) is implicitly supported as using GROUP BY will add @count column to result set. Explicit support might be added in the future. COUNT(DISTINCT attr) is supported. Currently there can be at most one COUNT(DISTINCT) per query and an argument needs to be an attribute. Both current restrictions on COUNT(DISTINCT) might be lifted in the future.

    SELECT *, AVG(price) AS avgprice, COUNT(DISTINCT storeid)
    FROM products
    WHERE MATCH('ipod')
    GROUP BY vendorid

    Starting with 2.0.1-beta, GROUP BY on a string attribute is supported, with respect for current collation (参见 第 5.12 节 “字符串排序规则”).

  • WITHIN GROUP ORDER BY clause. This is a Sphinx specific extension that lets you control how the best row within a group will to be selected. The syntax matches that of regular ORDER BY clause:

    SELECT *, INTERVAL(posted,NOW()-7*86400,NOW()-86400) AS timeseg
    FROM example WHERE MATCH('my search query')
    GROUP BY siteid
    WITHIN GROUP ORDER BY @weight DESC
    ORDER BY timeseg DESC, @weight DESC

    Starting with 2.0.1-beta, WITHIN GROUP ORDER BY on a string attribute is supported, with respect for current collation (参见 第 5.12 节 “字符串排序规则”).

  • ORDER BY clause. Unlike in regular SQL, only column names (not expressions) are allowed and explicit ASC and DESC are required. The columns however can be computed expressions:

    SELECT *, @weight*10+docboost AS skey FROM example ORDER BY skey

    Starting with 2.0.1-beta, ORDER BY on a string attribute is supported, with respect for current collation (参见 第 5.12 节 “字符串排序规则”).

  • LIMIT clause. Both LIMIT N and LIMIT M,N forms are supported. Unlike in regular SQL (but like in Sphinx API), an implicit LIMIT 0,20 is present by default.

  • OPTION clause. This is a Sphinx specific extension that lets you control a number of per-query options. The syntax is:

    OPTION <optionname>=<value> [ , ... ]

    Supported options and respectively allowed values are:

    Example:

    SELECT * FROM test WHERE MATCH('@title hello @body world')
    OPTION ranker=bm25, max_matches=3000,
        field_weights=(title=10, body=3)
    • 'ranker' - any of 'proximity_bm25', 'bm25', 'none', 'wordcount', 'proximity', 'matchany', or 'fieldmask'

    • 'max_matches' - integer (per-query max matches value)

    • 'cutoff' - integer (max found matches threshold)

    • 'max_query_time' - integer (max search time threshold, msec)

    • 'retry_count' - integer (distributed retries count)

    • 'retry_delay' - integer (distributed retry delay, msec)

    • 'field_weights' - a named integer list (per-field user weights for ranking)

    • 'index_weights' - a named integer list (per-index user weights for ranking)

    • 'reverse_scan' - 0 or 1, lets you control the order in which full-scan query processes the rows

7.2. SHOW META(显示查询状态信息)语法

SHOW META

SHOW META shows additional meta-information about the latest query such as query time and keyword statistics:

mysql> SELECT * FROM test1 WHERE MATCH('test|one|two');
+------+--------+----------+------------+
| id   | weight | group_id | date_added |
+------+--------+----------+------------+
|    1 |   3563 |      456 | 1231721236 |
|    2 |   2563 |      123 | 1231721236 |
|    4 |   1480 |        2 | 1231721236 |
+------+--------+----------+------------+
3 rows in set (0.01 sec)

mysql> SHOW META;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total         | 3     |
| total_found   | 3     |
| time          | 0.005 |
| keyword[0]    | test  |
| docs[0]       | 3     |
| hits[0]       | 5     |
| keyword[1]    | one   |
| docs[1]       | 1     |
| hits[1]       | 2     |
| keyword[2]    | two   |
| docs[2]       | 1     |
| hits[2]       | 2     |
+---------------+-------+
12 rows in set (0.00 sec)

7.3. SHOW WARNINGS(显示查询警告信息)语法

SHOW WARNINGS

SHOW WARNINGS statement, introduced in version 0.9.9-rc2, can be used to retrieve the warning produced by the latest query. The error message will be returned along with the query itself:

mysql> SELECT * FROM test1 WHERE MATCH('@@title hello') \G
ERROR 1064 (42000): index test1: syntax error, unexpected TOK_FIELDLIMIT
near '@title hello'

mysql> SELECT * FROM test1 WHERE MATCH('@title -hello') \G
ERROR 1064 (42000): index test1: query is non-computable (single NOT operator)

mysql> SELECT * FROM test1 WHERE MATCH('"test doc"/3') \G
*************************** 1. row ***************************
        id: 4
    weight: 2500
  group_id: 2
date_added: 1231721236
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS \G
*************************** 1. row ***************************
  Level: warning
   Code: 1000
Message: quorum threshold too high (words=2, thresh=3); replacing quorum operator
         with AND operator
1 row in set (0.00 sec)

7.4. SHOW STATUS(显示服务端状态信息)语法

SHOW STATUS, introduced in version 0.9.9-rc2, displays a number of useful performance counters. IO and CPU counters will only be available if searchd was started with --iostats and --cpustats switches respectively.

mysql> SHOW STATUS;
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| uptime             | 216   |
| connections        | 3     |
| maxed_out          | 0     |
| command_search     | 0     |
| command_excerpt    | 0     |
| command_update     | 0     |
| command_keywords   | 0     |
| command_persist    | 0     |
| command_status     | 0     |
| agent_connect      | 0     |
| agent_retry        | 0     |
| queries            | 10    |
| dist_queries       | 0     |
| query_wall         | 0.075 |
| query_cpu          | OFF   |
| dist_wall          | 0.000 |
| dist_local         | 0.000 |
| dist_wait          | 0.000 |
| query_reads        | OFF   |
| query_readkb       | OFF   |
| query_readtime     | OFF   |
| avg_query_wall     | 0.007 |
| avg_query_cpu      | OFF   |
| avg_dist_wall      | 0.000 |
| avg_dist_local     | 0.000 |
| avg_dist_wait      | 0.000 |
| avg_query_reads    | OFF   |
| avg_query_readkb   | OFF   |
| avg_query_readtime | OFF   |
+--------------------+-------+
29 rows in set (0.00 sec)

7.5. INSERT 和 REPLACE(数据插入和替换)语法

{INSERT | REPLACE} INTO index [(column, ...)]
	VALUES (value, ...)
	[, (...)]

INSERT statement, introduced in version 1.10-beta, is only supported for RT indexes. It inserts new rows (documents) into an existing index, with the provided column values.

ID column must be present in all cases. Rows with duplicate IDs will not be overwritten by INSERT; use REPLACE to do that.

index is the name of RT index into which the new row(s) should be inserted. The optional column names list lets you only explicitly specify values for some of the columns present in the index. All the other columns will be filled with their default values (0 for scalar types, empty string for text types).

Expressions are not currently supported in INSERT and values should be explicitly specified.

Multiple rows can be inserted using a single INSERT statement by providing several comma-separated, parens-enclosed lists of rows values.

7.6. DELETE(数据删除)语法

DELETE FROM index WHERE {id = value | id IN (val1 [, val2 [, ...]])}

DELETE statement, introduced in version 1.10-beta, is only supported for RT indexes. It deletes existing rows (documents) from an existing index based on ID.

index is the name of RT index from which the row should be deleted. value is the row ID to be deleted. Support for batch id IN (2,3,5) syntax was added in version 2.0.1-beta.

Additional types of WHERE conditions (such as conditions on attributes, etc) are planned, but not supported yet as of 1.10-beta.

7.7. SET(设置服务端变量)语法

SET [GLOBAL] server_variable_name = value
SET GLOBAL @user_variable_name = (int_val1 [, int_val2, ...])

SET statement, introduced in version 1.10-beta, modifies a server variable value. The variable names are case-insensitive. No variable value changes survive server restart. There are the following classes of the variables:

  1. per-session server variable (1.10-beta and above)

  2. global server variable (2.0.1-beta and above)

  3. global user variable (2.0.1-beta and above)

Global user variables are shared between concurrent sessions. Currently, the only supported value type is the list of BIGINTs, and these variables can only be used along with IN() for filtering purpose. The intended usage scenario is uploading huge lists of values to searchd (once) and reusing them (many times) later, saving on network overheads. Example:

// in session 1
mysql> SET GLOBAL @myfilter=(2,3,5,7,11,13);
Query OK, 0 rows affected (0.00 sec)

// later in session 2
mysql> SELECT * FROM test1 WHERE group_id IN @myfilter;
+------+--------+----------+------------+-----------------+------+
| id   | weight | group_id | date_added | title           | tag  |
+------+--------+----------+------------+-----------------+------+
|    3 |      1 |        2 | 1299338153 | another doc     | 15   |
|    4 |      1 |        2 | 1299338153 | doc number four | 7,40 |
+------+--------+----------+------------+-----------------+------+
2 rows in set (0.02 sec)

Per-session and global server variables affect certain server settings in the respective scope. Known per-session server variables are:

  • AUTOCOMMIT = {0 | 1}

  • Whether any data modification statement should be implicitly wrapped by BEGIN and COMMIT. 版本2.0.1-beta引入。

  • COLLATION_CONNECTION = collation_name

  • Selects the collation to be used for ORDER BY or GROUP BY on string values in the subsequent queries. Refer to 第 5.12 节 “字符串排序规则” for a list of known collation names. 版本2.0.1-beta引入。

  • CHARACTER_SET_RESULTS = charset_name

  • Does nothing; a placeholder to support frameworks, clients, and connectors that attempt to automatically enforce a charset when connecting to a Sphinx server. 版本2.0.1-beta引入。

Known global server variables are:

  • QUERY_LOG_FORMAT = {plain | sphinxql}

  • Changes the current log format. 版本2.0.1-beta引入。

  • LOG_LEVEL = {info | debug | debugv | debugvv}

  • Changes the current log verboseness level. 版本2.0.1-beta引入。

Examples:

mysql> SET autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET GLOBAL query_log_format=sphinxql;
Query OK, 0 rows affected (0.00 sec)

7.8. BEGIN, COMMIT, 以及 ROLLBACK(事务处理)语法

START TRANSACTION | BEGIN
COMMIT
ROLLBACK
SET AUTOCOMMIT = {0 | 1}

BEGIN, COMMIT, and ROLLBACK statements were 版本2.0.1-beta引入。 BEGIN statement (or its START TRANSACTION alias) forcibly commits pending transaction, if any, and begins a new one. COMMIT statement commits the current transaction, making all its changes permanent. ROLLBACK statement rolls back the current transaction, canceling all its changes. SET AUTOCOMMIT controls the autocommit mode in the active session.

AUTOCOMMIT is set to 1 by default, meaning that every statement that perfoms any changes on any index is implicitly wrapped in BEGIN and COMMIT.

Transactions are limited to a single RT index, and also limited in size. They are atomic, consistent, overly isolated, and durable. Overly isolated means that the changes are not only invisible to the concurrent transactions but even to the current session itself.

7.9. CALL SNIPPETS(摘要生成)语法

CALL SNIPPETS(data, index, query[, opt_value AS opt_name[, ...]])

CALL SNIPPETS statement, introduced in version 1.10-beta, builds a snippet from provided data and query, using specified index settings.

data is the source data string to extract a snippet from. index is the name of the index from which to take the text processing settings. query is the full-text query to build snippets for. Additional options are documented in 第 8.7.1 节 “BuildExcerpts (产生文本摘要和高亮)”. Usage example:

CALL SNIPPETS('this is my document text', 'test1', 'hello world',
    5 AS around, 200 AS limit)

7.10. CALL KEYWORDS(关键词生成)语法

CALL KEYWORDS(text, index, [hits])

CALL KEYWORDS statement, introduced in version 1.10-beta, splits text into particular keywords. It returns tokenized and normalized forms of the keywords, and, optionally, keyword statistics.

text is the text to break down to keywords. index is the name of the index from which to take the text processing settings. hits is an optional boolean parameter that specifies whether to return document and hit occurrence statistics.

7.11. SHOW TABLES(显示当前提供搜索服务的索引列表)语法

SHOW TABLES

SHOW TABLES statement, introduced in version 2.0.1-beta, enumerates all currently active indexes along with their types. As of 2.0.1-beta, existing index types arelocal, distributed, and rt respectively. Example:

mysql> SHOW TABLES;
+-------+-------------+
| Index | Type        |
+-------+-------------+
| dist1 | distributed |
| rt    | rt          |
| test1 | local       |
| test2 | local       |
+-------+-------------+
4 rows in set (0.00 sec)

7.12. DESCRIBE(显示指定搜索服务索引的字段信息)语法

{DESC | DESCRIBE} index

DESCRIBE statement, introduced in version 2.0.1-beta, lists index columns and their associated types. Columns are document ID, full-text fields, and attributes. The order matches that in which fields and attributes are expected by INSERT and REPLACE statements. As of 2.0.1-beta, column types are field, integer, timestamp, ordinal,bool, float, bigint, string, and mva. ID column will be typed either integer or bigint based on whether the binaries were built with 32-bit or 64-bit document ID support. Example:

mysql> DESC rt;
+---------+---------+
| Field   | Type    |
+---------+---------+
| id      | integer |
| title   | field   |
| content | field   |
| gid     | integer |
+---------+---------+
4 rows in set (0.00 sec)

7.13. CREATE FUNCTION(添加自定义函数)语法

CREATE FUNCTION udf_name
	RETURNS {INT | BIGINT | FLOAT}
	SONAME 'udf_lib_file'

CREATE FUNCTION statement, introduced in version 2.0.1-beta, installs a user-defined function (UDF) with the given name and type from the given library file. The library file must reside in a trusted plugin_dir directory. On success, the function is available for use in all subsequent queries that the server receives. Example:

mysql> CREATE FUNCTION avgmva RETURNS INT SONAME 'udfexample.dll';
Query OK, 0 rows affected (0.03 sec)

mysql> SELECT *, AVGMVA(tag) AS q from test1;
+------+--------+---------+-----------+
| id   | weight | tag     | q         |
+------+--------+---------+-----------+
|    1 |      1 | 1,3,5,7 | 4.000000  |
|    2 |      1 | 2,4,6   | 4.000000  |
|    3 |      1 | 15      | 15.000000 |
|    4 |      1 | 7,40    | 23.500000 |
+------+--------+---------+-----------+

7.14. DROP FUNCTION(删除自定义函数)语法

DROP FUNCTION udf_name

DROP FUNCTION statement, introduced in version 2.0.1-beta, deinstalls a user-defined function (UDF) with the given name. On success, the function is no longer available for use in subsequent queries. Pending concurrent queries will not be affected and the library unload, if necessary, will be postponed until those queries complete. Example:

mysql> DROP FUNCTION avgmva;
Query OK, 0 rows affected (0.00 sec)

7.15. SHOW VARIABLES(显示服务器端变量)语法

SHOW VARIABLES

Added in version 2.0.1-beta, this is currently a placeholder query that does nothing and reports success. That is in order to keep compatibility with frameworks and connectors that automatically execute this statement.

mysql> SHOW VARIABLES;
Query OK, 0 rows affected (0.00 sec)

7.16. SHOW COLLATION(显示字符集校对)语法

SHOW COLLATION

Added in version 2.0.1-beta, this is currently a placeholder query that does nothing and reports success. That is in order to keep compatibility with frameworks and connectors that automatically execute this statement.

mysql> SHOW COLLATION;
Query OK, 0 rows affected (0.00 sec)

7.17. UPDATE(数据更新)语法

UPDATE index SET col1 = newval1 [, ...] WHERE ID = docid

UPDATE statement was added in version 2.0.1-beta. It can currently update 32-bit integer attributes only. Multiple attributes and values can be specified. Both RT and disk indexes are supported. Updates on other attribute types are also planned.

mysql> UPDATE myindex SET enabled=0 WHERE id=123;
Query OK, 1 rows affected (0.00 sec)

7.18. 多结果集查询(批量查询)

Starting version 2.0.1-beta, SphinxQL supports multi-statement queries, or batches. Possible inter-statement optimizations described in 第 5.11 节 “批量查询” do apply to SphinxQL just as well. The batched queries should be separated by a semicolon. Your MySQL client library needs to support MySQL multi-query mechanism and multiple result set. For instance, mysqli interface in PHP and DBI/DBD libraries in Perl are known to work.

Here's a PHP sample showing how to utilize mysqli interface with Sphinx.

<?php

$link = mysqli_connect ( "127.0.0.1", "root", "", "", 9306 );
if ( mysqli_connect_errno() )
    die ( "connect failed: " . mysqli_connect_error() );

$batch = "SELECT * FROM test1 ORDER BY group_id ASC;";
$batch .= "SELECT * FROM test1 ORDER BY group_id DESC";

if ( !mysqli_multi_query ( $link, $batch ) )
    die ( "query failed" );

do
{
    // fetch and print result set
    if ( $result = mysqli_store_result($link) )
    {
        while ( $row = mysqli_fetch_row($result) )
            printf ( "id=%s\n", $row[0] );
        mysqli_free_result($result);
    }

    // print divider
    if ( mysqli_more_results($link) )
        printf ( "------\n" );

} while ( mysqli_next_result($link) );

Its output with the sample test1 index included with Sphinx is as follows.

$ php test_multi.php
id=1
id=2
id=3
id=4
------
id=3
id=4
id=1
id=2

The following statements can currently be used in a batch: SELECT, SHOW WARNINGS, SHOW STATUS, and SHOW META. Arbitrary sequence of these statements are allowed. The results sets returned should match those that would be returned if the batched queries were sent one by one.

7.19. COMMIT(注释)语法

Since version 2.0.1-beta, SphinxQL supports C-style comment syntax. Everything from an opening /* sequence to a closing */ sequence is ignored. Comments can span multiple lines, can not nest, and should not get logged. MySQL specific /*! ... */ comments are also currently ignored. (As the comments support was rather added for better compatibility with mysqldump produced dumps, rather than improving generaly query interoperability between Sphinx and MySQL.)

SELECT /*! SQL_CALC_FOUND_ROWS */ col1 FROM table1 WHERE ...

7.20. SphinxQL 保留关键字列表

A complete alphabetical list of keywords that are currently reserved in SphinxQL syntax (and therefore can not be used as identifiers).

AND
AS
ASC
AVG
BEGIN
BETWEEN
BY
CALL
COLLATION
COMMIT
COUNT
DELETE
DESC
DESCRIBE
DISTINCT
FALSE
FROM
GLOBAL
GROUP
ID
IN
INSERT
INTO
LIMIT
MATCH
MAX
META
MIN
NOT
NULL
OPTION
OR
ORDER
REPLACE
ROLLBACK
SELECT
SET
SHOW
START
STATUS
SUM
TABLES
TRANSACTION
TRUE
UPDATE
VALUES
VARIABLES
WARNINGS
WEIGHT
WHERE
WITHIN

7.21. SphinxQL 升级备注, version 2.0.1-beta

This section only applies to existing applications that use SphinxQL versions prior to 2.0.1-beta.

In previous versions, SphinxQL just wrapped around SphinxAPI and inherited its magic columns and column set quirks. Essentially, SphinxQL queries could return (slightly) different columns and in a (slightly) different order than it was explicitly requested in the query. Namely, weight magic column (which is not a real column in any index) was added at all times, and GROUP BY related @count, @group, and @distinct magic columns were conditionally added when grouping. Also, the order of columns (attributes) in the result set was actually taken from the index rather than the query. (So if you asked for columns C, B, A in your query but they were in the A, B, C order in the index, they would have been returned in the A, B, C order.)

In version 2.0.1-beta, we fixed that. SphinxQL is now more SQL compliant (and will be further brought in as much compliance with standard SQL syntax as possible). That is not yet a breaking change, because searchd now supports compat_sphinxql_magics directive that flips between the old "compatibility" mode and the new "compliance" mode. However, the compatibility mode support is going to be removed in future, so it's strongly advised to update SphinxQL applications and switch to the compliance mode.

The important changes are as follows:

  • @ID magic name is deprecated in favor of ID. Document ID is considered an attribute.

  • WEIGHT is no longer implicitly returned, because it is not actually a column (an index attribute), but rather an internal function computed per each row (a match). You have to explicitly ask for it, using the WEIGHT() function. (The requirement to alias the result will be lifted in the next release.)

    SELECT id, WEIGHT() w FROM myindex WHERE MATCH('test')
  • You can now use quoted reserved keywords as aliases. The quote character is backtick ("`", ASCII code 96 decimal, 60 hex). One particularly useful example would be returning weight column like the old mode:

    SELECT id, WEIGHT() `weight` FROM myindex WHERE MATCH('test')
  • The column order is now different and should now match the one expliclitly defined in the query. So if you are accessing columns based on their position in the result set rather than the name (for instance, by using mysql_fetch_row() rather than mysql_fetch_assoc() in PHP), check and fix the order of columns in your queries.

  • SELECT * return the columns in index order, as it used to, including the ID column. However, SELECT * does not automatically return WEIGHT(). To update such queries in case you access columns by names, simply add it to the query:

    SELECT *, WEIGHT() `weight` FROM myindex WHERE MATCH('test')

    Otherwise, i.e., in case you rely on column order, select ID, weight, and then other columns:

    SELECT id, *, WEIGHT() `weight` FROM myindex WHERE MATCH('test')
  • Magic @count and @distinct attributes are no longer implicitly returned. You now have to explicitly ask for them when using GROUP BY. (Also note that you currently have to alias them; that requirement will be lifted in the future.)

    SELECT gid, COUNT(*) q FROM myindex WHERE MATCH('test')
    GROUP BY gid ORDER BY q DESC