mirror of
https://github.com/VictoriaMetrics/VictoriaMetrics.git
synced 2026-06-10 12:23:45 +03:00
117 lines
7.2 KiB
Markdown
117 lines
7.2 KiB
Markdown
---
|
|
weight: 120
|
|
title: SQL to LogsQL tutorial
|
|
menu:
|
|
docs:
|
|
parent: "victorialogs"
|
|
weight: 120
|
|
---
|
|
|
|
This is a tutorial for the migration from SQL to [LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/).
|
|
It is expected you are familiar with SQL and know [how to execute queries at VictoriaLogs](https://docs.victoriametrics.com/victorialogs/querying/).
|
|
|
|
|
|
## data model
|
|
|
|
SQL is usually used for querying relational tables. Every such table contains a pre-defined set of columns with pre-defined types.
|
|
LogsQL is used for querying logs. Logs are stored in [log streams](https://docs.victoriametrics.com/victorialogs/keyconcepts/#stream-fields).
|
|
So log streams is an analogue of tables in relational databases. Log streams and relational tables have the following major differences:
|
|
|
|
- Log streams are created automatically when the first log entry (row) is ingested into them.
|
|
- There is no pre-defined scheme in log streams - logs with arbitrary set of fields can be ingested into every log stream.
|
|
Both names and values in every log entry have string type. They may contain arbitrary string data.
|
|
- Every log entry (row) can be represented as a flat JSON object: `{"f1":"v1",...,"fN":"vN"}`. See [these docs](https://docs.victoriametrics.com/victorialogs/keyconcepts/#data-model).
|
|
- By default VictoriaLogs selects log entries across all the log streams. The needed set of log streams can be specified
|
|
via [stream filters](https://docs.victoriametrics.com/victorialogs/logsql/#stream-filter).
|
|
- By default VictoriaLogs returns all the fields across the selected logs. The set of returned fields
|
|
can be limited with [`fields` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe).
|
|
|
|
## query structure
|
|
|
|
SQL query structure is quite convoluted:
|
|
|
|
```sql
|
|
SELECT
|
|
<fields, aggregations, calculations, transformations>
|
|
FROM <table>
|
|
<optional JOINs>
|
|
<optional filters with optional subqueries>
|
|
<optional GROUP BY>
|
|
<optional HAVING>
|
|
<optional ORDER BY>
|
|
<optional LIMIT / OFFSET>
|
|
<optional UNION>
|
|
```
|
|
|
|
[LogsQL](https://docs.victoriametrics.com/victorialogs/logsql/) query structure is much simpler:
|
|
|
|
```logsql
|
|
<filters>
|
|
| <optional_pipe1>
|
|
| ...
|
|
| <optional_pipeN>
|
|
```
|
|
|
|
The `<filters>` part selects the needed logs (rows) according to the provided [filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters).
|
|
Then the provided [pipes](https://docs.victoriametrics.com/victorialogs/logsql/#pipes) are executed sequentlially.
|
|
Every such pipe receives all the rows from the previous stage, performs some calculations and/or transformations,
|
|
and then pushes the resulting rows to the next stage. This simplifies reading and understanding the query - just read it from the beginning
|
|
to the end in order to understand what does it do at every stage.
|
|
|
|
LogsQL pipes cover all the functionality from SQL: aggregations, calculations, transformations, subqueries, joins, post-filters, sorting, etc.
|
|
See the [conversion rules](#conversion-rules) on how to convert SQL to LogsQL.
|
|
|
|
## conversion rules
|
|
|
|
The following rules must be used for converting SQL query into LogsQL query:
|
|
|
|
* If the SQL query contains `WHERE`, then convert it into [LogsQL filters](https://docs.victoriametrics.com/victorialogs/logsql/#filters).
|
|
Otherwise just start LogsQL query with [`*`](https://docs.victoriametrics.com/victorialogs/logsql/#any-value-filter).
|
|
For example, `SELECT * FROM table WHERE field1=value1 AND field2<>value2` is converted into `field1:=value1 field2:!=value2`,
|
|
while `SELECT * FROM table` is converted into `*`.
|
|
* `IN` subqueries inside `WHERE` must be converted into [`in` filters](https://docs.victoriametrics.com/victorialogs/logsql/#multi-exact-filter).
|
|
For example, `SELECT * FROM table WHERE id IN (SELECT id2 FROM table)` is converted into `id:in(* | fields id2)`.
|
|
* If the `SELECT` part isn't equal to `*` and there are no `GROUP BY` / aggregate functions in the SQL query, then enumerate
|
|
the selected columns at [`fields` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#fields-pipe).
|
|
For example, `SELECT field1, field2 FROM table` is converted into `* | fields field1, field2`.
|
|
* If the SQL query contains `JOIN`, then convert it into [`join` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#join-pipe).
|
|
* If the SQL query contains `GROUP BY` / aggregate functions, then convert them to [`stats` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#stats-pipe).
|
|
For example, `SELECT count(*) FROM table` is converted into `* | count()`, while `SELECT user_id, count(*) FROM table GROUP BY user_id`
|
|
is converted to `* | stats by (user_id) count()`. Note how the LogsQL query mentions the `GROUP BY` fields only once,
|
|
while SQL forces mentioning these fields twice - at the `SELECT` and at the `GROUP BY`. How many times did you hit the discrepancy
|
|
between `SELECT` and `GROUP BY` fields?
|
|
* If the SQL query contains additional calculations and/or transformations at the `SELECT`, which aren't covered yet by `GROUP BY`,
|
|
then convert them into the corresponding [LogsQL pipes](https://docs.victoriametrics.com/victorialogs/logsql/#pipes).
|
|
The most frequently used pipes are [`math`](https://docs.victoriametrics.com/victorialogs/logsql/#math-pipe)
|
|
and [`format`](https://docs.victoriametrics.com/victorialogs/logsql/#format-pipe).
|
|
For example, `SELECT field1 + 10 AS x, CONCAT("foo", field2) AS y FROM table` is converted into `* | math field1 + 10 as x | format "foo<field2>" as y | fields x, y`.
|
|
* If the SQL query contains `HAVING`, then convert it into [`filter` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#filter-pipe).
|
|
For example, `SELECT user_id, count(*) AS c FROM table GROUP BY user_id HAVING c > 100` is converted into `* | stats by (user_id) count() c | filter c:>100`.
|
|
* If the SQL query contains `ORDER BY`, `LIMIT` and `OFFSET`, then convert them into [`sort` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#sort-pipe).
|
|
For example, `SELECT * FROM table ORDER BY field1, field2 LIMIT 10 OFFSET 20` is converted into `* | sort by (field1, field2) limit 10 offset 20`.
|
|
* If the SQL query contains `UNION`, then convert it into [`union` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#union-pipe).
|
|
For example `SELECT * FROM table WHERE filters1 UNION ALL SELECT * FROM table WHERE filters2` is converted into `filters1 | union (filters2)`.
|
|
|
|
SQL queries are frequently used for obtaining top N column values, which are the most frequently seen in the selected rows.
|
|
For example, the query below returns top 5 `user_id` values, which present in the biggest number of rows:
|
|
|
|
```sql
|
|
SELECT user_id, count(*) hits FROM table GROUP BY user_id ORDER BY hits DESC LIMIT 5
|
|
```
|
|
|
|
LogsQL provides a shortcut syntax with [`top` pipe](https://docs.victoriametrics.com/victorialogs/logsql/#top-pipe) for this case:
|
|
|
|
```logsql
|
|
* | top 5 (user_id)
|
|
```
|
|
|
|
It is equivalent to the longer LogsQL query:
|
|
|
|
```logsql
|
|
* | by (user_id) count() hits | sort by (hits desc) limit 5
|
|
```
|
|
|
|
[LogsQL pipes](https://docs.victoriametrics.com/victorialogs/logsql/#pipes) support much wider functionality comparing to SQL,
|
|
so spend your spare time by reading [pipe docs](https://docs.victoriametrics.com/victorialogs/logsql/) and playing with them
|
|
at [VictoriaLogs playground](https://play-vmlogs.victoriametrics.com/).
|