Compare commits

...

20 Commits

Author SHA1 Message Date
Artem Fetishev
1f1c619abb port-rebase fixes
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-02-25 12:30:35 +01:00
Artem Fetishev
217d116c2c bump roaring bitmap version to 2.14.4
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-02-25 12:28:22 +01:00
Artem Fetishev
449d4ff1a1 byte size benchmark
Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-02-25 12:25:04 +01:00
Artem Fetishev
6128134e84 lib/uint64set: Add roaring64 bitmap to vendors and use it in benchmarks
The uint64set has been temporarily replaced in benchmarks with roaring64.Bitmap
in order to compare the performance with uint64set.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-02-25 12:24:57 +01:00
hagen1778
d467faf739 docs: add change lines after 673b2ca7db
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-02-25 11:27:53 +01:00
sias32
673b2ca7db dashboards/deployment: add links for vmalert (#10509)
### Describe Your Changes

1. Dashboard: Adding a link to an alert for quick access to it
(alert-statisticl)
2. Rules: Replace localhost with $externalURL to take the address from
the --external.url flag

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

---------

Signed-off-by: sias32 <sias.32@yandex.ru>
2026-02-25 11:26:44 +01:00
hagen1778
40ccf0c333 app/vmalert: fix typo Minium => Minimum
Follow-up after a6200cc83d

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-02-25 09:28:04 +01:00
hklhai
fe341a4204 Improve Influx parsing error message when raw newline (\n) appears inside quoted fieldvmagent: Improve Influx parsing error message when raw newline (\n)… (#10524)
# Investigation & Root Cause --- InfluxDB Line Protocol Parsing with Raw
Newline (`\n`)

This document describes the investigation process and root cause
analysis for Influx Line Protocol parsing errors in VictoriaMetrics when
a **raw newline (`\n`) byte appears inside a quoted field value**.

------------------------------------------------------------------------

## Background

According to the Influx Line Protocol specification:

-   Each point must be represented as a single line.
-   The newline character (`\n`) separates points.
-   Literal newline bytes are not allowed inside quoted field values.

Therefore, any raw newline byte (`0x0A`) inside a quoted string makes
the line invalid.

------------------------------------------------------------------------

## Related Issue

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067

------------------------------------------------------------------------

## Expected Behavior

VictoriaMetrics should reject Influx Line Protocol lines that contain a
raw newline inside a quoted field value, since this violates the
protocol specification.

The parsing failure itself is correct.

------------------------------------------------------------------------

## Actual Behavior

VictoriaMetrics rejects the line with the following error:

cannot parse field value for "...": missing closing quote for quoted
field value

While technically correct, the error message does not clearly indicate
that the root cause is a raw newline inside the quoted field value.

------------------------------------------------------------------------

## Minimal Reproducer

The issue can be reproduced without Telegraf or Jolokia:

``` bash
printf 'test value="hello
world"\n' | curl -X POST http://localhost:8428/write --data-binary @-
```

This produces:

cannot parse field value for "value": missing closing quote for quoted
field value

The failure occurs because the value contains an actual newline byte
(0x0A), not the escaped sequence `\n`.

------------------------------------------------------------------------

## Environment Setup

The issue was reproduced using the following stack:

-   VictoriaMetrics v1.127.0
-   InfluxDB 1.8
-   Spring Boot + Jolokia
-   Telegraf 1.36.2

Telegraf collects JVM `SystemProperties`, including:

``` json
"line.separator": "\n"
```

After JSON unmarshalling, this becomes a real newline byte in memory.

Detailed reproduction steps can be found here:

https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067#issuecomment-3896175100

------------------------------------------------------------------------

## Observed Serialized Line

Using breakpoint debugging in:

    lib/bytesutil/bytebuffer.go:58

The `ReadFrom` function reads and assembles an Influx line containing:

    SystemProperties.line.separator="
    ",

The quoted field contains an actual newline byte before the closing
quote.

This breaks the single-line assumption of Influx Line Protocol.

VictoriaMetrics splits on `\n`, resulting in:

-   A truncated first line
-   A missing closing quote
-   Parsing failure

------------------------------------------------------------------------

## Important Clarification

This issue is **not** caused by the escaped sequence `"\\n"`.

The failure occurs only when the serialized Influx line contains an
actual newline byte (`0x0A`) inside the quoted value.

Escaped `\n` (two characters: `\` and `n`) is valid.

------------------------------------------------------------------------

## Root Cause

-   Telegraf serializes a field containing a real newline byte.
-   Influx Line Protocol forbids literal newline characters inside
    quoted fields.
-   VictoriaMetrics correctly treats `\n` as a line separator.
-   The parser then encounters an incomplete quoted field and reports
    "missing closing quote".

The parsing behavior is correct per specification.

------------------------------------------------------------------------

## Proposed Improvement

The parsing logic should remain unchanged.

However, the error message can be improved to better indicate the root
cause.

Suggested error message:

invalid Influx line protocol: missing closing quote for quoted field
value;
this may be caused by a raw newline (`\n`) inside the quoted field value

This makes the failure immediately actionable and easier to diagnose.

------------------------------------------------------------------------

## Summary

-   The failure is caused by a raw newline byte inside a quoted field
    value.
-   This violates the Influx Line Protocol specification.
-   VictoriaMetrics correctly rejects the line.
-   The error message should explicitly mention the possibility of a raw
    newline (`\n`) inside the quoted field.

Signed-off-by: hklhai <hkhai@outlook.com>
Co-authored-by: Max Kotliar <kotlyar.maksim@gmail.com>
2026-02-24 20:42:43 +02:00
Max Kotliar
83ebf00659 app/vmstorage: increase min free disk space from 10M to 100M (#10529)
### Describe Your Changes

The free disk space check is not continuous but occurs periodically. In
high-load environments with large ingestion rates, the system can exceed
the remaining 10MB between checks. This can lead to a situation where
disk space is exhausted before the next check occurs, causing panic.

Increase the default value 10x to cover the case.

Fixes https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9561

### Checklist

The following checks are **mandatory**:

- [ ] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [ ] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2026-02-24 18:06:55 +02:00
Roman Khavronenko
5e602726f5 app/vmselect: properly apply extra filters for tenant tokens for /api/v1/label/../values (#10503)
Previosly, extra filters were ignored for
`/api/v1/label/vm_account_id/values` or
`/api/v1/label/vm_project_id/values` calls. In result, even if user's
visibility was limited by applying
`?extra_filters[]={vm_account_id="1"}` param they could get the list of
all available tenants in the system.

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>

(cherry picked from commit d2a033453e)
Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-02-24 15:42:13 +01:00
hagen1778
a6200cc83d app/vmalert: rename MiniMum => Minimum
Follow-up after a5811d3c3b

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-02-24 15:37:02 +01:00
Fedor Kanin
a5811d3c3b docs/vmalert: fix a typo by replacing maxiMum with maximum (#10516)
### Describe Your Changes

Fix a typo by replacing `maxiMum` with `maximum` in Markdown docs and
CLI flags help.

Resolve #10515 

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2026-02-24 15:34:41 +01:00
JAYICE
5962b47c31 document: enrich the description of buckets_limit (#10465)
### Describe Your Changes

fix https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10417

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2026-02-24 15:33:16 +01:00
Roman Khavronenko
9a4edc738a docs: re-visit Troubleshooting docs (#10512)
* remove ToC in the beginning, as it duplicates right-bar functionality
and is easier to make a mistake with. For example, it didn't have the
ZFS section in it
* simplify wording where it was possible
* reference new tools VM got in recent releases
* re-prioritize tips order based on personal experience

---------

Signed-off-by: hagen1778 <roman@victoriametrics.com>
Signed-off-by: Roman Khavronenko <hagen1778@gmail.com>
Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Co-authored-by: Pablo (Tomas) Fernandez <46322567+TomFern@users.noreply.github.com>
2026-02-24 15:30:31 +01:00
Roman Khavronenko
30d01e9cae dashboards: filter out zero value for Major page faults panel (#10517)
Components like vmselect and vminsert rarely touch disk, so most of the
time their values are 0. Filtering out 0 values makes the panel cleaner.

Signed-off-by: hagen1778 <roman@victoriametrics.com>
2026-02-24 15:30:05 +01:00
Artem Fetishev
6b46f3920c lib/uint64set: move set un/marshal methods from Storage to uint64set (#10521)
A refactoring that moves the uint64set.Set marshaling and unmarshaling from lib/storage/storage.go to lib/uint64set. Also added function docs and tests.

Signed-off-by: Artem Fetishev <rtm@victoriametrics.com>
2026-02-24 11:15:49 +01:00
Zhu Jiekun
97b11146ee flaky test: disable GC during sync.Pool test (#10523)
Disable GC when testing sync.Pool `Get` and `Put` logic, so the items in pool won't be recycled too fast.

Follow-up for 785daff65d.
2026-02-24 10:19:04 +01:00
Fred Navruzov
2ef74bd6ea docs/vmanomaly - strip bad chars from filenames (#10525)
### Describe Your Changes

Strip spaces and `=` from filenames as suggested in #10522 

now
```shellhelp
find ./docs |egrep '[ =]'
```
returns no such files

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).
2026-02-24 10:05:48 +02:00
Max Kotliar
845161e377 .github: Run apptests on separate pool of runners
It should prvent apptest timeouts due to runners saturation. When
apptests are run with other tests and linters they do not have enough
CPU to complete in time and often times out.

If one re-runs the apptests shortly after they are likely to pass
because the same runner has enough resources available (other job
finished).

Remove GOGC=10 as the runner has enough memory (16Gb)  to run apptests.

I did some tests and obeserve drop in overal test duration from 4.5m to
3.30-3m.
2026-02-23 14:16:40 +02:00
Vadim Rutkovsky
f176a6624a dashboards: operator dashboard should extract version from metrics (#10502)
### Describe Your Changes

Use vm_app_version to determine operator version instead of static text

### Checklist

The following checks are **mandatory**:

- [x] My change adheres to [VictoriaMetrics contributing
guidelines](https://docs.victoriametrics.com/victoriametrics/contributing/#pull-request-checklist).
- [x] My change adheres to [VictoriaMetrics development
goals](https://docs.victoriametrics.com/victoriametrics/goals/).

Signed-off-by: Vadim Rutkovsky <vadim@vrutkovs.eu>
2026-02-23 13:32:14 +02:00
106 changed files with 30422 additions and 373 deletions

View File

@@ -86,7 +86,7 @@ jobs:
- run: go version
- name: Run tests
run: GOGC=10 make ${{ matrix.scenario}}
run: make ${{ matrix.scenario}}
- name: Publish coverage
uses: codecov/codecov-action@v5
@@ -95,7 +95,7 @@ jobs:
apptest:
name: apptest
runs-on: ubuntu-latest
runs-on: apptest
steps:
- name: Code checkout

View File

@@ -31,8 +31,8 @@ var (
"0 means no limit.")
ruleUpdateEntriesLimit = flag.Int("rule.updateEntriesLimit", 20, "Defines the max number of rule's state updates stored in-memory. "+
"Rule's updates are available on rule's Details page and are used for debugging purposes. The number of stored updates can be overridden per rule via update_entries_limit param.")
resendDelay = flag.Duration("rule.resendDelay", 0, "MiniMum amount of time to wait before resending an alert to notifier.")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maxiMum duration for automatic alert expiration, "+
resendDelay = flag.Duration("rule.resendDelay", 0, "Minimum amount of time to wait before resending an alert to notifier.")
maxResolveDuration = flag.Duration("rule.maxResolveDuration", 0, "Limits the maximum duration for automatic alert expiration, "+
"which by default is 4 times evaluationInterval of the parent group")
evalDelay = flag.Duration("rule.evalDelay", 30*time.Second, "Adjustment of the 'time' parameter for rule evaluation requests to compensate intentional data delay from the datasource. "+
"Normally, should be equal to '-search.latencyOffset' (cmd-line flag configured for VictoriaMetrics single-node or vmselect). "+

View File

@@ -62,7 +62,7 @@ var (
"Excess series are logged and dropped. This can be useful for limiting series churn rate. See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-limiter . "+
"See also -storage.maxHourlySeries")
minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 10e6, "The minimum free disk space at -storageDataPath after which the storage stops accepting new data")
minFreeDiskSpaceBytes = flagutil.NewBytes("storage.minFreeDiskSpaceBytes", 100e6, "The minimum free disk space at -storageDataPath after which the storage stops accepting new data")
cacheSizeStorageTSID = flagutil.NewBytes("storage.cacheSizeStorageTSID", 0, "Overrides max size for storage/tsid cache. "+
"See https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning")

View File

@@ -171,6 +171,26 @@ func TestClusterMultiTenantSelect(t *testing.T) {
t.Errorf("unexpected response (-want, +got):\n%s", diff)
}
// /api/v1/label/../value with extra_filters
wantVR := apptest.NewPrometheusAPIV1LabelValuesResponse(t,
`{"data": [
"5"
]
}`)
wantSR.Sort()
gotVR := vmselect.PrometheusAPIV1LabelValues(t, "vm_account_id", "foo", apptest.QueryOpts{
Start: "2022-05-10T08:00:00.000Z",
End: "2022-05-10T08:30:00.000Z",
ExtraFilters: []string{`{vm_account_id="5"}`},
Tenant: "multitenant",
})
gotSR.Sort()
if diff := cmp.Diff(wantVR, gotVR, cmpopts.IgnoreFields(apptest.PrometheusAPIV1LabelValuesResponse{}, "Status", "IsPartial")); diff != "" {
t.Errorf("unexpected response (-want, +got):\n%s", diff)
}
// Delete series from specific tenant
vmselect.APIV1AdminTSDBDeleteSeries(t, "foo_bar", apptest.QueryOpts{
Tenant: "5:15",

View File

@@ -506,6 +506,24 @@
"value": 200
}
]
},
{
"matcher": {
"id": "byName",
"options": "Alert"
},
"properties": [
{
"id": "links",
"value": [
{
"targetBlank": true,
"title": "Alert",
"url": "/alerting/${ds:text}/${__value.text}/find"
}
]
}
]
}
]
},
@@ -659,4 +677,4 @@
"uid": "ehXxUsGSk",
"version": 1,
"weekStart": ""
}
}

View File

@@ -91,8 +91,26 @@
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"fieldConfig": {
"defaults": {},
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
}
]
}
},
"overrides": []
},
"gridPos": {
@@ -103,17 +121,42 @@
},
"id": 24,
"options": {
"code": {
"language": "plaintext",
"showLineNumbers": false,
"showMiniMap": false
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"percentChangeColorMode": "standard",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "/^short_version$/",
"values": false
},
"content": "<div style=\"text-align: center;\">$version</div>",
"mode": "markdown"
"showPercentChange": false,
"textMode": "value",
"wideLayout": true
},
"pluginVersion": "12.3.0",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
"format": "table",
"instant": true,
"interval": "",
"legendFormat": "{{short_version}}",
"range": false,
"refId": "A"
}
],
"title": "Version",
"type": "text"
"type": "stat"
},
{
"datasource": {

View File

@@ -5129,7 +5129,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
"legendFormat": "__auto",
"range": true,
"refId": "A"
@@ -11153,7 +11153,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
"legendFormat": "{{instance}} ({{job}})",
"range": true,
"refId": "A"

View File

@@ -5174,7 +5174,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
"legendFormat": "__auto",
"range": true,
"refId": "A"
@@ -7667,7 +7667,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
"legendFormat": "{{instance}} ({{job}})",
"range": true,
"refId": "A"

View File

@@ -92,29 +92,72 @@
"type": "row"
},
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"fieldConfig": {
"defaults": {},
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": 0
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 3,
"h": 4,
"w": 4,
"x": 0,
"y": 1
},
"id": 24,
"options": {
"code": {
"language": "plaintext",
"showLineNumbers": false,
"showMiniMap": false
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"percentChangeColorMode": "standard",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "/^short_version$/",
"values": false
},
"content": "<div style=\"text-align: center;\">$version</div>",
"mode": "markdown"
"showPercentChange": false,
"textMode": "value",
"wideLayout": true
},
"pluginVersion": "12.3.0",
"targets": [
{
"datasource": {
"type": "victoriametrics-metrics-datasource",
"uid": "$ds"
},
"editorMode": "code",
"exemplar": false,
"expr": "vm_app_version{job=~\"$job\",instance=~\"$instance\"}",
"format": "table",
"instant": true,
"interval": "",
"legendFormat": "{{short_version}}",
"range": false,
"refId": "A"
}
],
"title": "Version",
"type": "text"
"type": "stat"
},
{
"datasource": {

View File

@@ -5130,7 +5130,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
"legendFormat": "__auto",
"range": true,
"refId": "A"
@@ -11154,7 +11154,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
"legendFormat": "{{instance}} ({{job}})",
"range": true,
"refId": "A"

View File

@@ -5175,7 +5175,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job) > 0",
"legendFormat": "__auto",
"range": true,
"refId": "A"
@@ -7668,7 +7668,7 @@
"uid": "${ds}"
},
"editorMode": "code",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance)",
"expr": "sum(rate(process_major_pagefaults_total{job=~\"$job\", instance=~\"$instance\"}[$__rate_interval])) by (job,instance) > 0",
"legendFormat": "{{instance}} ({{job}})",
"range": true,
"refId": "A"

View File

@@ -27,7 +27,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} will run out of disk space in 3 days"
description: "Taking into account current ingestion rate, free disk space will be enough only
for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.\n
@@ -51,7 +51,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} will become read-only in 3 days"
description: "Taking into account current ingestion rate, free disk space and -storage.minFreeDiskSpaceBytes
instance {{ $labels.instance }} will remain writable for {{ $value | humanizeDuration }}.\n
@@ -68,7 +68,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=20&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
Having less than 20% of free disk space could cripple merges processes and overall performance.
@@ -81,7 +81,7 @@ groups:
severity: warning
show_at: dashboard
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=52&var-instance={{ $labels.instance }}"
summary: "Too many errors served for {{ $labels.job }} path {{ $labels.path }} (instance {{ $labels.instance }})"
description: "Requests to path {{ $labels.path }} are receiving errors.
Please verify if clients are sending correct requests."
@@ -100,7 +100,7 @@ groups:
severity: warning
show_at: dashboard
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=44&var-instance={{ $labels.instance }}"
summary: "Too many RPC errors for {{ $labels.job }} (instance {{ $labels.instance }})"
description: "RPC errors are interconnection errors between cluster components.\n
Possible reasons for errors are misconfiguration, overload, network blips or unreachable components."
@@ -116,7 +116,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=102"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=102"
summary: "Churn rate is more than 10% for the last 15m"
description: "VM constantly creates new time series.\n
This effect is known as Churn Rate.\n
@@ -132,7 +132,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=102"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=102"
summary: "Too high number of new series created over last 24h"
description: "The number of created new time series over last 24h is 3x times higher than
current number of active series.\n
@@ -151,7 +151,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=108"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=108"
summary: "Percentage of slow inserts is more than 5% for the last 15m"
description: "High rate of slow inserts may be a sign of resource exhaustion
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series.
@@ -164,7 +164,7 @@ groups:
severity: warning
show_at: dashboard
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=139&var-instance={{ $labels.instance }}"
summary: "Connection between vminsert on {{ $labels.instance }} and vmstorage on {{ $labels.addr }} is saturated"
description: "The connection between vminsert (instance {{ $labels.instance }}) and vmstorage (instance {{ $labels.addr }})
is saturated by more than 90% and vminsert won't be able to keep up.\n

View File

@@ -15,7 +15,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=49&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=49&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} is dropping data from persistent queue"
description: "Vmagent dropped {{ $value | humanize1024 }} from persistent queue
on instance {{ $labels.instance }} for the last 10m."
@@ -26,7 +26,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=79&var-instance={{ $labels.instance }}"
summary: "Vmagent is dropping data blocks that are rejected by remote storage"
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} drops the rejected by
remote-write server data blocks. Check the logs to find the reason for rejects."
@@ -37,7 +37,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=31&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=31&var-instance={{ $labels.instance }}"
summary: "Vmagent fails to scrape one or more targets"
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} fails to scrape targets for last 15m"
@@ -61,7 +61,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=77&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=77&var-instance={{ $labels.instance }}"
summary: "Vmagent responds with too many errors on data ingestion protocols"
description: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} responds with errors to write requests for last 15m."
@@ -71,7 +71,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=61&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=61&var-instance={{ $labels.instance }}"
summary: "Job \"{{ $labels.job }}\" on instance {{ $labels.instance }} fails to push to remote storage"
description: "Vmagent fails to push data via remote write protocol to destination \"{{ $labels.url }}\"\n
Ensure that destination is up and reachable."
@@ -87,7 +87,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=84&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=84&var-instance={{ $labels.instance }}"
summary: "Remote write connection from \"{{ $labels.job }}\" (instance {{ $labels.instance }}) to {{ $labels.url }} is saturated"
description: "The remote write connection between vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }}) and destination \"{{ $labels.url }}\"
is saturated by more than 90% and vmagent won't be able to keep up.\n
@@ -101,7 +101,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=98&var-instance={{ $labels.instance }}"
summary: "Persistent queue writes for instance {{ $labels.instance }} are saturated"
description: "Persistent queue writes for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
are saturated by more than 90% and vmagent won't be able to keep up with flushing data on disk.
@@ -113,7 +113,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=99&var-instance={{ $labels.instance }}"
summary: "Persistent queue reads for instance {{ $labels.instance }} are saturated"
description: "Persistent queue reads for vmagent \"{{ $labels.job }}\" (instance {{ $labels.instance }})
are saturated by more than 90% and vmagent won't be able to keep up with reading data from the disk.
@@ -124,7 +124,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=88&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=88&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} reached 90% of the limit"
description: "Max series limit set via -remoteWrite.maxHourlySeries flag is close to reaching the max value.
Then samples for new time series will be dropped instead of sending them to remote storage systems."
@@ -134,7 +134,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/G7Z9GzMGz?viewPanel=90&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/G7Z9GzMGz?viewPanel=90&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} reached 90% of the limit"
description: "Max series limit set via -remoteWrite.maxDailySeries flag is close to reaching the max value.
Then samples for new time series will be dropped instead of sending them to remote storage systems."

View File

@@ -23,7 +23,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=13&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=13&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
summary: "Alerting rules are failing for vmalert instance {{ $labels.instance }}"
description: "Alerting rules execution is failing for \"{{ $labels.alertname }}\" from group \"{{ $labels.group }}\" in file \"{{ $labels.file }}\".
Check vmalert's logs for detailed error message."
@@ -34,7 +34,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=30&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=30&var-instance={{ $labels.instance }}&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
summary: "Recording rules are failing for vmalert instance {{ $labels.instance }}"
description: "Recording rules execution is failing for \"{{ $labels.recording }}\" from group \"{{ $labels.group }}\" in file \"{{ $labels.file }}\".
Check vmalert's logs for detailed error message."
@@ -45,7 +45,7 @@ groups:
labels:
severity: info
annotations:
dashboard: "http://localhost:3000/d/LzldHAVnz?viewPanel=33&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
dashboard: "{{ $externalURL }}/d/LzldHAVnz?viewPanel=33&var-file={{ $labels.file }}&var-group={{ $labels.group }}"
summary: "Recording rule {{ $labels.recording }} ({{ $labels.group }}) produces no data"
description: "Recording rule \"{{ $labels.recording }}\" from group \"{{ $labels.group }}\ in file \"{{ $labels.file }}\"
produces 0 samples over the last 30min. It might be caused by a misconfiguration

View File

@@ -11,7 +11,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
summary: "vmauth ({{ $labels.instance }}) reached concurrent requests limit"
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentRequests flag value,
deploy additional vmauth replicas, check requests latency at backend service.
@@ -22,7 +22,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
summary: "vmauth ({{ $labels.instance }}) has reached concurrent requests limit for username {{ $labels.username }}"
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentPerUserRequests flag value,
deploy additional vmauth replicas, check requests latency at backend service."
@@ -32,7 +32,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=10&var-instance={{ $labels.instance }}"
summary: "vmauth ({{ $labels.instance }}) has reached concurrent requests limit for unauthorized user"
description: "Possible solutions: increase -maxQueueDuration flag value, increase -maxConcurrentPerUserRequests flag value,
deploy additional vmauth replicas, check requests latency at backend service."
@@ -42,7 +42,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
summary: "Too many errors served for unauthorized user (instance {{ $labels.instance }})"
description: "Requests from unauthorized user are receiving errors.
Please check the vmauth logs to verify that the configuration is correct and clients are sending valid requests."
@@ -52,7 +52,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/nbuo5Mr4k?viewPanel=37&var-instance={{ $labels.instance }}"
summary: "Too many errors served for user {{ $labels.username }} (instance {{ $labels.instance }})"
description: "Requests from user {{ $labels.username }} are receiving errors.
Please check the vmauth logs to verify that the configuration is correct and clients are sending valid requests."

View File

@@ -27,7 +27,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} will run out of disk space soon"
description: "Taking into account current ingestion rate, free disk space will be enough only
for {{ $value | humanizeDuration }} on instance {{ $labels.instance }}.\n
@@ -51,7 +51,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/oS7Bi_0Wz?viewPanel=53&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/oS7Bi_0Wz?viewPanel=53&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} will become read-only in 3 days"
description: "Taking into account current ingestion rate and free disk space
instance {{ $labels.instance }} is writable for {{ $value | humanizeDuration }}.\n
@@ -68,7 +68,7 @@ groups:
labels:
severity: critical
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=53&var-instance={{ $labels.instance }}"
summary: "Instance {{ $labels.instance }} (job={{ $labels.job }}) will run out of disk space soon"
description: "Disk utilisation on instance {{ $labels.instance }} is more than 80%.\n
Having less than 20% of free disk space could cripple merge processes and overall performance.
@@ -80,7 +80,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=35&var-instance={{ $labels.instance }}"
summary: "Too many errors served for path {{ $labels.path }} (instance {{ $labels.instance }})"
description: "Requests to path {{ $labels.path }} are receiving errors.
Please verify if clients are sending correct requests."
@@ -96,7 +96,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
summary: "Churn rate is more than 10% on \"{{ $labels.instance }}\" for the last 15m"
description: "VM constantly creates new time series on \"{{ $labels.instance }}\".\n
This effect is known as Churn Rate.\n
@@ -112,7 +112,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=66&var-instance={{ $labels.instance }}"
summary: "Too high number of new series on \"{{ $labels.instance }}\" created over last 24h"
description: "The number of created new time series over last 24h is 3x times higher than
current number of active series on \"{{ $labels.instance }}\".\n
@@ -131,7 +131,7 @@ groups:
labels:
severity: warning
annotations:
dashboard: "http://localhost:3000/d/wNf0q_kZk?viewPanel=68&var-instance={{ $labels.instance }}"
dashboard: "{{ $externalURL }}/d/wNf0q_kZk?viewPanel=68&var-instance={{ $labels.instance }}"
summary: "Percentage of slow inserts is more than 5% on \"{{ $labels.instance }}\" for the last 15m"
description: "High rate of slow inserts on \"{{ $labels.instance }}\" may be a sign of resource exhaustion
for the current load. It is likely more RAM is needed for optimal handling of the current number of active time series.

View File

@@ -136,21 +136,21 @@ models:
Here's how default (backward-compatible) behavior looks like - anomalies will be tracked in `both` directions (`y > yhat` or `y < yhat`). This is useful when there is no domain expertise to filter the required direction.
![schema_detection_direction=both](schema_detection_direction=both.webp)
![schema_detection_direction=both](schema_detection_direction_both.webp)
When set to `above_expected`, anomalies are tracked only when `y > yhat`.
*Example metrics*: Error rate, response time, page load time, number of failed transactions - metrics where *lower values are better*, so **higher** values are typically tracked.
![schema_detection_direction=above_expected](schema_detection_direction=above_expected.webp)
![schema_detection_direction=above_expected](schema_detection_direction_above_expected.webp)
When set to `below_expected`, anomalies are tracked only when `y < yhat`.
*Example metrics*: Service Level Agreement (SLA) compliance, conversion rate, Customer Satisfaction Score (CSAT) - metrics where *higher values are better*, so **lower** values are typically tracked.
![schema_detection_direction=below_expected](schema_detection_direction=below_expected.webp)
![schema_detection_direction=below_expected](schema_detection_direction_below_expected.webp)
Config with a split example:
@@ -199,13 +199,13 @@ reader:
Visualizations below demonstrate this concept; the green zone defined as the `[yhat - min_dev_from_expected, yhat + min_dev_from_expected]` range excludes actual data points (`y`) from generating anomaly scores if they fall within that range.
![min_dev_from_expected-default](schema_min_dev_from_expected=0.webp)
![min_dev_from_expected-default](schema_min_dev_from_expected_0.webp)
![min_dev_from_expected-small](schema_min_dev_from_expected=1.0.webp)
![min_dev_from_expected-small](schema_min_dev_from_expected_1_0.webp)
![min_dev_from_expected-big](schema_min_dev_from_expected=5.0.webp)
![min_dev_from_expected-big](schema_min_dev_from_expected_5_0.webp)
Example config of how to use this param based on query results:

View File

@@ -1227,7 +1227,10 @@ Metric names are stripped from the resulting series. Add [keep_metric_names](#ke
#### buckets_limit
`buckets_limit(limit, buckets)` is a [transform function](#transform-functions), which limits the number
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
of [histogram buckets](https://valyala.medium.com/improving-histogram-usability-for-prometheus-and-grafana-bc7e5df0e350) to the given `limit`.
The result will preserve the first and the last bucket to improve accuracy for min and max values.
So, if the `limit` is greater than 0 and less than 3, the function will still return 3 buckets: the first bucket, the last bucket, and a selected bucket.
See also [prometheus_buckets](#prometheus_buckets) and [histogram_quantile](#histogram_quantile).

View File

@@ -12,96 +12,78 @@ aliases:
- /troubleshooting/index.html
- /troubleshooting/
---
This document contains troubleshooting guides for the most common issues when working with VictoriaMetrics:
- [General troubleshooting checklist](#general-troubleshooting-checklist)
- [Unexpected query results](#unexpected-query-results)
- [Slow data ingestion](#slow-data-ingestion)
- [Slow queries](#slow-queries)
- [Out of memory errors](#out-of-memory-errors)
- [Cluster instability](#cluster-instability)
- [Too much disk space used](#too-much-disk-space-used)
- [Monitoring](#monitoring)
This document contains troubleshooting guides for the most common issues when working with VictoriaMetrics.
## General troubleshooting checklist
If you hit some issue or have some question about VictoriaMetrics components,
then please follow these steps in order to quickly find the solution:
If you encounter an issue or have a question about VictoriaMetrics components, follow these steps to quickly find a solution:
1. Check the version of VictoriaMetrics component, you are troubleshooting and compare
it to [the latest available version](https://docs.victoriametrics.com/victoriametrics/changelog/).
If the used version is lower than the latest available version, then there are high chances
that the issue is already resolved in newer versions. Carefully read [the changelog](https://docs.victoriametrics.com/victoriametrics/changelog/)
between your version and the latest version and check whether the issue is already fixed there.
1. Check the version of the VictoriaMetrics component you are troubleshooting and compare
it with [the latest available version](https://docs.victoriametrics.com/victoriametrics/changelog/).
If the issue is already fixed in newer versions, then upgrade to the newer version and verify whether the issue is fixed:
If you are running an older version, the issue may already be fixed. Review the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/)
for all releases between your version and the latest release to see whether the problem has been resolved.
If the issue is fixed in a newer release, upgrade and verify that the problem no longer occurs:
- [How to upgrade single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-upgrade-victoriametrics)
- [How to upgrade VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#updating--reconfiguring-cluster-nodes)
Upgrade procedure for other VictoriaMetrics components is as simple as gracefully stopping the component
by sending `SIGINT` signal to it and starting the new version of the component.
The upgrade procedure for other VictoriaMetrics components is as simple as gracefully stopping the component
by sending it a `SIGINT` signal and starting the new version of the component.
There may be breaking changes between different versions of VictoriaMetrics components in rare cases.
These cases are documented in [the changelog](https://docs.victoriametrics.com/victoriametrics/changelog/).
So please read the changelog before the upgrade.
In rare cases, upgrades may include breaking changes. These cases are documented in the [changelog](https://docs.victoriametrics.com/victoriametrics/changelog/),
especially check the **Update notes** near the top of the changelog, as they point out any special actions or considerations to take when upgrading.
1. Inspect command-line flags passed to VictoriaMetrics components and remove flags that have unclear outcomes for your workload.
VictoriaMetrics components are designed to work optimally with the default command-line flag values (e.g. when these flags aren't set explicitly).
It is recommended to remove flags with unclear outcomes, since they may result in unexpected issues.
1. Review command-line flags passed to VictoriaMetrics components and remove any flags whose impact on your workload is unclear.
VictoriaMetrics components are optimized to work well with default settings (that is, when flags aren't explicitly set).
Unnecessary or poorly understood flags can lead to unexpected behavior, so it's best to remove them unless you clearly understand why they are needed.
1. Check for logs in VictoriaMetrics components. They may contain useful information about cause of the issue
and how to fix the issue. If the log message doesn't have enough useful information for troubleshooting,
then search the log message in Google. There are high chances that the issue is already reported
somewhere (docs, StackOverflow, Github issues, etc.) and the solution is already documented there.
1. Check logs. They often contain useful details about the root cause and possible fixes.
1. If VictoriaMetrics logs have no relevant information, then try searching for the issue in Google
via multiple keywords and phrases specific to the issue. There are high chances that the issue
and the solution is already documented somewhere.
If the logs don't provide enough information, try searching the error message on Google. In many cases, the issue has
already been discussed (in documentation, on Stack Overflow, or in GitHub issues), and a solution may already be available.
1. Try searching for the issue at [VictoriaMetrics GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
The signal/noise quality of search results here is much lower than in Google, but sometimes it may help
finding the relevant information about the issue when Google fails to find the needed information.
If you located the relevant GitHub issue, but it misses some information on how to diagnose or troubleshoot it,
then please provide this information in comments to the issue. This increases chances that it will be resolved soon.
1. If VictoriaMetrics logs do not have relevant information, then try searching for the issue on Google
using multiple keywords and phrases specific to the issue. In many cases, both the issue and its solution are already documented.
1. Try searching for information about the issue in [VictoriaMetrics source code](https://github.com/search?q=repo%3AVictoriaMetrics%2FVictoriaMetrics&type=code).
GitHub code search may be not very good in some cases, so it is recommended [checking out VictoriaMetrics source code](https://github.com/VictoriaMetrics/VictoriaMetrics/)
and perform local search in the checked out code.
Note that the source code for VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
1. Try searching for the issue in [VictoriaMetrics GitHub](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
The signal-to-noise ratio of search results here is much lower than on Google, but sometimes it can help
find relevant information when Google fails.
If you located the relevant GitHub issue, but it lacks details for diagnosis or troubleshooting,
then please add them in the issue comments. This increases the chance that it will be resolved soon.
1. Try searching for information about the issue in the history of [VictoriaMetrics Slack chat](https://victoriametrics.slack.com).
There are non-zero chances that somebody already stuck with the same issue and documented the solution at Slack.
1. Try searching for information about the issue in the [VictoriaMetrics source code](https://github.com/search?q=repo%3AVictoriaMetrics%2FVictoriaMetrics&type=code).
GitHub code search may not be very effective in some cases, so it is recommended [to check out the VictoriaMetrics source code](https://github.com/VictoriaMetrics/VictoriaMetrics/)
and perform a local search in the code.
Note that the source code for the VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
1. If steps above didn't help finding the solution to the issue, then please [file a new issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new/choose)
by providing the maximum details on how to reproduce the issue.
1. If the steps above didn't help to find the solution to the issue, then please [file a new issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new/choose)
with as many details as possible on how to reproduce it.
After that you can post the link to the issue to [VictoriaMetrics Slack chat](https://victoriametrics.slack.com),
so VictoriaMetrics community could help finding the solution to the issue. It is better filing the issue at VictoriaMetrics GitHub
before posting your question to VictoriaMetrics Slack chat, since GitHub issues are indexed by Google,
while Slack messages aren't indexed by Google. This simplifies searching for the solution to the issue for future VictoriaMetrics users.
After that you can post the link to the issue in the [VictoriaMetrics Slack chat](https://victoriametrics.slack.com),
so the VictoriaMetrics community can help find a solution. It is better to file the issue on VictoriaMetrics GitHub
before posting your question to the VictoriaMetrics Slack chat, since GitHub issues are indexed by Google,
while Slack messages are not. This simplifies finding a solution to the issue for future VictoriaMetrics users.
1. Pro tip 1: if you see that [VictoriaMetrics docs](https://docs.victoriametrics.com/victoriametrics/) contain incomplete or incorrect information,
then please create a pull request with the relevant changes. This will help VictoriaMetrics community.
All the docs published at `https://docs.victoriametrics.com` are located in the [docs](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/docs)
folder inside VictoriaMetrics repository.
then please create a pull request with the relevant changes or a new issue explaining the problem. This will help the VictoriaMetrics community.
1. Pro tip 2: please provide links to existing docs / GitHub issues / StackOverflow questions
instead of copy-n-pasting the information from these sources when asking or answering questions
from VictoriaMetrics community. If the linked resources have no enough information,
then it is better posting the missing information in the web resource before providing links
to this information in Slack chat. This will simplify searching for this information in the future
instead of copying and pasting the information from these sources when asking or answering questions
to the VictoriaMetrics community. If the linked resources do not have enough information,
then it is better to add the missing information to the original web resource before linking it to Slack chat. This will simplify searching for this information in the future
for VictoriaMetrics users via Google and [Perplexity](https://www.perplexity.ai/).
1. Pro tip 3: if you are answering somebody's question about VictoriaMetrics components
at GitHub issues / Slack chat / StackOverflow, then the best answer is a direct link to the information
regarding the question.
in GitHub issues / Slack chat / StackOverflow, then the best answer is a direct link to the information
with the answer or solution to the question.
The better answer is a concise message with multiple links to the relevant information.
The worst answer is a message with misleading or completely wrong information.
1. Pro tip 4: if you can fix the issue on yourself, then please do it and provide the corresponding pull request!
We are glad to get pull requests from VictoriaMetrics community.
1. Pro tip 4: If you can fix the issue on your own, then please do it and provide the corresponding pull request!
We are happy to get pull requests from the VictoriaMetrics community.
## Unexpected query results
@@ -111,150 +93,150 @@ If you see unexpected or unreliable query results from VictoriaMetrics, then try
`sum(rate(http_requests_total[5m])) by (job)`, then check whether the following queries return
expected results:
- Remove the outer `sum` and execute `rate(http_requests_total[5m])`,
since aggregations could hide some missing series, gaps in data or anomalies in existing series.
If this query returns too many time series, then try adding more specific label filters to it.
- Remove the outer `sum` and execute `rate(http_requests_total[5m])`.
Aggregations could hide missing series, data gaps, or anomalies.
- If the query returns too many series, try adding more specific label filters.
For example, if you see that the original query returns unexpected results for the `job="foo"`,
then use `rate(http_requests_total{job="foo"}[5m])` query.
If this isn't enough, then continue adding more specific label filters, so the resulting query returns
manageable number of time series.
then use the `rate(http_requests_total{job="foo"}[5m])` query.
Continue adding more specific label filters until the resulting query returns a manageable number of time series.
- Remove the outer `rate` and execute `http_requests_total`. Additional label filters may be added here in order
to reduce the number of returned series.
- Remove the outer `rate` and execute `http_requests_total`. Add label filters to reduce the number of returned series
if needed.
Sometimes the query may be improperly constructed, so it returns unexpected results.
It is recommended reading and understanding [MetricsQL docs](https://docs.victoriametrics.com/victoriametrics/metricsql/),
Sometimes the query may be improperly constructed, leading to unexpected results.
It is recommended to read and understand [MetricsQL docs](https://docs.victoriametrics.com/victoriametrics/metricsql/),
especially [subqueries](https://docs.victoriametrics.com/victoriametrics/metricsql/#subqueries)
and [rollup functions](https://docs.victoriametrics.com/victoriametrics/metricsql/#rollup-functions) sections.
1. If the simplest query continues returning unexpected / unreliable results, then try verifying correctness
of raw unprocessed samples for this query via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-data-in-json-line-format)
on the given `[start..end]` time range and check whether they are expected:
of raw unprocessed samples in [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) via the `Raw Query` tab.
Responses returned from [/api/v1/query](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#instant-query)
and [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) contain **evaluated** data
instead of stored raw samples. In some cases, [staleness](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness),
[deduplication](https://docs.victoriametrics.com/victoriametrics/#deduplication), or irregular scrapes can affect evaluations.
See [this short video](https://www.youtube.com/watch?v=7AyVCC6uKfI) for details.
Raw data can be downloaded via the `Export` button in vmui's `Raw Query` tab or via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-export-data-in-json-line-format)
query on the given `[start..end]` time range and check whether they are expected:
```sh
single-node: curl http://victoriametrics:8428/api/v1/export -d 'match[]=http_requests_total' -d 'start=...' -d 'end=...' -d 'reduce_mem_usage=1'
cluster: curl http://<vmselect>:8481/select/<tenantID>/prometheus/api/v1/export -d 'match[]=http_requests_total' -d 'start=...' -d 'end=...' -d 'reduce_mem_usage=1'
```
When raising a GitHub ticket about query issues, please also attach the raw data, so maintainers can reproduce your case locally.
Note that responses returned from [/api/v1/query](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#instant-query)
and from [/api/v1/query_range](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query) contain **evaluated** data
instead of raw samples stored in VictoriaMetrics. See [these docs](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness)
for details. The raw samples can be also viewed in [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) in `Raw Query` tab and shared via `export` button.
1. Try executing the query with [tracer](https://docs.victoriametrics.com/victoriametrics/#query-tracing) enabled. The trace
contains a lot of additional information about query execution, series matching, caches, and internal modifications.
When raising a GitHub ticket about query issues, please also attach the trace so maintainers can investigate.
If you migrate from InfluxDB, then pass `-search.setLookbackToStep` command-line flag to single-node VictoriaMetrics
or to `vmselect` in VictoriaMetrics cluster. See also [how to migrate from InfluxDB to VictoriaMetrics](https://docs.victoriametrics.com/guides/migrate-from-influx/).
1. If you observe gaps when plotting series, it is likely caused by irregular intervals for metrics collection (network delays
or targets unavailability during scrapes, irregular pushes, irregular timestamps).
VictoriaMetrics automatically [fills the gaps](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
based on the median interval between [data samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples).
This may yield incorrect results for irregular data, as the median will be skewed. In this case, it is recommended to either fix the
irregularities or switch to the static interval for gaps filling by setting `-search.minStalenessInterval=5m` command-line flag (`5m` is
used by Prometheus by default).
1. Sometimes response caching may lead to unexpected results when samples with older timestamps
1. Sometimes, response caching may lead to unexpected results when samples with older timestamps
are ingested into VictoriaMetrics (aka [backfilling](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#backfilling)).
Try disabling response cache and see whether this helps. This can be done in the following ways:
Try disabling response cache and see whether this helps:
- By clicking on the toggle `Disable cache` in vmui.
- By passing `-search.disableCache` command-line flag to a single-node VictoriaMetrics
or to all the `vmselect` components if cluster version of VictoriaMetrics is used.
or to all the `vmselect` components if the cluster version of VictoriaMetrics is used.
- By passing `nocache=1` query arg to every request to `/api/v1/query` and `/api/v1/query_range`.
If you use Grafana, then this query arg can be specified in `Custom Query Parameters` field
at Prometheus datasource settings - see [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
If you use Grafana, then this query arg can be specified in the `Custom Query Parameters` field
in Prometheus datasource settings. See [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
If the problem was in the cache, try resetting it via [resetRollupCache handler](https://docs.victoriametrics.com/victoriametrics/url-examples/#internalresetrollupresultcache).
If the problem was in the cache, try resetting it via the [resetRollupCache handler](https://docs.victoriametrics.com/victoriametrics/url-examples/#internalresetrollupresultcache).
1. If you use cluster version of VictoriaMetrics, then it may return partial responses by default
when some of `vmstorage` nodes are temporarily unavailable - see [cluster availability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-availability)
for details. If you want to prioritize query consistency over cluster availability,
then you can pass `-search.denyPartialResponse` command-line flag to all the `vmselect` nodes.
In this case VictoriaMetrics returns an error during querying if at least a single `vmstorage` node is unavailable.
Another option is to pass `deny_partial_response=1` query arg to `/api/v1/query` and `/api/v1/query_range`.
If you use Grafana, then this query arg can be specified in `Custom Query Parameters` field
at Prometheus datasource settings - see [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
1. Cluster version of VictoriaMetrics may return partial responses by default when some of the `vmstorage` nodes are temporarily
unavailable. See [cluster availability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-availability).
If you want to prioritize query consistency over cluster availability, then pass `-search.denyPartialResponse` command-line flag to all the `vmselect` nodes.
This causes VictoriaMetrics to return an error during query execution if at least one `vmstorage` node is unavailable.
Another option is to pass `deny_partial_response=1` query argument to `/api/v1/query` and `/api/v1/query_range`.
If you use Grafana, then this query argument can be specified in the `Custom Query Parameters` field
in Prometheus/VictoriaMetrics datasource settings. See [these docs](https://grafana.com/docs/grafana/latest/datasources/prometheus/) for details.
1. If you pass `-replicationFactor` command-line flag to `vmselect`, then it is recommended removing this flag from `vmselect`,
1. If you pass the `-replicationFactor` command-line flag to `vmselect`, then it is recommended to remove this flag from `vmselect`,
since it may lead to incomplete responses when `vmstorage` nodes contain less than `-replicationFactor`
copies of the requested data.
1. If you observe gaps when plotting time series try simplifying your query according to p2 and follow the list.
If problem still remains, then it is likely caused by irregular intervals for metrics collection (network delays
or targets unavailability on scrapes, irregular pushes, irregular timestamps).
VictoriaMetrics automatically [fills the gaps](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#range-query)
based on median interval between [data samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples).
This may work incorrectly for irregular data as median will be skewed. In this case it is recommended to switch
to the static interval for gaps filling by setting `-search.minStalenessInterval=5m` command-line flag (`5m` is
the static interval used by Prometheus).
1. If you observe recently written data is not immediately visible/queryable, then read more about
1. If you observe that recently written data is not immediately visible/queryable, then read more about
[query latency](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#query-latency) behavior.
1. Try upgrading to the [latest available version of VictoriaMetrics](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
and verifying whether the issue is fixed there.
1. Try executing the query with `trace=1` query arg. This enables query tracing, that may contain
useful information on why the query returns unexpected data. See [query tracing docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing) for details.
1. Inspect command-line flags passed to VictoriaMetrics components. If you don't clearly understand the purpose
or the effect of some flags, then remove them from the list of flags.
VictoriaMetrics components are optimized to work well with default settings (that is, when flags aren't explicitly set).
Unnecessary or poorly understood flags can lead to unexpected behavior, so it's best to remove them unless you clearly understand why they are needed.
1. Inspect command-line flags passed to VictoriaMetrics components. If you don't understand clearly the purpose
or the effect of some flags, then remove them from the list of flags passed to VictoriaMetrics components,
because some command-line flags may change query results in unexpected ways when set to improper values.
VictoriaMetrics is optimized for running with default flag values (e.g. when they aren't set explicitly).
1. If the steps above didn't help identifying the root cause of unexpected query results,
then [file a bugreport](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new) with details on how to reproduce the issue.
Instead of sharing screenshots in the issue, consider sharing query and [trace](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
results in [VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui) by clicking on `Export query` button in top right corner of the graph area.
1. If the steps above didn't help identify the root cause of unexpected query results,
then [file a bug report](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/new) with details on how to reproduce the issue.
Instead of sharing screenshots in the issue, consider sharing the query, [raw samples](https://docs.victoriametrics.com/victoriametrics/#vmui) and [trace](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
results via [VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui).
## Slow data ingestion
These are the most commons reasons for slow data ingestion in VictoriaMetrics:
These are the most common reasons for slow data ingestion in VictoriaMetrics:
1. Memory shortage for the given amounts of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series).
VictoriaMetrics (or `vmstorage` in cluster version of VictoriaMetrics) maintains an in-memory cache
for quick search for internal series ids per each incoming metric.
This cache is named `storage/tsid`. VictoriaMetrics automatically determines the maximum size for this cache
depending on the available memory on the host where VictoriaMetrics (or `vmstorage`) runs. If the cache size isn't enough
for holding all the entries for active time series, then VictoriaMetrics locates the needed data on disk,
unpacks it, re-constructs the missing entry and puts it into the cache. This takes additional CPU time and disk read IO.
VictoriaMetrics (or `vmstorage` in the cluster version of VictoriaMetrics) maintains an in-memory cache `storage/tsid`
for a quick search for internal series IDs for each incoming metric. VictoriaMetrics automatically determines the maximum
size for this cache depending on the available memory on the host where VictoriaMetrics (or `vmstorage`) runs.
If the cache size isn't enough to hold all the entries for active time series, then VictoriaMetrics locates the required data on disk,
unpacks it, reconstructs the missing entry, and adds it to the cache. This takes additional CPU time and disk read I/O.
The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
contain `Slow inserts` graph, that shows the cache miss percentage for `storage/tsid` cache
during data ingestion. If `slow inserts` graph shows values greater than 5% for more than 10 minutes,
then it is likely the current number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series)
contain a `Slow inserts` graph that shows the cache miss percentage for the `storage/tsid` cache during data ingestion.
If the `slow inserts` graph shows values greater than 5% for more than 10 minutes,
then it is likely that the current number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series)
cannot fit the `storage/tsid` cache.
These are the solutions that exist for this issue:
- To increase the available memory on the host where VictoriaMetrics runs until `slow inserts` percentage
will become lower than 5%. If you run VictoriaMetrics cluster, then you need increasing total available
memory at `vmstorage` nodes. This can be done in two ways: either to increase the available memory
per each existing `vmstorage` node or to add more `vmstorage` nodes to the cluster.
- Increase the available memory on the host where VictoriaMetrics runs until the `slow inserts` percentage
drops to 5% or less. If you run a VictoriaMetrics cluster, then you need to increase the total available
memory at all `vmstorage` nodes. This can be done in two ways: either to increase the available memory
for each `vmstorage` node or to add more `vmstorage` nodes to the cluster to spread the load.
- To reduce the number of active time series. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
contain a graph showing the number of active time series. Recent versions of VictoriaMetrics
provide [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer),
that can help determining and fixing the source of [high cardinality](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-cardinality).
- Reduce the number of active time series. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
contain a graph showing the number of active time series. Use the [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer)
to determine and fix the source of [high cardinality](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-cardinality).
- Insert performance can degrade when the same time series arrives with labels in different order.
- Insert performance can degrade when the same time series arrives with labels in a different order.
Ensure your ingestion client always sends labels in a consistent order for each series.
Prometheus and `vmagent` already guarantee this, but custom or third-party clients might not.
As a fallback, you can enable `-sortLabels=true` on VictoriaMetrics or on `vminsert` in cluster mode.
This forces the server to normalize label order, though it increases CPU usage during ingestion.
1. [High churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate),
e.g. when old time series are substituted with new time series at a high rate.
When VictoriaMetrics encounters a sample for new time series, it needs to register the time series
in the internal index (aka `indexdb`), so it can be quickly located on subsequent select queries.
1. [High churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
When VictoriaMetrics encounters a sample for a new time series, it needs to register the time series
in the internal index (aka `indexdb`), so it can be quickly located during select queries.
The process of registering new time series in the internal index is an order of magnitude slower
than the process of adding new sample to already registered time series.
than the process of adding a new sample to an already registered time series.
So VictoriaMetrics may work slower than expected under [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
provides `Churn rate` graph, that shows the average number of new time series registered
provide a `Churn rate` graph, which shows the average number of new time series registered
during the last 24 hours. If this number exceeds the number of [active time series](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-an-active-time-series),
then you need to identify and fix the source of [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate).
The most common source of high churn rate is a label, that frequently changes its value. Try avoiding such labels.
The [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer) can help identifying
The most common source of high churn rate is a label that frequently changes its value (like timestamp, session_id). **Try avoiding such labels.**
The [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer) can help identify
such labels.
1. Resource shortage. The [official Grafana dashboards for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
contain `resource usage` graphs, that show memory usage, CPU usage, disk IO usage and free disk size.
Make sure VictoriaMetrics has enough free resources for graceful handling of potential spikes in workload
contain `Resource usage` graphs that show memory usage, CPU usage, disk I/O usage, etc.
Make sure VictoriaMetrics has enough free resources for gracefully handling potential spikes in workload
according to the following recommendations:
- 50% of free CPU
@@ -262,52 +244,51 @@ These are the most commons reasons for slow data ingestion in VictoriaMetrics:
- 20% of free disk space
If VictoriaMetrics components have lower amounts of free resources, then this may lead
to **significant** performance degradation after workload increases slightly.
to **significant** performance degradation when workload increases slightly.
For example:
- If the percentage of free CPU is close to 0, then VictoriaMetrics
may experience arbitrary long delays during data ingestion when it cannot keep up
with slightly increased data ingestion rate.
may experience arbitrarily long delays during data ingestion, even with slight increases in ingestion rate.
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run,
may not have enough memory for [page cache](https://en.wikipedia.org/wiki/Page_cache).
VictoriaMetrics relies on page cache for quick queries over recently ingested data.
If the operating system has no enough free memory for page cache, then it needs
to re-read the requested data from disk. This may **significantly** increase disk read IO
- If the percentage of free memory reaches 0, then the Operating System where VictoriaMetrics components run
may not have enough memory for the [page cache](https://en.wikipedia.org/wiki/Page_cache).
VictoriaMetrics relies on the page cache for quick queries over recently ingested data.
If the operating system does not have enough free memory for the page cache, then it must
re-read the requested data from disk. This may **significantly** increase disk read I/O
and slow down both queries and data ingestion.
- If free disk space is lower than 20%, then VictoriaMetrics is unable to perform optimal
background merge of the incoming data. This leads to increased number of data files on disk,
that, in turn, slows down both data ingestion and querying. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#storage) for details.
- If free disk space is below 20%, then VictoriaMetrics may be unable to perform optimal
background merge of the incoming data. This results in more data files on disk.
That, in turn, slows down both data ingestion and querying. See [these docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#storage) for details.
1. If you run cluster version of VictoriaMetrics, then make sure `vminsert` and `vmstorage` components
are located in the same network with small network latency between them.
`vminsert` packs incoming data into batch packets and sends them to `vmstorage` one-by-one.
It waits until `vmstorage` returns back `ack` response before sending the next packet.
1. If you run the cluster version of VictoriaMetrics, then make sure `vminsert` and `vmstorage` components
are located in the same network with a low network latency between them.
`vminsert` packs incoming data into batch packets and sends them to `vmstorage` one by one.
It waits until `vmstorage` returns back an `ack` response before sending the next packet.
If the network latency between `vminsert` and `vmstorage` is high (for example, if they run in different datacenters),
then this may become limiting factor for data ingestion speed.
then this may become a limiting factor for data ingestion speed.
The [official Grafana dashboard for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
contain `connection saturation` graph for `vminsert` components. If this graph reaches 100% (1s),
The [official Grafana dashboard for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
contains a `connection saturation` panel for `vminsert` components. If this graph reaches 100% (1s),
then it is likely you have issues with network latency between `vminsert` and `vmstorage`.
Another possible issue for 100% connection saturation between `vminsert` and `vmstorage`
is resource shortage at `vmstorage` nodes. In this case you need to increase amounts
of available resources (CPU, RAM, disk IO) at `vmstorage` nodes or to add more `vmstorage` nodes to the cluster.
is a resource shortage in the `vmstorage` nodes. In this case, you need to increase the amount
of available resources (CPU, RAM, disk I/O) at `vmstorage` nodes or add more `vmstorage` nodes to the cluster.
1. Noisy neighbor. Make sure VictoriaMetrics components run in an environment without other resource-hungry apps.
Such apps may steal RAM, CPU, disk IO and network bandwidth, that is needed for VictoriaMetrics components.
Issues like this are very hard to catch via [official Grafana dashboard for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
Such apps may steal RAM, CPU, disk I/O, and network bandwidth that are needed for VictoriaMetrics components.
Issues like this are hard to catch via the [official Grafana dashboard for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
and proper diagnosis would require checking resource usage on the instances where VictoriaMetrics runs.
1. If you see `TooHighSlowInsertsRate` [alert](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring) when single-node VictoriaMetrics or `vmstorage` has enough
free CPU and RAM, then increase `-cacheExpireDuration` command-line flag at single-node VictoriaMetrics or at `vmstorage` to the value,
1. If you see a `TooHighSlowInsertsRate` [alert](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring) when single-node VictoriaMetrics or `vmstorage` has enough
free CPU and RAM, then increase the `-cacheExpireDuration` command-line flag at single-node VictoriaMetrics or at `vmstorage` to a value
that exceeds the interval between ingested samples for the same time series (aka `scrape_interval`).
See [this comment](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/3976#issuecomment-1476883183) for more details.
1. If you see constant and abnormally high CPU usage of VictoriaMetrics component, check `CPU spent on GC` panel
on the corresponding [Grafana dashboard](https://grafana.com/orgs/victoriametrics) in `Resource usage` section. If percentage of CPU time spent on garbage collection
is high, then CPU usage of the component can be reduced at the cost of higher memory usage by changing [GOGC](https://tip.golang.org/doc/gc-guide#GOGC) environment variable
to higher values. By default VictoriaMetrics components use `GOGC=30`. Try running VictoriaMetrics components with `GOGC=100` and see whether this helps reducing CPU usage.
1. If you see constant and abnormally high CPU usage for the VictoriaMetrics component, check the `CPU spent on GC` panel
on the corresponding [Grafana dashboard](https://grafana.com/orgs/victoriametrics) in the `Resource usage` section. If the percentage of CPU time spent on garbage collection
is high, then CPU usage of the component can be reduced at the cost of higher memory usage by increasing the [GOGC](https://tip.golang.org/doc/gc-guide#GOGC) environment variable.
By default, VictoriaMetrics components use `GOGC=30`. Try running VictoriaMetrics components with `GOGC=100` and see whether this helps reduce CPU usage.
Note that higher `GOGC` values may increase memory usage.
## Slow queries
@@ -316,40 +297,43 @@ Some queries may take more time and resources (CPU, RAM, network bandwidth) than
VictoriaMetrics logs slow queries if their execution time exceeds the duration passed
to `-search.logSlowQueryDuration` command-line flag (5s by default).
VictoriaMetrics provides [`top queries` page at VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#top-queries), that shows
queries that took the most time to execute.
VictoriaMetrics provides a [`top queries` page in VMUI](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#top-queries) that shows
the longest-running queries. And [Query execution stats](https://docs.victoriametrics.com/victoriametrics/query-stats/) for dumping slow queries
to logs.
These are the solutions that exist for improving performance of slow queries:
These are the solutions that exist for improving the performance of slow queries:
- Investigating the bottleneck in query execution using [query tracing](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing).
It will show the percentage of time spent on each execution step and help understand the volume of processed data.
- Adding more CPU and memory to VictoriaMetrics, so it may perform the slow query faster.
If you use cluster version of VictoriaMetrics, then migrating `vmselect` nodes to machines
with more CPU and RAM should help improving speed for slow queries. Query performance
is always limited by resources of one `vmselect` that processes the query. For example, if 2vCPU cores on `vmselect`
isn't enough to process query fast enough, then migrating `vmselect` to a machine with 4vCPU cores should increase heavy query performance by up to 2x.
If the line on `concurrent select` graph form the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
If you use the cluster version of VictoriaMetrics, then migrating `vmselect` nodes to machines
with more CPU and RAM should help improve speed for slow queries. Query performance
is always limited by the resources of **one** `vmselect` that processes the query. For example, if 2 vCPU cores on `vmselect`
can't process queries fast enough, then migrating `vmselect` to a machine with 4 vCPU cores should increase heavy query performance by up to 2x.
If the line on the `concurrent select` graph from the [official Grafana dashboard for VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#monitoring)
is close to the limit, then prefer adding more `vmselect` nodes to the cluster.
Sometimes adding more `vmstorage` nodes also can help improving the speed for slow queries.
Sometimes adding more `vmstorage` nodes can also help improve the speed for slow queries.
- Rewriting slow queries, so they become faster. Unfortunately it is hard determining
whether the given query is slow by just looking at it.
- Rewriting slow queries, so they become faster.
The main source of slow queries in practice is [alerting and recording rules](https://docs.victoriametrics.com/victoriametrics/vmalert/#rules)
with long lookbehind windows in square brackets. These queries are frequently used in SLI/SLO calculations such as [Sloth](https://github.com/slok/sloth).
For example, `avg_over_time(up[30d]) > 0.99` needs to read and process
all the [raw samples](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#raw-samples)
for `up` [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) over the last 30 days
each time it executes. If this query is executed frequently, then it can take significant share of CPU, disk read IO, network bandwidth and RAM.
for the `up` [time series](https://docs.victoriametrics.com/victoriametrics/keyconcepts/#time-series) over the last 30 days
each time it executes. If this query is executed frequently, it can take a significant share of CPU, disk read I/O, network bandwidth, and RAM.
Such queries can be optimized in the following ways:
- To reduce the lookbehind window in square brackets. For example, `avg_over_time(up[10d])` takes up to 3x less compute resources
- To reduce the look-behind window in square brackets. For example, `avg_over_time(up[10d])` takes up to 3x less compute resources
than `avg_over_time(up[30d])` at VictoriaMetrics.
- To increase evaluation interval for alerting and recording rules, so they are executed less frequently.
For example, increasing `-evaluationInterval` command-line flag value at [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/)
from `1m` to `2m` should reduce compute resource usage at VictoriaMetrics by 2x.
- To increase the evaluation interval for alerting and recording rules, so they are executed less frequently.
For example, increasing the `-evaluationInterval` command-line flag value at [vmalert](https://docs.victoriametrics.com/victoriametrics/vmalert/)
from `1m` to `2m` should reduce compute resource usage by VictoriaMetrics 2x.
Another source of slow queries is improper use of [subqueries](https://docs.victoriametrics.com/victoriametrics/metricsql/#subqueries).
It is recommended avoiding subqueries if you don't understand clearly how they work.
It is recommended to avoid subqueries if you don't clearly understand how they work.
It is easy to create a subquery without knowing about it.
For example, `rate(sum(some_metric))` is implicitly transformed into the following subquery
according to [implicit conversion rules for MetricsQL queries](https://docs.victoriametrics.com/victoriametrics/metricsql/#implicit-query-conversions):
@@ -365,67 +349,64 @@ These are the solutions that exist for improving performance of slow queries:
It is likely this query won't return the expected results. Instead, `sum(rate(some_metric))` must be used instead.
See [this article](https://www.robustperception.io/rate-then-sum-never-sum-then-rate/) for more details.
VictoriaMetrics provides [query tracing](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing) feature,
that can help determining the source of slow query.
See also [this article](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986),
that explains how to determine and optimize slow queries.
which explains how to identify and optimize slow queries.
## Out of memory errors
There are the following most common sources of out of memory (aka OOM) crashes in VictoriaMetrics:
The following are the most common sources of out-of-memory (aka OOM) crashes in VictoriaMetrics:
1. Improper command-line flag values. Inspect command-line flags passed to VictoriaMetrics components.
If you don't understand clearly the purpose or the effect of some flags - remove them
from the list of flags passed to VictoriaMetrics components. Improper command-line flags values
may lead to increased memory and CPU usage. The increased memory usage increases chances for OOM crashes.
VictoriaMetrics is optimized for running with default flag values (e.g. when they aren't set explicitly).
If you don't clearly understand the purpose or the effect of some flags, remove them
from the list of flags passed to VictoriaMetrics components. Improper command-line flag values
may lead to increased memory and CPU usage. Increased memory usage increases the risk of OOM crashes.
VictoriaMetrics is optimized to run with default flag values (e.g., when they aren't explicitly set).
For example, it isn't recommended tuning cache sizes in VictoriaMetrics, since it frequently leads to OOM exceptions.
[These docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning) refer command-line flags, that aren't
recommended to tune. If you see that VictoriaMetrics needs increasing some cache sizes for the current workload,
then it is better migrating to a host with more memory instead of trying to tune cache sizes manually.
For example, it isn't recommended to change cache sizes in VictoriaMetrics, as this frequently leads to OOM exceptions.
[These docs](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cache-tuning) refer to command-line flags that aren't
recommended to tune. If you see that VictoriaMetrics needs to increase some cache sizes for the current workload,
then it is better to migrate to a host with more memory instead of trying to tune cache sizes manually.
1. Unexpected heavy queries. The query is considered as heavy if it needs to select and process millions of unique time series.
Such query may lead to OOM exception, since VictoriaMetrics needs to keep some of per-series data in memory.
VictoriaMetrics provides [various settings](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits),
1. Unexpected heavy queries. The query is considered heavy if it needs to select and process millions of unique time series.
Such a query may cause an OOM exception, as VictoriaMetrics needs to keep some per-series data in memory.
VictoriaMetrics provides [various settings](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#resource-usage-limits)
that can help limit resource usage.
For more context, see [How to optimize PromQL and MetricsQL queries](https://valyala.medium.com/how-to-optimize-promql-and-metricsql-queries-85a1b75bf986).
VictoriaMetrics also provides [query tracer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#query-tracing)
to help identify the source of heavy query.
to help identify the source of heavy queries. Slow queries can be logged with additional details via [Query execution stats](https://docs.victoriametrics.com/victoriametrics/query-stats/).
1. Lack of free memory for processing workload spikes. If VictoriaMetrics components use almost all the available memory
under the current workload, then it is recommended migrating to a host with bigger amounts of memory.
under the current workload, then it is recommended to migrate to a host with larger amounts of memory.
This would protect from possible OOM crashes on workload spikes. It is recommended to have at least 50%
of free memory for graceful handling of possible workload spikes.
of free memory to gracefully handle possible workload spikes.
See [capacity planning for single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#capacity-planning)
and [capacity planning for cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning).
and [capacity planning for the cluster version of VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning).
## Cluster instability
VictoriaMetrics cluster may become unstable if there is no enough free resources (CPU, RAM, disk IO, network bandwidth)
The VictoriaMetrics cluster may become unstable if there are not enough free resources (CPU, RAM, disk I/O, network bandwidth)
for processing the current workload.
The most common sources of cluster instability are:
- Workload spikes. For example, if the number of active time series increases by 2x while
the cluster has no enough free resources for processing the increased workload,
then it may become unstable.
VictoriaMetrics provides various configuration settings, that can be used for limiting unexpected workload spikes.
the cluster does not have enough free resources for processing the increased workload, then it may become unstable.
VictoriaMetrics provides several configuration settings to limit unexpected workload spikes.
See [these docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#resource-usage-limits) for details.
- Various maintenance tasks such as rolling upgrades or rolling restarts during configuration changes.
- Various maintenance tasks, such as rolling upgrades or rolling restarts, during configuration changes.
For example, if a cluster contains `N=3` `vmstorage` nodes and they are restarted one-by-one (aka rolling restart),
then the cluster will have only `N-1=2` healthy `vmstorage` nodes during the rolling restart.
This means that the load on healthy `vmstorage` nodes increases by at least `100%/(N-1)=50%`
comparing to the load before rolling restart. E.g. they need to process 50% more incoming
compared to the load before rolling restart. E.g., they need to process 50% more incoming
data and to return 50% more data during queries. In reality, the load on the remaining `vmstorage`
nodes increases even more because they need to register new time series, that were re-routed
from temporarily unavailable `vmstorage` node. If `vmstorage` nodes had less than 50%
of free resources (CPU, RAM, disk IO) before the rolling restart, then it
nodes increases even more because they need to register new time series that were re-routed
from a temporarily unavailable `vmstorage` node. If `vmstorage` nodes had less than 50%
of free resources (CPU, RAM, disk I/O) before the rolling restart, then it
can lead to cluster overload and instability for both data ingestion and querying.
The workload increase during rolling restart can be reduced by increasing
the number of `vmstorage` nodes in the cluster. For example, if VictoriaMetrics cluster contains
the number of `vmstorage` nodes in the cluster. For example, if the VictoriaMetrics cluster contains
`N=11` `vmstorage` nodes, then the workload increase during rolling restart of `vmstorage` nodes
would be `100%/(N-1)=10%`. It is recommended to have at least 8 `vmstorage` nodes in the cluster.
The recommended number of `vmstorage` nodes should be multiplied by `-replicationFactor` if replication is enabled -
@@ -433,11 +414,11 @@ The most common sources of cluster instability are:
for details.
- Time series sharding. Received time series [are consistently sharded](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#architecture-overview)
by `vminsert` between configured `vmstorage` nodes. As a sharding key `vminsert` is using time series name and labels,
respecting their order. If the order of labels in time series is constantly changing, this could cause wrong sharding
calculation and result in un-even and sub-optimal time series distribution across available vmstorages. It is expected
that metrics pushing client is responsible for consistent labels order (like `Prometheus` or `vmagent` during scraping).
If this can't be guaranteed, set `-sortLabels=true` command-line flag to `vminsert`. Please note, sorting may increase
by `vminsert` between configured `vmstorage` nodes. As a sharding key, `vminsert` is using time series name and labels,
respecting their order. If the order of labels in a time series is constantly changing, this could cause wrong sharding
calculation and result in uneven and suboptimal time series distribution across available vmstorages. It is expected
that the client who is pushing metrics is responsible for consistent label order (like `Prometheus` or `vmagent` during scraping).
If this can't be guaranteed, set `-sortLabels=true` command-line flag to `vminsert`. Please note that sorting may increase
CPU usage for `vminsert`.
- Network instability between cluster components (`vminsert`, `vmselect`, `vmstorage`) may lead to increased error rates, timeouts, or degraded performance.
@@ -447,14 +428,14 @@ The most common sources of cluster instability are:
but can still cause transient network failures. In such cases, check CPU usage at the OS level with higher-resolution tools.
Consider increasing `-vmstorageDialTimeout` and `-rpc.handshakeTimeout`{{% available_from "v1.124.0" %}} to mitigate the effects of CPU spikes.
If resource usage looks normal but networking issues still occur, then the root cause is likely outside VictoriaMetrics.
If resource usage appears normal but networking issues persist, the root cause is likely outside VictoriaMetrics.
This may be caused by unreliable or congested network links, especially across availability zones or regions.
In multi-AZ setups, consider [a multi-level cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multi-level-cluster-setup) with region-local load balancers to reduce cross-zone connections.
If the network cannot be improved, increasing timeouts such as `-vmstorageDialTimeout`, `-rpc.handshakeTimeout`{{% available_from "v1.124.0" %}}, or `-search.maxQueueDuration` may help, but should be done cautiously, as higher timeouts can impact cluster stability in other ways.
Keep in mind that VictoriaMetrics assumes reliable networking between components. If the network is unstable, the overall cluster stability may degrade regardless of resource availability.
The obvious solution against VictoriaMetrics cluster instability is to make sure cluster components
have enough free resources for graceful processing of the increased workload.
The obvious solution to VictoriaMetrics cluster instability is to make sure cluster components
have sufficient free resources to handle the increased workload gracefully.
See [capacity planning docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#capacity-planning)
and [cluster resizing and scalability docs](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#cluster-resizing-and-scalability)
for details.
@@ -464,10 +445,10 @@ for details.
If too much disk space is used by a [single-node VictoriaMetrics](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) or by `vmstorage` component
at [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/), then please check the following:
- Make sure that there are no old snapshots, since they can occupy disk space. See [how to work with snapshots](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-work-with-snapshots)
, [snapshot troubleshooting](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#snapshot-troubleshooting) and [vmbackup troubleshooting](https://docs.victoriametrics.com/victoriametrics/vmbackup/#troubleshooting).
- Make sure that there are no old snapshots, since they can occupy disk space. See [how to work with snapshots](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#how-to-work-with-snapshots),
[snapshot troubleshooting](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#snapshot-troubleshooting) and [vmbackup troubleshooting](https://docs.victoriametrics.com/victoriametrics/vmbackup/#troubleshooting).
- Under normal conditions the size of `<-storageDataPath>/indexdb` folder must be smaller than the size of `<-storageDataPath>/data` folder, where `-storageDataPath`
- Under normal conditions, the size of `<-storageDataPath>/indexdb` folder must be smaller than the size of `<-storageDataPath>/data` folder, where `-storageDataPath`
is the corresponding command-line flag value. This can be checked by the following query if [VictoriaMetrics monitoring](#monitoring) is properly set up:
```metricsql
@@ -476,22 +457,22 @@ at [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cl
sum(vm_data_size_bytes{type=~"(storage|indexdb)/.+"}) without(type)
```
If this query returns values bigger than 0.5, then it is likely there is a [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate) issue,
that results in excess disk space usage for both `indexdb` and `data` folders under `-storageDataPath` folder.
The solution is to identify and fix the source of high churn rate with [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer).
If this query returns values greater than 0.5, then it is likely there is a [high churn rate](https://docs.victoriametrics.com/victoriametrics/faq/#what-is-high-churn-rate) issue,
that results in excess disk space usage for both the `indexdb` and `data` folders under the `-storageDataPath` folder.
The solution is to identify and fix the source of the high churn rate with the [cardinality explorer](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#cardinality-explorer).
## Monitoring
Having proper [monitoring](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#monitoring)
would help identify and prevent most of the issues listed above.
can help identify and prevent most of the issues listed above.
[Grafana dashboards](https://grafana.com/orgs/victoriametrics/dashboards) contain panels reflecting the
health state, resource usage and other specific metrics for VictoriaMetrics components.
health state, resource usage, and other specific metrics for VictoriaMetrics components.
The list of [recommended alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts)
for VictoriaMetrics components will notify about issues and provide recommendations for how to solve them.
Check the list of [recommended alerting rules](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#alerts)
for VictoriaMetrics components to receive notifications about issues and receive recommendations for resolving them.
Internally, we heavily rely both on dashboards and alerts, and constantly improve them.
Internally, we rely heavily on both dashboards and alerts, and we constantly improve them.
It is important to stay up to date with such changes.
@@ -500,11 +481,12 @@ It is important to stay up to date with such changes.
On some ZFS filesystems, mixing reads from memory-mapped files (`mmap`) with usage of the `mincore()` syscall can trigger a bug in the ZFS in-memory cache (ARC), potentially resulting in **data read corruption** in VictoriaMetrics processes. This scenario has been observed when VictoriaMetrics instances access data directories on ZFS.
Symptoms:
Note that the source code for the VictoriaMetrics cluster is located in [the cluster](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) branch.
- Unexpected read errors when accessing data on ZFS.
- Corrupted or inconsistent query results.
- Crashes or panics in storage/query components when reading from ZFS.
It could be mitigated with `--fs.disableMincore` flag:
It could be mitigated with the `--fs.disableMincore` flag:
```text
./bin/victoria-metrics --storageDataPath /path/to/zfs/data --fs.disableMincore

View File

@@ -31,10 +31,17 @@ See also [LTS releases](https://docs.victoriametrics.com/victoriametrics/lts-rel
* FEATURE: all VictoriaMetrics components: expose `process_cpu_seconds_total`, `process_resident_memory_bytes`, and other process-level metrics when running on macOS. See [metrics#75](https://github.com/VictoriaMetrics/metrics/issues/75).
* FEATURE: [dashboards/vmauth](https://grafana.com/grafana/dashboards/21394): add `Request body buffering duration` panel to the `Troubleshooting` section. This panel shows the time spent buffering incoming client request bodies, helping identify slow client uploads and potential concurrency issues. The panel is only available when `-requestBufferSize` is non-zero. See [#10309](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10309).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/), [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/), `vminsert` and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): enable [ingestion](https://docs.victoriametrics.com/victoriametrics/vmagent/#metric-metadata) and in-memory [storage](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#metrics-metadata) of metrics metadata by default. Metadata ingestion can be disabled with `-enableMetadata=false`. See [#2974](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/2974).
* FEATURE: [dashboards/operator](https://grafana.com/grafana/dashboards/17869): extract operator version from metrics instead of hardcoded value
* FEATURE: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): increase default value for `-storage.minFreeDiskSpaceBytes` flag from 10M to 100M to reduce risk of panics under high ingestion on small disks. See [#9561](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/9561).
* FEATURE: [vmagent](https://docs.victoriametrics.com/victoriametrics/vmagent/) and [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/): improve [InfluxDB ingestion](https://docs.victoriametrics.com/victoriametrics/integrations/influxdb/) parsing error message when a closing quote is missing for a quoted field value, by adding a hint that this may be caused by a raw newline (`\n`) inside the quoted field value. See [#10067](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067). Thanks to @hklhai for the contribution.
* FEATURE: [dashboards/alert-statistics](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/dashboards/alert-statistics.json): add a link to a specific alerting rule on the table of firing alerts. Thanks to @sias32.
* FEATURE: [alerts](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/rules): use `$externalURL` instead of `localhost` in the alerting rules. This should improve usability of the rules if `$externalURL` is correctly configured, without need to update rules annotations. Thanks to @sias32.
* BUGFIX: [vmsingle](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/) and `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): prevent panic `error parsing regexp: expression nests too deeply` triggered by large repetition ranges in regex, for example `{"__name__"=~"a{0,1000}"}`. See [VictoriaLogs#1112](https://github.com/VictoriaMetrics/VictoriaLogs/issues/1112).
* BUGFIX: [vmui](https://docs.victoriametrics.com/victoriametrics/single-server-victoriametrics/#vmui): fix escaping for label names with special characters. See [#10485](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10485).
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly search tenants for [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) query request. See [#10422](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10422).
* BUGFIX: `vmstorage` in [VictoriaMetrics cluster](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/): properly apply `extra_filters[]` filter when querying `vm_account_id` or `vm_project_id` labels via [multitenant](https://docs.victoriametrics.com/victoriametrics/cluster-victoriametrics/#multitenancy) request for `/api/v1/label/…/values` API. Before, `extra_filters` was ignored.
## [v1.136.0](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.136.0)

View File

@@ -113,6 +113,56 @@ curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'ht
```
An arbitrary number of lines delimited by '\n' (aka newline char) can be sent in a single request.
> **Note:** The above refers to newline characters (`\n`) used as line separators
> between multiple points. According to the [Influx Line Protocol specification](https://docs.influxdata.com/influxdb/v2/reference/syntax/line-protocol/),
> raw newline bytes **must not** appear inside quoted tag or field values. For example:
>
> ```text
> SystemProperties.line.separator="
> "
> ```
>
> A raw newline inside a quoted value will result in a parsing error.
>
> If metrics are generated by Telegraf and written to VictoriaMetrics, fields containing raw newline characters
> (such as `SystemProperties.line.separator`) can be pre-processed using a `regex` processor to escape newline
> characters before serialization. For example:
<details>
<summary><strong>telegraf.conf</strong></summary>
```toml
[agent]
interval = "60s"
flush_interval = "30s"
debug = true
[[inputs.jolokia2_agent]]
urls = ["http://springboot:8080/jolokia"]
name_prefix = "springBoot."
[[inputs.jolokia2_agent.metric]]
name = "JavaRuntime"
mbean = "java.lang:type=Runtime"
paths = ["SystemProperties"]
[[processors.regex]]
namepass = ["springBoot.JavaRuntime"]
[[processors.regex.fields]]
key = "SystemProperties.line.separator"
pattern = '\n'
replacement = '\\n'
[[outputs.influxdb]]
urls = ["http://victoriametrics:8428"]
skip_database_creation = true
```
</details>
> This ensures the newline is properly escaped and the metric can be ingested successfully.
After that the data may be read via [/api/v1/export](https://docs.victoriametrics.com/victoriametrics/#how-to-export-data-in-json-line-format) endpoint:
```sh
curl -G 'http://<victoriametrics-addr>:8428/api/v1/export' -d 'match={__name__=~"measurement_.*"}'

View File

@@ -429,9 +429,9 @@ See the docs at https://docs.victoriametrics.com/victoriametrics/vmalert/ .
-rule.evalDelay duration
Adjustment of the 'time' parameter for rule evaluation requests to compensate intentional data delay from the datasource. Normally, should be equal to '-search.latencyOffset' (cmd-line flag configured for VictoriaMetrics single-node or vmselect). This doesn't apply to groups with eval_offset specified. (default 30s)
-rule.maxResolveDuration duration
Limits the maxiMum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group
Limits the maximum duration for automatic alert expiration, which by default is 4 times evaluationInterval of the parent group
-rule.resendDelay duration
MiniMum amount of time to wait before resending an alert to notifier.
Minimum amount of time to wait before resending an alert to notifier.
-rule.resultsLimit int
Limits the number of alerts or recording results a single rule can produce. Can be overridden by the limit option under group if specified. If exceeded, the rule will be marked with an error and all its results will be discarded. 0 means no limit.
-rule.templates array

3
go.mod
View File

@@ -7,6 +7,7 @@ require (
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.21.0
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.1
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.4
github.com/RoaringBitmap/roaring/v2 v2.14.4
github.com/VictoriaMetrics/VictoriaLogs v0.0.0-20260218111324-95b48d57d032
github.com/VictoriaMetrics/easyproto v1.2.0
github.com/VictoriaMetrics/fastcache v1.13.3
@@ -73,6 +74,7 @@ require (
github.com/aws/smithy-go v1.24.0 // indirect
github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/bits-and-blooms/bitset v1.24.2 // indirect
github.com/clipperhouse/uax29/v2 v2.7.0 // indirect
github.com/cncf/xds/go v0.0.0-20260202195803-dba9d589def2 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.7 // indirect
@@ -106,6 +108,7 @@ require (
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee // indirect
github.com/mschoch/smat v0.2.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f // indirect
github.com/oklog/ulid/v2 v2.1.1 // indirect

6
go.sum
View File

@@ -52,6 +52,8 @@ github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapp
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping v0.55.0/go.mod h1:Mf6O40IAyB9zR/1J8nGDDPirZQQPbYJni8Yisy7NTMc=
github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY=
github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU=
github.com/RoaringBitmap/roaring/v2 v2.14.4 h1:4aKySrrg9G/5oRtJ3TrZLObVqxgQ9f1znCRBwEwjuVw=
github.com/RoaringBitmap/roaring/v2 v2.14.4/go.mod h1:oMvV6omPWr+2ifRdeZvVJyaz+aoEUopyv5iH0u/+wbY=
github.com/VictoriaMetrics/VictoriaLogs v0.0.0-20260218111324-95b48d57d032 h1:kKVeXC+HAcMeMLefoKCWf934y9MoLU8V3Da7k6WP4K8=
github.com/VictoriaMetrics/VictoriaLogs v0.0.0-20260218111324-95b48d57d032/go.mod h1:WQ8hGgfKx1lXCCcS1SJSOklN9fToSbshtvKHp3xsv4w=
github.com/VictoriaMetrics/easyproto v1.2.0 h1:FJT9uNXA2isppFuJErbLqD306KoFlehl7Wn2dg/6oIE=
@@ -120,6 +122,8 @@ github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3 h1:6df1vn4bBlDDo
github.com/bboreham/go-loser v0.0.0-20230920113527-fcc2c21820a3/go.mod h1:CIWtjkly68+yqLPbvwwR/fjNJA/idrtULjZWh2v1ys0=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/bits-and-blooms/bitset v1.24.2 h1:M7/NzVbsytmtfHbumG+K2bremQPMJuqv1JD3vOaFxp0=
github.com/bits-and-blooms/bitset v1.24.2/go.mod h1:7hO7Gc7Pp1vODcmWvKMRA9BNmbv6a/7QIWpPxHddWR8=
github.com/bmatcuk/doublestar/v4 v4.10.0 h1:zU9WiOla1YA122oLM6i4EXvGW62DvKZVxIe6TYWexEs=
github.com/bmatcuk/doublestar/v4 v4.10.0/go.mod h1:xBQ8jztBU6kakFMg+8WGxn0c6z1fTSPVIjEY1Wr7jzc=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
@@ -330,6 +334,8 @@ github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd/go.mod h1:6dJ
github.com/modern-go/reflect2 v1.0.2/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee h1:W5t00kpgFdJifH4BDsTlE89Zl93FEloxaWZfGcifgq8=
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/mschoch/smat v0.2.0 h1:8imxQsjDm8yFEAVBe7azKmKSgzSkZXDuKkSq9374khM=
github.com/mschoch/smat v0.2.0/go.mod h1:kc9mz7DoBKqDyiRL7VZN8KvXQMWeTaVnttLRXOlotKw=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/mwitkow/go-conntrack v0.0.0-20190716064945-2f068394615f h1:KUppIJq7/+SVif2QVs3tOP0zanoHgBEVAwHxUSIzRqU=

View File

@@ -358,7 +358,11 @@ func parseFieldValue(s string, uc *unmarshalContext) (float64, error) {
}
if uc.hasQuotedFields && s[0] == '"' {
if len(s) < 2 || s[len(s)-1] != '"' {
return 0, fmt.Errorf("missing closing quote for quoted field value %s", s)
// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/10067
return 0, fmt.Errorf("missing closing quote for quoted field value %s; "+
"this may be caused by a raw newline (`\\n`) inside the quoted field value",
s,
)
}
// Try converting quoted string to number, since sometimes InfluxDB agents
// send numbers as strings.

View File

@@ -2,6 +2,7 @@ package influx
import (
"reflect"
"strings"
"testing"
)
@@ -140,6 +141,25 @@ func TestRowsUnmarshalFailure(t *testing.T) {
f("GET /foo?bar=baz HTTP/1.0")
}
func TestParseFieldValue_MissingClosingQuoteWithRawNewlineHint(t *testing.T) {
uc := &unmarshalContext{
hasQuotedFields: true,
}
// Simulate the truncated value that happens
// after line splitting on raw newline
input := "\"hello"
_, err := parseFieldValue(input, uc)
if err == nil {
t.Fatalf("expected error for missing closing quote")
}
if !strings.Contains(err.Error(), "this may be caused by a raw newline") {
t.Fatalf("unexpected error message: %s", err)
}
}
func TestRowsUnmarshalSuccess(t *testing.T) {
f := func(s string, rowsExpected *Rows) {
t.Helper()

View File

@@ -5,6 +5,7 @@ import (
"fmt"
"io"
"reflect"
"runtime/debug"
"sort"
"strings"
"sync"
@@ -821,6 +822,11 @@ func sortLabels(labels []prompb.Label) {
func TestPutBigWriteRequestContext(t *testing.T) {
f := func(l, c, expectC int) {
t.Helper()
// disable GC here so the items in pool won't be recycled too fast. reset it after the test.
prevPercent := debug.SetGCPercent(-1)
defer debug.SetGCPercent(prevPercent)
// let's reset the whole pool first, as different test case could interfere
wctxPool = sync.Pool{}

View File

@@ -891,7 +891,7 @@ func (s *Storage) mustLoadNextDayMetricIDs(date uint64) *nextDayMetricIDs {
}
// Unmarshal uint64set
m, tail, err := unmarshalUint64Set(src)
m, tail, err := uint64set.Unmarshal(src)
if err != nil {
logger.Infof("discarding %s because cannot load uint64set: %s", path, err)
return e
@@ -931,7 +931,7 @@ func (s *Storage) mustLoadHourMetricIDs(hour uint64, name string) *hourMetricIDs
}
// Unmarshal uint64set
m, tail, err := unmarshalUint64Set(src)
m, tail, err := uint64set.Unmarshal(src)
if err != nil {
logger.Infof("discarding %s because cannot load uint64set: %s", path, err)
return hm
@@ -952,7 +952,7 @@ func (s *Storage) mustSaveNextDayMetricIDs(e *nextDayMetricIDs) {
dst = encoding.MarshalUint64(dst, e.date)
// Marshal metricIDs
dst = marshalUint64Set(dst, &e.metricIDs)
dst = e.metricIDs.Marshal(dst)
fs.MustWriteSync(path, dst)
}
@@ -965,37 +965,11 @@ func (s *Storage) mustSaveHourMetricIDs(hm *hourMetricIDs, name string) {
dst = encoding.MarshalUint64(dst, hm.hour)
// Marshal hm.m
dst = marshalUint64Set(dst, hm.m)
dst = hm.m.Marshal(dst)
fs.MustWriteSync(path, dst)
}
func unmarshalUint64Set(src []byte) (*uint64set.Set, []byte, error) {
mLen := encoding.UnmarshalUint64(src)
src = src[8:]
if uint64(len(src)) < 8*mLen {
return nil, nil, fmt.Errorf("cannot unmarshal uint64set; got %d bytes; want at least %d bytes", len(src), 8*mLen)
}
m := &uint64set.Set{}
for range mLen {
metricID := encoding.UnmarshalUint64(src)
src = src[8:]
m.Add(metricID)
}
return m, src, nil
}
func marshalUint64Set(dst []byte, m *uint64set.Set) []byte {
dst = encoding.MarshalUint64(dst, uint64(m.Len()))
m.ForEach(func(part []uint64) bool {
for _, metricID := range part {
dst = encoding.MarshalUint64(dst, metricID)
}
return true
})
return dst
}
func mustGetMinTimestampForCompositeIndex(metadataDir string, isEmptyDB bool) int64 {
path := filepath.Join(metadataDir, "minTimestampForCompositeIndex")
minTimestamp, err := loadMinTimestampForCompositeIndex(path)

View File

@@ -1,6 +1,7 @@
package uint64set
import (
"fmt"
"math/bits"
"slices"
"sort"
@@ -8,6 +9,7 @@ import (
"sync/atomic"
"unsafe"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding"
"github.com/VictoriaMetrics/VictoriaMetrics/lib/slicesutil"
)
@@ -38,6 +40,49 @@ func (s *bucket32Sorter) Swap(i, j int) {
a[i], a[j] = a[j], a[i]
}
// Unmarshal creates an instance of a set from bytes.
//
// The first 8 src bytes contain the set length (number of the elements in the
// set). Since each element is 8-byte long, the number of remaining src bytes
// must be at least 8*length, or else the function will return an error. The
// function will read exactly 8*length bytes and construct an instance of a
// set. The remaining src bytes will be returned along with the set.
func Unmarshal(src []byte) (*Set, []byte, error) {
if len(src) < 8 {
return nil, nil, fmt.Errorf("cannot unmarshal uint64set; got %d bytes; want at least 8 bytes", len(src))
}
sLen := encoding.UnmarshalUint64(src)
src = src[8:]
if uint64(len(src)) < 8*sLen {
return nil, nil, fmt.Errorf("cannot unmarshal uint64set; got %d bytes; want at least %d bytes", len(src), 8*sLen)
}
s := &Set{}
for range sLen {
e := encoding.UnmarshalUint64(src)
src = src[8:]
s.Add(e)
}
return s, src, nil
}
// Marshal encodes the set as a sequence of bytes.
//
// The first 8 bytes contain the length of the set (number of the elements the
// set contains). The subsequent bytes are actual uint64 elements.
//
// The marshaling result is appended to the end of dst, i.e. the initial dst
// content is not overwritten.
func (s *Set) Marshal(dst []byte) []byte {
dst = encoding.MarshalUint64(dst, uint64(s.Len()))
s.ForEach(func(part []uint64) bool {
for _, e := range part {
dst = encoding.MarshalUint64(dst, e)
}
return true
})
return dst
}
// Clone returns an independent copy of s.
func (s *Set) Clone() *Set {
if s == nil || s.itemsCount == 0 {

View File

@@ -1,6 +1,7 @@
package uint64set
import (
"encoding/binary"
"fmt"
"math/rand"
"reflect"
@@ -895,3 +896,115 @@ func TestSubtract(t *testing.T) {
f(a, b1, want1)
f(a, b2, want2)
}
func TestUnmarshal(t *testing.T) {
n := uint64(100_000)
src := make([]byte, (n+1)*8+10)
binary.BigEndian.PutUint64(src, n)
want := &Set{}
for i := range n {
binary.BigEndian.PutUint64(src[(i+1)*8:], i)
want.Add(i)
}
got, gotTail, err := Unmarshal(src)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !got.Equal(want) {
diff := cmp.Diff(want.AppendTo(nil), got.AppendTo(nil))
t.Fatalf("unexpected set (-want, +got):\n%s", diff)
}
wantTail := make([]byte, 10)
if diff := cmp.Diff(wantTail, gotTail); diff != "" {
t.Fatalf("unexpected tail bytes (-want, +got):\n%s", diff)
}
}
func TestUnmarshal_zeroLenSet(t *testing.T) {
src := make([]byte, 8)
want := &Set{}
got, gotTail, err := Unmarshal(src)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !got.Equal(want) {
diff := cmp.Diff(want.AppendTo(nil), got.AppendTo(nil))
t.Fatalf("unexpected set (-want, +got):\n%s", diff)
}
wantTail := []byte{}
if diff := cmp.Diff(wantTail, gotTail); diff != "" {
t.Fatalf("unexpected tail bytes (-want, +got):\n%s", diff)
}
}
func TestUnmarshal_tooShortToIncludeSetLen(t *testing.T) {
src := make([]byte, 7) // set length occupies 8 bytes.
got, gotTail, err := Unmarshal(src)
if err == nil {
t.Fatalf("expected error but got nil")
}
if got != nil {
t.Fatalf("unexpected nil set but got: %v", got.AppendTo(nil))
}
if gotTail != nil {
t.Fatalf("unexpected nil tail bytes but got: %v", gotTail)
}
}
func TestUnmarshal_numElementsLessThanLen(t *testing.T) {
n := uint64(10)
src := make([]byte, n*8) // contains only 9 elements instead of 10.
binary.BigEndian.PutUint64(src, n)
for i := range n - 1 {
binary.BigEndian.PutUint64(src[(i+1)*8:], i)
}
got, gotTail, err := Unmarshal(src)
if err == nil {
t.Fatalf("expected error but got nil")
}
if got != nil {
t.Fatalf("unexpected nil set but got: %v", got.AppendTo(nil))
}
if gotTail != nil {
t.Fatalf("unexpected nil tail bytes but got: %v", gotTail)
}
}
func TestMarshal_emptyDst(t *testing.T) {
n := uint64(100_000)
want := make([]byte, (n+1)*8)
binary.BigEndian.PutUint64(want, n)
s := &Set{}
for i := range n {
binary.BigEndian.PutUint64(want[(i+1)*8:], i)
s.Add(i)
}
got := s.Marshal(nil)
if diff := cmp.Diff(want, got); diff != "" {
t.Fatalf("unexpected bytes (-want, +got):\n%s", diff)
}
}
func TestMarshal_nonEmptyDst(t *testing.T) {
n := uint64(100_000)
got := make([]byte, 10)
want := make([]byte, 10+(n+1)*8)
for i := range 10 {
got[i] = byte(i)
want[i] = byte(i)
}
binary.BigEndian.PutUint64(want[10:], n)
s := &Set{}
for i := range n {
binary.BigEndian.PutUint64(want[10+(i+1)*8:], i)
s.Add(i)
}
got = s.Marshal(got)
if diff := cmp.Diff(want, got); diff != "" {
t.Fatalf("unexpected bytes (-want, +got):\n%s", diff)
}
}

View File

@@ -5,14 +5,14 @@ import (
"testing"
"time"
"github.com/RoaringBitmap/roaring/v2/roaring64"
"github.com/valyala/fastrand"
)
func BenchmarkAddMulti(b *testing.B) {
for _, itemsCount := range []int{1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7} {
start := uint64(time.Now().UnixNano())
sa := createRangeSet(start, itemsCount)
a := sa.AppendTo(nil)
a := createRangeSet(start, itemsCount).ToArray()
b.Run(fmt.Sprintf("items_%d", itemsCount), func(b *testing.B) {
benchmarkAddMulti(b, a)
})
@@ -22,8 +22,7 @@ func BenchmarkAddMulti(b *testing.B) {
func BenchmarkAdd(b *testing.B) {
for _, itemsCount := range []int{1e3, 1e4, 1e5, 1e6, 1e7} {
start := uint64(time.Now().UnixNano())
sa := createRangeSet(start, itemsCount)
a := sa.AppendTo(nil)
a := createRangeSet(start, itemsCount).ToArray()
b.Run(fmt.Sprintf("items_%d", itemsCount), func(b *testing.B) {
benchmarkAdd(b, a)
})
@@ -68,7 +67,7 @@ func benchmarkAdd(b *testing.B, a []uint64) {
b.SetBytes(int64(len(a)))
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var s Set
s := roaring64.New()
for _, x := range a {
s.Add(x)
}
@@ -81,26 +80,26 @@ func benchmarkAddMulti(b *testing.B, a []uint64) {
b.SetBytes(int64(len(a)))
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var s Set
s := roaring64.New()
n := 0
for n < len(a) {
m := min(n+64, len(a))
s.AddMulti(a[n:m])
s.AddMany(a[n:m])
n = m
}
}
})
}
func benchmarkUnion(b *testing.B, sa, sb *Set) {
func benchmarkUnion(b *testing.B, sa, sb *roaring64.Bitmap) {
b.ReportAllocs()
b.SetBytes(int64(sa.Len() + sb.Len()))
b.SetBytes(int64(sa.Stats().Cardinality + sb.Stats().Cardinality))
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
saCopy := sa.Clone()
sbCopy := sb.Clone()
saCopy.Union(sb)
sbCopy.Union(sa)
saCopy.Or(sb)
sbCopy.Or(sa)
}
})
}
@@ -138,15 +137,15 @@ func BenchmarkIntersectFullOverlap(b *testing.B) {
}
}
func benchmarkIntersect(b *testing.B, sa, sb *Set) {
func benchmarkIntersect(b *testing.B, sa, sb *roaring64.Bitmap) {
b.ReportAllocs()
b.SetBytes(int64(sa.Len() + sb.Len()))
b.SetBytes(int64(sa.Stats().Cardinality + sb.Stats().Cardinality))
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
saCopy := sa.Clone()
sbCopy := sb.Clone()
saCopy.Intersect(sb)
sbCopy.Intersect(sa)
saCopy.And(sb)
sbCopy.And(sa)
}
})
}
@@ -156,10 +155,10 @@ func BenchmarkSubtract(b *testing.B) {
sa := createRangeSet(startA, int(itemsCountA))
sb := createRangeSet(startB, int(itemsCountB))
b.ReportAllocs()
b.SetBytes(int64(sa.Len() + sb.Len()))
b.SetBytes(int64(sa.Stats().Cardinality + sb.Stats().Cardinality))
for b.Loop() {
saCopy := sa.Clone()
saCopy.Subtract(sb)
saCopy.AndNot(sb)
}
}
@@ -211,13 +210,13 @@ func BenchmarkSubtract(b *testing.B) {
}
}
func createRangeSet(start uint64, itemsCount int) *Set {
var s Set
func createRangeSet(start uint64, itemsCount int) *roaring64.Bitmap {
s := roaring64.New()
for i := range itemsCount {
n := start + uint64(i)
s.Add(n)
}
return &s
return s
}
func BenchmarkSetAddRandomLastBits(b *testing.B) {
@@ -231,7 +230,7 @@ func BenchmarkSetAddRandomLastBits(b *testing.B) {
var rng fastrand.RNG
for pb.Next() {
start := uint64(time.Now().UnixNano())
var s Set
s := roaring64.New()
for range int(itemsCount) {
n := start | (uint64(rng.Uint32()) & mask)
s.Add(n)
@@ -273,7 +272,7 @@ func BenchmarkSetAddWithAllocs(b *testing.B) {
for pb.Next() {
start := uint64(time.Now().UnixNano())
end := start + itemsCount
var s Set
s := roaring64.New()
n := start
for n < end {
s.Add(n)
@@ -357,13 +356,13 @@ func BenchmarkSetHasHitRandomLastBits(b *testing.B) {
mask := (uint64(1) << lastBits) - 1
b.Run(fmt.Sprintf("lastBits_%d", lastBits), func(b *testing.B) {
start := uint64(time.Now().UnixNano())
var s Set
s := roaring64.New()
var rng fastrand.RNG
for range int(itemsCount) {
n := start | (uint64(rng.Uint32()) & mask)
s.Add(n)
}
a := s.AppendTo(nil)
a := s.ToArray()
b.ResetTimer()
b.ReportAllocs()
@@ -371,7 +370,7 @@ func BenchmarkSetHasHitRandomLastBits(b *testing.B) {
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
for _, n := range a {
if !s.Has(n) {
if !s.Contains(n) {
panic("unexpected miss")
}
}
@@ -415,7 +414,7 @@ func BenchmarkSetHasHit(b *testing.B) {
b.Run(fmt.Sprintf("items_%d", itemsCount), func(b *testing.B) {
start := uint64(time.Now().UnixNano())
end := start + itemsCount
var s Set
s := roaring64.New()
n := start
for n < end {
s.Add(n)
@@ -429,7 +428,7 @@ func BenchmarkSetHasHit(b *testing.B) {
for pb.Next() {
n := start
for n < end {
if !s.Has(n) {
if !s.Contains(n) {
panic("unexpected miss")
}
n++
@@ -475,7 +474,7 @@ func BenchmarkSetHasMiss(b *testing.B) {
b.Run(fmt.Sprintf("items_%d", itemsCount), func(b *testing.B) {
start := uint64(time.Now().UnixNano())
end := start + itemsCount
var s Set
s := roaring64.New()
n := start
for n < end {
s.Add(n)
@@ -490,7 +489,7 @@ func BenchmarkSetHasMiss(b *testing.B) {
n := end
nEnd := end + itemsCount
for n < nEnd {
if s.Has(n) {
if s.Contains(n) {
panic("unexpected hit")
}
n++
@@ -531,3 +530,62 @@ func BenchmarkMapHasMiss(b *testing.B) {
})
}
}
func BenchmarkSizeBytes_uint64slice(b *testing.B) {
benchmarkSizeBytes(b, func(start, n, step uint64) uint64 {
s := []uint64{}
for i := range n {
v := start + i*step
s = append(s, v)
}
return uint64(len(s) * 8)
})
}
func BenchmarkSizeBytes_uint64set(b *testing.B) {
benchmarkSizeBytes(b, func(start, n, step uint64) uint64 {
s := &Set{}
for i := range n {
v := start + i*step
s.Add(v)
}
return s.SizeBytes()
})
}
func BenchmarkSizeBytes_roaring(b *testing.B) {
benchmarkSizeBytes(b, func(start, n, step uint64) uint64 {
s := roaring64.New()
for i := range n {
v := start + i*step
s.Add(v)
}
stats := s.Stats()
sizeBytes := stats.ArrayContainerBytes
sizeBytes += stats.BitmapContainerBytes
sizeBytes += stats.RunContainerBytes
return sizeBytes
})
}
func benchmarkSizeBytes(b *testing.B, sizeBytesFunc func(start, n, step uint64) uint64) {
f := func(b *testing.B, n, step uint64) {
start := uint64(time.Now().UnixNano())
var sizeBytes uint64
for b.Loop() {
sizeBytes = sizeBytesFunc(start, n, step)
}
b.ReportAllocs()
b.ReportMetric(float64(sizeBytes), "bytes")
}
for _, n := range []uint64{15_000_000} {
for _, step := range []uint64{1, 10, 100, 1e3, 1e4, 1e5, 1e6} {
name := fmt.Sprintf("%d/%d", n, step)
b.Run(name, func(b *testing.B) {
f(b, n, step)
})
}
}
}

19
vendor/github.com/RoaringBitmap/roaring/v2/.drone.yml generated vendored Normal file
View File

@@ -0,0 +1,19 @@
kind: pipeline
name: default
workspace:
base: /go
path: src/github.com/RoaringBitmap/roaring
steps:
- name: test
image: golang
commands:
- go get -t
- go test
- go build -tags appengine
- go test -tags appengine
- GOARCH=386 go build
- GOARCH=386 go test
- GOARCH=arm go build
- GOARCH=arm64 go build

View File

@@ -0,0 +1,5 @@
*~
roaring-fuzz.zip
workdir
coverage.out
testdata/all3.classic

View File

11
vendor/github.com/RoaringBitmap/roaring/v2/AUTHORS generated vendored Normal file
View File

@@ -0,0 +1,11 @@
# This is the official list of roaring authors for copyright purposes.
Todd Gruben (@tgruben),
Daniel Lemire (@lemire),
Elliot Murphy (@statik),
Bob Potter (@bpot),
Tyson Maly (@tvmaly),
Will Glynn (@willglynn),
Brent Pedersen (@brentp)
Maciej Biłas (@maciej),
Joe Nall (@joenall)

View File

@@ -0,0 +1,18 @@
# This is the official list of roaring contributors
Todd Gruben (@tgruben),
Daniel Lemire (@lemire),
Elliot Murphy (@statik),
Bob Potter (@bpot),
Tyson Maly (@tvmaly),
Will Glynn (@willglynn),
Brent Pedersen (@brentp),
Jason E. Aten (@glycerine),
Vali Malinoiu (@0x4139),
Forud Ghafouri (@fzerorubigd),
Joe Nall (@joenall),
(@fredim),
Edd Robinson (@e-dard),
Alexander Petrov (@alldroll),
Guy Molinari (@guymolinari),
Ling Jin (@JinLingChristopher)

235
vendor/github.com/RoaringBitmap/roaring/v2/LICENSE generated vendored Normal file
View File

@@ -0,0 +1,235 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2016 by the authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
================================================================================
Portions of runcontainer.go are from the Go standard library, which is licensed
under:
Copyright (c) 2009 The Go Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2016 by the authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

432
vendor/github.com/RoaringBitmap/roaring/v2/README.md generated vendored Normal file
View File

@@ -0,0 +1,432 @@
# roaring
[![GoDoc](https://godoc.org/github.com/RoaringBitmap/roaring?status.svg)](https://godoc.org/github.com/RoaringBitmap/roaring) [![Go Report Card](https://goreportcard.com/badge/RoaringBitmap/roaring)](https://goreportcard.com/report/github.com/RoaringBitmap/roaring)
![Go-CI](https://github.com/RoaringBitmap/roaring/workflows/Go-CI/badge.svg)
![Go-ARM-CI](https://github.com/RoaringBitmap/roaring/workflows/Go-ARM-CI/badge.svg)
![Go-Windows-CI](https://github.com/RoaringBitmap/roaring/workflows/Go-Windows-CI/badge.svg)
=============
This is a go version of the Roaring bitmap data structure.
Roaring bitmaps are used by several major systems such as [Apache Lucene][lucene] and derivative systems such as [Solr][solr] and
[Elasticsearch][elasticsearch], [Apache Druid (Incubating)][druid], [LinkedIn Pinot][pinot], [Netflix Atlas][atlas], [Apache Spark][spark], [OpenSearchServer][opensearchserver], [anacrolix/torrent][anacrolix/torrent], [Whoosh][whoosh], [Redpanda](https://github.com/redpanda-data/redpanda), [Pilosa][pilosa], [Microsoft Visual Studio Team Services (VSTS)][vsts], and eBay's [Apache Kylin][kylin]. The YouTube SQL Engine, [Google Procella](https://research.google/pubs/pub48388/), uses Roaring bitmaps for indexing.
[lucene]: https://lucene.apache.org/
[solr]: https://lucene.apache.org/solr/
[elasticsearch]: https://www.elastic.co/products/elasticsearch
[druid]: https://druid.apache.org/
[spark]: https://spark.apache.org/
[opensearchserver]: http://www.opensearchserver.com
[anacrolix/torrent]: https://github.com/anacrolix/torrent
[whoosh]: https://bitbucket.org/mchaput/whoosh/wiki/Home
[pilosa]: https://www.pilosa.com/
[kylin]: http://kylin.apache.org/
[pinot]: http://github.com/linkedin/pinot/wiki
[vsts]: https://www.visualstudio.com/team-services/
[atlas]: https://github.com/Netflix/atlas
[quanta]: https://github.com/disney/quanta
Roaring bitmaps are found to work well in many important applications:
> Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods ([Wang et al., SIGMOD 2017](http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf))
The ``roaring`` Go library is used by
* [anacrolix/torrent]
* [InfluxDB](https://www.influxdata.com)
* [Pilosa](https://www.pilosa.com/)
* [Bleve](http://www.blevesearch.com)
* [Weaviate](https://github.com/weaviate/weaviate)
* [lindb](https://github.com/lindb/lindb)
* [Elasticell](https://github.com/deepfabric/elasticell)
* [SourceGraph](https://github.com/sourcegraph/sourcegraph)
* [M3](https://github.com/m3db/m3)
* [trident](https://github.com/NetApp/trident)
* [Husky](https://www.datadoghq.com/blog/engineering/introducing-husky/)
* [FrostDB](https://github.com/polarsignals/frostdb)
* [Disney Quanta](https://github.com/disney/quanta)
This library is used in production in several systems, it is part of the [Awesome Go collection](https://awesome-go.com).
There are also [Java](https://github.com/RoaringBitmap/RoaringBitmap) and [C/C++](https://github.com/RoaringBitmap/CRoaring) versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps
from a Java program and load them back in Go, and vice versa. We have a [format specification](https://github.com/RoaringBitmap/RoaringFormatSpec).
This code is licensed under Apache License, Version 2.0 (ASL2.0).
Copyright 2016-... by the authors.
When should you use a bitmap?
===================================
Sets are a fundamental abstraction in
software. They can be implemented in various
ways, as hash sets, as trees, and so forth.
In databases and search engines, sets are often an integral
part of indexes. For example, we may need to maintain a set
of all documents or rows (represented by numerical identifier)
that satisfy some property. Besides adding or removing
elements from the set, we need fast functions
to compute the intersection, the union, the difference between sets, and so on.
To implement a set
of integers, a particularly appealing strategy is the
bitmap (also called bitset or bit vector). Using n bits,
we can represent any set made of the integers from the range
[0,n): the ith bit is set to one if integer i is present in the set.
Commodity processors use words of W=32 or W=64 bits. By combining many such words, we can
support large values of n. Intersections, unions and differences can then be implemented
as bitwise AND, OR and ANDNOT operations.
More complicated set functions can also be implemented as bitwise operations.
When the bitset approach is applicable, it can be orders of
magnitude faster than other possible implementation of a set (e.g., as a hash set)
while using several times less memory.
However, a bitset, even a compressed one is not always applicable. For example, if
you have 1000 random-looking integers, then a simple array might be the best representation.
We refer to this case as the "sparse" scenario.
When should you use compressed bitmaps?
===================================
An uncompressed BitSet can use a lot of memory. For example, if you take a BitSet
and set the bit at position 1,000,000 to true and you have just over 100kB. That is over 100kB
to store the position of one bit. This is wasteful even if you do not care about memory:
suppose that you need to compute the intersection between this BitSet and another one
that has a bit at position 1,000,001 to true, then you need to go through all these zeroes,
whether you like it or not. That can become very wasteful.
This being said, there are definitively cases where attempting to use compressed bitmaps is wasteful.
For example, if you have a small universe size. E.g., your bitmaps represent sets of integers
from [0,n) where n is small (e.g., n=64 or n=128). If you can use uncompressed BitSet and
it does not blow up your memory usage, then compressed bitmaps are probably not useful
to you. In fact, if you do not need compression, then a BitSet offers remarkable speed.
The sparse scenario is another use case where compressed bitmaps should not be used.
Keep in mind that random-looking data is usually not compressible. E.g., if you have a small set of
32-bit random integers, it is not mathematically possible to use far less than 32 bits per integer,
and attempts at compression can be counterproductive.
How does Roaring compares with the alternatives?
==================================================
Most alternatives to Roaring are part of a larger family of compressed bitmaps that are run-length-encoded
bitmaps. They identify long runs of 1s or 0s and they represent them with a marker word.
If you have a local mix of 1s and 0, you use an uncompressed word.
There are many formats in this family:
* Oracle's BBC is an obsolete format at this point: though it may provide good compression,
it is likely much slower than more recent alternatives due to excessive branching.
* WAH is a patented variation on BBC that provides better performance.
* Concise is a variation on the patented WAH. It some specific instances, it can compress
much better than WAH (up to 2x better), but it is generally slower.
* EWAH is both free of patent, and it is faster than all the above. On the downside, it
does not compress quite as well. It is faster because it allows some form of "skipping"
over uncompressed words. So though none of these formats are great at random access, EWAH
is better than the alternatives.
There is a big problem with these formats however that can hurt you badly in some cases: there is no random access. If you want to check whether a given value is present in the set, you have to start from the beginning and "uncompress" the whole thing. This means that if you want to intersect a big set with a large set, you still have to uncompress the whole big set in the worst case...
Roaring solves this problem. It works in the following manner. It divides the data into chunks of 2<sup>16</sup> integers
(e.g., [0, 2<sup>16</sup>), [2<sup>16</sup>, 2 x 2<sup>16</sup>), ...). Within a chunk, it can use an uncompressed bitmap, a simple list of integers,
or a list of runs. Whatever format it uses, they all allow you to check for the presence of any one value quickly
(e.g., with a binary search). The net result is that Roaring can compute many operations much faster than run-length-encoded
formats like WAH, EWAH, Concise... Maybe surprisingly, Roaring also generally offers better compression ratios.
### References
- Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 [arXiv:1709.07821](https://arxiv.org/abs/1709.07821)
- Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin,
Better bitmap performance with Roaring bitmaps,
Software: Practice and Experience 46 (5), 2016.[arXiv:1402.6407](http://arxiv.org/abs/1402.6407) This paper used data from http://lemire.me/data/realroaring2014.html
- Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. [arXiv:1603.06549](http://arxiv.org/abs/1603.06549)
### Dependencies
Dependencies are fetched automatically by giving the `-t` flag to `go get`.
they include
- github.com/bits-and-blooms/bitset
- github.com/mschoch/smat
- github.com/glycerine/go-unsnap-stream
- github.com/philhofer/fwd
- github.com/jtolds/gls
Note that the smat library requires Go 1.15 or better.
#### Installation
- go get -t github.com/RoaringBitmap/roaring
### Instructions for contributors
Using bash or other common shells:
```
$ git clone git@github.com:RoaringBitmap/roaring.git
$ export GO111MODULE=on
$ go mod tidy
$ go test -v
```
### Example
Here is a simplified but complete example:
```go
package main
import (
"fmt"
"github.com/RoaringBitmap/roaring/v2"
"bytes"
)
func main() {
// example inspired by https://github.com/fzandona/goroar
fmt.Println("==roaring==")
rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
fmt.Println(rb1.String())
rb2 := roaring.BitmapOf(3, 4, 1000)
fmt.Println(rb2.String())
rb3 := roaring.New()
fmt.Println(rb3.String())
fmt.Println("Cardinality: ", rb1.GetCardinality())
fmt.Println("Contains 3? ", rb1.Contains(3))
rb1.And(rb2)
rb3.Add(1)
rb3.Add(5)
rb3.Or(rb1)
// computes union of the three bitmaps in parallel using 4 workers
roaring.ParOr(4, rb1, rb2, rb3)
// computes intersection of the three bitmaps in parallel using 4 workers
roaring.ParAnd(4, rb1, rb2, rb3)
// prints 1, 3, 4, 5, 1000
i := rb3.Iterator()
for i.HasNext() {
fmt.Println(i.Next())
}
fmt.Println()
// next we include an example of serialization
buf := new(bytes.Buffer)
rb1.WriteTo(buf) // we omit error handling
newrb:= roaring.New()
newrb.ReadFrom(buf)
if rb1.Equals(newrb) {
fmt.Println("I wrote the content to a byte stream and read it back.")
}
// you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}
```
If you wish to use serialization and handle errors, you might want to
consider the following sample of code:
```go
rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
buf := new(bytes.Buffer)
size,err:=rb.WriteTo(buf)
if err != nil {
fmt.Println("Failed writing") // return or panic
}
newrb:= New()
size,err=newrb.ReadFrom(buf)
if err != nil {
fmt.Println("Failed reading") // return or panic
}
// if buf is an untrusted source, you should validate the result
// (this adds a bit of complexity but it is necessary for security)
if newrb.Validate() != nil {
fmt.Println("Failed validation") // return or panic
}
if ! rb.Equals(newrb) {
fmt.Println("Cannot retrieve serialized version")
}
```
Given N integers in [0,x), then the serialized size in bytes of
a Roaring bitmap should never exceed this bound:
`` 8 + 9 * ((long)x+65535)/65536 + 2 * N ``
That is, given a fixed overhead for the universe size (x), Roaring
bitmaps never use more than 2 bytes per integer. You can call
``BoundSerializedSizeInBytes`` for a more precise estimate.
### 64-bit Roaring
By default, roaring is used to stored unsigned 32-bit integers. However, we also offer
an extension dedicated to 64-bit integers. It supports roughly the same functions:
```go
package main
import (
"fmt"
"github.com/RoaringBitmap/roaring/v2/roaring64"
"bytes"
)
func main() {
// example inspired by https://github.com/fzandona/goroar
fmt.Println("==roaring64==")
rb1 := roaring64.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
fmt.Println(rb1.String())
rb2 := roaring64.BitmapOf(3, 4, 1000)
fmt.Println(rb2.String())
rb3 := roaring64.New()
fmt.Println(rb3.String())
fmt.Println("Cardinality: ", rb1.GetCardinality())
fmt.Println("Contains 3? ", rb1.Contains(3))
rb1.And(rb2)
rb3.Add(1)
rb3.Add(5)
rb3.Or(rb1)
// prints 1, 3, 4, 5, 1000
i := rb3.Iterator()
for i.HasNext() {
fmt.Println(i.Next())
}
fmt.Println()
// next we include an example of serialization
buf := new(bytes.Buffer)
rb1.WriteTo(buf) // we omit error handling
newrb:= roaring64.New()
newrb.ReadFrom(buf)
if rb1.Equals(newrb) {
fmt.Println("I wrote the content to a byte stream and read it back.")
}
// you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}
```
Only the 32-bit roaring format is standard and cross-operable between Java, C++, C and Go. There is no guarantee that the 64-bit versions are compatible.
### Documentation
Current documentation is available at https://pkg.go.dev/github.com/RoaringBitmap/roaring and https://pkg.go.dev/github.com/RoaringBitmap/roaring/roaring64
### Goroutine safety
In general, it should not generally be considered safe to access
the same bitmaps using different goroutines--they are left
unsynchronized for performance. Should you want to access
a Bitmap from more than one goroutine, you should
provide synchronization. Typically this is done by using channels to pass
the *Bitmap around (in Go style; so there is only ever one owner),
or by using `sync.Mutex` to serialize operations on Bitmaps.
### Coverage
We test our software. For a report on our test coverage, see
https://coveralls.io/github/RoaringBitmap/roaring?branch=master
### Benchmark
Type
go test -bench Benchmark -run -
To run benchmarks on [Real Roaring Datasets](https://github.com/RoaringBitmap/real-roaring-datasets)
run the following:
```sh
go get github.com/RoaringBitmap/real-roaring-datasets
BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -
```
### Iterative use
You can use roaring with gore:
- go install github.com/x-motemen/gore/cmd/gore@latest
- Make sure that ``$GOPATH/bin`` is in your ``$PATH``.
```go
$ gore
gore version 0.2.6 :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"
```
### Fuzzy testing
You can help us test further the library with fuzzy testing:
go get github.com/dvyukov/go-fuzz/go-fuzz
go get github.com/dvyukov/go-fuzz/go-fuzz-build
go test -tags=gofuzz -run=TestGenerateSmatCorpus
go-fuzz-build github.com/RoaringBitmap/roaring
go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200 -func FuzzSmat
Let it run, and if the # of crashers is > 0, check out the reports in
the workdir where you should be able to find the panic goroutine stack
traces.
You may also replace `-func FuzzSmat` by `-func FuzzSerializationBuffer` or `-func FuzzSerializationStream`.
### Alternative in Go
There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring
For an alternative implementation in Go, see https://github.com/fzandona/goroar
The two versions were written independently.
### Mailing list/discussion group
https://groups.google.com/g/roaring-bitmaps
## Stars
[![Star History Chart](https://api.star-history.com/svg?repos=RoaringBitmap/roaring&type=Date)](https://www.star-history.com/#RoaringBitmap/roaring&Date)
### Further reading
<p>Mastering Programming: From Testing to Performance in Go</p>
<div><a href="https://www.amazon.com/dp/B0FMPGSWR5"><img style="margin-left: auto; margin-right: auto;" src="https://m.media-amazon.com/images/I/61feneHS7kL._SL1499_.jpg" alt="" width="250px" /></a></div>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

19
vendor/github.com/RoaringBitmap/roaring/v2/clz.go generated vendored Normal file
View File

@@ -0,0 +1,19 @@
//go:build go1.9
// +build go1.9
// "go1.9", from Go version 1.9 onward
// See https://golang.org/pkg/go/build/#hdr-Build_Constraints
package roaring
import "math/bits"
// countLeadingOnes returns the number of leading zeros bits in x; the result is 64 for x == 0.
func countLeadingZeros(x uint64) int {
return bits.LeadingZeros64(x)
}
// countLeadingOnes returns the number of leading ones bits in x; the result is 0 for x == 0.
func countLeadingOnes(x uint64) int {
return bits.LeadingZeros64(^x)
}

View File

@@ -0,0 +1,37 @@
//go:build !go1.9
// +build !go1.9
package roaring
// LeadingZeroBits returns the number of consecutive most significant zero
// bits of x.
func countLeadingZeros(i uint64) int {
if i == 0 {
return 64
}
n := 1
x := uint32(i >> 32)
if x == 0 {
n += 32
x = uint32(i)
}
if (x >> 16) == 0 {
n += 16
x <<= 16
}
if (x >> 24) == 0 {
n += 8
x <<= 8
}
if x>>28 == 0 {
n += 4
x <<= 4
}
if x>>30 == 0 {
n += 2
x <<= 2
}
n -= int(x >> 31)
return n
}

21
vendor/github.com/RoaringBitmap/roaring/v2/ctz.go generated vendored Normal file
View File

@@ -0,0 +1,21 @@
//go:build go1.9
// +build go1.9
// "go1.9", from Go version 1.9 onward
// See https://golang.org/pkg/go/build/#hdr-Build_Constraints
package roaring
import "math/bits"
// countTrailingZeros returns the number of trailing zero bits in x; the result is 64 for x == 0.
func countTrailingZeros(x uint64) int {
return bits.TrailingZeros64(x)
}
// countTrailingOnes returns the number of trailing one bits in x
// The result is 64 for x == 9,223,372,036,854,775,807.
// The result is 0 for x == 0.
func countTrailingOnes(x uint64) int {
return bits.TrailingZeros64(^x)
}

View File

@@ -0,0 +1,72 @@
//go:build !go1.9
// +build !go1.9
package roaring
// Reuse of portions of go/src/math/big standard lib code
// under this license:
/*
Copyright (c) 2009 The Go Authors. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
const deBruijn32 = 0x077CB531
var deBruijn32Lookup = []byte{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9,
}
const deBruijn64 = 0x03f79d71b4ca8b09
var deBruijn64Lookup = []byte{
0, 1, 56, 2, 57, 49, 28, 3, 61, 58, 42, 50, 38, 29, 17, 4,
62, 47, 59, 36, 45, 43, 51, 22, 53, 39, 33, 30, 24, 18, 12, 5,
63, 55, 48, 27, 60, 41, 37, 16, 46, 35, 44, 21, 52, 32, 23, 11,
54, 26, 40, 15, 34, 20, 31, 10, 25, 14, 19, 9, 13, 8, 7, 6,
}
// trailingZeroBits returns the number of consecutive least significant zero
// bits of x.
func countTrailingZeros(x uint64) int {
// x & -x leaves only the right-most bit set in the word. Let k be the
// index of that bit. Since only a single bit is set, the value is two
// to the power of k. Multiplying by a power of two is equivalent to
// left shifting, in this case by k bits. The de Bruijn constant is
// such that all six bit, consecutive substrings are distinct.
// Therefore, if we have a left shifted version of this constant we can
// find by how many bits it was shifted by looking at which six bit
// substring ended up at the top of the word.
// (Knuth, volume 4, section 7.3.1)
if x == 0 {
// We have to special case 0; the fomula
// below doesn't work for 0.
return 64
}
return int(deBruijn64Lookup[((x&-x)*(deBruijn64))>>58])
}

View File

@@ -0,0 +1,313 @@
package roaring
import (
"container/heap"
)
// Or function that requires repairAfterLazy
func lazyOR(x1, x2 *Bitmap) *Bitmap {
answer := NewBitmap()
pos1 := 0
pos2 := 0
length1 := x1.highlowcontainer.size()
length2 := x2.highlowcontainer.size()
main:
for (pos1 < length1) && (pos2 < length2) {
s1 := x1.highlowcontainer.getKeyAtIndex(pos1)
s2 := x2.highlowcontainer.getKeyAtIndex(pos2)
for {
if s1 < s2 {
answer.highlowcontainer.appendCopy(x1.highlowcontainer, pos1)
pos1++
if pos1 == length1 {
break main
}
s1 = x1.highlowcontainer.getKeyAtIndex(pos1)
} else if s1 > s2 {
answer.highlowcontainer.appendCopy(x2.highlowcontainer, pos2)
pos2++
if pos2 == length2 {
break main
}
s2 = x2.highlowcontainer.getKeyAtIndex(pos2)
} else {
c1 := x1.highlowcontainer.getContainerAtIndex(pos1)
answer.highlowcontainer.appendContainer(s1, c1.lazyOR(x2.highlowcontainer.getContainerAtIndex(pos2)), false)
pos1++
pos2++
if (pos1 == length1) || (pos2 == length2) {
break main
}
s1 = x1.highlowcontainer.getKeyAtIndex(pos1)
s2 = x2.highlowcontainer.getKeyAtIndex(pos2)
}
}
}
if pos1 == length1 {
answer.highlowcontainer.appendCopyMany(x2.highlowcontainer, pos2, length2)
} else if pos2 == length2 {
answer.highlowcontainer.appendCopyMany(x1.highlowcontainer, pos1, length1)
}
return answer
}
// In-place Or function that requires repairAfterLazy
func (x1 *Bitmap) lazyOR(x2 *Bitmap) *Bitmap {
pos1 := 0
pos2 := 0
length1 := x1.highlowcontainer.size()
length2 := x2.highlowcontainer.size()
main:
for (pos1 < length1) && (pos2 < length2) {
s1 := x1.highlowcontainer.getKeyAtIndex(pos1)
s2 := x2.highlowcontainer.getKeyAtIndex(pos2)
for {
if s1 < s2 {
pos1++
if pos1 == length1 {
break main
}
s1 = x1.highlowcontainer.getKeyAtIndex(pos1)
} else if s1 > s2 {
x1.highlowcontainer.insertNewKeyValueAt(pos1, s2, x2.highlowcontainer.getContainerAtIndex(pos2).clone())
pos2++
pos1++
length1++
if pos2 == length2 {
break main
}
s2 = x2.highlowcontainer.getKeyAtIndex(pos2)
} else {
c1 := x1.highlowcontainer.getWritableContainerAtIndex(pos1)
x1.highlowcontainer.containers[pos1] = c1.lazyIOR(x2.highlowcontainer.getContainerAtIndex(pos2))
x1.highlowcontainer.needCopyOnWrite[pos1] = false
pos1++
pos2++
if (pos1 == length1) || (pos2 == length2) {
break main
}
s1 = x1.highlowcontainer.getKeyAtIndex(pos1)
s2 = x2.highlowcontainer.getKeyAtIndex(pos2)
}
}
}
if pos1 == length1 {
x1.highlowcontainer.appendCopyMany(x2.highlowcontainer, pos2, length2)
}
return x1
}
// to be called after lazy aggregates
func (x1 *Bitmap) repairAfterLazy() {
for pos := 0; pos < x1.highlowcontainer.size(); pos++ {
c := x1.highlowcontainer.getContainerAtIndex(pos)
switch c.(type) {
case *bitmapContainer:
if c.(*bitmapContainer).cardinality == invalidCardinality {
c = x1.highlowcontainer.getWritableContainerAtIndex(pos)
c.(*bitmapContainer).computeCardinality()
if c.(*bitmapContainer).getCardinality() <= arrayDefaultMaxSize {
x1.highlowcontainer.setContainerAtIndex(pos, c.(*bitmapContainer).toArrayContainer())
} else if c.(*bitmapContainer).isFull() {
x1.highlowcontainer.setContainerAtIndex(pos, newRunContainer16Range(0, MaxUint16))
}
}
}
}
}
// FastAnd computes the intersection between many bitmaps quickly
// Compared to the And function, it can take many bitmaps as input, thus saving the trouble
// of manually calling "And" many times.
//
// Performance hints: if you have very large and tiny bitmaps,
// it may be beneficial performance-wise to put a tiny bitmap
// in first position.
func FastAnd(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
} else if len(bitmaps) == 1 {
return bitmaps[0].Clone()
}
answer := And(bitmaps[0], bitmaps[1])
for _, bm := range bitmaps[2:] {
answer.And(bm)
}
return answer
}
// FastOr computes the union between many bitmaps quickly, as opposed to having to call Or repeatedly.
// It might also be faster than calling Or repeatedly.
func FastOr(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
} else if len(bitmaps) == 1 {
return bitmaps[0].Clone()
}
answer := lazyOR(bitmaps[0], bitmaps[1])
for _, bm := range bitmaps[2:] {
answer = answer.lazyOR(bm)
}
// here is where repairAfterLazy is called.
answer.repairAfterLazy()
return answer
}
// HeapOr computes the union between many bitmaps quickly using a heap.
// It might be faster than calling Or repeatedly.
func HeapOr(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
}
// TODO: for better speed, we could do the operation lazily, see Java implementation
pq := make(priorityQueue, len(bitmaps))
for i, bm := range bitmaps {
pq[i] = &item{bm, i}
}
heap.Init(&pq)
for pq.Len() > 1 {
x1 := heap.Pop(&pq).(*item)
x2 := heap.Pop(&pq).(*item)
heap.Push(&pq, &item{Or(x1.value, x2.value), 0})
}
return heap.Pop(&pq).(*item).value
}
// HeapXor computes the symmetric difference between many bitmaps quickly (as opposed to calling Xor repeated).
// Internally, this function uses a heap.
// It might be faster than calling Xor repeatedly.
func HeapXor(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
}
pq := make(priorityQueue, len(bitmaps))
for i, bm := range bitmaps {
pq[i] = &item{bm, i}
}
heap.Init(&pq)
for pq.Len() > 1 {
x1 := heap.Pop(&pq).(*item)
x2 := heap.Pop(&pq).(*item)
heap.Push(&pq, &item{Xor(x1.value, x2.value), 0})
}
return heap.Pop(&pq).(*item).value
}
// AndAny provides a result equivalent to x1.And(FastOr(bitmaps)).
// It's optimized to minimize allocations. It also might be faster than separate calls.
func (x1 *Bitmap) AndAny(bitmaps ...*Bitmap) {
if len(bitmaps) == 0 {
return
} else if len(bitmaps) == 1 {
x1.And(bitmaps[0])
return
}
type withPos struct {
bitmap *roaringArray
pos int
key uint16
}
filters := make([]withPos, 0, len(bitmaps))
for _, b := range bitmaps {
if b.highlowcontainer.size() > 0 {
filters = append(filters, withPos{
bitmap: &b.highlowcontainer,
pos: 0,
key: b.highlowcontainer.getKeyAtIndex(0),
})
}
}
basePos := 0
intersections := 0
keyContainers := make([]container, 0, len(filters))
var (
tmpArray *arrayContainer
tmpBitmap *bitmapContainer
minNextKey uint16
)
for basePos < x1.highlowcontainer.size() && len(filters) > 0 {
baseKey := x1.highlowcontainer.getKeyAtIndex(basePos)
// accumulate containers for current key, find next minimal key in filters
// and exclude filters that do not have related values anymore
i := 0
maxPossibleOr := 0
minNextKey = MaxUint16
for _, f := range filters {
if f.key < baseKey {
f.pos = f.bitmap.advanceUntil(baseKey, f.pos)
if f.pos == f.bitmap.size() {
continue
}
f.key = f.bitmap.getKeyAtIndex(f.pos)
}
if f.key == baseKey {
cont := f.bitmap.getContainerAtIndex(f.pos)
keyContainers = append(keyContainers, cont)
maxPossibleOr += cont.getCardinality()
f.pos++
if f.pos == f.bitmap.size() {
continue
}
f.key = f.bitmap.getKeyAtIndex(f.pos)
}
minNextKey = minOfUint16(minNextKey, f.key)
filters[i] = f
i++
}
filters = filters[:i]
if len(keyContainers) == 0 {
basePos = x1.highlowcontainer.advanceUntil(minNextKey, basePos)
continue
}
var ored container
if len(keyContainers) == 1 {
ored = keyContainers[0]
} else {
//TODO: special case for run containers?
if maxPossibleOr > arrayDefaultMaxSize {
if tmpBitmap == nil {
tmpBitmap = newBitmapContainer()
}
tmpBitmap.resetTo(keyContainers[0])
ored = tmpBitmap
} else {
if tmpArray == nil {
tmpArray = newArrayContainerCapacity(maxPossibleOr)
}
tmpArray.realloc(maxPossibleOr)
tmpArray.resetTo(keyContainers[0])
ored = tmpArray
}
for _, c := range keyContainers[1:] {
ored = ored.ior(c)
}
}
result := x1.highlowcontainer.getWritableContainerAtIndex(basePos).iand(ored)
if !result.isEmpty() {
x1.highlowcontainer.replaceKeyAndContainerAtIndex(intersections, baseKey, result, false)
intersections++
}
keyContainers = keyContainers[:0]
basePos = x1.highlowcontainer.advanceUntil(minNextKey, basePos)
}
x1.highlowcontainer.resize(intersections)
}

View File

@@ -0,0 +1,215 @@
package internal
import (
"encoding/binary"
"io"
)
// ByteInput typed interface around io.Reader or raw bytes
type ByteInput interface {
// Next returns a slice containing the next n bytes from the buffer,
// advancing the buffer as if the bytes had been returned by Read.
Next(n int) ([]byte, error)
// NextReturnsSafeSlice returns true if Next() returns a safe slice as opposed
// to a slice that points to an underlying buffer possibly owned by another system.
// When NextReturnsSafeSlice returns false, the result from Next() should be copied
// before it is modified (i.e., it is immutable).
NextReturnsSafeSlice() bool
// ReadUInt32 reads uint32 with LittleEndian order
ReadUInt32() (uint32, error)
// ReadUInt16 reads uint16 with LittleEndian order
ReadUInt16() (uint16, error)
// GetReadBytes returns read bytes
GetReadBytes() int64
// SkipBytes skips exactly n bytes
SkipBytes(n int) error
}
// NewByteInputFromReader creates reader wrapper
func NewByteInputFromReader(reader io.Reader) ByteInput {
return &ByteInputAdapter{
r: reader,
readBytes: 0,
}
}
// NewByteInput creates raw bytes wrapper
func NewByteInput(buf []byte) ByteInput {
return &ByteBuffer{
buf: buf,
off: 0,
}
}
// ByteBuffer raw bytes wrapper
type ByteBuffer struct {
buf []byte
off int
}
// NewByteBuffer creates a new ByteBuffer.
func NewByteBuffer(buf []byte) *ByteBuffer {
return &ByteBuffer{
buf: buf,
}
}
var _ io.Reader = (*ByteBuffer)(nil)
// Read implements io.Reader.
func (b *ByteBuffer) Read(p []byte) (int, error) {
data, err := b.Next(len(p))
if err != nil {
return 0, err
}
copy(p, data)
return len(data), nil
}
// Next returns a slice containing the next n bytes from the reader
// If there are fewer bytes than the given n, io.ErrUnexpectedEOF will be returned
func (b *ByteBuffer) Next(n int) ([]byte, error) {
m := len(b.buf) - b.off
if n > m {
return nil, io.ErrUnexpectedEOF
}
data := b.buf[b.off : b.off+n]
b.off += n
return data, nil
}
// NextReturnsSafeSlice returns false since ByteBuffer might hold
// an array owned by some other systems.
func (b *ByteBuffer) NextReturnsSafeSlice() bool {
return false
}
// ReadUInt32 reads uint32 with LittleEndian order
func (b *ByteBuffer) ReadUInt32() (uint32, error) {
if len(b.buf)-b.off < 4 {
return 0, io.ErrUnexpectedEOF
}
v := binary.LittleEndian.Uint32(b.buf[b.off:])
b.off += 4
return v, nil
}
// ReadUInt16 reads uint16 with LittleEndian order
func (b *ByteBuffer) ReadUInt16() (uint16, error) {
if len(b.buf)-b.off < 2 {
return 0, io.ErrUnexpectedEOF
}
v := binary.LittleEndian.Uint16(b.buf[b.off:])
b.off += 2
return v, nil
}
// GetReadBytes returns read bytes
func (b *ByteBuffer) GetReadBytes() int64 {
return int64(b.off)
}
// SkipBytes skips exactly n bytes
func (b *ByteBuffer) SkipBytes(n int) error {
m := len(b.buf) - b.off
if n > m {
return io.ErrUnexpectedEOF
}
b.off += n
return nil
}
// Reset resets the given buffer with a new byte slice
func (b *ByteBuffer) Reset(buf []byte) {
b.buf = buf
b.off = 0
}
// ByteInputAdapter reader wrapper
type ByteInputAdapter struct {
r io.Reader
readBytes int
buf [4]byte
}
var _ io.Reader = (*ByteInputAdapter)(nil)
// Read implements io.Reader.
func (b *ByteInputAdapter) Read(buf []byte) (int, error) {
m, err := io.ReadAtLeast(b.r, buf, len(buf))
b.readBytes += m
if err != nil {
return 0, err
}
return m, nil
}
// Next returns a slice containing the next n bytes from the buffer,
// advancing the buffer as if the bytes had been returned by Read.
func (b *ByteInputAdapter) Next(n int) ([]byte, error) {
buf := make([]byte, n)
_, err := b.Read(buf)
if err != nil {
return nil, err
}
return buf, nil
}
// NextReturnsSafeSlice returns true since ByteInputAdapter always returns a slice
// allocated with make([]byte, ...)
func (b *ByteInputAdapter) NextReturnsSafeSlice() bool {
return true
}
// ReadUInt32 reads uint32 with LittleEndian order
func (b *ByteInputAdapter) ReadUInt32() (uint32, error) {
buf := b.buf[:4]
_, err := b.Read(buf)
if err != nil {
return 0, err
}
return binary.LittleEndian.Uint32(buf), nil
}
// ReadUInt16 reads uint16 with LittleEndian order
func (b *ByteInputAdapter) ReadUInt16() (uint16, error) {
buf := b.buf[:2]
_, err := b.Read(buf)
if err != nil {
return 0, err
}
return binary.LittleEndian.Uint16(buf), nil
}
// GetReadBytes returns read bytes
func (b *ByteInputAdapter) GetReadBytes() int64 {
return int64(b.readBytes)
}
// SkipBytes skips exactly n bytes
func (b *ByteInputAdapter) SkipBytes(n int) error {
_, err := b.Next(n)
return err
}
// Reset resets the given buffer with a new stream
func (b *ByteInputAdapter) Reset(stream io.Reader) {
b.r = stream
b.readBytes = 0
}

View File

@@ -0,0 +1,21 @@
package internal
import (
"sync"
)
var (
// ByteInputAdapterPool shared pool
ByteInputAdapterPool = sync.Pool{
New: func() interface{} {
return &ByteInputAdapter{}
},
}
// ByteBufferPool shared pool
ByteBufferPool = sync.Pool{
New: func() interface{} {
return &ByteBuffer{}
},
}
)

44
vendor/github.com/RoaringBitmap/roaring/v2/iter.go generated vendored Normal file
View File

@@ -0,0 +1,44 @@
package roaring
import "iter"
// Values returns an iterator that yields the elements of the bitmap in
// increasing order. Starting with Go 1.23, users can use a for loop to iterate
// over it.
func Values(b *Bitmap) iter.Seq[uint32] {
return func(yield func(uint32) bool) {
it := b.Iterator()
for it.HasNext() {
if !yield(it.Next()) {
return
}
}
}
}
// Backward returns an iterator that yields the elements of the bitmap in
// decreasing order. Starting with Go 1.23, users can use a for loop to iterate
// over it.
func Backward(b *Bitmap) iter.Seq[uint32] {
return func(yield func(uint32) bool) {
it := b.ReverseIterator()
for it.HasNext() {
if !yield(it.Next()) {
return
}
}
}
}
// Unset creates an iterator that yields values in the range [min, max] that are NOT contained in the bitmap.
// The iterator becomes invalid if the bitmap is modified (e.g., with Add or Remove).
func Unset(b *Bitmap, min, max uint32) iter.Seq[uint32] {
return func(yield func(uint32) bool) {
it := b.UnsetIterator(uint64(min), uint64(max)+1)
for it.HasNext() {
if !yield(it.Next()) {
return
}
}
}
}

View File

@@ -0,0 +1,32 @@
package roaring
type manyIterable interface {
nextMany(hs uint32, buf []uint32) int
nextMany64(hs uint64, buf []uint64) int
}
func (si *shortIterator) nextMany(hs uint32, buf []uint32) int {
n := 0
l := si.loc
s := si.slice
for n < len(buf) && l < len(s) {
buf[n] = uint32(s[l]) | hs
l++
n++
}
si.loc = l
return n
}
func (si *shortIterator) nextMany64(hs uint64, buf []uint64) int {
n := 0
l := si.loc
s := si.slice
for n < len(buf) && l < len(s) {
buf[n] = uint64(s[l]) | hs
l++
n++
}
si.loc = l
return n
}

612
vendor/github.com/RoaringBitmap/roaring/v2/parallel.go generated vendored Normal file
View File

@@ -0,0 +1,612 @@
package roaring
import (
"container/heap"
"fmt"
"runtime"
"sync"
)
var defaultWorkerCount = runtime.NumCPU()
type bitmapContainerKey struct {
key uint16
idx int
bitmap *Bitmap
}
type multipleContainers struct {
key uint16
containers []container
idx int
}
type keyedContainer struct {
key uint16
container container
idx int
}
type bitmapContainerHeap []bitmapContainerKey
func (h bitmapContainerHeap) Len() int { return len(h) }
func (h bitmapContainerHeap) Less(i, j int) bool { return h[i].key < h[j].key }
func (h bitmapContainerHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *bitmapContainerHeap) Push(x interface{}) {
// Push and Pop use pointer receivers because they modify the slice's length,
// not just its contents.
*h = append(*h, x.(bitmapContainerKey))
}
func (h *bitmapContainerHeap) Pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
func (h bitmapContainerHeap) Peek() bitmapContainerKey {
return h[0]
}
func (h *bitmapContainerHeap) popIncrementing() (key uint16, container container) {
k := h.Peek()
key = k.key
container = k.bitmap.highlowcontainer.containers[k.idx]
newIdx := k.idx + 1
if newIdx < k.bitmap.highlowcontainer.size() {
k = bitmapContainerKey{
k.bitmap.highlowcontainer.keys[newIdx],
newIdx,
k.bitmap,
}
(*h)[0] = k
heap.Fix(h, 0)
} else {
heap.Pop(h)
}
return
}
func (h *bitmapContainerHeap) Next(containers []container) multipleContainers {
if h.Len() == 0 {
return multipleContainers{}
}
key, container := h.popIncrementing()
containers = append(containers, container)
for h.Len() > 0 && key == h.Peek().key {
_, container = h.popIncrementing()
containers = append(containers, container)
}
return multipleContainers{
key,
containers,
-1,
}
}
func newBitmapContainerHeap(bitmaps ...*Bitmap) bitmapContainerHeap {
// Initialize heap
var h bitmapContainerHeap = make([]bitmapContainerKey, 0, len(bitmaps))
for _, bitmap := range bitmaps {
if !bitmap.IsEmpty() {
key := bitmapContainerKey{
bitmap.highlowcontainer.keys[0],
0,
bitmap,
}
h = append(h, key)
}
}
heap.Init(&h)
return h
}
func repairAfterLazy(c container) container {
switch t := c.(type) {
case *bitmapContainer:
if t.cardinality == invalidCardinality {
t.computeCardinality()
}
if t.getCardinality() <= arrayDefaultMaxSize {
return t.toArrayContainer()
} else if c.(*bitmapContainer).isFull() {
return newRunContainer16Range(0, MaxUint16)
}
}
return c
}
func toBitmapContainer(c container) container {
switch t := c.(type) {
case *arrayContainer:
return t.toBitmapContainer()
case *runContainer16:
if !t.isFull() {
return t.toBitmapContainer()
}
}
return c
}
func appenderRoutine(bitmapChan chan<- *Bitmap, resultChan <-chan keyedContainer, expectedKeysChan <-chan int) {
expectedKeys := -1
appendedKeys := 0
var keys []uint16
var containers []container
for appendedKeys != expectedKeys {
select {
case item := <-resultChan:
if len(keys) <= item.idx {
keys = append(keys, make([]uint16, item.idx-len(keys)+1)...)
containers = append(containers, make([]container, item.idx-len(containers)+1)...)
}
keys[item.idx] = item.key
containers[item.idx] = item.container
appendedKeys++
case msg := <-expectedKeysChan:
expectedKeys = msg
}
}
answer := &Bitmap{
roaringArray{
make([]uint16, 0, expectedKeys),
make([]container, 0, expectedKeys),
make([]bool, 0, expectedKeys),
false,
},
}
for i := range keys {
if containers[i] != nil { // in case a resulting container was empty, see ParAnd function
answer.highlowcontainer.appendContainer(keys[i], containers[i], false)
}
}
bitmapChan <- answer
}
// ParHeapOr computes the union (OR) of all provided bitmaps in parallel,
// where the parameter "parallelism" determines how many workers are to be used
// (if it is set to 0, a default number of workers is chosen)
// ParHeapOr uses a heap to compute the union. For rare cases it might be faster than ParOr
func ParHeapOr(parallelism int, bitmaps ...*Bitmap) *Bitmap {
bitmapCount := len(bitmaps)
if bitmapCount == 0 {
return NewBitmap()
} else if bitmapCount == 1 {
return bitmaps[0].Clone()
}
if parallelism == 0 {
parallelism = defaultWorkerCount
}
h := newBitmapContainerHeap(bitmaps...)
bitmapChan := make(chan *Bitmap)
inputChan := make(chan multipleContainers, 128)
resultChan := make(chan keyedContainer, 32)
expectedKeysChan := make(chan int)
pool := sync.Pool{
New: func() interface{} {
return make([]container, 0, len(bitmaps))
},
}
orFunc := func() {
// Assumes only structs with >=2 containers are passed
for input := range inputChan {
c := toBitmapContainer(input.containers[0]).lazyOR(input.containers[1])
for _, next := range input.containers[2:] {
c = c.lazyIOR(next)
}
c = repairAfterLazy(c)
kx := keyedContainer{
input.key,
c,
input.idx,
}
resultChan <- kx
pool.Put(input.containers[:0])
}
}
go appenderRoutine(bitmapChan, resultChan, expectedKeysChan)
for i := 0; i < parallelism; i++ {
go orFunc()
}
idx := 0
for h.Len() > 0 {
ck := h.Next(pool.Get().([]container))
if len(ck.containers) == 1 {
resultChan <- keyedContainer{
ck.key,
ck.containers[0],
idx,
}
pool.Put(ck.containers[:0])
} else {
ck.idx = idx
inputChan <- ck
}
idx++
}
expectedKeysChan <- idx
bitmap := <-bitmapChan
close(inputChan)
close(resultChan)
close(expectedKeysChan)
return bitmap
}
// ParAnd computes the intersection (AND) of all provided bitmaps in parallel,
// where the parameter "parallelism" determines how many workers are to be used
// (if it is set to 0, a default number of workers is chosen)
func ParAnd(parallelism int, bitmaps ...*Bitmap) *Bitmap {
bitmapCount := len(bitmaps)
if bitmapCount == 0 {
return NewBitmap()
} else if bitmapCount == 1 {
return bitmaps[0].Clone()
}
if parallelism == 0 {
parallelism = defaultWorkerCount
}
h := newBitmapContainerHeap(bitmaps...)
bitmapChan := make(chan *Bitmap)
inputChan := make(chan multipleContainers, 128)
resultChan := make(chan keyedContainer, 32)
expectedKeysChan := make(chan int)
andFunc := func() {
// Assumes only structs with >=2 containers are passed
for input := range inputChan {
c := input.containers[0].and(input.containers[1])
for _, next := range input.containers[2:] {
if c.isEmpty() {
break
}
c = c.iand(next)
}
// Send a nil explicitly if the result of the intersection is an empty container
if c.isEmpty() {
c = nil
}
kx := keyedContainer{
input.key,
c,
input.idx,
}
resultChan <- kx
}
}
go appenderRoutine(bitmapChan, resultChan, expectedKeysChan)
for i := 0; i < parallelism; i++ {
go andFunc()
}
idx := 0
for h.Len() > 0 {
ck := h.Next(make([]container, 0, 4))
if len(ck.containers) == bitmapCount {
ck.idx = idx
inputChan <- ck
idx++
}
}
expectedKeysChan <- idx
bitmap := <-bitmapChan
close(inputChan)
close(resultChan)
close(expectedKeysChan)
return bitmap
}
// ParOr computes the union (OR) of all provided bitmaps in parallel,
// where the parameter "parallelism" determines how many workers are to be used
// (if it is set to 0, a default number of workers is chosen)
func ParOr(parallelism int, bitmaps ...*Bitmap) *Bitmap {
var lKey uint16 = MaxUint16
var hKey uint16
bitmapsFiltered := bitmaps[:0]
for _, b := range bitmaps {
if !b.IsEmpty() {
bitmapsFiltered = append(bitmapsFiltered, b)
}
}
bitmaps = bitmapsFiltered
for _, b := range bitmaps {
lKey = minOfUint16(lKey, b.highlowcontainer.keys[0])
hKey = maxOfUint16(hKey, b.highlowcontainer.keys[b.highlowcontainer.size()-1])
}
if lKey == MaxUint16 && hKey == 0 {
return New()
} else if len(bitmaps) == 1 {
return bitmaps[0].Clone()
}
keyRange := int(hKey) - int(lKey) + 1
if keyRange == 1 {
// revert to FastOr. Since the key range is 0
// no container-level aggregation parallelism is achievable
return FastOr(bitmaps...)
}
if parallelism == 0 {
parallelism = defaultWorkerCount
}
var chunkSize int
var chunkCount int
if parallelism*4 > int(keyRange) {
chunkSize = 1
chunkCount = int(keyRange)
} else {
chunkCount = parallelism * 4
chunkSize = (int(keyRange) + chunkCount - 1) / chunkCount
}
if chunkCount*chunkSize < int(keyRange) {
// it's fine to panic to indicate an implementation error
panic(fmt.Sprintf("invariant check failed: chunkCount * chunkSize < keyRange, %d * %d < %d", chunkCount, chunkSize, keyRange))
}
chunks := make([]*roaringArray, chunkCount)
chunkSpecChan := make(chan parChunkSpec, minOfInt(maxOfInt(64, 2*parallelism), int(chunkCount)))
chunkChan := make(chan parChunk, minOfInt(32, int(chunkCount)))
orFunc := func() {
for spec := range chunkSpecChan {
ra := lazyOrOnRange(&bitmaps[0].highlowcontainer, &bitmaps[1].highlowcontainer, spec.start, spec.end)
for _, b := range bitmaps[2:] {
ra = lazyIOrOnRange(ra, &b.highlowcontainer, spec.start, spec.end)
}
for i, c := range ra.containers {
ra.containers[i] = repairAfterLazy(c)
}
chunkChan <- parChunk{ra, spec.idx}
}
}
for i := 0; i < parallelism; i++ {
go orFunc()
}
go func() {
for i := 0; i < chunkCount; i++ {
spec := parChunkSpec{
start: uint16(int(lKey) + i*chunkSize),
end: uint16(minOfInt(int(lKey)+(i+1)*chunkSize-1, int(hKey))),
idx: int(i),
}
chunkSpecChan <- spec
}
}()
chunksRemaining := chunkCount
for chunk := range chunkChan {
chunks[chunk.idx] = chunk.ra
chunksRemaining--
if chunksRemaining == 0 {
break
}
}
close(chunkChan)
close(chunkSpecChan)
containerCount := 0
for _, chunk := range chunks {
containerCount += chunk.size()
}
result := Bitmap{
roaringArray{
containers: make([]container, containerCount),
keys: make([]uint16, containerCount),
needCopyOnWrite: make([]bool, containerCount),
},
}
resultOffset := 0
for _, chunk := range chunks {
copy(result.highlowcontainer.containers[resultOffset:], chunk.containers)
copy(result.highlowcontainer.keys[resultOffset:], chunk.keys)
copy(result.highlowcontainer.needCopyOnWrite[resultOffset:], chunk.needCopyOnWrite)
resultOffset += chunk.size()
}
return &result
}
type parChunkSpec struct {
start uint16
end uint16
idx int
}
type parChunk struct {
ra *roaringArray
idx int
}
func (c parChunk) size() int {
return c.ra.size()
}
func parNaiveStartAt(ra *roaringArray, start uint16, last uint16) int {
for idx, key := range ra.keys {
if key >= start && key <= last {
return idx
} else if key > last {
break
}
}
return ra.size()
}
func lazyOrOnRange(ra1, ra2 *roaringArray, start, last uint16) *roaringArray {
answer := newRoaringArray()
length1 := ra1.size()
length2 := ra2.size()
idx1 := parNaiveStartAt(ra1, start, last)
idx2 := parNaiveStartAt(ra2, start, last)
var key1 uint16
var key2 uint16
if idx1 < length1 && idx2 < length2 {
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
for key1 <= last && key2 <= last {
if key1 < key2 {
answer.appendCopy(*ra1, idx1)
idx1++
if idx1 == length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
} else if key1 > key2 {
answer.appendCopy(*ra2, idx2)
idx2++
if idx2 == length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
} else {
c1 := ra1.getFastContainerAtIndex(idx1, false)
answer.appendContainer(key1, c1.lazyOR(ra2.getContainerAtIndex(idx2)), false)
idx1++
idx2++
if idx1 == length1 || idx2 == length2 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
}
}
}
if idx2 < length2 {
key2 = ra2.getKeyAtIndex(idx2)
for key2 <= last {
answer.appendCopy(*ra2, idx2)
idx2++
if idx2 == length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
}
}
if idx1 < length1 {
key1 = ra1.getKeyAtIndex(idx1)
for key1 <= last {
answer.appendCopy(*ra1, idx1)
idx1++
if idx1 == length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
}
}
return answer
}
func lazyIOrOnRange(ra1, ra2 *roaringArray, start, last uint16) *roaringArray {
length1 := ra1.size()
length2 := ra2.size()
idx1 := 0
idx2 := parNaiveStartAt(ra2, start, last)
var key1 uint16
var key2 uint16
if idx1 < length1 && idx2 < length2 {
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
for key1 <= last && key2 <= last {
if key1 < key2 {
idx1++
if idx1 >= length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
} else if key1 > key2 {
ra1.insertNewKeyValueAt(idx1, key2, ra2.getContainerAtIndex(idx2))
ra1.needCopyOnWrite[idx1] = true
idx2++
idx1++
length1++
if idx2 >= length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
} else {
c1 := ra1.getFastContainerAtIndex(idx1, true)
ra1.containers[idx1] = c1.lazyIOR(ra2.getContainerAtIndex(idx2))
ra1.needCopyOnWrite[idx1] = false
idx1++
idx2++
if idx1 >= length1 || idx2 >= length2 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
}
}
}
if idx2 < length2 {
key2 = ra2.getKeyAtIndex(idx2)
for key2 <= last {
ra1.appendCopy(*ra2, idx2)
idx2++
if idx2 >= length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
}
}
return ra1
}

13
vendor/github.com/RoaringBitmap/roaring/v2/popcnt.go generated vendored Normal file
View File

@@ -0,0 +1,13 @@
//go:build go1.9
// +build go1.9
// "go1.9", from Go version 1.9 onward
// See https://golang.org/pkg/go/build/#hdr-Build_Constraints
package roaring
import "math/bits"
func popcount(x uint64) uint64 {
return uint64(bits.OnesCount64(x))
}

View File

@@ -0,0 +1,103 @@
// +build amd64,!appengine,!go1.9
TEXT ·hasAsm(SB),4,$0-1
MOVQ $1, AX
CPUID
SHRQ $23, CX
ANDQ $1, CX
MOVB CX, ret+0(FP)
RET
#define POPCNTQ_DX_DX BYTE $0xf3; BYTE $0x48; BYTE $0x0f; BYTE $0xb8; BYTE $0xd2
TEXT ·popcntSliceAsm(SB),4,$0-32
XORQ AX, AX
MOVQ s+0(FP), SI
MOVQ s_len+8(FP), CX
TESTQ CX, CX
JZ popcntSliceEnd
popcntSliceLoop:
BYTE $0xf3; BYTE $0x48; BYTE $0x0f; BYTE $0xb8; BYTE $0x16 // POPCNTQ (SI), DX
ADDQ DX, AX
ADDQ $8, SI
LOOP popcntSliceLoop
popcntSliceEnd:
MOVQ AX, ret+24(FP)
RET
TEXT ·popcntMaskSliceAsm(SB),4,$0-56
XORQ AX, AX
MOVQ s+0(FP), SI
MOVQ s_len+8(FP), CX
TESTQ CX, CX
JZ popcntMaskSliceEnd
MOVQ m+24(FP), DI
popcntMaskSliceLoop:
MOVQ (DI), DX
NOTQ DX
ANDQ (SI), DX
POPCNTQ_DX_DX
ADDQ DX, AX
ADDQ $8, SI
ADDQ $8, DI
LOOP popcntMaskSliceLoop
popcntMaskSliceEnd:
MOVQ AX, ret+48(FP)
RET
TEXT ·popcntAndSliceAsm(SB),4,$0-56
XORQ AX, AX
MOVQ s+0(FP), SI
MOVQ s_len+8(FP), CX
TESTQ CX, CX
JZ popcntAndSliceEnd
MOVQ m+24(FP), DI
popcntAndSliceLoop:
MOVQ (DI), DX
ANDQ (SI), DX
POPCNTQ_DX_DX
ADDQ DX, AX
ADDQ $8, SI
ADDQ $8, DI
LOOP popcntAndSliceLoop
popcntAndSliceEnd:
MOVQ AX, ret+48(FP)
RET
TEXT ·popcntOrSliceAsm(SB),4,$0-56
XORQ AX, AX
MOVQ s+0(FP), SI
MOVQ s_len+8(FP), CX
TESTQ CX, CX
JZ popcntOrSliceEnd
MOVQ m+24(FP), DI
popcntOrSliceLoop:
MOVQ (DI), DX
ORQ (SI), DX
POPCNTQ_DX_DX
ADDQ DX, AX
ADDQ $8, SI
ADDQ $8, DI
LOOP popcntOrSliceLoop
popcntOrSliceEnd:
MOVQ AX, ret+48(FP)
RET
TEXT ·popcntXorSliceAsm(SB),4,$0-56
XORQ AX, AX
MOVQ s+0(FP), SI
MOVQ s_len+8(FP), CX
TESTQ CX, CX
JZ popcntXorSliceEnd
MOVQ m+24(FP), DI
popcntXorSliceLoop:
MOVQ (DI), DX
XORQ (SI), DX
POPCNTQ_DX_DX
ADDQ DX, AX
ADDQ $8, SI
ADDQ $8, DI
LOOP popcntXorSliceLoop
popcntXorSliceEnd:
MOVQ AX, ret+48(FP)
RET

View File

@@ -0,0 +1,68 @@
//go:build amd64 && !appengine && !go1.9
// +build amd64,!appengine,!go1.9
package roaring
// *** the following functions are defined in popcnt_amd64.s
//go:noescape
func hasAsm() bool
// useAsm is a flag used to select the GO or ASM implementation of the popcnt function
var useAsm = hasAsm()
//go:noescape
func popcntSliceAsm(s []uint64) uint64
//go:noescape
func popcntMaskSliceAsm(s, m []uint64) uint64
//go:noescape
func popcntAndSliceAsm(s, m []uint64) uint64
//go:noescape
func popcntOrSliceAsm(s, m []uint64) uint64
//go:noescape
func popcntXorSliceAsm(s, m []uint64) uint64
func popcntSlice(s []uint64) uint64 {
if useAsm {
return popcntSliceAsm(s)
}
return popcntSliceGo(s)
}
func popcntMaskSlice(s, m []uint64) uint64 {
if useAsm {
return popcntMaskSliceAsm(s, m)
}
return popcntMaskSliceGo(s, m)
}
func popcntAndSlice(s, m []uint64) uint64 {
if useAsm {
return popcntAndSliceAsm(s, m)
}
return popcntAndSliceGo(s, m)
}
func popcntOrSlice(s, m []uint64) uint64 {
if useAsm {
return popcntOrSliceAsm(s, m)
}
return popcntOrSliceGo(s, m)
}
func popcntXorSlice(s, m []uint64) uint64 {
if useAsm {
return popcntXorSliceAsm(s, m)
}
return popcntXorSliceGo(s, m)
}

View File

@@ -0,0 +1,18 @@
//go:build !go1.9
// +build !go1.9
package roaring
// bit population count, take from
// https://code.google.com/p/go/issues/detail?id=4988#c11
// credit: https://code.google.com/u/arnehormann/
// credit: https://play.golang.org/p/U7SogJ7psJ
// credit: http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
func popcount(x uint64) uint64 {
x -= (x >> 1) & 0x5555555555555555
x = (x>>2)&0x3333333333333333 + x&0x3333333333333333
x += x >> 4
x &= 0x0f0f0f0f0f0f0f0f
x *= 0x0101010101010101
return x >> 56
}

View File

@@ -0,0 +1,24 @@
//go:build !amd64 || appengine || go1.9
// +build !amd64 appengine go1.9
package roaring
func popcntSlice(s []uint64) uint64 {
return popcntSliceGo(s)
}
func popcntMaskSlice(s, m []uint64) uint64 {
return popcntMaskSliceGo(s, m)
}
func popcntAndSlice(s, m []uint64) uint64 {
return popcntAndSliceGo(s, m)
}
func popcntOrSlice(s, m []uint64) uint64 {
return popcntOrSliceGo(s, m)
}
func popcntXorSlice(s, m []uint64) uint64 {
return popcntXorSliceGo(s, m)
}

View File

@@ -0,0 +1,41 @@
package roaring
func popcntSliceGo(s []uint64) uint64 {
cnt := uint64(0)
for _, x := range s {
cnt += popcount(x)
}
return cnt
}
func popcntMaskSliceGo(s, m []uint64) uint64 {
cnt := uint64(0)
for i := range s {
cnt += popcount(s[i] &^ m[i])
}
return cnt
}
func popcntAndSliceGo(s, m []uint64) uint64 {
cnt := uint64(0)
for i := range s {
cnt += popcount(s[i] & m[i])
}
return cnt
}
func popcntOrSliceGo(s, m []uint64) uint64 {
cnt := uint64(0)
for i := range s {
cnt += popcount(s[i] | m[i])
}
return cnt
}
func popcntXorSliceGo(s, m []uint64) uint64 {
cnt := uint64(0)
for i := range s {
cnt += popcount(s[i] ^ m[i])
}
return cnt
}

View File

@@ -0,0 +1,101 @@
package roaring
import "container/heap"
/////////////
// The priorityQueue is used to keep Bitmaps sorted.
////////////
type item struct {
value *Bitmap
index int
}
type priorityQueue []*item
func (pq priorityQueue) Len() int { return len(pq) }
func (pq priorityQueue) Less(i, j int) bool {
return pq[i].value.GetSizeInBytes() < pq[j].value.GetSizeInBytes()
}
func (pq priorityQueue) Swap(i, j int) {
pq[i], pq[j] = pq[j], pq[i]
pq[i].index = i
pq[j].index = j
}
func (pq *priorityQueue) Push(x interface{}) {
n := len(*pq)
item := x.(*item)
item.index = n
*pq = append(*pq, item)
}
func (pq *priorityQueue) Pop() interface{} {
old := *pq
n := len(old)
item := old[n-1]
item.index = -1 // for safety
*pq = old[0 : n-1]
return item
}
func (pq *priorityQueue) update(item *item, value *Bitmap) {
item.value = value
heap.Fix(pq, item.index)
}
/////////////
// The containerPriorityQueue is used to keep the containers of various Bitmaps sorted.
////////////
type containeritem struct {
value *Bitmap
keyindex int
index int
}
type containerPriorityQueue []*containeritem
func (pq containerPriorityQueue) Len() int { return len(pq) }
func (pq containerPriorityQueue) Less(i, j int) bool {
k1 := pq[i].value.highlowcontainer.getKeyAtIndex(pq[i].keyindex)
k2 := pq[j].value.highlowcontainer.getKeyAtIndex(pq[j].keyindex)
if k1 != k2 {
return k1 < k2
}
c1 := pq[i].value.highlowcontainer.getContainerAtIndex(pq[i].keyindex)
c2 := pq[j].value.highlowcontainer.getContainerAtIndex(pq[j].keyindex)
return c1.getCardinality() > c2.getCardinality()
}
func (pq containerPriorityQueue) Swap(i, j int) {
pq[i], pq[j] = pq[j], pq[i]
pq[i].index = i
pq[j].index = j
}
func (pq *containerPriorityQueue) Push(x interface{}) {
n := len(*pq)
item := x.(*containeritem)
item.index = n
*pq = append(*pq, item)
}
func (pq *containerPriorityQueue) Pop() interface{} {
old := *pq
n := len(old)
item := old[n-1]
item.index = -1 // for safety
*pq = old[0 : n-1]
return item
}
//func (pq *containerPriorityQueue) update(item *containeritem, value *Bitmap, keyindex int) {
// item.value = value
// item.keyindex = keyindex
// heap.Fix(pq, item.index)
//}

2379
vendor/github.com/RoaringBitmap/roaring/v2/roaring.go generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,106 @@
.PHONY: help all test format fmtcheck vet lint qa deps clean nuke ser fetch-real-roaring-datasets
# Display general help about this command
help:
@echo ""
@echo "The following commands are available:"
@echo ""
@echo " make qa : Run all the tests"
@echo " make test : Run the unit tests"
@echo ""
@echo " make format : Format the source code"
@echo " make fmtcheck : Check if the source code has been formatted"
@echo " make vet : Check for suspicious constructs"
@echo " make lint : Check for style errors"
@echo ""
@echo " make deps : Get the dependencies"
@echo " make clean : Remove any build artifact"
@echo " make nuke : Deletes any intermediate file"
@echo ""
@echo " make fuzz-smat : Fuzzy testing with smat"
@echo " make fuzz-stream : Fuzzy testing with stream deserialization"
@echo " make fuzz-buffer : Fuzzy testing with buffer deserialization"
@echo ""
# Alias for help target
all: help
test:
go test
# Format the source code
format:
@find ./ -type f -name "*.go" -exec gofmt -w {} \;
# Check if the source code has been formatted
fmtcheck:
@mkdir -p target
@find ./ -type f -name "*.go" -exec gofmt -d {} \; | tee target/format.diff
@test ! -s target/format.diff || { echo "ERROR: the source code has not been formatted - please use 'make format' or 'gofmt'"; exit 1; }
# Check for syntax errors
vet:
GOPATH=$(GOPATH) go vet ./...
# Check for style errors
lint:
GOPATH=$(GOPATH) PATH=$(GOPATH)/bin:$(PATH) golint ./...
# Alias to run all quality-assurance checks
qa: fmtcheck test vet lint
# --- INSTALL ---
# Get the dependencies
deps:
GOPATH=$(GOPATH) go get github.com/stretchr/testify
GOPATH=$(GOPATH) go get github.com/bits-and-blooms/bitset
GOPATH=$(GOPATH) go get github.com/golang/lint/golint
GOPATH=$(GOPATH) go get github.com/mschoch/smat
GOPATH=$(GOPATH) go get github.com/dvyukov/go-fuzz/go-fuzz
GOPATH=$(GOPATH) go get github.com/dvyukov/go-fuzz/go-fuzz-build
GOPATH=$(GOPATH) go get github.com/glycerine/go-unsnap-stream
GOPATH=$(GOPATH) go get github.com/philhofer/fwd
GOPATH=$(GOPATH) go get github.com/jtolds/gls
fuzz-smat:
go test -tags=gofuzz -run=TestGenerateSmatCorpus
go-fuzz-build -func FuzzSmat github.com/RoaringBitmap/roaring
go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200
fuzz-stream:
go-fuzz-build -func FuzzSerializationStream github.com/RoaringBitmap/roaring
go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200
fuzz-buffer:
go-fuzz-build -func FuzzSerializationBuffer github.com/RoaringBitmap/roaring
go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200
# Remove any build artifact
clean:
GOPATH=$(GOPATH) go clean ./...
# Deletes any intermediate file
nuke:
rm -rf ./target
GOPATH=$(GOPATH) go clean -i ./...
cover:
go test -coverprofile=coverage.out
go tool cover -html=coverage.out
fetch-real-roaring-datasets:
# pull github.com/RoaringBitmap/real-roaring-datasets -> testdata/real-roaring-datasets
git submodule init
git submodule update

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,31 @@
package roaring64
// FastAnd computes the intersection between many bitmaps quickly
// Compared to the And function, it can take many bitmaps as input, thus saving the trouble
// of manually calling "And" many times.
func FastAnd(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
} else if len(bitmaps) == 1 {
return bitmaps[0].Clone()
}
answer := And(bitmaps[0], bitmaps[1])
for _, bm := range bitmaps[2:] {
answer.And(bm)
}
return answer
}
// FastOr computes the union between many bitmaps quickly, as opposed to having to call Or repeatedly.
func FastOr(bitmaps ...*Bitmap) *Bitmap {
if len(bitmaps) == 0 {
return NewBitmap()
} else if len(bitmaps) == 1 {
return bitmaps[0].Clone()
}
answer := Or(bitmaps[0], bitmaps[1])
for _, bm := range bitmaps[2:] {
answer.Or(bm)
}
return answer
}

View File

@@ -0,0 +1,31 @@
package roaring64
import "iter"
// Values returns an iterator that yields the elements of the bitmap in
// increasing order. Starting with Go 1.23, users can use a for loop to iterate
// over it.
func Values(b *Bitmap) iter.Seq[uint64] {
return func(yield func(uint64) bool) {
it := b.Iterator()
for it.HasNext() {
if !yield(it.Next()) {
return
}
}
}
}
// Backward returns an iterator that yields the elements of the bitmap in
// decreasing order. Starting with Go 1.23, users can use a for loop to iterate
// over it.
func Backward(b *Bitmap) iter.Seq[uint64] {
return func(yield func(uint64) bool) {
it := b.ReverseIterator()
for it.HasNext() {
if !yield(it.Next()) {
return
}
}
}
}

View File

@@ -0,0 +1,169 @@
package roaring64
import (
"github.com/RoaringBitmap/roaring/v2"
)
// IntIterable64 allows you to iterate over the values in a Bitmap
type IntIterable64 interface {
HasNext() bool
Next() uint64
}
// IntPeekable64 allows you to look at the next value without advancing and
// advance as long as the next value is smaller than minval
type IntPeekable64 interface {
IntIterable64
// PeekNext peeks the next value without advancing the iterator
PeekNext() uint64
// AdvanceIfNeeded advances as long as the next value is smaller than minval
AdvanceIfNeeded(minval uint64)
}
type intIterator struct {
pos int
hs uint64
iter roaring.IntPeekable
highlowcontainer *roaringArray64
}
// HasNext returns true if there are more integers to iterate over
func (ii *intIterator) HasNext() bool {
return ii.pos < ii.highlowcontainer.size()
}
func (ii *intIterator) init() {
if ii.highlowcontainer.size() > ii.pos {
ii.iter = ii.highlowcontainer.getContainerAtIndex(ii.pos).Iterator()
ii.hs = uint64(ii.highlowcontainer.getKeyAtIndex(ii.pos)) << 32
}
}
// Next returns the next integer
func (ii *intIterator) Next() uint64 {
lowbits := ii.iter.Next()
x := uint64(lowbits) | ii.hs
if !ii.iter.HasNext() {
ii.pos = ii.pos + 1
ii.init()
}
return x
}
// PeekNext peeks the next value without advancing the iterator
func (ii *intIterator) PeekNext() uint64 {
return uint64(ii.iter.PeekNext()&maxLowBit) | ii.hs
}
// AdvanceIfNeeded advances as long as the next value is smaller than minval
func (ii *intIterator) AdvanceIfNeeded(minval uint64) {
to := minval >> 32
for ii.HasNext() && (ii.hs>>32) < to {
ii.pos++
ii.init()
}
if ii.HasNext() && (ii.hs>>32) == to {
ii.iter.AdvanceIfNeeded(lowbits(minval))
if !ii.iter.HasNext() {
ii.pos++
ii.init()
}
}
}
func newIntIterator(a *Bitmap) *intIterator {
p := new(intIterator)
p.pos = 0
p.highlowcontainer = &a.highlowcontainer
p.init()
return p
}
type intReverseIterator struct {
pos int
hs uint64
iter roaring.IntIterable
highlowcontainer *roaringArray64
}
// HasNext returns true if there are more integers to iterate over
func (ii *intReverseIterator) HasNext() bool {
return ii.pos >= 0
}
func (ii *intReverseIterator) init() {
if ii.pos >= 0 {
ii.iter = ii.highlowcontainer.getContainerAtIndex(ii.pos).ReverseIterator()
ii.hs = uint64(ii.highlowcontainer.getKeyAtIndex(ii.pos)) << 32
} else {
ii.iter = nil
}
}
// Next returns the next integer
func (ii *intReverseIterator) Next() uint64 {
x := uint64(ii.iter.Next()) | ii.hs
if !ii.iter.HasNext() {
ii.pos = ii.pos - 1
ii.init()
}
return x
}
func newIntReverseIterator(a *Bitmap) *intReverseIterator {
p := new(intReverseIterator)
p.highlowcontainer = &a.highlowcontainer
p.pos = a.highlowcontainer.size() - 1
p.init()
return p
}
// ManyIntIterable64 allows you to iterate over the values in a Bitmap
type ManyIntIterable64 interface {
// pass in a buffer to fill up with values, returns how many values were returned
NextMany([]uint64) int
}
type manyIntIterator struct {
pos int
hs uint64
iter roaring.ManyIntIterable
highlowcontainer *roaringArray64
}
func (ii *manyIntIterator) init() {
if ii.highlowcontainer.size() > ii.pos {
ii.iter = ii.highlowcontainer.getContainerAtIndex(ii.pos).ManyIterator()
ii.hs = uint64(ii.highlowcontainer.getKeyAtIndex(ii.pos)) << 32
} else {
ii.iter = nil
}
}
func (ii *manyIntIterator) NextMany(buf []uint64) int {
n := 0
for n < len(buf) {
if ii.iter == nil {
break
}
moreN := ii.iter.NextMany64(ii.hs, buf[n:])
n += moreN
if moreN == 0 {
ii.pos = ii.pos + 1
ii.init()
}
}
return n
}
func newManyIntIterator(a *Bitmap) *manyIntIterator {
p := new(manyIntIterator)
p.pos = 0
p.highlowcontainer = &a.highlowcontainer
p.init()
return p
}

View File

@@ -0,0 +1,297 @@
package roaring64
import (
"fmt"
"runtime"
"github.com/RoaringBitmap/roaring/v2"
)
var defaultWorkerCount = runtime.NumCPU()
// ParOr computes the union (OR) of all provided bitmaps in parallel,
// where the parameter "parallelism" determines how many workers are to be used
// (if it is set to 0, a default number of workers is chosen)
func ParOr(parallelism int, bitmaps ...*Bitmap) *Bitmap {
var lKey uint32 = maxUint32
var hKey uint32
bitmapsFiltered := bitmaps[:0]
for _, b := range bitmaps {
if !b.IsEmpty() {
bitmapsFiltered = append(bitmapsFiltered, b)
}
}
bitmaps = bitmapsFiltered
for _, b := range bitmaps {
lKey = minOfUint32(lKey, b.highlowcontainer.keys[0])
hKey = maxOfUint32(hKey, b.highlowcontainer.keys[b.highlowcontainer.size()-1])
}
if lKey == maxUint32 && hKey == 0 {
return New()
} else if len(bitmaps) == 1 {
return bitmaps[0]
}
// The following might overflow and we do not want that!
// as it might lead to a channel of size 0 later which,
// on some systems, would block indefinitely.
keyRange := uint64(hKey) - uint64(lKey) + 1
if keyRange == 1 {
// All bitmaps have the same key,
// we can merge the 32-bit roaring bitmaps in parallel
var bms32s = make([]*roaring.Bitmap, 0, len(bitmaps))
for _, b := range bitmaps {
bms32s = append(bms32s, b.highlowcontainer.containers...)
}
return roaring32AsRoaring64(roaring.ParOr(parallelism, bms32s...), lKey)
}
if parallelism == 0 {
parallelism = defaultWorkerCount
}
// We cannot use int since int is 32-bit on 32-bit systems.
var chunkSize int64
var chunkCount int64
if int64(parallelism)*4 > int64(keyRange) {
chunkSize = 1
chunkCount = int64(keyRange)
} else {
chunkCount = int64(parallelism) * 4
chunkSize = (int64(keyRange) + chunkCount - 1) / chunkCount
}
if chunkCount*chunkSize < int64(keyRange) {
// it's fine to panic to indicate an implementation error
panic(fmt.Sprintf("invariant check failed: chunkCount * chunkSize < keyRange, %d * %d < %d", chunkCount, chunkSize, keyRange))
}
chunks := make([]*roaringArray64, chunkCount)
chunkSpecChan := make(chan parChunkSpec, minOfInt(maxOfInt(64, 2*parallelism), int(chunkCount)))
chunkChan := make(chan parChunk, minOfInt(32, int(chunkCount)))
orFunc := func() {
for spec := range chunkSpecChan {
ra := orOnRange(&bitmaps[0].highlowcontainer, &bitmaps[1].highlowcontainer, spec.start, spec.end)
for _, b := range bitmaps[2:] {
ra = iorOnRange(ra, &b.highlowcontainer, spec.start, spec.end)
}
chunkChan <- parChunk{ra, spec.idx}
}
}
for i := 0; i < parallelism; i++ {
go orFunc()
}
go func() {
for i := int64(0); i < chunkCount; i++ {
spec := parChunkSpec{
start: uint32(int64(lKey) + i*chunkSize),
end: uint32(minOfInt64(int64(lKey)+(i+1)*chunkSize-1, int64(hKey))),
idx: int(i),
}
chunkSpecChan <- spec
}
}()
chunksRemaining := chunkCount
for chunk := range chunkChan {
chunks[chunk.idx] = chunk.ra
chunksRemaining--
if chunksRemaining == 0 {
break
}
}
close(chunkChan)
close(chunkSpecChan)
containerCount := 0
for _, chunk := range chunks {
containerCount += chunk.size()
}
result := Bitmap{
roaringArray64{
containers: make([]*roaring.Bitmap, containerCount),
keys: make([]uint32, containerCount),
needCopyOnWrite: make([]bool, containerCount),
},
}
resultOffset := 0
for _, chunk := range chunks {
copy(result.highlowcontainer.containers[resultOffset:], chunk.containers)
copy(result.highlowcontainer.keys[resultOffset:], chunk.keys)
copy(result.highlowcontainer.needCopyOnWrite[resultOffset:], chunk.needCopyOnWrite)
resultOffset += chunk.size()
}
return &result
}
type parChunkSpec struct {
start uint32
end uint32
idx int
}
type parChunk struct {
ra *roaringArray64
idx int
}
func (c parChunk) size() int {
return c.ra.size()
}
// parNaiveStartAt returns the index of the first key that is inclusive between start and last
// Returns the size if there is no such key
func parNaiveStartAt(ra *roaringArray64, start uint32, last uint32) int {
for idx, key := range ra.keys {
if key >= start && key <= last {
return idx
} else if key > last {
break
}
}
return ra.size()
}
func orOnRange(ra1, ra2 *roaringArray64, start, last uint32) *roaringArray64 {
answer := &roaringArray64{}
length1 := ra1.size()
length2 := ra2.size()
idx1 := parNaiveStartAt(ra1, start, last)
idx2 := parNaiveStartAt(ra2, start, last)
var key1 uint32
var key2 uint32
if idx1 < length1 && idx2 < length2 {
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
for key1 <= last && key2 <= last {
if key1 < key2 {
answer.appendCopy(*ra1, idx1)
idx1++
if idx1 == length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
} else if key1 > key2 {
answer.appendCopy(*ra2, idx2)
idx2++
if idx2 == length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
} else {
c1 := ra1.getContainerAtIndex(idx1)
// answer.appendContainer(key1, c1.lazyOR(ra2.getContainerAtIndex(idx2)), false)
answer.appendContainer(key1, roaring.Or(c1, ra2.getContainerAtIndex(idx2)), false)
idx1++
idx2++
if idx1 == length1 || idx2 == length2 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
}
}
}
if idx2 < length2 {
key2 = ra2.getKeyAtIndex(idx2)
for key2 <= last {
answer.appendCopy(*ra2, idx2)
idx2++
if idx2 == length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
}
}
if idx1 < length1 {
key1 = ra1.getKeyAtIndex(idx1)
for key1 <= last {
answer.appendCopy(*ra1, idx1)
idx1++
if idx1 == length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
}
}
return answer
}
func iorOnRange(ra1, ra2 *roaringArray64, start, last uint32) *roaringArray64 {
length1 := ra1.size()
length2 := ra2.size()
idx1 := 0
idx2 := parNaiveStartAt(ra2, start, last)
var key1 uint32
var key2 uint32
if idx1 < length1 && idx2 < length2 {
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
for key1 <= last && key2 <= last {
if key1 < key2 {
idx1++
if idx1 >= length1 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
} else if key1 > key2 {
ra1.insertNewKeyValueAt(idx1, key2, ra2.getContainerAtIndex(idx2))
ra1.needCopyOnWrite[idx1] = true
idx2++
idx1++
length1++
if idx2 >= length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
} else {
c1 := ra1.getWritableContainerAtIndex(idx1)
// ra1.containers[idx1] = c1.lazyIOR(ra2.getContainerAtIndex(idx2))
c1.Or(ra2.getContainerAtIndex(idx2))
ra1.setContainerAtIndex(idx1, c1)
ra1.needCopyOnWrite[idx1] = false
idx1++
idx2++
if idx1 >= length1 || idx2 >= length2 {
break
}
key1 = ra1.getKeyAtIndex(idx1)
key2 = ra2.getKeyAtIndex(idx2)
}
}
}
if idx2 < length2 {
key2 = ra2.getKeyAtIndex(idx2)
for key2 <= last {
ra1.appendCopy(*ra2, idx2)
idx2++
if idx2 >= length2 {
break
}
key2 = ra2.getKeyAtIndex(idx2)
}
}
return ra1
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,462 @@
package roaring64
import (
"errors"
"github.com/RoaringBitmap/roaring/v2"
)
type roaringArray64 struct {
keys []uint32
containers []*roaring.Bitmap
needCopyOnWrite []bool
copyOnWrite bool
}
var (
ErrKeySortOrder = errors.New("keys were out of order")
ErrCardinalityConstraint = errors.New("size of arrays was not coherent")
)
// runOptimize compresses the element containers to minimize space consumed.
// Q: how does this interact with copyOnWrite and needCopyOnWrite?
// A: since we aren't changing the logical content, just the representation,
//
// we don't bother to check the needCopyOnWrite bits. We replace
// (possibly all) elements of ra.containers in-place with space
// optimized versions.
func (ra *roaringArray64) runOptimize() {
for i := range ra.containers {
ra.containers[i].RunOptimize()
}
}
func (ra *roaringArray64) appendContainer(key uint32, value *roaring.Bitmap, mustCopyOnWrite bool) {
ra.keys = append(ra.keys, key)
ra.containers = append(ra.containers, value)
ra.needCopyOnWrite = append(ra.needCopyOnWrite, mustCopyOnWrite)
}
func (ra *roaringArray64) appendWithoutCopy(sa roaringArray64, startingindex int) {
mustCopyOnWrite := sa.needCopyOnWrite[startingindex]
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex], mustCopyOnWrite)
}
func (ra *roaringArray64) appendCopy(sa roaringArray64, startingindex int) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := (ra.copyOnWrite && sa.copyOnWrite) || sa.needsCopyOnWrite(startingindex)
if !copyonwrite {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex].Clone(), copyonwrite)
} else {
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex].Clone(), copyonwrite)
if !sa.needsCopyOnWrite(startingindex) {
sa.setNeedsCopyOnWrite(startingindex)
}
}
}
func (ra *roaringArray64) appendWithoutCopyMany(sa roaringArray64, startingindex, end int) {
for i := startingindex; i < end; i++ {
ra.appendWithoutCopy(sa, i)
}
}
func (ra *roaringArray64) appendCopyMany(sa roaringArray64, startingindex, end int) {
for i := startingindex; i < end; i++ {
ra.appendCopy(sa, i)
}
}
func (ra *roaringArray64) appendCopiesUntil(sa roaringArray64, stoppingKey uint32) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := ra.copyOnWrite && sa.copyOnWrite
for i := 0; i < sa.size(); i++ {
if sa.keys[i] >= stoppingKey {
break
}
thiscopyonewrite := copyonwrite || sa.needsCopyOnWrite(i)
if thiscopyonewrite {
ra.appendContainer(sa.keys[i], sa.containers[i], thiscopyonewrite)
if !sa.needsCopyOnWrite(i) {
sa.setNeedsCopyOnWrite(i)
}
} else {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[i], sa.containers[i].Clone(), thiscopyonewrite)
}
}
}
func (ra *roaringArray64) appendCopiesAfter(sa roaringArray64, beforeStart uint32) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := ra.copyOnWrite && sa.copyOnWrite
startLocation := sa.getIndex(beforeStart)
if startLocation >= 0 {
startLocation++
} else {
startLocation = -startLocation - 1
}
for i := startLocation; i < sa.size(); i++ {
thiscopyonewrite := copyonwrite || sa.needsCopyOnWrite(i)
if thiscopyonewrite {
ra.appendContainer(sa.keys[i], sa.containers[i], thiscopyonewrite)
if !sa.needsCopyOnWrite(i) {
sa.setNeedsCopyOnWrite(i)
}
} else {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[i], sa.containers[i].Clone(), thiscopyonewrite)
}
}
}
func (ra *roaringArray64) removeIndexRange(begin, end int) {
if end <= begin {
return
}
r := end - begin
copy(ra.keys[begin:], ra.keys[end:])
copy(ra.containers[begin:], ra.containers[end:])
copy(ra.needCopyOnWrite[begin:], ra.needCopyOnWrite[end:])
ra.resize(len(ra.keys) - r)
}
func (ra *roaringArray64) resize(newsize int) {
for k := newsize; k < len(ra.containers); k++ {
ra.keys[k] = 0
ra.needCopyOnWrite[k] = false
ra.containers[k] = nil
}
ra.keys = ra.keys[:newsize]
ra.containers = ra.containers[:newsize]
ra.needCopyOnWrite = ra.needCopyOnWrite[:newsize]
}
func (ra *roaringArray64) clear() {
ra.resize(0)
ra.copyOnWrite = false
}
func (ra *roaringArray64) clone() *roaringArray64 {
sa := roaringArray64{}
sa.copyOnWrite = ra.copyOnWrite
// this is where copyOnWrite is used.
if ra.copyOnWrite {
sa.keys = make([]uint32, len(ra.keys))
copy(sa.keys, ra.keys)
sa.containers = make([]*roaring.Bitmap, len(ra.containers))
copy(sa.containers, ra.containers)
sa.needCopyOnWrite = make([]bool, len(ra.needCopyOnWrite))
ra.markAllAsNeedingCopyOnWrite()
sa.markAllAsNeedingCopyOnWrite()
// sa.needCopyOnWrite is shared
} else {
// make a full copy
sa.keys = make([]uint32, len(ra.keys))
copy(sa.keys, ra.keys)
sa.containers = make([]*roaring.Bitmap, len(ra.containers))
for i := range sa.containers {
sa.containers[i] = ra.containers[i].Clone()
}
sa.needCopyOnWrite = make([]bool, len(ra.needCopyOnWrite))
}
return &sa
}
// clone all containers which have needCopyOnWrite set to true
// This can be used to make sure it is safe to munmap a []byte
// that the roaring array may still have a reference to.
func (ra *roaringArray64) cloneCopyOnWriteContainers() {
for i, needCopyOnWrite := range ra.needCopyOnWrite {
if needCopyOnWrite {
ra.containers[i] = ra.containers[i].Clone()
ra.needCopyOnWrite[i] = false
}
}
}
// unused function:
// func (ra *roaringArray64) containsKey(x uint32) bool {
// return (ra.binarySearch(0, int64(len(ra.keys)), x) >= 0)
// }
func (ra *roaringArray64) getContainer(x uint32) *roaring.Bitmap {
i := ra.binarySearch(0, int64(len(ra.keys)), x)
if i < 0 {
return nil
}
return ra.containers[i]
}
func (ra *roaringArray64) getContainerAtIndex(i int) *roaring.Bitmap {
return ra.containers[i]
}
func (ra *roaringArray64) getWritableContainerAtIndex(i int) *roaring.Bitmap {
if ra.needCopyOnWrite[i] {
ra.containers[i] = ra.containers[i].Clone()
ra.needCopyOnWrite[i] = false
}
return ra.containers[i]
}
func (ra *roaringArray64) getIndex(x uint32) int {
// before the binary search, we optimize for frequent cases
size := len(ra.keys)
if (size == 0) || (ra.keys[size-1] == x) {
return size - 1
}
return ra.binarySearch(0, int64(size), x)
}
func (ra *roaringArray64) getKeyAtIndex(i int) uint32 {
return ra.keys[i]
}
func (ra *roaringArray64) insertNewKeyValueAt(i int, key uint32, value *roaring.Bitmap) {
ra.keys = append(ra.keys, 0)
ra.containers = append(ra.containers, nil)
copy(ra.keys[i+1:], ra.keys[i:])
copy(ra.containers[i+1:], ra.containers[i:])
ra.keys[i] = key
ra.containers[i] = value
ra.needCopyOnWrite = append(ra.needCopyOnWrite, false)
copy(ra.needCopyOnWrite[i+1:], ra.needCopyOnWrite[i:])
ra.needCopyOnWrite[i] = false
}
func (ra *roaringArray64) remove(key uint32) bool {
i := ra.binarySearch(0, int64(len(ra.keys)), key)
if i >= 0 { // if a new key
ra.removeAtIndex(i)
return true
}
return false
}
func (ra *roaringArray64) removeAtIndex(i int) {
copy(ra.keys[i:], ra.keys[i+1:])
copy(ra.containers[i:], ra.containers[i+1:])
copy(ra.needCopyOnWrite[i:], ra.needCopyOnWrite[i+1:])
ra.resize(len(ra.keys) - 1)
}
func (ra *roaringArray64) setContainerAtIndex(i int, c *roaring.Bitmap) {
ra.containers[i] = c
}
func (ra *roaringArray64) replaceKeyAndContainerAtIndex(i int, key uint32, c *roaring.Bitmap, mustCopyOnWrite bool) {
ra.keys[i] = key
ra.containers[i] = c
ra.needCopyOnWrite[i] = mustCopyOnWrite
}
func (ra *roaringArray64) size() int {
return len(ra.keys)
}
func (ra *roaringArray64) binarySearch(begin, end int64, ikey uint32) int {
low := begin
high := end - 1
for low+16 <= high {
middleIndex := low + (high-low)/2 // avoid overflow
middleValue := ra.keys[middleIndex]
if middleValue < ikey {
low = middleIndex + 1
} else if middleValue > ikey {
high = middleIndex - 1
} else {
return int(middleIndex)
}
}
for ; low <= high; low++ {
val := ra.keys[low]
if val >= ikey {
if val == ikey {
return int(low)
}
break
}
}
return -int(low + 1)
}
func (ra *roaringArray64) equals(o interface{}) bool {
srb, ok := o.(roaringArray64)
if ok {
if srb.size() != ra.size() {
return false
}
for i, k := range ra.keys {
if k != srb.keys[i] {
return false
}
}
for i, c := range ra.containers {
if !c.Equals(srb.containers[i]) {
return false
}
}
return true
}
return false
}
func (ra *roaringArray64) hasRunCompression() bool {
for _, c := range ra.containers {
if c.HasRunCompression() {
return true
}
}
return false
}
/**
* Find the smallest integer index strictly larger than pos such that array[index].key&gt;=min. If none can
* be found, return size. Based on code by O. Kaser.
*
* @param min minimal value
* @param pos index to exceed
* @return the smallest index greater than pos such that array[index].key is at least as large as
* min, or size if it is not possible.
*/
func (ra *roaringArray64) advanceUntil(min uint32, pos int) int {
lower := pos + 1
if lower >= len(ra.keys) || ra.keys[lower] >= min {
return lower
}
spansize := 1
for lower+spansize < len(ra.keys) && ra.keys[lower+spansize] < min {
spansize *= 2
}
var upper int
if lower+spansize < len(ra.keys) {
upper = lower + spansize
} else {
upper = len(ra.keys) - 1
}
if ra.keys[upper] == min {
return upper
}
if ra.keys[upper] < min {
// means
// array
// has no
// item
// >= min
// pos = array.length;
return len(ra.keys)
}
// we know that the next-smallest span was too small
lower += (spansize >> 1)
mid := 0
for lower+1 != upper {
mid = (lower + upper) >> 1
if ra.keys[mid] == min {
return mid
} else if ra.keys[mid] < min {
lower = mid
} else {
upper = mid
}
}
return upper
}
func (ra *roaringArray64) markAllAsNeedingCopyOnWrite() {
for i := range ra.needCopyOnWrite {
ra.needCopyOnWrite[i] = true
}
}
func (ra *roaringArray64) needsCopyOnWrite(i int) bool {
return ra.needCopyOnWrite[i]
}
func (ra *roaringArray64) setNeedsCopyOnWrite(i int) {
ra.needCopyOnWrite[i] = true
}
// should be dirt cheap
func (ra *roaringArray64) serializedSizeInBytes() uint64 {
answer := uint64(8)
for _, c := range ra.containers {
answer += 4
answer += c.GetSerializedSizeInBytes()
}
return answer
}
func (ra *roaringArray64) checkKeysSorted() bool {
if len(ra.keys) == 0 || len(ra.keys) == 1 {
return true
}
previous := ra.keys[0]
for nextIdx := 1; nextIdx < len(ra.keys); nextIdx++ {
next := ra.keys[nextIdx]
if previous >= next {
return false
}
previous = next
}
return true
}
// validate checks the referential integrity
// ensures len(keys) == len(containers), recurses and checks each container type
func (ra *roaringArray64) validate() error {
if !ra.checkKeysSorted() {
return ErrKeySortOrder
}
if len(ra.keys) != len(ra.containers) {
return ErrCardinalityConstraint
}
if len(ra.keys) != len(ra.needCopyOnWrite) {
return ErrCardinalityConstraint
}
for _, maps := range ra.containers {
err := maps.Validate()
if err != nil {
return err
}
if maps.IsEmpty() {
return errors.New("empty container")
}
}
return nil
}

View File

@@ -0,0 +1,49 @@
package roaring64
import "github.com/RoaringBitmap/roaring/v2"
func highbits(x uint64) uint32 {
return uint32(x >> 32)
}
func lowbits(x uint64) uint32 {
return uint32(x & maxLowBit)
}
const maxLowBit = roaring.MaxUint32
const maxUint32 = roaring.MaxUint32
func minOfInt64(a, b int64) int64 {
if a < b {
return a
}
return b
}
func minOfInt(a, b int) int {
if a < b {
return a
}
return b
}
func maxOfInt(a, b int) int {
if a > b {
return a
}
return b
}
func maxOfUint32(a, b uint32) uint32 {
if a > b {
return a
}
return b
}
func minOfUint32(a, b uint32) uint32 {
if a < b {
return a
}
return b
}

View File

@@ -0,0 +1,821 @@
package roaring
import (
"bytes"
"encoding/binary"
"errors"
"fmt"
"io"
"github.com/RoaringBitmap/roaring/v2/internal"
)
type container interface {
// addOffset returns the (low, high) parts of the shifted container.
// Whenever one of them would be empty, nil will be returned instead to
// avoid unnecessary allocations.
addOffset(uint16) (container, container)
clone() container
and(container) container
andCardinality(container) int
iand(container) container // i stands for inplace
andNot(container) container
iandNot(container) container // i stands for inplace
isEmpty() bool
getCardinality() int
// rank returns the number of integers that are
// smaller or equal to x. rank(infinity) would be getCardinality().
rank(uint16) int
iadd(x uint16) bool // inplace, returns true if x was new.
iaddReturnMinimized(uint16) container // may change return type to minimize storage.
iaddRange(start, endx int) container // i stands for inplace, range is [firstOfRange,endx)
iremove(x uint16) bool // inplace, returns true if x was present.
iremoveReturnMinimized(uint16) container // may change return type to minimize storage.
not(start, final int) container // range is [firstOfRange,lastOfRange)
inot(firstOfRange, endx int) container // i stands for inplace, range is [firstOfRange,endx)
xor(r container) container
getShortIterator() shortPeekable
getUnsetIterator() shortPeekable
iterate(cb func(x uint16) bool) bool
getReverseIterator() shortIterable
getManyIterator() manyIterable
contains(i uint16) bool
maximum() uint16
minimum() uint16
// equals is now logical equals; it does not require the
// same underlying container types, but compares across
// any of the implementations.
equals(r container) bool
fillLeastSignificant16bits(array []uint32, i int, mask uint32) int
or(r container) container
orCardinality(r container) int
isFull() bool
ior(r container) container // i stands for inplace
intersects(r container) bool // whether the two containers intersect
lazyOR(r container) container
lazyIOR(r container) container
getSizeInBytes() int
iremoveRange(start, final int) container // i stands for inplace, range is [firstOfRange,lastOfRange)
selectInt(x uint16) int // selectInt returns the xth integer in the container
serializedSizeInBytes() int
writeTo(io.Writer) (int, error)
numberOfRuns() int
toEfficientContainer() container
String() string
containerType() contype
safeMinimum() (uint16, error)
safeMaximum() (uint16, error)
nextValue(x uint16) int
previousValue(x uint16) int
nextAbsentValue(x uint16) int
previousAbsentValue(x uint16) int
validate() error
}
type contype uint8
const (
bitmapContype contype = iota
arrayContype
run16Contype
run32Contype
)
var (
ErrKeySortOrder = errors.New("keys were out of order")
ErrCardinalityConstraint = errors.New("size of arrays was not coherent")
)
// careful: range is [firstOfRange,lastOfRange]
func rangeOfOnes(start, last int) container {
if start > MaxUint16 {
panic("rangeOfOnes called with start > MaxUint16")
}
if last > MaxUint16 {
panic("rangeOfOnes called with last > MaxUint16")
}
if start < 0 {
panic("rangeOfOnes called with start < 0")
}
if last < 0 {
panic("rangeOfOnes called with last < 0")
}
return newRunContainer16Range(uint16(start), uint16(last)).toEfficientContainer()
}
type roaringArray struct {
keys []uint16
containers []container `msg:"-"` // don't try to serialize directly.
needCopyOnWrite []bool
copyOnWrite bool
}
func newRoaringArray() *roaringArray {
return &roaringArray{}
}
// runOptimize compresses the element containers to minimize space consumed.
// Q: how does this interact with copyOnWrite and needCopyOnWrite?
// A: since we aren't changing the logical content, just the representation,
//
// we don't bother to check the needCopyOnWrite bits. We replace
// (possibly all) elements of ra.containers in-place with space
// optimized versions.
func (ra *roaringArray) runOptimize() {
for i := range ra.containers {
ra.containers[i] = ra.containers[i].toEfficientContainer()
}
}
func (ra *roaringArray) appendContainer(key uint16, value container, mustCopyOnWrite bool) {
ra.keys = append(ra.keys, key)
ra.containers = append(ra.containers, value)
ra.needCopyOnWrite = append(ra.needCopyOnWrite, mustCopyOnWrite)
}
func (ra *roaringArray) appendWithoutCopy(sa roaringArray, startingindex int) {
mustCopyOnWrite := sa.needCopyOnWrite[startingindex]
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex], mustCopyOnWrite)
}
func (ra *roaringArray) appendCopy(sa roaringArray, startingindex int) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := (ra.copyOnWrite && sa.copyOnWrite) || sa.needsCopyOnWrite(startingindex)
if !copyonwrite {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex].clone(), copyonwrite)
} else {
ra.appendContainer(sa.keys[startingindex], sa.containers[startingindex], copyonwrite)
if !sa.needsCopyOnWrite(startingindex) {
sa.setNeedsCopyOnWrite(startingindex)
}
}
}
func (ra *roaringArray) appendWithoutCopyMany(sa roaringArray, startingindex, end int) {
for i := startingindex; i < end; i++ {
ra.appendWithoutCopy(sa, i)
}
}
func (ra *roaringArray) appendCopyMany(sa roaringArray, startingindex, end int) {
for i := startingindex; i < end; i++ {
ra.appendCopy(sa, i)
}
}
func (ra *roaringArray) appendCopiesUntil(sa roaringArray, stoppingKey uint16) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := ra.copyOnWrite && sa.copyOnWrite
for i := 0; i < sa.size(); i++ {
if sa.keys[i] >= stoppingKey {
break
}
thiscopyonewrite := copyonwrite || sa.needsCopyOnWrite(i)
if thiscopyonewrite {
ra.appendContainer(sa.keys[i], sa.containers[i], thiscopyonewrite)
if !sa.needsCopyOnWrite(i) {
sa.setNeedsCopyOnWrite(i)
}
} else {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[i], sa.containers[i].clone(), thiscopyonewrite)
}
}
}
func (ra *roaringArray) appendCopiesAfter(sa roaringArray, beforeStart uint16) {
// cow only if the two request it, or if we already have a lightweight copy
copyonwrite := ra.copyOnWrite && sa.copyOnWrite
startLocation := sa.getIndex(beforeStart)
if startLocation >= 0 {
startLocation++
} else {
startLocation = -startLocation - 1
}
for i := startLocation; i < sa.size(); i++ {
thiscopyonewrite := copyonwrite || sa.needsCopyOnWrite(i)
if thiscopyonewrite {
ra.appendContainer(sa.keys[i], sa.containers[i], thiscopyonewrite)
if !sa.needsCopyOnWrite(i) {
sa.setNeedsCopyOnWrite(i)
}
} else {
// since there is no copy-on-write, we need to clone the container (this is important)
ra.appendContainer(sa.keys[i], sa.containers[i].clone(), thiscopyonewrite)
}
}
}
func (ra *roaringArray) removeIndexRange(begin, end int) {
if end <= begin {
return
}
r := end - begin
copy(ra.keys[begin:], ra.keys[end:])
copy(ra.containers[begin:], ra.containers[end:])
copy(ra.needCopyOnWrite[begin:], ra.needCopyOnWrite[end:])
ra.resize(len(ra.keys) - r)
}
func (ra *roaringArray) resize(newsize int) {
for k := newsize; k < len(ra.containers); k++ {
ra.containers[k] = nil
}
ra.keys = ra.keys[:newsize]
ra.containers = ra.containers[:newsize]
ra.needCopyOnWrite = ra.needCopyOnWrite[:newsize]
}
func (ra *roaringArray) clear() {
ra.resize(0)
ra.copyOnWrite = false
}
func (ra *roaringArray) clone() *roaringArray {
sa := roaringArray{}
sa.copyOnWrite = ra.copyOnWrite
// this is where copyOnWrite is used.
if ra.copyOnWrite {
sa.keys = make([]uint16, len(ra.keys))
copy(sa.keys, ra.keys)
sa.containers = make([]container, len(ra.containers))
copy(sa.containers, ra.containers)
sa.needCopyOnWrite = make([]bool, len(ra.needCopyOnWrite))
ra.markAllAsNeedingCopyOnWrite()
sa.markAllAsNeedingCopyOnWrite()
// sa.needCopyOnWrite is shared
} else {
// make a full copy
sa.keys = make([]uint16, len(ra.keys))
copy(sa.keys, ra.keys)
sa.containers = make([]container, len(ra.containers))
for i := range sa.containers {
sa.containers[i] = ra.containers[i].clone()
}
sa.needCopyOnWrite = make([]bool, len(ra.needCopyOnWrite))
}
return &sa
}
// clone all containers which have needCopyOnWrite set to true
// This can be used to make sure it is safe to munmap a []byte
// that the roaring array may still have a reference to.
func (ra *roaringArray) cloneCopyOnWriteContainers() {
for i, needCopyOnWrite := range ra.needCopyOnWrite {
if needCopyOnWrite {
ra.containers[i] = ra.containers[i].clone()
ra.needCopyOnWrite[i] = false
}
}
}
// unused function:
//func (ra *roaringArray) containsKey(x uint16) bool {
// return (ra.binarySearch(0, int64(len(ra.keys)), x) >= 0)
//}
// getContainer returns the container with key `x`
// if no such container exists `nil` is returned
func (ra *roaringArray) getContainer(x uint16) container {
i := ra.binarySearch(0, int64(len(ra.keys)), x)
if i < 0 {
return nil
}
return ra.containers[i]
}
func (ra *roaringArray) getContainerAtIndex(i int) container {
return ra.containers[i]
}
func (ra *roaringArray) getFastContainerAtIndex(i int, needsWriteable bool) container {
c := ra.getContainerAtIndex(i)
switch t := c.(type) {
case *arrayContainer:
c = t.toBitmapContainer()
case *runContainer16:
if !t.isFull() {
c = t.toBitmapContainer()
}
case *bitmapContainer:
if needsWriteable && ra.needCopyOnWrite[i] {
c = ra.containers[i].clone()
}
}
return c
}
// getUnionedWritableContainer switches behavior for in-place Or
// depending on whether the container requires a copy on write.
// If it does using the non-inplace or() method leads to fewer allocations.
func (ra *roaringArray) getUnionedWritableContainer(pos int, other container) container {
if ra.needCopyOnWrite[pos] {
return ra.getContainerAtIndex(pos).or(other)
}
return ra.getContainerAtIndex(pos).ior(other)
}
func (ra *roaringArray) getWritableContainerAtIndex(i int) container {
if ra.needCopyOnWrite[i] {
ra.containers[i] = ra.containers[i].clone()
ra.needCopyOnWrite[i] = false
}
return ra.containers[i]
}
// getIndex returns the index of the container with key `x`
// if no such container exists a negative value is returned
func (ra *roaringArray) getIndex(x uint16) int {
// Todo : test
// before the binary search, we optimize for frequent cases
size := len(ra.keys)
if (size == 0) || (ra.keys[size-1] == x) {
return size - 1
}
return ra.binarySearch(0, int64(size), x)
}
func (ra *roaringArray) getKeyAtIndex(i int) uint16 {
return ra.keys[i]
}
func (ra *roaringArray) insertNewKeyValueAt(i int, key uint16, value container) {
ra.keys = append(ra.keys, 0)
ra.containers = append(ra.containers, nil)
copy(ra.keys[i+1:], ra.keys[i:])
copy(ra.containers[i+1:], ra.containers[i:])
ra.keys[i] = key
ra.containers[i] = value
ra.needCopyOnWrite = append(ra.needCopyOnWrite, false)
copy(ra.needCopyOnWrite[i+1:], ra.needCopyOnWrite[i:])
ra.needCopyOnWrite[i] = false
}
func (ra *roaringArray) remove(key uint16) bool {
i := ra.binarySearch(0, int64(len(ra.keys)), key)
if i >= 0 { // if a new key
ra.removeAtIndex(i)
return true
}
return false
}
func (ra *roaringArray) removeAtIndex(i int) {
copy(ra.keys[i:], ra.keys[i+1:])
copy(ra.containers[i:], ra.containers[i+1:])
copy(ra.needCopyOnWrite[i:], ra.needCopyOnWrite[i+1:])
ra.resize(len(ra.keys) - 1)
}
func (ra *roaringArray) setContainerAtIndex(i int, c container) {
ra.containers[i] = c
}
func (ra *roaringArray) replaceKeyAndContainerAtIndex(i int, key uint16, c container, mustCopyOnWrite bool) {
ra.keys[i] = key
ra.containers[i] = c
ra.needCopyOnWrite[i] = mustCopyOnWrite
}
func (ra *roaringArray) size() int {
return len(ra.keys)
}
// binarySearch returns the index of the key.
// negative value returned if not found
func (ra *roaringArray) binarySearch(begin, end int64, ikey uint16) int {
// TODO: add unit tests
low := begin
high := end - 1
for low+16 <= high {
middleIndex := low + (high-low)/2 // avoid overflow
middleValue := ra.keys[middleIndex]
if middleValue < ikey {
low = middleIndex + 1
} else if middleValue > ikey {
high = middleIndex - 1
} else {
return int(middleIndex)
}
}
for ; low <= high; low++ {
val := ra.keys[low]
if val >= ikey {
if val == ikey {
return int(low)
}
break
}
}
return -int(low + 1)
}
func (ra *roaringArray) equals(o interface{}) bool {
srb, ok := o.(roaringArray)
if ok {
if srb.size() != ra.size() {
return false
}
for i, k := range ra.keys {
if k != srb.keys[i] {
return false
}
}
for i, c := range ra.containers {
if !c.equals(srb.containers[i]) {
return false
}
}
return true
}
return false
}
func (ra *roaringArray) headerSize() uint64 {
size := uint64(len(ra.keys))
if ra.hasRunCompression() {
if size < noOffsetThreshold { // for small bitmaps, we omit the offsets
return 4 + (size+7)/8 + 4*size
}
return 4 + (size+7)/8 + 8*size // - 4 because we pack the size with the cookie
}
return 4 + 4 + 8*size
}
// should be dirt cheap
func (ra *roaringArray) serializedSizeInBytes() uint64 {
answer := ra.headerSize()
for _, c := range ra.containers {
answer += uint64(c.serializedSizeInBytes())
}
return answer
}
// spec: https://github.com/RoaringBitmap/RoaringFormatSpec
func (ra *roaringArray) writeTo(w io.Writer) (n int64, err error) {
hasRun := ra.hasRunCompression()
isRunSizeInBytes := 0
cookieSize := 8
if hasRun {
cookieSize = 4
isRunSizeInBytes = (len(ra.keys) + 7) / 8
}
descriptiveHeaderSize := 4 * len(ra.keys)
preambleSize := cookieSize + isRunSizeInBytes + descriptiveHeaderSize
buf := make([]byte, preambleSize+4*len(ra.keys))
nw := 0
if hasRun {
binary.LittleEndian.PutUint16(buf[0:], uint16(serialCookie))
nw += 2
binary.LittleEndian.PutUint16(buf[2:], uint16(len(ra.keys)-1))
nw += 2
// compute isRun bitmap without temporary allocation
runbitmapslice := buf[nw : nw+isRunSizeInBytes]
for i, c := range ra.containers {
switch c.(type) {
case *runContainer16:
runbitmapslice[i/8] |= 1 << (uint(i) % 8)
}
}
nw += isRunSizeInBytes
} else {
binary.LittleEndian.PutUint32(buf[0:], uint32(serialCookieNoRunContainer))
nw += 4
binary.LittleEndian.PutUint32(buf[4:], uint32(len(ra.keys)))
nw += 4
}
// descriptive header
for i, key := range ra.keys {
binary.LittleEndian.PutUint16(buf[nw:], key)
nw += 2
c := ra.containers[i]
binary.LittleEndian.PutUint16(buf[nw:], uint16(c.getCardinality()-1))
nw += 2
}
startOffset := int64(preambleSize + 4*len(ra.keys))
if !hasRun || (len(ra.keys) >= noOffsetThreshold) {
// offset header
for _, c := range ra.containers {
binary.LittleEndian.PutUint32(buf[nw:], uint32(startOffset))
nw += 4
switch rc := c.(type) {
case *runContainer16:
startOffset += 2 + int64(len(rc.iv))*4
default:
startOffset += int64(getSizeInBytesFromCardinality(c.getCardinality()))
}
}
}
written, err := w.Write(buf[:nw])
if err != nil {
return n, err
}
n += int64(written)
for _, c := range ra.containers {
written, err := c.writeTo(w)
if err != nil {
return n, err
}
n += int64(written)
}
return n, nil
}
// spec: https://github.com/RoaringBitmap/RoaringFormatSpec
func (ra *roaringArray) toBytes() ([]byte, error) {
var buf bytes.Buffer
_, err := ra.writeTo(&buf)
return buf.Bytes(), err
}
// Reads a serialized roaringArray from a byte slice.
func (ra *roaringArray) readFrom(stream internal.ByteInput, cookieHeader ...byte) (int64, error) {
var cookie uint32
var err error
if len(cookieHeader) > 0 && len(cookieHeader) != 4 {
return int64(len(cookieHeader)), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: incorrect size of cookie header")
}
if len(cookieHeader) == 4 {
cookie = binary.LittleEndian.Uint32(cookieHeader)
} else {
cookie, err = stream.ReadUInt32()
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("error in roaringArray.readFrom: could not read initial cookie: %s", err)
}
}
// If NextReturnsSafeSlice is false, then willNeedCopyOnWrite should be true
willNeedCopyOnWrite := !stream.NextReturnsSafeSlice()
var size uint32
var isRunBitmap []byte
if cookie&0x0000FFFF == serialCookie {
size = uint32(cookie>>16 + 1)
// create is-run-container bitmap
isRunBitmapSize := (int(size) + 7) / 8
isRunBitmap, err = stream.Next(isRunBitmapSize)
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("malformed bitmap, failed to read is-run bitmap, got: %s", err)
}
} else if cookie == serialCookieNoRunContainer {
size, err = stream.ReadUInt32()
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("malformed bitmap, failed to read a bitmap size: %s", err)
}
} else {
return stream.GetReadBytes(), fmt.Errorf("error in roaringArray.readFrom: did not find expected serialCookie in header")
}
if size > (1 << 16) {
return stream.GetReadBytes(), fmt.Errorf("it is logically impossible to have more than (1<<16) containers")
}
// descriptive header
buf, err := stream.Next(2 * 2 * int(size))
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("failed to read descriptive header: %s", err)
}
keycard := byteSliceAsUint16Slice(buf)
if isRunBitmap == nil || size >= noOffsetThreshold {
if err := stream.SkipBytes(int(size) * 4); err != nil {
return stream.GetReadBytes(), fmt.Errorf("failed to skip bytes: %s", err)
}
}
// Allocate slices upfront as number of containers is known
if cap(ra.containers) >= int(size) {
ra.containers = ra.containers[:size]
} else {
ra.containers = make([]container, size)
}
if cap(ra.keys) >= int(size) {
ra.keys = ra.keys[:size]
} else {
ra.keys = make([]uint16, size)
}
if cap(ra.needCopyOnWrite) >= int(size) {
ra.needCopyOnWrite = ra.needCopyOnWrite[:size]
} else {
ra.needCopyOnWrite = make([]bool, size)
}
for i := uint32(0); i < size; i++ {
key := keycard[2*i]
card := int(keycard[2*i+1]) + 1
ra.keys[i] = key
ra.needCopyOnWrite[i] = willNeedCopyOnWrite
if isRunBitmap != nil && isRunBitmap[i/8]&(1<<(i%8)) != 0 {
// run container
nr, err := stream.ReadUInt16()
if err != nil {
return 0, fmt.Errorf("failed to read runtime container size: %s", err)
}
buf, err := stream.Next(int(nr) * 4)
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("failed to read runtime container content: %s", err)
}
nb := runContainer16{
iv: byteSliceAsInterval16Slice(buf),
}
ra.containers[i] = &nb
} else if card > arrayDefaultMaxSize {
// bitmap container
buf, err := stream.Next(arrayDefaultMaxSize * 2)
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("failed to read bitmap container: %s", err)
}
nb := bitmapContainer{
cardinality: card,
bitmap: byteSliceAsUint64Slice(buf),
}
ra.containers[i] = &nb
} else {
// array container
buf, err := stream.Next(card * 2)
if err != nil {
return stream.GetReadBytes(), fmt.Errorf("failed to read array container: %s", err)
}
nb := arrayContainer{
byteSliceAsUint16Slice(buf),
}
ra.containers[i] = &nb
}
}
return stream.GetReadBytes(), nil
}
func (ra *roaringArray) hasRunCompression() bool {
for _, c := range ra.containers {
switch c.(type) {
case *runContainer16:
return true
}
}
return false
}
/**
* Find the smallest integer index larger than pos such that array[index].key&gt;=min. If none can
* be found, return size. Based on code by O. Kaser.
*
* @param min minimal value
* @param pos index to exceed
* @return the smallest index greater than pos such that array[index].key is at least as large as
* min, or size if it is not possible.
*/
func (ra *roaringArray) advanceUntil(min uint16, pos int) int {
lower := pos + 1
if lower >= len(ra.keys) || ra.keys[lower] >= min {
return lower
}
spansize := 1
for lower+spansize < len(ra.keys) && ra.keys[lower+spansize] < min {
spansize *= 2
}
var upper int
if lower+spansize < len(ra.keys) {
upper = lower + spansize
} else {
upper = len(ra.keys) - 1
}
if ra.keys[upper] == min {
return upper
}
if ra.keys[upper] < min {
// means
// array
// has no
// item
// >= min
// pos = array.length;
return len(ra.keys)
}
// we know that the next-smallest span was too small
lower += (spansize >> 1)
mid := 0
for lower+1 != upper {
mid = (lower + upper) >> 1
if ra.keys[mid] == min {
return mid
} else if ra.keys[mid] < min {
lower = mid
} else {
upper = mid
}
}
return upper
}
func (ra *roaringArray) markAllAsNeedingCopyOnWrite() {
for i := range ra.needCopyOnWrite {
ra.needCopyOnWrite[i] = true
}
}
func (ra *roaringArray) needsCopyOnWrite(i int) bool {
return ra.needCopyOnWrite[i]
}
func (ra *roaringArray) setNeedsCopyOnWrite(i int) {
ra.needCopyOnWrite[i] = true
}
func (ra *roaringArray) checkKeysSorted() bool {
if len(ra.keys) == 0 || len(ra.keys) == 1 {
return true
}
previous := ra.keys[0]
for nextIdx := 1; nextIdx < len(ra.keys); nextIdx++ {
next := ra.keys[nextIdx]
if previous >= next {
return false
}
previous = next
}
return true
}
// validate checks the referential integrity
// ensures len(keys) == len(containers), recurses and checks each container type
func (ra *roaringArray) validate() error {
if !ra.checkKeysSorted() {
return ErrKeySortOrder
}
if len(ra.keys) != len(ra.containers) {
return ErrCardinalityConstraint
}
if len(ra.keys) != len(ra.needCopyOnWrite) {
return ErrCardinalityConstraint
}
for _, container := range ra.containers {
err := container.validate()
if err != nil {
return err
}
}
return nil
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,18 @@
package roaring
import (
"encoding/binary"
"io"
)
// writeTo for runContainer16 follows this
// spec: https://github.com/RoaringBitmap/RoaringFormatSpec
func (b *runContainer16) writeTo(stream io.Writer) (int, error) {
buf := make([]byte, 2+4*len(b.iv))
binary.LittleEndian.PutUint16(buf[0:], uint16(len(b.iv)))
for i, v := range b.iv {
binary.LittleEndian.PutUint16(buf[2+i*4:], v.start)
binary.LittleEndian.PutUint16(buf[2+2+i*4:], v.length)
}
return stream.Write(buf)
}

View File

@@ -0,0 +1,145 @@
//go:build (!amd64 && !386 && !arm && !arm64 && !ppc64le && !mipsle && !mips64le && !mips64p32le && !wasm) || appengine
// +build !amd64,!386,!arm,!arm64,!ppc64le,!mipsle,!mips64le,!mips64p32le,!wasm appengine
package roaring
import (
"encoding/binary"
"errors"
"io"
)
func (b *arrayContainer) writeTo(stream io.Writer) (int, error) {
buf := make([]byte, 2*len(b.content))
for i, v := range b.content {
base := i * 2
buf[base] = byte(v)
buf[base+1] = byte(v >> 8)
}
return stream.Write(buf)
}
func (b *arrayContainer) readFrom(stream io.Reader) (int, error) {
err := binary.Read(stream, binary.LittleEndian, b.content)
if err != nil {
return 0, err
}
return 2 * len(b.content), nil
}
func (b *bitmapContainer) writeTo(stream io.Writer) (int, error) {
if b.cardinality <= arrayDefaultMaxSize {
return 0, errors.New("refusing to write bitmap container with cardinality of array container")
}
// Write set
buf := make([]byte, 8*len(b.bitmap))
for i, v := range b.bitmap {
base := i * 8
buf[base] = byte(v)
buf[base+1] = byte(v >> 8)
buf[base+2] = byte(v >> 16)
buf[base+3] = byte(v >> 24)
buf[base+4] = byte(v >> 32)
buf[base+5] = byte(v >> 40)
buf[base+6] = byte(v >> 48)
buf[base+7] = byte(v >> 56)
}
return stream.Write(buf)
}
func (b *bitmapContainer) readFrom(stream io.Reader) (int, error) {
err := binary.Read(stream, binary.LittleEndian, b.bitmap)
if err != nil {
return 0, err
}
b.computeCardinality()
return 8 * len(b.bitmap), nil
}
func (bc *bitmapContainer) asLittleEndianByteSlice() []byte {
by := make([]byte, len(bc.bitmap)*8)
for i := range bc.bitmap {
binary.LittleEndian.PutUint64(by[i*8:], bc.bitmap[i])
}
return by
}
func uint64SliceAsByteSlice(slice []uint64) []byte {
by := make([]byte, len(slice)*8)
for i, v := range slice {
binary.LittleEndian.PutUint64(by[i*8:], v)
}
return by
}
func uint16SliceAsByteSlice(slice []uint16) []byte {
by := make([]byte, len(slice)*2)
for i, v := range slice {
binary.LittleEndian.PutUint16(by[i*2:], v)
}
return by
}
func interval16SliceAsByteSlice(slice []interval16) []byte {
by := make([]byte, len(slice)*4)
for i, v := range slice {
binary.LittleEndian.PutUint16(by[i*2:], v.start)
binary.LittleEndian.PutUint16(by[i*2+2:], v.length)
}
return by
}
func byteSliceAsUint16Slice(slice []byte) []uint16 {
if len(slice)%2 != 0 {
panic("Slice size should be divisible by 2")
}
b := make([]uint16, len(slice)/2)
for i := range b {
b[i] = binary.LittleEndian.Uint16(slice[2*i:])
}
return b
}
func byteSliceAsUint64Slice(slice []byte) []uint64 {
if len(slice)%8 != 0 {
panic("Slice size should be divisible by 8")
}
b := make([]uint64, len(slice)/8)
for i := range b {
b[i] = binary.LittleEndian.Uint64(slice[8*i:])
}
return b
}
// Converts a byte slice to a interval16 slice.
// The function assumes that the slice byte buffer is run container data
// encoded according to Roaring Format Spec
func byteSliceAsInterval16Slice(byteSlice []byte) []interval16 {
if len(byteSlice)%4 != 0 {
panic("Slice size should be divisible by 4")
}
intervalSlice := make([]interval16, len(byteSlice)/4)
for i := range intervalSlice {
intervalSlice[i] = interval16{
start: binary.LittleEndian.Uint16(byteSlice[i*4:]),
length: binary.LittleEndian.Uint16(byteSlice[i*4+2:]),
}
}
return intervalSlice
}

View File

@@ -0,0 +1,671 @@
//go:build (386 && !appengine) || (amd64 && !appengine) || (arm && !appengine) || (arm64 && !appengine) || (ppc64le && !appengine) || (mipsle && !appengine) || (mips64le && !appengine) || (mips64p32le && !appengine) || (wasm && !appengine)
// +build 386,!appengine amd64,!appengine arm,!appengine arm64,!appengine ppc64le,!appengine mipsle,!appengine mips64le,!appengine mips64p32le,!appengine wasm,!appengine
package roaring
import (
"encoding/binary"
"errors"
"io"
"reflect"
"runtime"
"unsafe"
)
func (ac *arrayContainer) writeTo(stream io.Writer) (int, error) {
buf := uint16SliceAsByteSlice(ac.content)
return stream.Write(buf)
}
func (bc *bitmapContainer) writeTo(stream io.Writer) (int, error) {
if bc.cardinality <= arrayDefaultMaxSize {
return 0, errors.New("refusing to write bitmap container with cardinality of array container")
}
buf := uint64SliceAsByteSlice(bc.bitmap)
return stream.Write(buf)
}
func uint64SliceAsByteSlice(slice []uint64) []byte {
// make a new slice header
header := *(*reflect.SliceHeader)(unsafe.Pointer(&slice))
// update its capacity and length
header.Len *= 8
header.Cap *= 8
// instantiate result and use KeepAlive so data isn't unmapped.
result := *(*[]byte)(unsafe.Pointer(&header))
runtime.KeepAlive(&slice)
// return it
return result
}
func uint16SliceAsByteSlice(slice []uint16) []byte {
// make a new slice header
header := *(*reflect.SliceHeader)(unsafe.Pointer(&slice))
// update its capacity and length
header.Len *= 2
header.Cap *= 2
// instantiate result and use KeepAlive so data isn't unmapped.
result := *(*[]byte)(unsafe.Pointer(&header))
runtime.KeepAlive(&slice)
// return it
return result
}
func interval16SliceAsByteSlice(slice []interval16) []byte {
// make a new slice header
header := *(*reflect.SliceHeader)(unsafe.Pointer(&slice))
// update its capacity and length
header.Len *= 4
header.Cap *= 4
// instantiate result and use KeepAlive so data isn't unmapped.
result := *(*[]byte)(unsafe.Pointer(&header))
runtime.KeepAlive(&slice)
// return it
return result
}
func (bc *bitmapContainer) asLittleEndianByteSlice() []byte {
return uint64SliceAsByteSlice(bc.bitmap)
}
// Deserialization code follows
// //
// These methods (byteSliceAsUint16Slice,...) do not make copies,
// they are pointer-based (unsafe). The caller is responsible to
// ensure that the input slice does not get garbage collected, deleted
// or modified while you hold the returned slince.
// //
func byteSliceAsUint16Slice(slice []byte) (result []uint16) { // here we create a new slice holder
if len(slice)%2 != 0 {
panic("Slice size should be divisible by 2")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / 2
rHeader.Cap = bHeader.Cap / 2
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsUint64Slice(slice []byte) (result []uint64) {
if len(slice)%8 != 0 {
panic("Slice size should be divisible by 8")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / 8
rHeader.Cap = bHeader.Cap / 8
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsInterval16Slice(slice []byte) (result []interval16) {
if len(slice)%4 != 0 {
panic("Slice size should be divisible by 4")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / 4
rHeader.Cap = bHeader.Cap / 4
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsContainerSlice(slice []byte) (result []container) {
var c container
containerSize := int(unsafe.Sizeof(c))
if len(slice)%containerSize != 0 {
panic("Slice size should be divisible by unsafe.Sizeof(container)")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / containerSize
rHeader.Cap = bHeader.Cap / containerSize
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsBitsetSlice(slice []byte) (result []bitmapContainer) {
bitsetSize := int(unsafe.Sizeof(bitmapContainer{}))
if len(slice)%bitsetSize != 0 {
panic("Slice size should be divisible by unsafe.Sizeof(bitmapContainer)")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / bitsetSize
rHeader.Cap = bHeader.Cap / bitsetSize
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsArraySlice(slice []byte) (result []arrayContainer) {
arraySize := int(unsafe.Sizeof(arrayContainer{}))
if len(slice)%arraySize != 0 {
panic("Slice size should be divisible by unsafe.Sizeof(arrayContainer)")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / arraySize
rHeader.Cap = bHeader.Cap / arraySize
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsRun16Slice(slice []byte) (result []runContainer16) {
run16Size := int(unsafe.Sizeof(runContainer16{}))
if len(slice)%run16Size != 0 {
panic("Slice size should be divisible by unsafe.Sizeof(runContainer16)")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / run16Size
rHeader.Cap = bHeader.Cap / run16Size
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
func byteSliceAsBoolSlice(slice []byte) (result []bool) {
boolSize := int(unsafe.Sizeof(true))
if len(slice)%boolSize != 0 {
panic("Slice size should be divisible by unsafe.Sizeof(bool)")
}
// reference: https://go101.org/article/unsafe.html
// make a new slice header
bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice))
rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result))
// transfer the data from the given slice to a new variable (our result)
rHeader.Data = bHeader.Data
rHeader.Len = bHeader.Len / boolSize
rHeader.Cap = bHeader.Cap / boolSize
// instantiate result and use KeepAlive so data isn't unmapped.
runtime.KeepAlive(&slice) // it is still crucial, GC can free it)
// return result
return
}
// FrozenView creates a static view of a serialized bitmap stored in buf.
// It uses CRoaring's frozen bitmap format.
//
// The format specification is available here:
// https://github.com/RoaringBitmap/CRoaring/blob/2c867e9f9c9e2a3a7032791f94c4c7ae3013f6e0/src/roaring.c#L2756-L2783
//
// The provided byte array (buf) is expected to be a constant.
// The function makes the best effort attempt not to copy data.
// Only little endian is supported. The function will err if it detects a big
// endian serialized file.
// You should take care not to modify buff as it will likely result in
// unexpected program behavior.
// If said buffer comes from a memory map, it's advisable to give it read
// only permissions, either at creation or by calling Mprotect from the
// golang.org/x/sys/unix package.
//
// Resulting bitmaps are effectively immutable in the following sense:
// a copy-on-write marker is used so that when you modify the resulting
// bitmap, copies of selected data (containers) are made.
// You should *not* change the copy-on-write status of the resulting
// bitmaps (SetCopyOnWrite).
//
// If buf becomes unavailable, then a bitmap created with
// FromBuffer would be effectively broken. Furthermore, any
// bitmap derived from this bitmap (e.g., via Or, And) might
// also be broken. Thus, before making buf unavailable, you should
// call CloneCopyOnWriteContainers on all such bitmaps.
func (rb *Bitmap) FrozenView(buf []byte) error {
return rb.highlowcontainer.frozenView(buf)
}
func (rb *Bitmap) MustFrozenView(buf []byte) error {
if err := rb.FrozenView(buf); err != nil {
return err
}
err := rb.Validate()
return err
}
/* Verbatim specification from CRoaring.
*
* FROZEN SERIALIZATION FORMAT DESCRIPTION
*
* -- (beginning must be aligned by 32 bytes) --
* <bitset_data> uint64_t[BITSET_CONTAINER_SIZE_IN_WORDS * num_bitset_containers]
* <run_data> rle16_t[total number of rle elements in all run containers]
* <array_data> uint16_t[total number of array elements in all array containers]
* <keys> uint16_t[num_containers]
* <counts> uint16_t[num_containers]
* <typecodes> uint8_t[num_containers]
* <header> uint32_t
*
* <header> is a 4-byte value which is a bit union of frozenCookie (15 bits)
* and the number of containers (17 bits).
*
* <counts> stores number of elements for every container.
* Its meaning depends on container type.
* For array and bitset containers, this value is the container cardinality minus one.
* For run container, it is the number of rle_t elements (n_runs).
*
* <bitset_data>,<array_data>,<run_data> are flat arrays of elements of
* all containers of respective type.
*
* <*_data> and <keys> are kept close together because they are not accessed
* during deserilization. This may reduce IO in case of large mmaped bitmaps.
* All members have their native alignments during deserilization except <header>,
* which is not guaranteed to be aligned by 4 bytes.
*/
const frozenCookie = 13766
var (
// ErrFrozenBitmapInvalidCookie is returned when the header does not contain the frozenCookie.
ErrFrozenBitmapInvalidCookie = errors.New("header does not contain the frozenCookie")
// ErrFrozenBitmapBigEndian is returned when the header is big endian.
ErrFrozenBitmapBigEndian = errors.New("loading big endian frozen bitmaps is not supported")
// ErrFrozenBitmapIncomplete is returned when the buffer is too small to contain a frozen bitmap.
ErrFrozenBitmapIncomplete = errors.New("input buffer too small to contain a frozen bitmap")
// ErrFrozenBitmapOverpopulated is returned when the number of containers is too large.
ErrFrozenBitmapOverpopulated = errors.New("too many containers")
// ErrFrozenBitmapUnexpectedData is returned when the buffer contains unexpected data.
ErrFrozenBitmapUnexpectedData = errors.New("spurious data in input")
// ErrFrozenBitmapInvalidTypecode is returned when the typecode is invalid.
ErrFrozenBitmapInvalidTypecode = errors.New("unrecognized typecode")
// ErrFrozenBitmapBufferTooSmall is returned when the buffer is too small.
ErrFrozenBitmapBufferTooSmall = errors.New("buffer too small")
)
func (ra *roaringArray) frozenView(buf []byte) error {
if len(buf) < 4 {
return ErrFrozenBitmapIncomplete
}
headerBE := binary.BigEndian.Uint32(buf[len(buf)-4:])
if headerBE&0x7fff == frozenCookie {
return ErrFrozenBitmapBigEndian
}
header := binary.LittleEndian.Uint32(buf[len(buf)-4:])
buf = buf[:len(buf)-4]
if header&0x7fff != frozenCookie {
return ErrFrozenBitmapInvalidCookie
}
nCont := int(header >> 15)
if nCont > (1 << 16) {
return ErrFrozenBitmapOverpopulated
}
// 1 byte per type, 2 bytes per key, 2 bytes per count.
if len(buf) < 5*nCont {
return ErrFrozenBitmapIncomplete
}
types := buf[len(buf)-nCont:]
buf = buf[:len(buf)-nCont]
counts := byteSliceAsUint16Slice(buf[len(buf)-2*nCont:])
buf = buf[:len(buf)-2*nCont]
keys := byteSliceAsUint16Slice(buf[len(buf)-2*nCont:])
buf = buf[:len(buf)-2*nCont]
nBitmap, nArray, nRun := 0, 0, 0
nArrayEl, nRunEl := 0, 0
for i, t := range types {
switch t {
case 1:
nBitmap++
case 2:
nArray++
nArrayEl += int(counts[i]) + 1
case 3:
nRun++
nRunEl += int(counts[i])
default:
return ErrFrozenBitmapInvalidTypecode
}
}
if len(buf) < (1<<13)*nBitmap+4*nRunEl+2*nArrayEl {
return ErrFrozenBitmapIncomplete
}
bitsetsArena := byteSliceAsUint64Slice(buf[:(1<<13)*nBitmap])
buf = buf[(1<<13)*nBitmap:]
runsArena := byteSliceAsInterval16Slice(buf[:4*nRunEl])
buf = buf[4*nRunEl:]
arraysArena := byteSliceAsUint16Slice(buf[:2*nArrayEl])
buf = buf[2*nArrayEl:]
if len(buf) != 0 {
return ErrFrozenBitmapUnexpectedData
}
var c container
containersSz := int(unsafe.Sizeof(c)) * nCont
bitsetsSz := int(unsafe.Sizeof(bitmapContainer{})) * nBitmap
arraysSz := int(unsafe.Sizeof(arrayContainer{})) * nArray
runsSz := int(unsafe.Sizeof(runContainer16{})) * nRun
needCOWSz := int(unsafe.Sizeof(true)) * nCont
bitmapArenaSz := containersSz + bitsetsSz + arraysSz + runsSz + needCOWSz
bitmapArena := make([]byte, bitmapArenaSz)
containers := byteSliceAsContainerSlice(bitmapArena[:containersSz])
bitmapArena = bitmapArena[containersSz:]
bitsets := byteSliceAsBitsetSlice(bitmapArena[:bitsetsSz])
bitmapArena = bitmapArena[bitsetsSz:]
arrays := byteSliceAsArraySlice(bitmapArena[:arraysSz])
bitmapArena = bitmapArena[arraysSz:]
runs := byteSliceAsRun16Slice(bitmapArena[:runsSz])
bitmapArena = bitmapArena[runsSz:]
needCOW := byteSliceAsBoolSlice(bitmapArena)
iBitset, iArray, iRun := 0, 0, 0
for i, t := range types {
needCOW[i] = true
switch t {
case 1:
containers[i] = &bitsets[iBitset]
bitsets[iBitset].cardinality = int(counts[i]) + 1
bitsets[iBitset].bitmap = bitsetsArena[:1024]
bitsetsArena = bitsetsArena[1024:]
iBitset++
case 2:
containers[i] = &arrays[iArray]
sz := int(counts[i]) + 1
arrays[iArray].content = arraysArena[:sz]
arraysArena = arraysArena[sz:]
iArray++
case 3:
containers[i] = &runs[iRun]
runs[iRun].iv = runsArena[:counts[i]]
runsArena = runsArena[counts[i]:]
iRun++
}
}
// Not consuming the full input is a bug.
if iBitset != nBitmap || len(bitsetsArena) != 0 ||
iArray != nArray || len(arraysArena) != 0 ||
iRun != nRun || len(runsArena) != 0 {
panic("we missed something")
}
ra.keys = keys
ra.containers = containers
ra.needCopyOnWrite = needCOW
ra.copyOnWrite = true
return nil
}
// GetFrozenSizeInBytes returns the size in bytes of the frozen bitmap.
func (rb *Bitmap) GetFrozenSizeInBytes() uint64 {
nBits, nArrayEl, nRunEl := uint64(0), uint64(0), uint64(0)
for _, c := range rb.highlowcontainer.containers {
switch v := c.(type) {
case *bitmapContainer:
nBits++
case *arrayContainer:
nArrayEl += uint64(len(v.content))
case *runContainer16:
nRunEl += uint64(len(v.iv))
}
}
return 4 + 5*uint64(len(rb.highlowcontainer.containers)) +
(nBits << 13) + 2*nArrayEl + 4*nRunEl
}
// Freeze serializes the bitmap in the CRoaring's frozen format.
func (rb *Bitmap) Freeze() ([]byte, error) {
sz := rb.GetFrozenSizeInBytes()
buf := make([]byte, sz)
_, err := rb.FreezeTo(buf)
return buf, err
}
// FreezeTo serializes the bitmap in the CRoaring's frozen format.
func (rb *Bitmap) FreezeTo(buf []byte) (int, error) {
containers := rb.highlowcontainer.containers
nCont := len(containers)
nBits, nArrayEl, nRunEl := 0, 0, 0
for _, c := range containers {
switch v := c.(type) {
case *bitmapContainer:
nBits++
case *arrayContainer:
nArrayEl += len(v.content)
case *runContainer16:
nRunEl += len(v.iv)
}
}
serialSize := 4 + 5*nCont + (1<<13)*nBits + 4*nRunEl + 2*nArrayEl
if len(buf) < serialSize {
return 0, ErrFrozenBitmapBufferTooSmall
}
bitsArena := byteSliceAsUint64Slice(buf[:(1<<13)*nBits])
buf = buf[(1<<13)*nBits:]
runsArena := byteSliceAsInterval16Slice(buf[:4*nRunEl])
buf = buf[4*nRunEl:]
arraysArena := byteSliceAsUint16Slice(buf[:2*nArrayEl])
buf = buf[2*nArrayEl:]
keys := byteSliceAsUint16Slice(buf[:2*nCont])
buf = buf[2*nCont:]
counts := byteSliceAsUint16Slice(buf[:2*nCont])
buf = buf[2*nCont:]
types := buf[:nCont]
buf = buf[nCont:]
header := uint32(frozenCookie | (nCont << 15))
binary.LittleEndian.PutUint32(buf[:4], header)
copy(keys, rb.highlowcontainer.keys[:])
for i, c := range containers {
switch v := c.(type) {
case *bitmapContainer:
copy(bitsArena, v.bitmap)
bitsArena = bitsArena[1024:]
counts[i] = uint16(v.cardinality - 1)
types[i] = 1
case *arrayContainer:
copy(arraysArena, v.content)
arraysArena = arraysArena[len(v.content):]
elems := len(v.content)
counts[i] = uint16(elems - 1)
types[i] = 2
case *runContainer16:
copy(runsArena, v.iv)
runs := len(v.iv)
runsArena = runsArena[runs:]
counts[i] = uint16(runs)
types[i] = 3
}
}
return serialSize, nil
}
// WriteFrozenTo serializes the bitmap in the CRoaring's frozen format.
func (rb *Bitmap) WriteFrozenTo(wr io.Writer) (int, error) {
// FIXME: this is a naive version that iterates 4 times through the
// containers and allocates 3*len(containers) bytes; it's quite likely
// it can be done more efficiently.
containers := rb.highlowcontainer.containers
written := 0
for _, c := range containers {
c, ok := c.(*bitmapContainer)
if !ok {
continue
}
n, err := wr.Write(uint64SliceAsByteSlice(c.bitmap))
written += n
if err != nil {
return written, err
}
}
for _, c := range containers {
c, ok := c.(*runContainer16)
if !ok {
continue
}
n, err := wr.Write(interval16SliceAsByteSlice(c.iv))
written += n
if err != nil {
return written, err
}
}
for _, c := range containers {
c, ok := c.(*arrayContainer)
if !ok {
continue
}
n, err := wr.Write(uint16SliceAsByteSlice(c.content))
written += n
if err != nil {
return written, err
}
}
n, err := wr.Write(uint16SliceAsByteSlice(rb.highlowcontainer.keys))
written += n
if err != nil {
return written, err
}
countTypeBuf := make([]byte, 3*len(containers))
counts := byteSliceAsUint16Slice(countTypeBuf[:2*len(containers)])
types := countTypeBuf[2*len(containers):]
for i, c := range containers {
switch c := c.(type) {
case *bitmapContainer:
counts[i] = uint16(c.cardinality - 1)
types[i] = 1
case *arrayContainer:
elems := len(c.content)
counts[i] = uint16(elems - 1)
types[i] = 2
case *runContainer16:
runs := len(c.iv)
counts[i] = uint16(runs)
types[i] = 3
}
}
n, err = wr.Write(countTypeBuf)
written += n
if err != nil {
return written, err
}
header := uint32(frozenCookie | (len(containers) << 15))
if err := binary.Write(wr, binary.LittleEndian, header); err != nil {
return written, err
}
written += 4
return written, nil
}

View File

@@ -0,0 +1,22 @@
//go:build gofuzz
// +build gofuzz
package roaring
import "bytes"
func FuzzSerializationStream(data []byte) int {
newrb := NewBitmap()
if _, err := newrb.ReadFrom(bytes.NewReader(data)); err != nil {
return 0
}
return 1
}
func FuzzSerializationBuffer(data []byte) int {
newrb := NewBitmap()
if _, err := newrb.FromBuffer(data); err != nil {
return 0
}
return 1
}

665
vendor/github.com/RoaringBitmap/roaring/v2/setutil.go generated vendored Normal file
View File

@@ -0,0 +1,665 @@
package roaring
func difference(set1 []uint16, set2 []uint16, buffer []uint16) int {
if len(set2) == 0 {
buffer = buffer[:len(set1)]
copy(buffer, set1)
return len(set1)
}
if len(set1) == 0 {
return 0
}
pos := 0
k1 := 0
k2 := 0
buffer = buffer[:cap(buffer)]
s1 := set1[k1]
s2 := set2[k2]
for {
if s1 < s2 {
buffer[pos] = s1
pos++
k1++
if k1 >= len(set1) {
break
}
s1 = set1[k1]
} else if s1 == s2 {
k1++
k2++
if k1 >= len(set1) {
break
}
s1 = set1[k1]
if k2 >= len(set2) {
for ; k1 < len(set1); k1++ {
buffer[pos] = set1[k1]
pos++
}
break
}
s2 = set2[k2]
} else { // if (val1>val2)
k2++
if k2 >= len(set2) {
for ; k1 < len(set1); k1++ {
buffer[pos] = set1[k1]
pos++
}
break
}
s2 = set2[k2]
}
}
return pos
}
func exclusiveUnion2by2(set1 []uint16, set2 []uint16, buffer []uint16) int {
if 0 == len(set2) {
buffer = buffer[:len(set1)]
copy(buffer, set1[:])
return len(set1)
}
if 0 == len(set1) {
buffer = buffer[:len(set2)]
copy(buffer, set2[:])
return len(set2)
}
pos := 0
k1 := 0
k2 := 0
s1 := set1[k1]
s2 := set2[k2]
buffer = buffer[:cap(buffer)]
for {
if s1 < s2 {
buffer[pos] = s1
pos++
k1++
if k1 >= len(set1) {
for ; k2 < len(set2); k2++ {
buffer[pos] = set2[k2]
pos++
}
break
}
s1 = set1[k1]
} else if s1 == s2 {
k1++
k2++
if k1 >= len(set1) {
for ; k2 < len(set2); k2++ {
buffer[pos] = set2[k2]
pos++
}
break
}
if k2 >= len(set2) {
for ; k1 < len(set1); k1++ {
buffer[pos] = set1[k1]
pos++
}
break
}
s1 = set1[k1]
s2 = set2[k2]
} else { // if (val1>val2)
buffer[pos] = s2
pos++
k2++
if k2 >= len(set2) {
for ; k1 < len(set1); k1++ {
buffer[pos] = set1[k1]
pos++
}
break
}
s2 = set2[k2]
}
}
return pos
}
// union2by2Cardinality computes the cardinality of the union
func union2by2Cardinality(set1 []uint16, set2 []uint16) int {
pos := 0
k1 := 0
k2 := 0
if 0 == len(set2) {
return len(set1)
}
if 0 == len(set1) {
return len(set2)
}
s1 := set1[k1]
s2 := set2[k2]
for {
if s1 < s2 {
pos++
k1++
if k1 >= len(set1) {
pos += len(set2) - k2
break
}
s1 = set1[k1]
} else if s1 == s2 {
pos++
k1++
k2++
if k1 >= len(set1) {
pos += len(set2) - k2
break
}
if k2 >= len(set2) {
pos += len(set1) - k1
break
}
s1 = set1[k1]
s2 = set2[k2]
} else { // if (set1[k1]>set2[k2])
pos++
k2++
if k2 >= len(set2) {
pos += len(set1) - k1
break
}
s2 = set2[k2]
}
}
return pos
}
func intersection2by2(
set1 []uint16,
set2 []uint16,
buffer []uint16,
) int {
if len(set1)*64 < len(set2) {
return onesidedgallopingintersect2by2(set1, set2, buffer)
} else if len(set2)*64 < len(set1) {
return onesidedgallopingintersect2by2(set2, set1, buffer)
} else {
return localintersect2by2(set1, set2, buffer)
}
}
// intersection2by2Cardinality computes the cardinality of the intersection
func intersection2by2Cardinality(
set1 []uint16,
set2 []uint16,
) int {
if len(set1)*64 < len(set2) {
return onesidedgallopingintersect2by2Cardinality(set1, set2)
} else if len(set2)*64 < len(set1) {
return onesidedgallopingintersect2by2Cardinality(set2, set1)
} else {
return localintersect2by2Cardinality(set1, set2)
}
}
// intersects2by2 computes whether the two sets intersect
func intersects2by2(
set1 []uint16,
set2 []uint16,
) bool {
// could be optimized if one set is much larger than the other one
if (len(set1) == 0) || (len(set2) == 0) {
return false
}
index1 := 0
index2 := 0
value1 := set1[index1]
value2 := set2[index2]
mainwhile:
for {
if value2 < value1 {
for {
index2++
if index2 == len(set2) {
break mainwhile
}
value2 = set2[index2]
if value2 >= value1 {
break
}
}
}
if value1 < value2 {
for {
index1++
if index1 == len(set1) {
break mainwhile
}
value1 = set1[index1]
if value1 >= value2 {
break
}
}
} else {
// (set2[k2] == set1[k1])
return true
}
}
return false
}
func localintersect2by2(
set1 []uint16,
set2 []uint16,
buffer []uint16,
) int {
if (len(set1) == 0) || (len(set2) == 0) {
return 0
}
k1 := 0
k2 := 0
pos := 0
buffer = buffer[:cap(buffer)]
s1 := set1[k1]
s2 := set2[k2]
mainwhile:
for {
if s2 < s1 {
for {
k2++
if k2 == len(set2) {
break mainwhile
}
s2 = set2[k2]
if s2 >= s1 {
break
}
}
}
if s1 < s2 {
for {
k1++
if k1 == len(set1) {
break mainwhile
}
s1 = set1[k1]
if s1 >= s2 {
break
}
}
} else {
// (set2[k2] == set1[k1])
buffer[pos] = s1
pos++
k1++
if k1 == len(set1) {
break
}
s1 = set1[k1]
k2++
if k2 == len(set2) {
break
}
s2 = set2[k2]
}
}
return pos
}
// / localintersect2by2Cardinality computes the cardinality of the intersection
func localintersect2by2Cardinality(
set1 []uint16,
set2 []uint16,
) int {
if (len(set1) == 0) || (len(set2) == 0) {
return 0
}
index1 := 0
index2 := 0
pos := 0
value1 := set1[index1]
value2 := set2[index2]
mainwhile:
for {
if value2 < value1 {
for {
index2++
if index2 == len(set2) {
break mainwhile
}
value2 = set2[index2]
if value2 >= value1 {
break
}
}
}
if value1 < value2 {
for {
index1++
if index1 == len(set1) {
break mainwhile
}
value1 = set1[index1]
if value1 >= value2 {
break
}
}
} else {
// (set2[k2] == set1[k1])
pos++
index1++
if index1 == len(set1) {
break
}
value1 = set1[index1]
index2++
if index2 == len(set2) {
break
}
value2 = set2[index2]
}
}
return pos
}
func advanceUntil(
array []uint16,
pos int,
length int,
min uint16,
) int {
lower := pos + 1
if lower >= length || array[lower] >= min {
return lower
}
spansize := 1
for lower+spansize < length && array[lower+spansize] < min {
spansize *= 2
}
var upper int
if lower+spansize < length {
upper = lower + spansize
} else {
upper = length - 1
}
if array[upper] == min {
return upper
}
if array[upper] < min {
// means
// array
// has no
// item
// >= min
// pos = array.length;
return length
}
// we know that the next-smallest span was too small
lower += (spansize >> 1)
mid := 0
for lower+1 != upper {
mid = (lower + upper) >> 1
if array[mid] == min {
return mid
} else if array[mid] < min {
lower = mid
} else {
upper = mid
}
}
return upper
}
func onesidedgallopingintersect2by2(
smallset []uint16,
largeset []uint16,
buffer []uint16,
) int {
if 0 == len(smallset) {
return 0
}
buffer = buffer[:cap(buffer)]
k1 := 0
k2 := 0
pos := 0
s1 := largeset[k1]
s2 := smallset[k2]
mainwhile:
for {
if s1 < s2 {
k1 = advanceUntil(largeset, k1, len(largeset), s2)
if k1 == len(largeset) {
break mainwhile
}
s1 = largeset[k1]
}
if s2 < s1 {
k2++
if k2 == len(smallset) {
break mainwhile
}
s2 = smallset[k2]
} else {
buffer[pos] = s2
pos++
k2++
if k2 == len(smallset) {
break
}
s2 = smallset[k2]
k1 = advanceUntil(largeset, k1, len(largeset), s2)
if k1 == len(largeset) {
break mainwhile
}
s1 = largeset[k1]
}
}
return pos
}
func onesidedgallopingintersect2by2Cardinality(
smallset []uint16,
largeset []uint16,
) int {
if 0 == len(smallset) {
return 0
}
k1 := 0
k2 := 0
pos := 0
s1 := largeset[k1]
s2 := smallset[k2]
mainwhile:
for {
if s1 < s2 {
k1 = advanceUntil(largeset, k1, len(largeset), s2)
if k1 == len(largeset) {
break mainwhile
}
s1 = largeset[k1]
}
if s2 < s1 {
k2++
if k2 == len(smallset) {
break mainwhile
}
s2 = smallset[k2]
} else {
pos++
k2++
if k2 == len(smallset) {
break
}
s2 = smallset[k2]
k1 = advanceUntil(largeset, k1, len(largeset), s2)
if k1 == len(largeset) {
break mainwhile
}
s1 = largeset[k1]
}
}
return pos
}
func binarySearch(array []uint16, ikey uint16) int {
low := 0
high := len(array) - 1
for low+16 <= high {
middleIndex := int(uint32(low+high) >> 1)
middleValue := array[middleIndex]
if middleValue < ikey {
low = middleIndex + 1
} else if middleValue > ikey {
high = middleIndex - 1
} else {
return middleIndex
}
}
for ; low <= high; low++ {
val := array[low]
if val >= ikey {
if val == ikey {
return low
}
break
}
}
return -(low + 1)
}
// searchResult provides information about a search request.
// The values will depend on the context of the search
type searchResult struct {
value uint16
index int
exactMatch bool
}
// notFound returns a bool depending the search context
// For cases `previousValue` and `nextValue` if target is present in the slice
// this function will return `true` otherwise `false`
// For `nextAbsentValue` and `previousAbsentValue` this will only return `False`
func (sr *searchResult) notFound() bool {
return !sr.exactMatch
}
// outOfBounds indicates whether the target was outside the lower and upper bounds of the container
func (sr *searchResult) outOfBounds() bool {
return sr.index <= -1
}
// binarySearchUntil is a helper function around binarySearchUntilWithBounds
// The user does not have to pass in the lower and upper bound
// The lower bound is taken to be `0` and the upper bound `len(array)-1`
func binarySearchUntil(array []uint16, target uint16) searchResult {
return binarySearchUntilWithBounds(array, target, 0, len(array)-1)
}
// binarySearchUntilWithBounds returns a `searchResult`.
// If an exact match is found the `searchResult{target, <index>, true}` will be returned, where `<index>` is
// `target`s index in `array`, and `result.notFound()` evaluates to `false`.
// If a match is not found, but `target` was in-bounds then the result.index will be the closest smaller value
// Example: [ 8,9,11,12] if the target was 10, then `searchResult{9, 1, false}` will be returned.
// If `target` was out of bounds `searchResult{0, -1, false}` will be returned.
func binarySearchUntilWithBounds(array []uint16, target uint16, lowIndex int, maxIndex int) searchResult {
highIndex := maxIndex
closestIndex := -1
if target < array[lowIndex] {
return searchResult{0, closestIndex, false}
}
if target > array[maxIndex] {
return searchResult{0, len(array), false}
}
for lowIndex <= highIndex {
middleIndex := (lowIndex + highIndex) / 2
middleValue := array[middleIndex]
if middleValue == target {
return searchResult{middleValue, middleIndex, true}
}
if target < middleValue {
if middleIndex > 0 && target > array[middleIndex-1] {
return searchResult{array[middleIndex-1], middleIndex - 1, false}
}
highIndex = middleIndex
} else {
if middleIndex < maxIndex && target < array[middleIndex+1] {
return searchResult{middleValue, middleIndex, false}
}
lowIndex = middleIndex + 1
}
}
return searchResult{array[closestIndex], closestIndex, false}
}
// binarySearchPast is a wrapper around binarySearchPastWithBounds
// The user does not have to pass in the lower and upper bound
// The lower bound is taken to be `0` and the upper bound `len(array)-1`
func binarySearchPast(array []uint16, target uint16) searchResult {
return binarySearchPastWithBounds(array, target, 0, len(array)-1)
}
// binarySearchPastWithBounds looks for the smallest value larger than or equal to `target`
// If `target` is out of bounds a `searchResult` indicating out of bounds is returned
// `target` does not have to exist in the slice.
//
// Example:
// Suppose the slice is [...10,13...] with `target` equal to 11
// The searchResult will have searchResult.value = 13
func binarySearchPastWithBounds(array []uint16, target uint16, lowIndex int, maxIndex int) searchResult {
highIndex := maxIndex
closestIndex := -1
if target < array[lowIndex] {
return searchResult{0, closestIndex, false}
}
if target > array[maxIndex] {
return searchResult{0, len(array), false}
}
for lowIndex <= highIndex {
middleIndex := (lowIndex + highIndex) / 2
middleValue := array[middleIndex]
if middleValue == target {
return searchResult{middleValue, middleIndex, true}
}
if target < middleValue {
if middleIndex > 0 && target > array[middleIndex-1] {
return searchResult{array[middleIndex], middleIndex, false}
}
highIndex = middleIndex
} else {
if middleIndex < maxIndex && target < array[middleIndex+1] {
return searchResult{array[middleIndex+1], middleIndex + 1, false}
}
lowIndex = middleIndex + 1
}
}
return searchResult{array[closestIndex], closestIndex, false}
}

View File

@@ -0,0 +1,7 @@
//go:build arm64 && !gccgo && !appengine
// +build arm64,!gccgo,!appengine
package roaring
//go:noescape
func union2by2(set1 []uint16, set2 []uint16, buffer []uint16) (size int)

View File

@@ -0,0 +1,132 @@
// +build arm64,!gccgo,!appengine
#include "textflag.h"
// This implements union2by2 using golang's version of arm64 assembly
// The algorithm is very similar to the generic one,
// but makes better use of arm64 features so is notably faster.
// The basic algorithm structure is as follows:
// 1. If either set is empty, copy the other set into the buffer and return the length
// 2. Otherwise, load the first element of each set into a variable (s1 and s2).
// 3. a. Compare the values of s1 and s2.
// b. add the smaller one to the buffer.
// c. perform a bounds check before incrementing.
// If one set is finished, copy the rest of the other set over.
// d. update s1 and or s2 to the next value, continue loop.
//
// Past the fact of the algorithm, this code makes use of several arm64 features
// Condition Codes:
// arm64's CMP operation sets 4 bits that can be used for branching,
// rather than just true or false.
// As a consequence, a single comparison gives enough information to distinguish the three cases
//
// Post-increment pointers after load/store:
// Instructions like `MOVHU.P 2(R0), R6`
// increment the register by a specified amount, in this example 2.
// Because uint16's are exactly 2 bytes and the length of the slices
// is part of the slice header,
// there is no need to separately track the index into the slice.
// Instead, the code can calculate the final read value and compare against that,
// using the post-increment reads to move the pointers along.
//
// TODO: CALL out to memmove once the list is exhausted.
// Right now it moves the necessary shorts so that the remaining count
// is a multiple of 4 and then copies 64 bits at a time.
TEXT ·union2by2(SB), NOSPLIT, $0-80
// R0, R1, and R2 for the pointers to the three slices
MOVD set1+0(FP), R0
MOVD set2+24(FP), R1
MOVD buffer+48(FP), R2
//R3 and R4 will be the values at which we will have finished reading set1 and set2.
// R3 should be R0 + 2 * set1_len+8(FP)
MOVD set1_len+8(FP), R3
MOVD set2_len+32(FP), R4
ADD R3<<1, R0, R3
ADD R4<<1, R1, R4
//Rather than counting the number of elements added separately
//Save the starting register of buffer.
MOVD buffer+48(FP), R5
// set1 is empty, just flush set2
CMP R0, R3
BEQ flush_right
// set2 is empty, just flush set1
CMP R1, R4
BEQ flush_left
// R6, R7 are the working space for s1 and s2
MOVD ZR, R6
MOVD ZR, R7
MOVHU.P 2(R0), R6
MOVHU.P 2(R1), R7
loop:
CMP R6, R7
BEQ pop_both // R6 == R7
BLS pop_right // R6 > R7
//pop_left: // R6 < R7
MOVHU.P R6, 2(R2)
CMP R0, R3
BEQ pop_then_flush_right
MOVHU.P 2(R0), R6
JMP loop
pop_both:
MOVHU.P R6, 2(R2) //could also use R7, since they are equal
CMP R0, R3
BEQ flush_right
CMP R1, R4
BEQ flush_left
MOVHU.P 2(R0), R6
MOVHU.P 2(R1), R7
JMP loop
pop_right:
MOVHU.P R7, 2(R2)
CMP R1, R4
BEQ pop_then_flush_left
MOVHU.P 2(R1), R7
JMP loop
pop_then_flush_right:
MOVHU.P R7, 2(R2)
flush_right:
MOVD R1, R0
MOVD R4, R3
JMP flush_left
pop_then_flush_left:
MOVHU.P R6, 2(R2)
flush_left:
CMP R0, R3
BEQ return
//figure out how many bytes to slough off. Must be a multiple of two
SUB R0, R3, R4
ANDS $6, R4
BEQ long_flush //handles the 0 mod 8 case
SUBS $4, R4, R4 // since possible values are 2, 4, 6, this splits evenly
BLT pop_single // exactly the 2 case
MOVW.P 4(R0), R6
MOVW.P R6, 4(R2)
BEQ long_flush // we're now aligned by 64 bits, as R4==4, otherwise 2 more
pop_single:
MOVHU.P 2(R0), R6
MOVHU.P R6, 2(R2)
long_flush:
// at this point we know R3 - R0 is a multiple of 8.
CMP R0, R3
BEQ return
MOVD.P 8(R0), R6
MOVD.P R6, 8(R2)
JMP long_flush
return:
// number of shorts written is (R5 - R2) >> 1
SUB R5, R2
LSR $1, R2, R2
MOVD R2, size+72(FP)
RET

View File

@@ -0,0 +1,64 @@
//go:build !arm64 || gccgo || appengine
// +build !arm64 gccgo appengine
package roaring
func union2by2(set1 []uint16, set2 []uint16, buffer []uint16) int {
pos := 0
k1 := 0
k2 := 0
if 0 == len(set2) {
buffer = buffer[:len(set1)]
copy(buffer, set1[:])
return len(set1)
}
if 0 == len(set1) {
buffer = buffer[:len(set2)]
copy(buffer, set2[:])
return len(set2)
}
s1 := set1[k1]
s2 := set2[k2]
buffer = buffer[:cap(buffer)]
for {
if s1 < s2 {
buffer[pos] = s1
pos++
k1++
if k1 >= len(set1) {
copy(buffer[pos:], set2[k2:])
pos += len(set2) - k2
break
}
s1 = set1[k1]
} else if s1 == s2 {
buffer[pos] = s1
pos++
k1++
k2++
if k1 >= len(set1) {
copy(buffer[pos:], set2[k2:])
pos += len(set2) - k2
break
}
if k2 >= len(set2) {
copy(buffer[pos:], set1[k1:])
pos += len(set1) - k1
break
}
s1 = set1[k1]
s2 = set2[k2]
} else { // if (set1[k1]>set2[k2])
buffer[pos] = s2
pos++
k2++
if k2 >= len(set2) {
copy(buffer[pos:], set1[k1:])
pos += len(set1) - k1
break
}
s2 = set2[k2]
}
}
return pos
}

View File

@@ -0,0 +1,102 @@
package roaring
type shortIterable interface {
hasNext() bool
next() uint16
}
type shortPeekable interface {
shortIterable
peekNext() uint16
advanceIfNeeded(minval uint16)
}
type shortIterator struct {
slice []uint16
loc int
}
func (si *shortIterator) hasNext() bool {
return si.loc < len(si.slice)
}
func (si *shortIterator) next() uint16 {
a := si.slice[si.loc]
si.loc++
return a
}
func (si *shortIterator) peekNext() uint16 {
return si.slice[si.loc]
}
func (si *shortIterator) advanceIfNeeded(minval uint16) {
if si.hasNext() && si.peekNext() < minval {
si.loc = advanceUntil(si.slice, si.loc, len(si.slice), minval)
}
}
type reverseIterator struct {
slice []uint16
loc int
}
func (si *reverseIterator) hasNext() bool {
return si.loc >= 0
}
func (si *reverseIterator) next() uint16 {
a := si.slice[si.loc]
si.loc--
return a
}
type arrayContainerUnsetIterator struct {
content []uint16
// pos is the index of the next set bit that is >= nextVal.
// When nextVal reaches content[pos], pos is incremented.
pos int
nextVal int
}
func (acui *arrayContainerUnsetIterator) next() uint16 {
val := acui.nextVal
acui.nextVal++
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {
acui.nextVal++
acui.pos++
}
return uint16(val)
}
func (acui *arrayContainerUnsetIterator) hasNext() bool {
return acui.nextVal < 65536
}
func (acui *arrayContainerUnsetIterator) peekNext() uint16 {
return uint16(acui.nextVal)
}
func (acui *arrayContainerUnsetIterator) advanceIfNeeded(minval uint16) {
if !acui.hasNext() || acui.peekNext() >= minval {
return
}
acui.nextVal = int(minval)
acui.pos = binarySearch(acui.content, minval)
if acui.pos < 0 {
acui.pos = -acui.pos - 1
}
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {
acui.nextVal++
acui.pos++
}
}
func newArrayContainerUnsetIterator(content []uint16) *arrayContainerUnsetIterator {
acui := &arrayContainerUnsetIterator{content: content, pos: 0, nextVal: 0}
for acui.pos < len(acui.content) && uint16(acui.nextVal) >= acui.content[acui.pos] {
acui.nextVal++
acui.pos++
}
return acui
}

717
vendor/github.com/RoaringBitmap/roaring/v2/smat.go generated vendored Normal file
View File

@@ -0,0 +1,717 @@
/*
# Instructions for smat testing for roaring
[smat](https://github.com/mschoch/smat) is a framework that provides
state machine assisted fuzz testing.
To run the smat tests for roaring...
## Prerequisites
Go 1.18 or later (for native fuzzing support).
## Steps
1. Generate initial smat corpus:
```
go test -tags=gofuzz -run=TestGenerateSmatCorpus
```
You should see a directory `workdir` created with initial corpus files.
2. Run the fuzz test:
```
go test -run='^$' -fuzz=FuzzSmat -fuzztime=300s -timeout=60s
```
Adjust `-fuzztime` as needed for longer or shorter runs. If crashes are found,
check the test output and the reproducer files in the `workdir` directory.
You may copy the reproducers to roaring_tests.go
*/
package roaring
import (
"encoding/base64"
"fmt"
"os"
"path/filepath"
"runtime/debug"
"sort"
"strings"
"time"
"github.com/bits-and-blooms/bitset"
"github.com/mschoch/smat"
)
// The native fuzz entry point lives in a _test.go file so the go test
// fuzz engine discovers it. See smat_fuzz_test.go for the fuzz wrapper.
var smatDebug = true
const max_value = 1048576
const max_pairs = 10
func smatLog(prefix, format string, args ...interface{}) {
if smatDebug {
fmt.Print(prefix)
fmt.Printf(format, args...)
}
}
type smatContext struct {
pairs []*smatPair
// Two registers, x & y.
x int
y int
actions int
// per-context last action for this fuzz worker
lastAction *actionRecord
}
// actionRecord stores a snapshot of the state just before an action runs.
type actionRecord struct {
Name string
X, Y int
PairSnapshots []string // base64-encoded MarshalBinary of each pair's Bitmap
}
type smatPair struct {
bm *Bitmap
bs *bitset.BitSet
// parent context (nil if unknown)
ctx *smatContext
}
// ------------------------------------------------------------------
var smatActionMap = smat.ActionMap{
smat.ActionID('X'): smatAction("x++", smatWrap(func(c *smatContext) { c.x = (c.x + 1) % max_value })),
smat.ActionID('x'): smatAction("x--", smatWrap(func(c *smatContext) { c.x = (c.x - 1 + max_value) % max_value })),
smat.ActionID('Y'): smatAction("y++", smatWrap(func(c *smatContext) { c.y = (c.y + 1) % max_value })),
smat.ActionID('y'): smatAction("y--", smatWrap(func(c *smatContext) { c.y = (c.y - 1 + max_value) % max_value })),
smat.ActionID('*'): smatAction("x*y", smatWrap(func(c *smatContext) { c.x = (c.x * c.y) % max_value })),
smat.ActionID('<'): smatAction("x<<", smatWrap(func(c *smatContext) { c.x = (c.x << 1) % max_value })),
smat.ActionID('^'): smatAction("swap", smatWrap(func(c *smatContext) { c.x, c.y = c.y, c.x })),
smat.ActionID('['): smatAction(" pushPair", smatWrap(smatPushPair)),
smat.ActionID(']'): smatAction(" popPair", smatWrap(smatPopPair)),
smat.ActionID('B'): smatAction(" setBit", smatWrap(smatSetBit)),
smat.ActionID('b'): smatAction(" removeBit", smatWrap(smatRemoveBit)),
smat.ActionID('o'): smatAction(" or", smatWrap(smatOr)),
smat.ActionID('a'): smatAction(" and", smatWrap(smatAnd)),
smat.ActionID('z'): smatAction(" xor", smatWrap(smatXor)),
smat.ActionID('#'): smatAction(" cardinality", smatWrap(smatCardinality)),
smat.ActionID('O'): smatAction(" orCardinality", smatWrap(smatOrCardinality)),
smat.ActionID('A'): smatAction(" andCardinality", smatWrap(smatAndCardinality)),
smat.ActionID('Z'): smatAction(" xorCardinality", smatWrap(smatXorCardinality)),
smat.ActionID('c'): smatAction(" clear", smatWrap(smatClear)),
smat.ActionID('r'): smatAction(" runOptimize", smatWrap(smatRunOptimize)),
smat.ActionID('e'): smatAction(" isEmpty", smatWrap(smatIsEmpty)),
smat.ActionID('i'): smatAction(" intersects", smatWrap(smatIntersects)),
smat.ActionID('f'): smatAction(" flip", smatWrap(smatFlip)),
smat.ActionID('-'): smatAction(" difference", smatWrap(smatDifference)),
}
var smatRunningPercentActions []smat.PercentAction
func init() {
var ids []int
for actionId := range smatActionMap {
ids = append(ids, int(actionId))
}
sort.Ints(ids)
pct := 100 / len(smatActionMap)
for _, actionId := range ids {
smatRunningPercentActions = append(smatRunningPercentActions,
smat.PercentAction{Percent: pct, Action: smat.ActionID(actionId)})
}
smatActionMap[smat.ActionID('S')] = smatAction("SETUP", smatSetupFunc)
smatActionMap[smat.ActionID('T')] = smatAction("TEARDOWN", smatTeardownFunc)
}
// We only have one smat state: running.
func smatRunning(next byte) smat.ActionID {
return smat.PercentExecute(next, smatRunningPercentActions...)
}
func smatAction(name string, f func(ctx smat.Context) (smat.State, error)) func(smat.Context) (smat.State, error) {
return func(ctx smat.Context) (smat.State, error) {
c := ctx.(*smatContext)
// Snapshot all pairs' bitmaps (base64 of MarshalBinary) before action
rec := actionRecord{Name: name, X: c.x, Y: c.y}
if len(c.pairs) > 0 {
rec.PairSnapshots = make([]string, 0, len(c.pairs))
for _, pair := range c.pairs {
if pair == nil || pair.bm == nil {
rec.PairSnapshots = append(rec.PairSnapshots, "<nil>")
continue
}
b, err := pair.bm.MarshalBinary()
if err != nil {
rec.PairSnapshots = append(rec.PairSnapshots, "<marshal-error:"+err.Error()+">")
} else {
rec.PairSnapshots = append(rec.PairSnapshots, base64.StdEncoding.EncodeToString(b))
}
}
}
// record per-context last action (no global mutex required)
if c != nil {
c.lastAction = &rec
}
// catch panics inside action to dump a repro and stack before re-panicking
defer func() {
if r := recover(); r != nil {
// best-effort: write quick repro with lastAction from context
var lastAction *actionRecord
if c != nil {
lastAction = c.lastAction
}
ts := time.Now().UnixNano()
repro := "// Reproducer generated by smat (panic)\n"
repro += "package roaring\n\n"
repro += "import (\n\t\"encoding/base64\"\n\t\"testing\"\n)\n\n"
repro += fmt.Sprintf("func TestFuzzerPanicRepro_%d(t *testing.T) {\n", ts)
// similar to checkEquals repro
if lastAction != nil && len(lastAction.PairSnapshots) > 0 {
pairIndex := lastAction.X % len(lastAction.PairSnapshots)
if pairIndex < len(lastAction.PairSnapshots) {
snapshot := lastAction.PairSnapshots[pairIndex]
if snapshot != "<nil>" && !strings.HasPrefix(snapshot, "<") {
repro += fmt.Sprintf("\tb, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshot)
repro += "\tbm := NewBitmap()\n"
repro += "\tbm.UnmarshalBinary(b)\n"
// perform the action that caused panic
if strings.Contains(lastAction.Name, "setBit") {
repro += fmt.Sprintf("\tbm.AddInt(%d)\n", lastAction.Y)
} else if strings.Contains(lastAction.Name, "removeBit") {
repro += fmt.Sprintf("\tbm.Remove(%d)\n", lastAction.Y)
} else if strings.Contains(lastAction.Name, "flip") {
repro += fmt.Sprintf("\tbm.Flip(uint64(%d), uint64(%d)+1)\n", lastAction.Y, lastAction.Y)
} else if strings.Contains(lastAction.Name, "runOptimize") {
repro += "\tbm.RunOptimize()\n"
} else if strings.Contains(lastAction.Name, "clear") {
repro += "\tbm.Clear()\n"
} else if lastAction.Name == " or" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.Or(bm2)\n"
}
}
} else if lastAction.Name == " and" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.And(bm2)\n"
}
}
} else if lastAction.Name == " difference" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.AndNot(bm2)\n"
}
}
} else if lastAction.Name == " xor" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.Xor(bm2)\n"
}
}
} else {
repro += fmt.Sprintf("\t// Unhandled action: %s\n", lastAction.Name)
}
} else {
repro += "\t// invalid snapshot\n"
}
}
}
repro += "}\n"
if path, werr := saveReproFile("smat_panic_repro", ts, repro); werr == nil {
fmt.Printf("wrote panic repro to %s\n", path)
} else {
fmt.Printf("failed writing panic repro: %v\n", werr)
}
fmt.Printf("PANIC in action %s: %v\n", rec.Name, r)
fmt.Printf("stack:\n%s\n", debug.Stack())
panic(r)
}
}()
c.actions++
return f(ctx)
}
}
// saveReproFile writes the given repro content to workdir/<prefix>_<ts>_test.go
// or falls back to the OS temp dir. Returns full path or error.
func saveReproFile(prefix string, ts int64, content string) (string, error) {
// try workdir
if err := os.MkdirAll("workdir", 0o755); err == nil {
fname := fmt.Sprintf("workdir/%s_%d_test.go", prefix, ts)
if err := os.WriteFile(fname, []byte(content), 0o644); err == nil {
return fname, nil
}
}
// fallback to temp
tmp := os.TempDir()
fname := fmt.Sprintf("%s_%d_test.go", prefix, ts)
full := filepath.Join(tmp, fname)
if err := os.WriteFile(full, []byte(content), 0o644); err == nil {
return full, nil
} else {
return "", err
}
}
// Creates an smat action func based on a simple callback.
func smatWrap(cb func(c *smatContext)) func(smat.Context) (next smat.State, err error) {
return func(ctx smat.Context) (next smat.State, err error) {
c := ctx.(*smatContext)
cb(c)
return smatRunning, nil
}
}
// Invokes a callback function with the input v bounded to len(c.pairs).
func (c *smatContext) withPair(v int, cb func(*smatPair)) {
if len(c.pairs) > 0 {
if v < 0 {
v = -v
}
v = v % len(c.pairs)
cb(c.pairs[v])
}
}
// ------------------------------------------------------------------
func smatSetupFunc(ctx smat.Context) (next smat.State, err error) {
return smatRunning, nil
}
func smatTeardownFunc(ctx smat.Context) (next smat.State, err error) {
return nil, err
}
// ------------------------------------------------------------------
func smatPushPair(c *smatContext) {
if len(c.pairs) >= max_pairs {
return
}
p := &smatPair{
bm: NewBitmap(),
bs: bitset.New(100),
ctx: c,
}
c.pairs = append(c.pairs, p)
}
func smatPopPair(c *smatContext) {
if len(c.pairs) > 0 {
c.pairs = c.pairs[0 : len(c.pairs)-1]
}
}
func smatSetBit(c *smatContext) {
c.withPair(c.x, func(p *smatPair) {
p.Validate()
y := uint32(c.y)
p.bm.AddInt(int(y))
p.bs.Set(uint(y))
p.checkEquals()
})
}
func smatRemoveBit(c *smatContext) {
c.withPair(c.x, func(p *smatPair) {
p.Validate()
y := uint32(c.y)
p.bm.Remove(y)
p.bs.Clear(uint(y))
p.checkEquals()
})
}
func smatAnd(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
px.bm.And(py.bm)
px.bs = px.bs.Intersection(py.bs)
px.checkEquals()
py.checkEquals()
})
})
}
func smatOr(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
px.bm.Or(py.bm)
px.bs = px.bs.Union(py.bs)
px.checkEquals()
py.checkEquals()
})
})
}
func smatXor(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
px.bm.Xor(py.bm)
px.bs = px.bs.SymmetricDifference(py.bs)
px.checkEquals()
py.checkEquals()
})
})
}
func smatAndCardinality(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
c0 := px.bm.AndCardinality(py.bm)
c1 := px.bs.IntersectionCardinality(py.bs)
if c0 != uint64(c1) {
panic("expected same add cardinality")
}
px.checkEquals()
py.checkEquals()
})
})
}
func smatOrCardinality(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
c0 := px.bm.OrCardinality(py.bm)
c1 := px.bs.UnionCardinality(py.bs)
if c0 != uint64(c1) {
panic("expected same or cardinality")
}
px.checkEquals()
py.checkEquals()
})
})
}
func smatXorCardinality(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
c0 := px.bm.OrCardinality(py.bm) - px.bm.AndCardinality(py.bm)
c1 := px.bs.SymmetricDifferenceCardinality(py.bs)
if c0 != uint64(c1) {
panic("expected same xor cardinality")
}
px.checkEquals()
py.checkEquals()
})
})
}
func smatRunOptimize(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
px.Validate()
px.bm.RunOptimize()
px.checkEquals()
})
}
func smatClear(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
px.Validate()
px.bm.Clear()
px.bs = px.bs.ClearAll()
px.checkEquals()
})
}
func smatCardinality(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c0 := px.bm.GetCardinality()
c1 := px.bs.Count()
if c0 != uint64(c1) {
panic("expected same cardinality")
}
})
}
func smatIsEmpty(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c0 := px.bm.IsEmpty()
c1 := px.bs.None()
if c0 != c1 {
panic("expected same is empty")
}
})
}
func smatIntersects(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
v0 := px.bm.Intersects(py.bm)
v1 := px.bs.IntersectionCardinality(py.bs) > 0
if v0 != v1 {
panic("intersects not equal")
}
px.checkEquals()
py.checkEquals()
})
})
}
func smatFlip(c *smatContext) {
c.withPair(c.x, func(p *smatPair) {
p.Validate()
y := uint32(c.y)
p.bm.Flip(uint64(y), uint64(y)+1)
p.bs = p.bs.Flip(uint(y))
p.checkEquals()
})
}
func smatDifference(c *smatContext) {
c.withPair(c.x, func(px *smatPair) {
c.withPair(c.y, func(py *smatPair) {
px.Validate()
py.Validate()
px.bm.AndNot(py.bm)
px.bs = px.bs.Difference(py.bs)
px.checkEquals()
py.checkEquals()
})
})
}
func (p *smatPair) checkEquals() {
valid := p.bm.Validate()
if valid != nil {
// marshal current bitmap
var curSnap string
if p != nil && p.bm != nil {
if b, err := p.bm.MarshalBinary(); err == nil {
curSnap = base64.StdEncoding.EncodeToString(b)
} else {
curSnap = "<marshal-error:" + err.Error() + ">"
}
} else {
curSnap = "<nil>"
}
// collect last action summary from context (per-worker)
last := "<none>"
if p != nil && p.ctx != nil {
c := p.ctx
if c.lastAction != nil {
last = fmt.Sprintf("action=%s x=%d y=%d pairs=%d", c.lastAction.Name, c.lastAction.X, c.lastAction.Y, len(c.lastAction.PairSnapshots))
}
}
// If debugging enabled, log extra info
smatLog("ERROR: ", "bitmap invalid: %v\n", valid)
// build a reproducible test snippet that reconstructs the bitmap and replays the failing action
ts := time.Now().UnixNano()
testName := fmt.Sprintf("TestFuzzerRepro_%d", ts)
repro := "// Reproducer generated by smat\n"
repro += "package roaring\n\n"
repro += "import (\n\t\"encoding/base64\"\n\t\"testing\"\n)\n\n"
repro += fmt.Sprintf("func %s(t *testing.T) {\n", testName)
var lastAction *actionRecord
if p != nil && p.ctx != nil {
lastAction = p.ctx.lastAction
}
// use the snapshot of the modified pair
if lastAction != nil && len(lastAction.PairSnapshots) > 0 {
// assume the modified pair is x % len(pairs), but since pairs are in order, and x is lastAction.X
pairIndex := lastAction.X % len(lastAction.PairSnapshots)
if pairIndex < len(lastAction.PairSnapshots) {
snapshot := lastAction.PairSnapshots[pairIndex]
if snapshot != "<nil>" && !strings.HasPrefix(snapshot, "<") {
repro += fmt.Sprintf("\tb, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshot)
repro += "\tbm := NewBitmap()\n"
repro += "\tbm.UnmarshalBinary(b)\n"
repro += "\tif err := bm.Validate(); err != nil {\n"
repro += "\t\tt.Errorf(\"Initial Validate failed: %v\", err)\n"
repro += "\t}\n"
// perform the action
if strings.Contains(lastAction.Name, "setBit") {
repro += fmt.Sprintf("\tbm.AddInt(%d)\n", lastAction.Y)
} else if strings.Contains(lastAction.Name, "removeBit") {
repro += fmt.Sprintf("\tbm.Remove(%d)\n", lastAction.Y)
} else if strings.Contains(lastAction.Name, "flip") {
repro += fmt.Sprintf("\tbm.Flip(uint64(%d), uint64(%d)+1)\n", lastAction.Y, lastAction.Y)
} else if strings.Contains(lastAction.Name, "runOptimize") {
repro += "\tbm.RunOptimize()\n"
} else if strings.Contains(lastAction.Name, "clear") {
repro += "\tbm.Clear()\n"
} else if lastAction.Name == " or" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.Or(bm2)\n"
}
}
} else if lastAction.Name == " and" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.And(bm2)\n"
}
}
} else if lastAction.Name == " difference" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.AndNot(bm2)\n"
}
}
} else if lastAction.Name == " xor" {
pairIndexY := lastAction.Y % len(lastAction.PairSnapshots)
if pairIndexY < len(lastAction.PairSnapshots) {
snapshotY := lastAction.PairSnapshots[pairIndexY]
if snapshotY != "<nil>" && !strings.HasPrefix(snapshotY, "<") {
repro += fmt.Sprintf("\tb2, _ := base64.StdEncoding.DecodeString(\"%s\")\n", snapshotY)
repro += "\tbm2 := NewBitmap()\n"
repro += "\tbm2.UnmarshalBinary(b2)\n"
repro += "\tbm.Xor(bm2)\n"
}
}
} else {
repro += fmt.Sprintf("\t// Unhandled action: %s\n", lastAction.Name)
}
repro += "\tif err := bm.Validate(); err != nil {\n"
repro += "\t\tt.Errorf(\"Validate failed: %v\", err)\n"
repro += "\t} else {\n"
repro += "\t\tt.Logf(\"Validate succeeded\")\n"
repro += "\t}\n"
} else {
repro += "\t// invalid snapshot\n"
}
}
}
repro += "}\n"
// print the repro snippet for the developer
fmt.Println()
fmt.Println("=== SMAT REPRODUCER SNIPPET ===")
if len(repro) > 10000 {
fmt.Println("// Reproducer too large, skipping full print")
} else {
fmt.Println(repro)
}
// also write the repro snippet to a timestamped file in workdir/
if len(repro) > 10000 {
repro = "// Reproducer too large, skipping\n"
}
if err := os.MkdirAll("workdir", 0o755); err == nil {
fname := fmt.Sprintf("workdir/smat_repro_%d_test.go", ts)
if werr := os.WriteFile(fname, []byte(repro), 0o644); werr == nil {
fmt.Printf("Wrote repro to %s\n", fname)
} else {
fmt.Printf("Failed writing repro file: %v\n", werr)
}
} else {
fmt.Printf("Failed creating workdir: %v\n", err)
}
panic(fmt.Sprintf("[checkEquals] bitmap invalid: %v\ncurrentBase64:%s\nlastAction:%s\n", valid, curSnap, last))
}
if !p.equalsBitSet(p.bs, p.bm) {
panic("bitset mismatch")
}
}
func (p *smatPair) Validate() {
valid := p.bm.Validate()
if valid != nil {
panic(fmt.Sprintf("[Validate] bitmap invalid: %v", valid))
}
}
func (p *smatPair) equalsBitSet(a *bitset.BitSet, b *Bitmap) bool {
for i, e := a.NextSet(0); e; i, e = a.NextSet(i + 1) {
if !b.ContainsInt(int(i)) {
fmt.Printf("in a bitset, not b bitmap, i: %d\n", i)
fmt.Printf(" a bitset: %s\n b bitmap: %s\n",
a.String(), b.String())
return false
}
}
i := b.Iterator()
for i.HasNext() {
v := i.Next()
if !a.Test(uint(v)) {
fmt.Printf("in b bitmap, not a bitset, v: %d\n", v)
fmt.Printf(" a bitset: %s\n b bitmap: %s\n",
a.String(), b.String())
return false
}
}
return true
}

313
vendor/github.com/RoaringBitmap/roaring/v2/util.go generated vendored Normal file
View File

@@ -0,0 +1,313 @@
package roaring
import (
"math"
"math/rand"
"sort"
)
const (
arrayDefaultMaxSize = 4096 // containers with 4096 or fewer integers should be array containers.
arrayLazyLowerBound = 1024
maxCapacity = 1 << 16
serialCookieNoRunContainer = 12346 // only arrays and bitmaps
invalidCardinality = -1
serialCookie = 12347 // runs, arrays, and bitmaps
noOffsetThreshold = 4
// MaxUint32 is the largest uint32 value.
MaxUint32 = math.MaxUint32
// MaxRange is One more than the maximum allowed bitmap bit index. For use as an upper
// bound for ranges.
MaxRange uint64 = MaxUint32 + 1
// MaxUint16 is the largest 16 bit unsigned int.
// This is the largest value an interval16 can store.
MaxUint16 = math.MaxUint16
// Compute wordSizeInBytes, the size of a word in bytes.
_m = ^uint64(0)
_logS = _m>>8&1 + _m>>16&1 + _m>>32&1
wordSizeInBytes = 1 << _logS
// other constants used in ctz_generic.go
wordSizeInBits = wordSizeInBytes << 3 // word size in bits
)
const maxWord = 1<<wordSizeInBits - 1
// doesn't apply to runContainers
func getSizeInBytesFromCardinality(card int) int {
if card > arrayDefaultMaxSize {
// bitmapContainer
return maxCapacity / 8
}
// arrayContainer
return 2 * card
}
func fill(arr []uint64, val uint64) {
for i := range arr {
arr[i] = val
}
}
func fillRange(arr []uint64, start, end int, val uint64) {
for i := start; i < end; i++ {
arr[i] = val
}
}
func fillArrayAND(container []uint16, bitmap1, bitmap2 []uint64) {
if len(bitmap1) != len(bitmap2) {
panic("array lengths don't match")
}
// TODO: rewrite in assembly
pos := 0
for k := range bitmap1 {
bitset := bitmap1[k] & bitmap2[k]
for bitset != 0 {
t := bitset & -bitset
container[pos] = uint16((k*64 + int(popcount(t-1))))
pos = pos + 1
bitset ^= t
}
}
}
func fillArrayANDNOT(container []uint16, bitmap1, bitmap2 []uint64) {
if len(bitmap1) != len(bitmap2) {
panic("array lengths don't match")
}
// TODO: rewrite in assembly
pos := 0
for k := range bitmap1 {
bitset := bitmap1[k] &^ bitmap2[k]
for bitset != 0 {
t := bitset & -bitset
container[pos] = uint16((k*64 + int(popcount(t-1))))
pos = pos + 1
bitset ^= t
}
}
}
func fillArrayXOR(container []uint16, bitmap1, bitmap2 []uint64) {
if len(bitmap1) != len(bitmap2) {
panic("array lengths don't match")
}
// TODO: rewrite in assembly
pos := 0
for k := 0; k < len(bitmap1); k++ {
bitset := bitmap1[k] ^ bitmap2[k]
for bitset != 0 {
t := bitset & -bitset
container[pos] = uint16((k*64 + int(popcount(t-1))))
pos = pos + 1
bitset ^= t
}
}
}
func highbits(x uint32) uint16 {
return uint16(x >> 16)
}
func lowbits(x uint32) uint16 {
return uint16(x & maxLowBit)
}
func combineLoHi16(lob uint16, hob uint16) uint32 {
return combineLoHi32(uint32(lob), uint32(hob))
}
func combineLoHi32(lob uint32, hob uint32) uint32 {
return uint32(lob) | (hob << 16)
}
const maxLowBit = 0xFFFF
func flipBitmapRange(bitmap []uint64, start int, end int) {
if start >= end {
return
}
firstword := start / 64
endword := (end - 1) / 64
bitmap[firstword] ^= ^(^uint64(0) << uint(start%64))
for i := firstword; i < endword; i++ {
bitmap[i] = ^bitmap[i]
}
bitmap[endword] ^= ^uint64(0) >> (uint(-end) % 64)
}
func resetBitmapRange(bitmap []uint64, start int, end int) {
if start >= end {
return
}
firstword := start / 64
endword := (end - 1) / 64
if firstword == endword {
bitmap[firstword] &= ^((^uint64(0) << uint(start%64)) & (^uint64(0) >> (uint(-end) % 64)))
return
}
bitmap[firstword] &= ^(^uint64(0) << uint(start%64))
for i := firstword + 1; i < endword; i++ {
bitmap[i] = 0
}
bitmap[endword] &= ^(^uint64(0) >> (uint(-end) % 64))
}
func setBitmapRange(bitmap []uint64, start int, end int) {
if start >= end {
return
}
firstword := start / 64
endword := (end - 1) / 64
if firstword == endword {
bitmap[firstword] |= (^uint64(0) << uint(start%64)) & (^uint64(0) >> (uint(-end) % 64))
return
}
bitmap[firstword] |= ^uint64(0) << uint(start%64)
for i := firstword + 1; i < endword; i++ {
bitmap[i] = ^uint64(0)
}
bitmap[endword] |= ^uint64(0) >> (uint(-end) % 64)
}
func flipBitmapRangeAndCardinalityChange(bitmap []uint64, start int, end int) int {
before := wordCardinalityForBitmapRange(bitmap, start, end)
flipBitmapRange(bitmap, start, end)
after := wordCardinalityForBitmapRange(bitmap, start, end)
return int(after - before)
}
func resetBitmapRangeAndCardinalityChange(bitmap []uint64, start int, end int) int {
before := wordCardinalityForBitmapRange(bitmap, start, end)
resetBitmapRange(bitmap, start, end)
after := wordCardinalityForBitmapRange(bitmap, start, end)
return int(after - before)
}
func setBitmapRangeAndCardinalityChange(bitmap []uint64, start int, end int) int {
before := wordCardinalityForBitmapRange(bitmap, start, end)
setBitmapRange(bitmap, start, end)
after := wordCardinalityForBitmapRange(bitmap, start, end)
return int(after - before)
}
func wordCardinalityForBitmapRange(bitmap []uint64, start int, end int) uint64 {
answer := uint64(0)
if start >= end {
return answer
}
firstword := start / 64
endword := (end - 1) / 64
for i := firstword; i <= endword; i++ {
answer += popcount(bitmap[i])
}
return answer
}
func selectBitPosition(w uint64, j int) int {
seen := 0
// Divide 64bit
part := w & 0xFFFFFFFF
n := popcount(part)
if n <= uint64(j) {
part = w >> 32
seen += 32
j -= int(n)
}
w = part
// Divide 32bit
part = w & 0xFFFF
n = popcount(part)
if n <= uint64(j) {
part = w >> 16
seen += 16
j -= int(n)
}
w = part
// Divide 16bit
part = w & 0xFF
n = popcount(part)
if n <= uint64(j) {
part = w >> 8
seen += 8
j -= int(n)
}
w = part
// Lookup in final byte
var counter uint
for counter = 0; counter < 8; counter++ {
j -= int((w >> counter) & 1)
if j < 0 {
break
}
}
return seen + int(counter)
}
func panicOn(err error) {
if err != nil {
panic(err)
}
}
type ph struct {
orig int
rand int
}
type pha []ph
func (p pha) Len() int { return len(p) }
func (p pha) Less(i, j int) bool { return p[i].rand < p[j].rand }
func (p pha) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func getRandomPermutation(n int) []int {
r := make([]ph, n)
for i := 0; i < n; i++ {
r[i].orig = i
r[i].rand = rand.Intn(1 << 29)
}
sort.Sort(pha(r))
m := make([]int, n)
for i := range m {
m[i] = r[i].orig
}
return m
}
func minOfInt(a, b int) int {
if a < b {
return a
}
return b
}
func maxOfInt(a, b int) int {
if a > b {
return a
}
return b
}
func maxOfUint16(a, b uint16) uint16 {
if a > b {
return a
}
return b
}
func minOfUint16(a, b uint16) uint16 {
if a < b {
return a
}
return b
}

26
vendor/github.com/bits-and-blooms/bitset/.gitignore generated vendored Normal file
View File

@@ -0,0 +1,26 @@
# Compiled Object files, Static and Dynamic libs (Shared Objects)
*.o
*.a
*.so
# Folders
_obj
_test
# Architecture specific extensions/prefixes
*.[568vq]
[568vq].out
*.cgo1.go
*.cgo2.c
_cgo_defun.c
_cgo_gotypes.go
_cgo_export.*
_testmain.go
*.exe
*.test
*.prof
target

37
vendor/github.com/bits-and-blooms/bitset/.travis.yml generated vendored Normal file
View File

@@ -0,0 +1,37 @@
language: go
sudo: false
branches:
except:
- release
branches:
only:
- master
- travis
go:
- "1.11.x"
- tip
matrix:
allow_failures:
- go: tip
before_install:
- if [ -n "$GH_USER" ]; then git config --global github.user ${GH_USER}; fi;
- if [ -n "$GH_TOKEN" ]; then git config --global github.token ${GH_TOKEN}; fi;
- go get github.com/mattn/goveralls
before_script:
- make deps
script:
- make qa
after_failure:
- cat ./target/test/report.xml
after_success:
- if [ "$TRAVIS_GO_VERSION" = "1.11.1" ]; then $HOME/gopath/bin/goveralls -covermode=count -coverprofile=target/report/coverage.out -service=travis-ci; fi;

27
vendor/github.com/bits-and-blooms/bitset/LICENSE generated vendored Normal file
View File

@@ -0,0 +1,27 @@
Copyright (c) 2014 Will Fitzgerald. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

176
vendor/github.com/bits-and-blooms/bitset/README.md generated vendored Normal file
View File

@@ -0,0 +1,176 @@
# bitset
*Go language library to map between non-negative integers and boolean values*
[![Test](https://github.com/bits-and-blooms/bitset/workflows/Test/badge.svg)](https://github.com/willf/bitset/actions?query=workflow%3ATest)
[![Go Report Card](https://goreportcard.com/badge/github.com/willf/bitset)](https://goreportcard.com/report/github.com/willf/bitset)
[![PkgGoDev](https://pkg.go.dev/badge/github.com/bits-and-blooms/bitset?tab=doc)](https://pkg.go.dev/github.com/bits-and-blooms/bitset?tab=doc)
This library is part of the [awesome go collection](https://github.com/avelino/awesome-go). It is used in production by several important systems:
* [beego](https://github.com/beego/beego)
* [CubeFS](https://github.com/cubefs/cubefs)
* [Amazon EKS Distro](https://github.com/aws/eks-distro)
* [sourcegraph](https://github.com/sourcegraph/sourcegraph-public-snapshot)
* [torrent](https://github.com/anacrolix/torrent)
## Description
Package bitset implements bitsets, a mapping between non-negative integers and boolean values.
It should be more efficient than map[uint] bool.
It provides methods for setting, clearing, flipping, and testing individual integers.
But it also provides set intersection, union, difference, complement, and symmetric operations, as well as tests to check whether any, all, or no bits are set, and querying a bitset's current length and number of positive bits.
BitSets are expanded to the size of the largest set bit; the memory allocation is approximately Max bits, where Max is the largest set bit. BitSets are never shrunk automatically, but `Shrink` and `Compact` methods are available. On creation, a hint can be given for the number of bits that will be used.
Many of the methods, including Set, Clear, and Flip, return a BitSet pointer, which allows for chaining.
### Example use:
```go
package main
import (
"fmt"
"math/rand"
"github.com/bits-and-blooms/bitset"
)
func main() {
fmt.Printf("Hello from BitSet!\n")
var b bitset.BitSet
// play some Go Fish
for i := 0; i < 100; i++ {
card1 := uint(rand.Intn(52))
card2 := uint(rand.Intn(52))
b.Set(card1)
if b.Test(card2) {
fmt.Println("Go Fish!")
}
b.Clear(card1)
}
// Chaining
b.Set(10).Set(11)
for i, e := b.NextSet(0); e; i, e = b.NextSet(i + 1) {
fmt.Println("The following bit is set:", i)
}
if b.Intersection(bitset.New(100).Set(10)).Count() == 1 {
fmt.Println("Intersection works.")
} else {
fmt.Println("Intersection doesn't work???")
}
}
```
If you have Go 1.23 or better, you can iterate over the set bits like so:
```go
for i := range b.EachSet() {}
```
Package documentation is at: https://pkg.go.dev/github.com/bits-and-blooms/bitset?tab=doc
## Serialization
You may serialize a bitset safely and portably to a stream
of bytes as follows:
```Go
const length = 9585
const oneEvery = 97
bs := bitset.New(length)
// Add some bits
for i := uint(0); i < length; i += oneEvery {
bs = bs.Set(i)
}
var buf bytes.Buffer
n, err := bs.WriteTo(&buf)
if err != nil {
// failure
}
// Here n == buf.Len()
```
You can later deserialize the result as follows:
```Go
// Read back from buf
bs = bitset.New()
n, err = bs.ReadFrom(&buf)
if err != nil {
// error
}
// n is the number of bytes read
```
The `ReadFrom` function attempts to read the data into the existing
BitSet instance, to minimize memory allocations.
*Performance tip*:
When reading and writing to a file or a network connection, you may get better performance by
wrapping your streams with `bufio` instances.
E.g.,
```Go
f, err := os.Create("myfile")
w := bufio.NewWriter(f)
```
```Go
f, err := os.Open("myfile")
r := bufio.NewReader(f)
```
## Memory Usage
The memory usage of a bitset using `N` bits is at least `N/8` bytes. The number of bits in a bitset is at least as large as one plus the greatest bit index you have accessed. Thus it is possible to run out of memory while using a bitset. If you have lots of bits, you might prefer compressed bitsets, like the [Roaring bitmaps](https://roaringbitmap.org) and its [Go implementation](https://github.com/RoaringBitmap/roaring).
The `roaring` library allows you to go back and forth between compressed Roaring bitmaps and the conventional bitset instances:
```Go
mybitset := roaringbitmap.ToBitSet()
newroaringbitmap := roaring.FromBitSet(mybitset)
```
### Goroutine safety
In general, it's not safe to access the same BitSet using different goroutines--they are unsynchronized for performance.
Should you want to access a BitSet from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *BitSet around (in Go style; so there is only ever one owner), or by using `sync.Mutex` to serialize operations on BitSets.
## Installation
```bash
go get github.com/bits-and-blooms/bitset
```
## Contributing
If you wish to contribute to this project, please branch and issue a pull request against master ("[GitHub Flow](https://guides.github.com/introduction/flow/)")
## Running all tests
Before committing the code, please check if it passes tests, has adequate coverage, etc.
```bash
go test
go test -cover
```
## Stars
[![Star History Chart](https://api.star-history.com/svg?repos=bits-and-blooms/bitset&type=Date)](https://www.star-history.com/#bits-and-blooms/bitset&Date)
## Further reading
<p>Mastering Programming: From Testing to Performance in Go</p>
<div><a href="https://www.amazon.com/dp/B0FMPGSWR5"><img style="margin-left: auto; margin-right: auto;" src="https://m.media-amazon.com/images/I/61feneHS7kL._SL1499_.jpg" alt="" width="250px" /></a></div>

5
vendor/github.com/bits-and-blooms/bitset/SECURITY.md generated vendored Normal file
View File

@@ -0,0 +1,5 @@
# Security Policy
## Reporting a Vulnerability
You can report privately a vulnerability by email at daniel@lemire.me (current maintainer).

View File

@@ -0,0 +1,39 @@
# Go
# Build your Go project.
# Add steps that test, save build artifacts, deploy, and more:
# https://docs.microsoft.com/azure/devops/pipelines/languages/go
trigger:
- master
pool:
vmImage: 'Ubuntu-16.04'
variables:
GOBIN: '$(GOPATH)/bin' # Go binaries path
GOROOT: '/usr/local/go1.11' # Go installation path
GOPATH: '$(system.defaultWorkingDirectory)/gopath' # Go workspace path
modulePath: '$(GOPATH)/src/github.com/$(build.repository.name)' # Path to the module's code
steps:
- script: |
mkdir -p '$(GOBIN)'
mkdir -p '$(GOPATH)/pkg'
mkdir -p '$(modulePath)'
shopt -s extglob
shopt -s dotglob
mv !(gopath) '$(modulePath)'
echo '##vso[task.prependpath]$(GOBIN)'
echo '##vso[task.prependpath]$(GOROOT)/bin'
displayName: 'Set up the Go workspace'
- script: |
go version
go get -v -t -d ./...
if [ -f Gopkg.toml ]; then
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
dep ensure
fi
go build -v .
workingDirectory: '$(modulePath)'
displayName: 'Get dependencies, then build'

1767
vendor/github.com/bits-and-blooms/bitset/bitset.go generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,23 @@
//go:build go1.23
// +build go1.23
package bitset
import (
"iter"
"math/bits"
)
func (b *BitSet) EachSet() iter.Seq[uint] {
return func(yield func(uint) bool) {
for wordIndex, word := range b.set {
idx := 0
for trail := bits.TrailingZeros64(word); trail != 64; trail = bits.TrailingZeros64(word >> idx) {
if !yield(uint(wordIndex<<log2WordSize + idx + trail)) {
return
}
idx += trail + 1
}
}
}
}

8866
vendor/github.com/bits-and-blooms/bitset/pext.gen.go generated vendored Normal file

File diff suppressed because it is too large Load Diff

52
vendor/github.com/bits-and-blooms/bitset/popcnt.go generated vendored Normal file
View File

@@ -0,0 +1,52 @@
package bitset
import "math/bits"
func popcntSlice(s []uint64) (cnt uint64) {
for _, x := range s {
cnt += uint64(bits.OnesCount64(x))
}
return
}
func popcntMaskSlice(s, m []uint64) (cnt uint64) {
// The next line is to help the bounds checker, it matters!
_ = m[len(s)-1] // BCE
for i := range s {
cnt += uint64(bits.OnesCount64(s[i] &^ m[i]))
}
return
}
// popcntAndSlice computes the population count of the AND of two slices.
// It assumes that len(m) >= len(s) > 0.
func popcntAndSlice(s, m []uint64) (cnt uint64) {
// The next line is to help the bounds checker, it matters!
_ = m[len(s)-1] // BCE
for i := range s {
cnt += uint64(bits.OnesCount64(s[i] & m[i]))
}
return
}
// popcntOrSlice computes the population count of the OR of two slices.
// It assumes that len(m) >= len(s) > 0.
func popcntOrSlice(s, m []uint64) (cnt uint64) {
// The next line is to help the bounds checker, it matters!
_ = m[len(s)-1] // BCE
for i := range s {
cnt += uint64(bits.OnesCount64(s[i] | m[i]))
}
return
}
// popcntXorSlice computes the population count of the XOR of two slices.
// It assumes that len(m) >= len(s) > 0.
func popcntXorSlice(s, m []uint64) (cnt uint64) {
// The next line is to help the bounds checker, it matters!
_ = m[len(s)-1] // BCE
for i := range s {
cnt += uint64(bits.OnesCount64(s[i] ^ m[i]))
}
return
}

47
vendor/github.com/bits-and-blooms/bitset/select.go generated vendored Normal file
View File

@@ -0,0 +1,47 @@
package bitset
import "math/bits"
func select64(w uint64, j uint) uint {
seen := 0
// Divide 64bit
part := w & 0xFFFFFFFF
n := uint(bits.OnesCount64(part))
if n <= j {
part = w >> 32
seen += 32
j -= n
}
ww := part
// Divide 32bit
part = ww & 0xFFFF
n = uint(bits.OnesCount64(part))
if n <= j {
part = ww >> 16
seen += 16
j -= n
}
ww = part
// Divide 16bit
part = ww & 0xFF
n = uint(bits.OnesCount64(part))
if n <= j {
part = ww >> 8
seen += 8
j -= n
}
ww = part
// Lookup in final byte
counter := 0
for ; counter < 8; counter++ {
j -= uint((ww >> counter) & 1)
if j+1 == 0 {
break
}
}
return uint(seen + counter)
}

14
vendor/github.com/mschoch/smat/.gitignore generated vendored Normal file
View File

@@ -0,0 +1,14 @@
#*
*.sublime-*
*~
.#*
.project
.settings
**/.idea/
**/*.iml
/examples/bolt/boltsmat-fuzz.zip
/examples/bolt/workdir/
.DS_Store
coverage.out
*.test
tags

Some files were not shown because too many files have changed in this diff Show More