lib/storage: do not change timestamps to constant rate if values are constant or have constant delta

This breaks the original timestamps, which results in issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/120 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/141 .
app/vmstorage: add vm_concurrent_addrows_* metrics for tracking concurrency for Storage.AddRows calls
2026-06-07 19:06:17 +03:00 · 2019-08-06 15:40:07 +03:00 · 2019-08-06 15:08:33 +03:00 · 2019-08-05 19:21:36 +03:00 · 2019-08-05 18:30:57 +03:00 · 2019-08-05 10:33:21 +03:00
222 changed files with 18733 additions and 560 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,30 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Version**
+The line returned when passing `--version` command line flag to binary. For example:
+```
+$ ./victoria-metrics-prod --version
+victoria-metrics-20190730-121249-heads-single-node-0-g671d9e55
+```
+
+**Additional context**
+Add any other context about the problem here such as error logs, `/metrics` output, screenshots from [the official Grafana dashboard for VictoriaMetrics](https://grafana.com/dashboards/10229).
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
--- a/.gitignore
+++ b/.gitignore
@@ -9,3 +9,7 @@
 /victoria-metrics-data
 /vmstorage-data
 /vmselect-cache
+/package/temp-deb-*
+/package/temp-rpm-*
+/package/*.deb
+/package/*.rpm
--- a/.travis.yml
+++ b/.travis.yml
@@ -15,8 +15,12 @@ before_install:
 script:
  - make check_all
  - git diff --exit-code
-  - make test_full
+  - make test-full
+  - make test-pure
  - make victoria-metrics
+  - make victoria-metrics-pure
+  - make victoria-metrics-arm
+  - make victoria-metrics-arm64

 after_success:
-  - bash <(curl -s https://codecov.io/bash)
+  - bash <(curl -s https://codecov.io/bash)
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,76 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to making participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+
+## Our Standards
+
+Examples of behavior that contributes to creating a positive environment
+include:
+
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery and unwelcome sexual attention or
+ advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+ address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+ professional setting
+
+## Our Responsibilities
+
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+## Scope
+
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community. Examples of
+representing a project or community include using an official project e-mail
+address, posting via an official social media account, or acting as an appointed
+representative at an online or offline event. Representation of a project may be
+further defined and clarified by project maintainers.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at info@victoriametrics.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+
+[homepage]: https://www.contributor-covenant.org
+
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,16 @@
+If you like VictoriaMetrics and want to contribute, then we need the following:
+
+- Filing issues and feature requests [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
+- Spreading a word about VictoriaMetrics: conference talks, articles, comments, experience sharing with colleagues.
+- Updating documentation.
+
+We are open to third-party pull requests provided they follow [KISS design principle](https://en.wikipedia.org/wiki/KISS_principle):
+
+- Prefer simple code and architecture.
+- Avoid complex abstractions.
+- Avoid magic code and fancy algorithms.
+- Avoid [big external dependencies](https://medium.com/@valyala/stripping-dependency-bloat-in-victoriametrics-docker-image-983fb5912b0d).
+- Minimize the number of moving parts in the distributed system.
+- Avoid automated decisions, which may hurt cluster availability, consistency or performance.
+
+Adhering `KISS` principle simplifies the resulting code and architecture, so it can be reviewed, understood and verified by many people.
--- a/12
+++ b/12
@@ -53,16 +53,22 @@ install-errcheck:
 check_all: fmt vet lint errcheck golangci-lint

 test:
-	GO111MODULE=on go test -mod=vendor ./lib/...
-	GO111MODULE=on go test -mod=vendor ./app/...
+	GO111MODULE=on go test -tags=integration -mod=vendor ./lib/... ./app/...

-test_full:
+test-pure:
+	GO111MODULE=on CGO_ENABLED=0 go test -tags=integration -mod=vendor ./lib/... ./app/...
+
+test-full:
 	GO111MODULE=on go test -tags=integration -mod=vendor -coverprofile=coverage.txt -covermode=atomic ./lib/... ./app/...

 benchmark:
 	GO111MODULE=on go test -mod=vendor -bench=. ./lib/...
 	GO111MODULE=on go test -mod=vendor -bench=. ./app/...

+benchmark-pure:
+	GO111MODULE=on CGO_ENABLED=0 go test -mod=vendor -bench=. ./lib/...
+	GO111MODULE=on CGO_ENABLED=0 go test -mod=vendor -bench=. ./app/...
+
 vendor-update:
 	GO111MODULE=on go get -u ./lib/...
 	GO111MODULE=on go get -u ./app/...
--- a/README.md
+++ b/README.md
@@ -1,4 +1,5 @@
 [![Latest Release](https://img.shields.io/github/release/VictoriaMetrics/VictoriaMetrics.svg?style=flat-square)](https://github.com/VictoriaMetrics/VictoriaMetrics/releases/latest)
+[![Slack](https://img.shields.io/badge/join%20slack-%23victoriametrics-brightgreen.svg)](http://slack.victoriametrics.com/)
 [![GitHub license](https://img.shields.io/github/license/VictoriaMetrics/VictoriaMetrics.svg)](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/LICENSE)
 [![Go Report](https://goreportcard.com/badge/github.com/VictoriaMetrics/VictoriaMetrics)](https://goreportcard.com/report/github.com/VictoriaMetrics/VictoriaMetrics)
 [![Build Status](https://travis-ci.org/VictoriaMetrics/VictoriaMetrics.svg?branch=master)](https://travis-ci.org/VictoriaMetrics/VictoriaMetrics)
@@ -8,7 +9,7 @@

 ## Single-node VictoriaMetrics

-VictoriaMetrics is fast, cost-effective and scalable time series database. It can be used as a long-term remote storage for Prometheus.
+VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-term remote storage for Prometheus.
 It is available in [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases),
 [docker images](https://hub.docker.com/r/victoriametrics/victoria-metrics/) and
 in [source code](https://github.com/VictoriaMetrics/VictoriaMetrics).
@@ -26,8 +27,8 @@ Cluster version is available [here](https://github.com/VictoriaMetrics/VictoriaM
  [Outperforms InfluxDB and TimescaleDB by up to 20x](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).
 * [Uses 10x less RAM than InfluxDB](https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893) when working with millions of unique time series (aka high cardinality).
 * High data compression, so [up to 70x more data points](https://medium.com/@valyala/when-size-matters-benchmarking-victoriametrics-vs-timescale-and-influxdb-6035811952d4)
-  may be crammed into a limited storage comparing to TimescaleDB.
-* Optimized for storage with high-latency IO and low iops (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc). See [graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
+  may be crammed into limited storage comparing to TimescaleDB.
+* Optimized for storage with high-latency IO and low IOPS (HDD and network storage in AWS, Google Cloud, Microsoft Azure, etc). See [graphs from these benchmarks](https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-13e6ee64dd6b).
 * A single-node VictoriaMetrics may substitute moderately sized clusters built with competing solutions such as Thanos, Uber M3, Cortex, InfluxDB or TimescaleDB.
  See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae)
  and [comparing Thanos to VictoriaMetrics cluster](https://medium.com/@valyala/comparing-thanos-to-victoriametrics-cluster-b193bea1683).
@@ -52,20 +53,24 @@ Cluster version is available [here](https://github.com/VictoriaMetrics/VictoriaM

 ### Table of contents

-  - [How to build from sources](#how-to-build-from-sources)
-    - [Development build](#development-build)
-    - [Production build](#production-build)
-    - [Building docker images](#building-docker-images)
  - [How to start VictoriaMetrics](#how-to-start-victoriametrics)
-  - [Setting up service](#setting-up-service)
-  - [Third-party contributions](#third-party-contributions)
  - [Prometheus setup](#prometheus-setup)
  - [Grafana setup](#grafana-setup)
  - [How to upgrade VictoriaMetrics?](#how-to-upgrade-victoriametrics)
  - [How to apply new config to VictoriaMetrics?](#how-to-apply-new-config-to-victoriametrics)
  - [How to send data from InfluxDB-compatible agents such as Telegraf?](#how-to-send-data-from-influxdb-compatible-agents-such-as-telegraf)
  - [How to send data from Graphite-compatible agents such as StatsD?](#how-to-send-data-from-graphite-compatible-agents-such-as-statsd)
+  - [Querying Graphite data](#querying-graphite-data)
  - [How to send data from OpenTSDB-compatible agents?](#how-to-send-data-from-opentsdb-compatible-agents)
+  - [How to build from sources](#how-to-build-from-sources)
+    - [Development build](#development-build)
+    - [Production build](#production-build)
+    - [ARM build](#arm-build)
+    - [Pure Go build (CGO_ENABLED=0)](#pure-go-build-cgo_enabled0)
+    - [Building docker images](#building-docker-images)
+  - [Start with docker-compose](#start-with-docker-compose)
+  - [Setting up service](#setting-up-service)
+  - [Third-party contributions](#third-party-contributions)
  - [How to work with snapshots?](#how-to-work-with-snapshots)
  - [How to delete time series?](#how-to-delete-time-series)
  - [How to export time series?](#how-to-export-time-series)
@@ -81,6 +86,7 @@ Cluster version is available [here](https://github.com/VictoriaMetrics/VictoriaM
  - [Tuning](#tuning)
  - [Monitoring](#monitoring)
  - [Troubleshooting](#troubleshooting)
+- [Roadmap](#roadmap)
 - [Contacts](#contacts)
 - [Community and contributions](#community-and-contributions)
 - [Reporting bugs](#reporting-bugs)
@@ -91,56 +97,22 @@ Cluster version is available [here](https://github.com/VictoriaMetrics/VictoriaM
  - [We kindly ask:](#we-kindly-ask)


-### How to build from sources
-
-We recommend using either [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or
-[docker images](https://hub.docker.com/r/victoriametrics/victoria-metrics/) instead of building VictoriaMetrics
-from sources. Building from sources is reasonable when developing an additional features specific
-to your needs.
-
-
-#### Development build
-
-1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12.
-2. Run `make victoria-metrics` from the root folder of the repository.
-   It will build `victoria-metrics` binary and put it into the `bin` folder.
-
-#### Production build
-
-1. [Install docker](https://docs.docker.com/install/).
-2. Run `make victoria-metrics-prod` from the root folder of the repository.
-   It will build `victoria-metrics-prod` binary and put it into the `bin` folder.
-
-#### Building docker images
-
-Run `make package-victoria-metrics`. It will build `victoriametrics/victoria-metrics:<PKG_TAG>` docker image locally.
-`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
-The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-victoria-metrics`.
-
-
 ### How to start VictoriaMetrics

-Just start VictoriaMetrics executable or docker image with the desired command-line flags.
+Just start VictoriaMetrics [executable](https://github.com/VictoriaMetrics/VictoriaMetrics/releases)
+or [docker image](https://hub.docker.com/r/victoriametrics/victoria-metrics/) with the desired command-line flags.

-The following command line flags are used the most:
+The following command-line flags are used the most:

 * `-storageDataPath` - path to data directory. VictoriaMetrics stores all the data in this directory.
 * `-retentionPeriod` - retention period in months for the data. Older data is automatically deleted.
-* `-httpListenAddr` - TCP address to listen to for http requests. By default it listens port `8428` on all the network interfaces.
-* `-graphiteListenAddr` - TCP and UDP address to listen to for Graphite data. By default it is disabled.
-* `-opentsdbListenAddr` - TCP and UDP address to listen to for OpenTSDB data. By default it is disabled.
+* `-httpListenAddr` - TCP address to listen to for http requests. By default, it listens port `8428` on all the network interfaces.
+* `-graphiteListenAddr` - TCP and UDP address to listen to for Graphite data. By default, it is disabled.
+* `-opentsdbListenAddr` - TCP and UDP address to listen to for OpenTSDB data. By default, it is disabled.

 Pass `-help` to see all the available flags with description and default values.

-
-### Setting up service
-
-Read [these instructions](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/43) on how to set up VictoriaMetrics as a service in your OS.
-
-
-### Third-party contributions
-
-* [Unofficial yum repository](https://copr.fedorainfracloud.org/coprs/antonpatsev/VictoriaMetrics/) ([source code](https://github.com/patsevanton/victoriametrics-rpm))
+It is recommended setting up [monitoring](#monitoring) for VictoriaMetrics.


 ### Prometheus setup
@@ -166,7 +138,7 @@ Prometheus writes incoming data to local storage and replicates it to remote sto
 This means the data remains available in local storage for `--storage.tsdb.retention.time` duration
 even if remote storage is unavailable.

-If you plan sending data to VictoriaMetrics from multiple Prometheus instances, then add the following lines into `global` section
+If you plan to send data to VictoriaMetrics from multiple Prometheus instances, then add the following lines into `global` section
 of [Prometheus config](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#configuration-file):

 ```yml
@@ -177,7 +149,7 @@ global:

 This instructs Prometheus to add `datacenter=dc-123` label to each time series sent to remote storage.
 The label name may be arbitrary - `datacenter` is just an example. The label value must be unique
-across Prometheus instances, so time series may be filtered and grouped by this label.
+across Prometheus instances, so those time series may be filtered and grouped by this label.


 It is recommended upgrading Prometheus to [v2.10.0](https://github.com/prometheus/prometheus/releases) or newer,
@@ -217,7 +189,7 @@ VictoriaMetrics must be restarted for applying new config:

 1) Send `SIGINT` signal to VictoriaMetrics process in order to gracefully stop it.
 2) Wait until the process stops. This can take a few seconds.
-3) Start VictoriaMetrics with new config.
+3) Start VictoriaMetrics with the new config.


 ### How to send data from InfluxDB-compatible agents such as [Telegraf](https://www.influxdata.com/time-series-platform/telegraf/)?
@@ -260,7 +232,7 @@ to local VictoriaMetrics using `curl`:
 curl -d 'measurement,tag1=value1,tag2=value2 field1=123,field2=1.23' -X POST 'http://localhost:8428/write'
 ```

-Arbitrary number of lines delimited by '\n' may be sent in a single request.
+An arbitrary number of lines delimited by '\n' may be sent in a single request.
 After that the data may be read via [/api/v1/export](#how-to-export-time-series) endpoint:

 ```
@@ -294,8 +266,8 @@ Example for writing data with Graphite plaintext protocol to local VictoriaMetri
 echo "foo.bar.baz;tag1=value1;tag2=value2 123 `date +%s`" | nc -N localhost 2003
 ```

-VictoriaMetrics sets the current time if timestamp is omitted.
-Arbitrary number of lines delimited by `\n` may be sent in one go.
+VictoriaMetrics sets the current time if the timestamp is omitted.
+An arbitrary number of lines delimited by `\n` may be sent in one go.
 After that the data may be read via [/api/v1/export](#how-to-export-time-series) endpoint:

 ```
@@ -309,6 +281,14 @@ The `/api/v1/export` endpoint should return the following response:
 ```


+### Querying Graphite data
+
+Data sent to VictoriaMetrics via `Graphite plaintext protocol` may be read either via
+[Prometheus querying API](https://prometheus.io/docs/prometheus/latest/querying/api/)
+or via [go-graphite/carbonapi](https://github.com/go-graphite/carbonapi/blob/master/cmd/carbonapi/carbonapi.example.prometheus.yaml).
+
+
+
 ### How to send data from OpenTSDB-compatible agents?

 1) Enable OpenTSDB receiver in VictoriaMetrics by setting `-opentsdbListenAddr` command line flag. For instance,
@@ -327,7 +307,7 @@ Example for writing data with OpenTSDB protocol to local VictoriaMetrics using `
 echo "put foo.bar.baz `date +%s` 123 tag1=value1 tag2=value2" | nc -N localhost 4242
 ```

-Arbitrary number of lines delimited by `\n` may be sent in one go.
+An arbitrary number of lines delimited by `\n` may be sent in one go.
 After that the data may be read via [/api/v1/export](#how-to-export-time-series) endpoint:

 ```
@@ -341,9 +321,79 @@ The `/api/v1/export` endpoint should return the following response:
 ```


+### How to build from sources
+
+We recommend using either [binary releases](https://github.com/VictoriaMetrics/VictoriaMetrics/releases) or
+[docker images](https://hub.docker.com/r/victoriametrics/victoria-metrics/) instead of building VictoriaMetrics
+from sources. Building from sources is reasonable when developing additional features specific
+to your needs.
+
+
+#### Development build
+
+1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12.
+2. Run `make victoria-metrics` from the root folder of the repository.
+   It builds `victoria-metrics` binary and puts it into the `bin` folder.
+
+#### Production build
+
+1. [Install docker](https://docs.docker.com/install/).
+2. Run `make victoria-metrics-prod` from the root folder of the repository.
+   It builds `victoria-metrics-prod` binary and puts it into the `bin` folder.
+
+#### ARM build
+
+ARM build may run on Raspberry Pi or on [energy-efficient ARM servers](https://blog.cloudflare.com/arm-takes-wing/).
+
+#### Development ARM build
+
+1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12.
+2. Run `make victoria-metrics-arm` or `make victoria-metrics-arm64` from the root folder of the repository.
+   It builds `victoria-metrics-arm` or `victoria-metrics-arm64` binary respectively and puts it into the `bin` folder.
+
+#### Production ARM build
+
+1. [Install docker](https://docs.docker.com/install/).
+2. Run `make victoria-metrics-arm-prod` or `make victoria-metrics-arm64-prod` from the root folder of the repository.
+   It builds `victoria-metrics-arm-prod` or `victoria-metrics-arm64-prod` binary respectively and puts it into the `bin` folder.
+
+#### Pure Go build (CGO_ENABLED=0)
+
+`Pure Go` mode builds only Go code without [cgo](https://golang.org/cmd/cgo/) dependencies.
+This is an experimental mode, which may result in a lower compression ratio and slower decompression performance.
+Use it with caution!
+
+1. [Install Go](https://golang.org/doc/install). The minimum supported version is Go 1.12.
+2. Run `make victoria-metrics-pure` from the root folder of the repository.
+   It builds `victoria-metrics-pure` binary and puts it into the `bin` folder.
+
+#### Building docker images
+
+Run `make package-victoria-metrics`. It builds `victoriametrics/victoria-metrics:<PKG_TAG>` docker image locally.
+`<PKG_TAG>` is auto-generated image tag, which depends on source code in the repository.
+The `<PKG_TAG>` may be manually set via `PKG_TAG=foobar make package-victoria-metrics`.
+
+
+### Start with docker-compose
+
+[Docker-compose](https://github.com/VictoriaMetrics/VictoriaMetrics/blob/master/deployment/docker/docker-compose.yml)
+helps to spin up VictoriaMetrics, Prometheus and Grafana with one command.
+More details may be found [here](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/master/deployment/docker#folder-contains-basic-images-and-tools-for-building-and-running-victoria-metrics-in-docker).
+
+
+### Setting up service
+
+Read [these instructions](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/43) on how to set up VictoriaMetrics as a service in your OS.
+
+
+### Third-party contributions
+
+* [Unofficial yum repository](https://copr.fedorainfracloud.org/coprs/antonpatsev/VictoriaMetrics/) ([source code](https://github.com/patsevanton/victoriametrics-rpm))
+
+
 ### How to work with snapshots?

-VictoriaMetrics is able to create [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
+VictoriaMetrics can create [instant snapshots](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282)
 for all the data stored under `-storageDataPath` directory.
 Navigate to `http://<victoriametrics-addr>:8428/snapshot/create` in order to create an instant snapshot.
 The page will return the following JSON response:
@@ -400,15 +450,15 @@ VictoriaMetrics exports [Prometheus-compatible federation data](https://promethe
 at `http://<victoriametrics-addr>:8428/federate?match[]=<timeseries_selector_for_federation>`.

 Optional `start` and `end` args may be added to the request in order to scrape the last point for each selected time series on the `[start ... end]` interval.
-`start` and `end` may contain either unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values. By default the last point
-on the interval `[now - max_lookback ... now]` is scraped for each time series. Default value for `max_lookback` is `5m` (5 minutes), but can be overridden.
+`start` and `end` may contain either unix timestamp in seconds or [RFC3339](https://www.ietf.org/rfc/rfc3339.txt) values. By default, the last point
+on the interval `[now - max_lookback ... now]` is scraped for each time series. The default value for `max_lookback` is `5m` (5 minutes), but can be overridden.
 For instance, `/federate?match[]=up&max_lookback=1h` would return last points on the `[now - 1h ... now]` interval. This may be useful for time series federation
 with scrape intervals exceeding `5m`.


 ### Capacity planning

-Rough estimation of the required resources for ingestion path:
+A rough estimation of the required resources for ingestion path:

 * RAM size: less than 1KB per active time series. So, ~1GB of RAM is required for 1M active time series.
  Time series is considered active if new data points have been added to it recently or if it has been recently queried.
@@ -430,15 +480,15 @@ Rough estimation of the required resources for ingestion path:

 * Network usage: outbound traffic is negligible. Ingress traffic is ~100 bytes per ingested data point via
  [Prometheus remote_write API](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write).
-  The actual ingress bandwitdh usage depends on the average number of labels per ingested metric and the average size
-  of label values. Higher number of per-metric lables and longer label values mean higher ingress bandwidth.
+  The actual ingress bandwidth usage depends on the average number of labels per ingested metric and the average size
+  of label values. The higher number of per-metric labels and longer label values mean the higher ingress bandwidth.


 The required resources for query path:

 * RAM size: depends on the number of time series to scan in each query and the `step`
  argument passed to [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries).
-  Higher number of scanned time series and lower `step` argument results in higher RAM usage.
+  The higher number of scanned time series and lower `step` argument results in the higher RAM usage.

 * CPU cores: a CPU core per 30 millions of scanned data points per second.

@@ -494,7 +544,7 @@ There is no downsampling support at the moment, but:
 - VictoriaMetrics has good compression for on-disk data. See [this article](https://medium.com/@valyala/victoriametrics-achieving-better-compression-for-time-series-data-than-gorilla-317bc1f95932)
  for details.

-These properties reduce the need in downsampling. We plan implementing downsampling in the future.
+These properties reduce the need in downsampling. We plan to implement downsampling in the future.
 See [this issue](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/36) for details.


@@ -506,7 +556,7 @@ Single-node VictoriaMetrics doesn't support multi-tenancy. Use [cluster version]
 ### Scalability and cluster version

 Though single-node VictoriaMetrics cannot scale to multiple nodes, it is optimized for resource usage - storage size / bandwidth / IOPS, RAM, CPU.
-This means that a single-node VictoriaMetrics may scale vertically and substitute moderately sized cluster built with competing solutions
+This means that a single-node VictoriaMetrics may scale vertically and substitute a moderately sized cluster built with competing solutions
 such as Thanos, Uber M3, InfluxDB or TimescaleDB. See [vertical scalability benchmarks](https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-92550d78d8ae).

 So try single-node VictoriaMetrics at first and then [switch to cluster version](https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster) if you still need
@@ -522,7 +572,7 @@ on [Prometheus side](https://prometheus.io/docs/alerting/overview/) or on [Grafa

 ### Security

-Do not forget protecting sensitive endpoints in VictoriaMetrics when exposing it to untrusted networks such as internet.
+Do not forget protecting sensitive endpoints in VictoriaMetrics when exposing it to untrusted networks such as the internet.
 Consider setting the following command-line flags:

 * `-tls`, `-tlsCertFile` and `-tlsKeyFile` for switching from HTTP to HTTPS.
@@ -537,10 +587,10 @@ For example, substitute `-graphiteListenAddr=:2003` with `-graphiteListenAddr=<i

 ### Tuning

-* There is no need in VictoriaMetrics tuning, since it uses reasonable defaults for command-line flags,
+* There is no need in VictoriaMetrics tuning since it uses reasonable defaults for command-line flags,
  which are automatically adjusted for the available CPU and RAM resources.
-* There is no need in Operating System tuning, since VictoriaMetrics is optimized for default OS settings.
-  The only option is increasing the limit on [the number open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a),
+* There is no need in Operating System tuning since VictoriaMetrics is optimized for default OS settings.
+  The only option is increasing the limit on [the number of open files in the OS](https://medium.com/@muhammadtriwibowo/set-permanently-ulimit-n-open-files-in-ubuntu-4d61064429a),
  so Prometheus instances could establish more connections to VictoriaMetrics.


@@ -575,11 +625,26 @@ The most interesting metrics are:
  Another option is to increase `-memory.allowedPercent` command-line flag value. Be careful with this
  option, since too big value for `-memory.allowedPercent` may result in high I/O usage.

+* VictoriaMetrics requires free disk space for [merging data files to bigger ones](https://medium.com/@valyala/how-victoriametrics-makes-instant-snapshots-for-multi-terabyte-time-series-data-e1f3fb0e0282).
+  It may slow down when there is no enough free space left. So make sure `-storageDataPath` directory
+  has at least 20% of free space comparing to disk size.
+
 * If VictoriaMetrics doesn't work because of certain parts are corrupted due to disk errors,
  then just remove directoreis with broken parts. This will recover VictoriaMetrics at the cost
-  of data loss stored in the broken parts. In the future `vmrecover` tool will be created
+  of data loss stored in the broken parts. In the future, `vmrecover` tool will be created
  for automatic recovering from such errors.

+## Roadmap
+
+- [ ] Replication [#118](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/118)
+- [ ] Support of Object Storages (GCS, S3, Azure Storage) [#38](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/38)
+- [ ] Data downsampling [#36](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/36)
+- [ ] Alert Manager Integration [#119](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/119) 
+- [ ] CLI tool for data migration, re-balancing and adding/removing nodes [#103](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/103)
+
+
+The discussion happens [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129). Feel free to comment any item or add own one.
+

 ## Contacts

@@ -596,7 +661,7 @@ Feel free asking any questions regarding VictoriaMetrics:
 - [google groups](https://groups.google.com/forum/#!forum/victorametrics-users)


-If you like VictoriaMetrics and want contributing, then we need the following:
+If you like VictoriaMetrics and want to contribute, then we need the following:

 - Filing issues and feature requests [here](https://github.com/VictoriaMetrics/VictoriaMetrics/issues).
 - Spreading a word about VictoriaMetrics: conference talks, articles, comments, experience sharing with colleagues.
--- a/app/victoria-metrics/Makefile
+++ b/app/victoria-metrics/Makefile
@@ -21,7 +21,52 @@ run-victoria-metrics:
 	$(MAKE) run-via-docker

 victoria-metrics-arm:
-	CC=arm-linux-gnueabi-gcc CGO_ENABLED=1 GOARCH=arm GO111MODULE=on go build -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/victoria-metrics-arm ./app/victoria-metrics
+	CGO_ENABLED=0 GOOS=linux GOARCH=arm GO111MODULE=on go build -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/victoria-metrics-arm ./app/victoria-metrics
+
+victoria-metrics-arm-prod:
+	APP_NAME=victoria-metrics APP_SUFFIX='-arm' DOCKER_OPTS='--env CGO_ENABLED=0 --env GOARCH=arm' $(MAKE) app-via-docker

 victoria-metrics-arm64:
-	CC=aarch64-linux-gnu-gcc CGO_ENABLED=1 GOARCH=arm64 GO111MODULE=on go build -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/victoria-metrics-arm64 ./app/victoria-metrics
+	CGO_ENABLED=0 GOOS=linux GOARCH=arm64 GO111MODULE=on go build -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/victoria-metrics-arm64 ./app/victoria-metrics
+
+victoria-metrics-arm64-prod:
+	APP_NAME=victoria-metrics APP_SUFFIX='-arm64' DOCKER_OPTS='--env CGO_ENABLED=0 --env GOARCH=arm64' $(MAKE) app-via-docker
+
+victoria-metrics-pure:
+	GO111MODULE=on CGO_ENABLED=0 go build -mod=vendor -ldflags "$(GO_BUILDINFO)" -o bin/victoria-metrics-pure ./app/victoria-metrics
+
+victoria-metrics-pure-prod:
+	APP_NAME=victoria-metrics APP_SUFFIX='-pure' DOCKER_OPTS='--env CGO_ENABLED=0' $(MAKE) app-via-docker
+
+### Packaging as DEB - amd64
+victoria-metrics-package-deb: victoria-metrics-prod
+	./package/package_deb.sh amd64
+
+### Packaging as DEB - arm64
+victoria-metrics-package-deb-arm64: victoria-metrics-arm64-prod
+	./package/package_deb.sh arm64
+
+### Packaging as DEB - all
+victoria-metrics-package-deb-all: \
+        victoria-metrics-package-deb \
+        victoria-metrics-package-deb-arm64
+
+### Packaging as RPM - amd64
+victoria-metrics-package-rpm: victoria-metrics-prod
+	./package/package_rpm.sh amd64
+
+### Packaging as RPM - arm64
+victoria-metrics-package-rpm-arm64: victoria-metrics-arm64-prod
+	./package/package_rpm.sh arm64
+
+### Packaging as RPM - all
+victoria-metrics-package-rpm-all: \
+        victoria-metrics-package-rpm \
+        victoria-metrics-package-rpm-arm64
+
+### Packaging as both DEB and RPM - all
+victoria-metrics-package-deb-rpm-all: \
+        victoria-metrics-package-deb \
+        victoria-metrics-package-deb-arm64 \
+        victoria-metrics-package-rpm \
+        victoria-metrics-package-rpm-arm64
--- a/app/vminsert/concurrencylimiter/concurrencylimiter.go
+++ b/app/vminsert/concurrencylimiter/concurrencylimiter.go
@@ -32,6 +32,17 @@ func Init() {
 func Do(f func() error) error {
 	// Limit the number of conurrent f calls in order to prevent from excess
 	// memory usage and CPU trashing.
+	select {
+	case ch <- struct{}{}:
+		err := f()
+		<-ch
+		return err
+	default:
+	}
+
+	// All the workers are busy.
+	// Sleep for up to waitDuration.
+	concurrencyLimitReached.Inc()
 	t := timerpool.Get(waitDuration)
 	select {
 	case ch <- struct{}{}:
@@ -41,9 +52,19 @@ func Do(f func() error) error {
 		return err
 	case <-t.C:
 		timerpool.Put(t)
-		concurrencyLimitErrors.Inc()
+		concurrencyLimitTimeout.Inc()
 		return fmt.Errorf("the server is overloaded with %d concurrent inserts; either increase -maxConcurrentInserts or reduce the load", cap(ch))
 	}
 }

-var concurrencyLimitErrors = metrics.NewCounter(`vm_concurrency_limit_errors_total`)
+var (
+	concurrencyLimitReached = metrics.NewCounter(`vm_concurrent_insert_limit_reached_total`)
+	concurrencyLimitTimeout = metrics.NewCounter(`vm_concurrent_insert_limit_timeout_total`)
+
+	_ = metrics.NewGauge(`vm_concurrent_insert_capacity`, func() float64 {
+		return float64(cap(ch))
+	})
+	_ = metrics.NewGauge(`vm_concurrent_insert_current`, func() float64 {
+		return float64(len(ch))
+	})
+)
--- a/app/vminsert/graphite/parser.go
+++ b/app/vminsert/graphite/parser.go
@@ -114,6 +114,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag) ([]Row, []Tag, error) {
 			var err error
 			tagsPool, err = r.unmarshal(s, tagsPool)
 			if err != nil {
+				err = fmt.Errorf("cannot unmarshal Graphite line %q: %s", s, err)
 				return dst, tagsPool, err
 			}
 			return dst, tagsPool, nil
@@ -121,6 +122,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag) ([]Row, []Tag, error) {
 		var err error
 		tagsPool, err = r.unmarshal(s[:n], tagsPool)
 		if err != nil {
+			err = fmt.Errorf("cannot unmarshal Graphite line %q: %s", s[:n], err)
 			return dst, tagsPool, err
 		}
 		s = s[n+1:]
--- a/app/vminsert/graphite/request_handler.go
+++ b/app/vminsert/graphite/request_handler.go
@@ -14,7 +14,10 @@ import (
 	"github.com/VictoriaMetrics/metrics"
 )

-var rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="graphite"}`)
+var (
+	rowsInserted  = metrics.NewCounter(`vm_rows_inserted_total{type="graphite"}`)
+	rowsPerInsert = metrics.NewSummary(`vm_rows_per_insert{type="graphite"}`)
+)

 // insertHandler processes remote write for graphite plaintext protocol.
 //
@@ -51,6 +54,7 @@ func (ctx *pushCtx) InsertRows() error {
 		ic.WriteDataPoint(nil, ic.Labels, r.Timestamp, r.Value)
 	}
 	rowsInserted.Add(len(rows))
+	rowsPerInsert.Update(float64(len(rows)))
 	return ic.FlushBufs()
 }

--- a/app/vminsert/influx/parser.go
+++ b/app/vminsert/influx/parser.go
@@ -196,6 +196,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag, fieldsPool []Field) ([]R
 			var err error
 			tagsPool, fieldsPool, err = r.unmarshal(s, tagsPool, fieldsPool)
 			if err != nil {
+				err = fmt.Errorf("cannot unmarshal Influx line %q: %s", s, err)
 				return dst, tagsPool, fieldsPool, err
 			}
 			return dst, tagsPool, fieldsPool, nil
@@ -203,6 +204,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag, fieldsPool []Field) ([]R
 		var err error
 		tagsPool, fieldsPool, err = r.unmarshal(s[:n], tagsPool, fieldsPool)
 		if err != nil {
+			err = fmt.Errorf("cannot unmarshal Influx line %q: %s", s[:n], err)
 			return dst, tagsPool, fieldsPool, err
 		}
 		s = s[n+1:]
--- a/app/vminsert/influx/request_handler.go
+++ b/app/vminsert/influx/request_handler.go
@@ -22,7 +22,10 @@ var (
 	skipSingleField           = flag.Bool("influxSkipSingleField", false, "Uses `{measurement}` instead of `{measurement}{separator}{field_name}` for metic name if Influx line contains only a single field")
 )

-var rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="influx"}`)
+var (
+	rowsInserted  = metrics.NewCounter(`vm_rows_inserted_total{type="influx"}`)
+	rowsPerInsert = metrics.NewSummary(`vm_rows_per_insert{type="influx"}`)
+)

 // InsertHandler processes remote write for influx line protocol.
 //
@@ -84,6 +87,7 @@ func (ctx *pushCtx) InsertRows(db string) error {
 	}
 	ic := &ctx.Common
 	ic.Reset(rowsLen)
+	rowsTotal := 0
 	for i := range rows {
 		r := &rows[i]
 		ic.Labels = ic.Labels[:0]
@@ -109,8 +113,10 @@ func (ctx *pushCtx) InsertRows(db string) error {
 			ic.AddLabel("", metricGroup)
 			ic.WriteDataPoint(ctx.metricNameBuf, ic.Labels[:1], r.Timestamp, f.Value)
 		}
-		rowsInserted.Add(len(r.Fields))
+		rowsTotal += len(r.Fields)
 	}
+	rowsInserted.Add(rowsTotal)
+	rowsPerInsert.Update(float64(rowsTotal))
 	return ic.FlushBufs()
 }

@@ -164,6 +170,7 @@ func (ctx *pushCtx) Read(r io.Reader, tsMultiplier int64) bool {
 		}
 	} else if tsMultiplier < 0 {
 		tsMultiplier = -tsMultiplier
+		currentTs -= currentTs % tsMultiplier
 		for i := range ctx.Rows.Rows {
 			row := &ctx.Rows.Rows[i]
 			if row.Timestamp == 0 {
--- a/app/vminsert/opentsdb/parser.go
+++ b/app/vminsert/opentsdb/parser.go
@@ -111,6 +111,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag) ([]Row, []Tag, error) {
 			var err error
 			tagsPool, err = r.unmarshal(s, tagsPool)
 			if err != nil {
+				err = fmt.Errorf("cannot unmarshal OpenTSDB line %q: %s", s, err)
 				return dst, tagsPool, err
 			}
 			return dst, tagsPool, nil
@@ -118,6 +119,7 @@ func unmarshalRows(dst []Row, s string, tagsPool []Tag) ([]Row, []Tag, error) {
 		var err error
 		tagsPool, err = r.unmarshal(s[:n], tagsPool)
 		if err != nil {
+			err = fmt.Errorf("cannot unmarshal OpenTSDB line %q: %s", s[:n], err)
 			return dst, tagsPool, err
 		}
 		s = s[n+1:]
--- a/app/vminsert/opentsdb/request_handler.go
+++ b/app/vminsert/opentsdb/request_handler.go
@@ -14,7 +14,10 @@ import (
 	"github.com/VictoriaMetrics/metrics"
 )

-var rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="opentsdb"}`)
+var (
+	rowsInserted  = metrics.NewCounter(`vm_rows_inserted_total{type="opentsdb"}`)
+	rowsPerInsert = metrics.NewSummary(`vm_rows_per_insert{type="opentsdb"}`)
+)

 // insertHandler processes remote write for OpenTSDB put protocol.
 //
@@ -51,6 +54,7 @@ func (ctx *pushCtx) InsertRows() error {
 		ic.WriteDataPoint(nil, ic.Labels, r.Timestamp, r.Value)
 	}
 	rowsInserted.Add(len(rows))
+	rowsPerInsert.Update(float64(len(rows)))
 	return ic.FlushBufs()
 }

--- a/app/vminsert/prometheus/request_handler.go
+++ b/app/vminsert/prometheus/request_handler.go
@@ -12,7 +12,10 @@ import (
 	"github.com/VictoriaMetrics/metrics"
 )

-var rowsInserted = metrics.NewCounter(`vm_rows_inserted_total{type="prometheus"}`)
+var (
+	rowsInserted  = metrics.NewCounter(`vm_rows_inserted_total{type="prometheus"}`)
+	rowsPerInsert = metrics.NewSummary(`vm_rows_per_insert{type="prometheus"}`)
+)

 // InsertHandler processes remote write for prometheus.
 func InsertHandler(r *http.Request, maxSize int64) error {
@@ -34,6 +37,7 @@ func insertHandlerInternal(r *http.Request, maxSize int64) error {
 	}
 	ic := &ctx.Common
 	ic.Reset(rowsLen)
+	rowsTotal := 0
 	for i := range timeseries {
 		ts := &timeseries[i]
 		var metricNameRaw []byte
@@ -41,8 +45,10 @@ func insertHandlerInternal(r *http.Request, maxSize int64) error {
 			r := &ts.Samples[i]
 			metricNameRaw = ic.WriteDataPointExt(metricNameRaw, ts.Labels, r.Timestamp, r.Value)
 		}
-		rowsInserted.Add(len(ts.Samples))
+		rowsTotal += len(ts.Samples)
 	}
+	rowsInserted.Add(rowsTotal)
+	rowsPerInsert.Update(float64(rowsTotal))
 	return ic.FlushBufs()
 }

--- a/app/vmselect/main.go
+++ b/app/vmselect/main.go
@@ -30,29 +30,49 @@ func Init() {
 	fs.RemoveDirContents(tmpDirPath)
 	netstorage.InitTmpBlocksDir(tmpDirPath)
 	promql.InitRollupResultCache(*vmstorage.DataPath + "/cache/rollupResult")
+
 	concurrencyCh = make(chan struct{}, *maxConcurrentRequests)
 }

-var concurrencyCh chan struct{}
-
 // Stop stops vmselect
 func Stop() {
 	promql.StopRollupResultCache()
 }

+var concurrencyCh chan struct{}
+
+var (
+	concurrencyLimitReached = metrics.NewCounter(`vm_concurrent_select_limit_reached_total`)
+	concurrencyLimitTimeout = metrics.NewCounter(`vm_concurrent_select_limit_timeout_total`)
+
+	_ = metrics.NewGauge(`vm_concurrent_select_capacity`, func() float64 {
+		return float64(cap(concurrencyCh))
+	})
+	_ = metrics.NewGauge(`vm_concurrent_select_current`, func() float64 {
+		return float64(len(concurrencyCh))
+	})
+)
+
 // RequestHandler handles remote read API requests for Prometheus
 func RequestHandler(w http.ResponseWriter, r *http.Request) bool {
 	// Limit the number of concurrent queries.
-	// Sleep for a while until giving up. This should resolve short bursts in requests.
-	t := timerpool.Get(*maxQueueDuration)
 	select {
 	case concurrencyCh <- struct{}{}:
-		timerpool.Put(t)
 		defer func() { <-concurrencyCh }()
-	case <-t.C:
-		timerpool.Put(t)
-		httpserver.Errorf(w, "cannot handle more than %d concurrent requests", cap(concurrencyCh))
-		return true
+	default:
+		// Sleep for a while until giving up. This should resolve short bursts in requests.
+		concurrencyLimitReached.Inc()
+		t := timerpool.Get(*maxQueueDuration)
+		select {
+		case concurrencyCh <- struct{}{}:
+			timerpool.Put(t)
+			defer func() { <-concurrencyCh }()
+		case <-t.C:
+			timerpool.Put(t)
+			concurrencyLimitTimeout.Inc()
+			httpserver.Errorf(w, "cannot handle more than %d concurrent requests", cap(concurrencyCh))
+			return true
+		}
 	}

 	path := strings.Replace(r.URL.Path, "//", "/", -1)
--- a/app/vmselect/netstorage/netstorage.go
+++ b/app/vmselect/netstorage/netstorage.go
@@ -49,8 +49,9 @@ func (r *Result) reset() {

 // Results holds results returned from ProcessSearchQuery.
 type Results struct {
-	tr       storage.TimeRange
-	deadline Deadline
+	tr        storage.TimeRange
+	fetchData bool
+	deadline  Deadline

 	tbf *tmpBlocksFile

@@ -103,10 +104,10 @@ func (rss *Results) RunParallel(f func(rs *Result, workerID uint)) error {
 					err = fmt.Errorf("timeout exceeded during query execution: %s", rss.deadline.Timeout)
 					break
 				}
-				if err = pts.Unpack(rss.tbf, rs, rss.tr, maxWorkersCount); err != nil {
+				if err = pts.Unpack(rss.tbf, rs, rss.tr, rss.fetchData, maxWorkersCount); err != nil {
 					break
 				}
-				if len(rs.Timestamps) == 0 {
+				if len(rs.Timestamps) == 0 && rss.fetchData {
 					// Skip empty blocks.
 					continue
 				}
@@ -149,7 +150,7 @@ type packedTimeseries struct {
 }

 // Unpack unpacks pts to dst.
-func (pts *packedTimeseries) Unpack(tbf *tmpBlocksFile, dst *Result, tr storage.TimeRange, maxWorkersCount int) error {
+func (pts *packedTimeseries) Unpack(tbf *tmpBlocksFile, dst *Result, tr storage.TimeRange, fetchData bool, maxWorkersCount int) error {
 	dst.reset()

 	if err := dst.MetricName.Unmarshal(bytesutil.ToUnsafeBytes(pts.metricName)); err != nil {
@@ -176,7 +177,7 @@ func (pts *packedTimeseries) Unpack(tbf *tmpBlocksFile, dst *Result, tr storage.
 			var err error
 			for addr := range workCh {
 				sb := getSortBlock()
-				if err = sb.unpackFrom(tbf, addr, tr); err != nil {
+				if err = sb.unpackFrom(tbf, addr, tr, fetchData); err != nil {
 					break
 				}

@@ -295,10 +296,12 @@ func (sb *sortBlock) reset() {
 	sb.NextIdx = 0
 }

-func (sb *sortBlock) unpackFrom(tbf *tmpBlocksFile, addr tmpBlockAddr, tr storage.TimeRange) error {
+func (sb *sortBlock) unpackFrom(tbf *tmpBlocksFile, addr tmpBlockAddr, tr storage.TimeRange, fetchData bool) error {
 	tbf.MustReadBlockAt(&sb.b, addr)
-	if err := sb.b.UnmarshalData(); err != nil {
-		return fmt.Errorf("cannot unmarshal block: %s", err)
+	if fetchData {
+		if err := sb.b.UnmarshalData(); err != nil {
+			return fmt.Errorf("cannot unmarshal block: %s", err)
+		}
 	}
 	timestamps := sb.b.Timestamps()

@@ -460,7 +463,7 @@ var ssPool sync.Pool
 var missingMetricNamesForMetricID = metrics.NewCounter(`vm_missing_metric_names_for_metric_id_total`)

 // ProcessSearchQuery performs sq on storage nodes until the given deadline.
-func ProcessSearchQuery(sq *storage.SearchQuery, deadline Deadline) (*Results, error) {
+func ProcessSearchQuery(sq *storage.SearchQuery, fetchData bool, deadline Deadline) (*Results, error) {
 	// Setup search.
 	tfss, err := setupTfss(sq.TagFilterss)
 	if err != nil {
@@ -476,35 +479,38 @@ func ProcessSearchQuery(sq *storage.SearchQuery, deadline Deadline) (*Results, e

 	sr := getStorageSearch()
 	defer putStorageSearch(sr)
-	sr.Init(vmstorage.Storage, tfss, tr, *maxMetricsPerSearch)
+	sr.Init(vmstorage.Storage, tfss, tr, fetchData, *maxMetricsPerSearch)

 	tbf := getTmpBlocksFile()
 	m := make(map[string][]tmpBlockAddr)
+	blocksRead := 0
 	for sr.NextMetricBlock() {
+		blocksRead++
 		addr, err := tbf.WriteBlock(sr.MetricBlock.Block)
 		if err != nil {
 			putTmpBlocksFile(tbf)
-			return nil, fmt.Errorf("cannot write data to temporary blocks file: %s", err)
+			return nil, fmt.Errorf("cannot write data block #%d to temporary blocks file: %s", blocksRead, err)
 		}
 		if time.Until(deadline.Deadline) < 0 {
 			putTmpBlocksFile(tbf)
-			return nil, fmt.Errorf("timeout exceeded while fetching data from storage: %s", deadline.Timeout)
+			return nil, fmt.Errorf("timeout exceeded while fetching data block #%d from storage: %s", blocksRead, deadline.Timeout)
 		}
 		metricName := sr.MetricBlock.MetricName
 		m[string(metricName)] = append(m[string(metricName)], addr)
 	}
 	if err := sr.Error(); err != nil {
 		putTmpBlocksFile(tbf)
-		return nil, fmt.Errorf("search error: %s", err)
+		return nil, fmt.Errorf("search error after reading %d data blocks: %s", blocksRead, err)
 	}
 	if err := tbf.Finalize(); err != nil {
 		putTmpBlocksFile(tbf)
-		return nil, fmt.Errorf("cannot finalize temporary blocks file: %s", err)
+		return nil, fmt.Errorf("cannot finalize temporary blocks file with %d blocks: %s", blocksRead, err)
 	}

 	var rss Results
 	rss.packedTimeseries = make([]packedTimeseries, len(m))
 	rss.tr = tr
+	rss.fetchData = fetchData
 	rss.deadline = deadline
 	rss.tbf = tbf
 	i := 0
--- a/app/vmselect/prometheus/prometheus.go
+++ b/app/vmselect/prometheus/prometheus.go
@@ -6,12 +6,15 @@ import (
 	"math"
 	"net/http"
 	"runtime"
+	"sort"
 	"strconv"
 	"strings"
+	"sync"
 	"time"

 	"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/netstorage"
 	"github.com/VictoriaMetrics/VictoriaMetrics/app/vmselect/promql"
+	"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
 	"github.com/VictoriaMetrics/VictoriaMetrics/lib/storage"
 	"github.com/VictoriaMetrics/metrics"
 	"github.com/valyala/quicktemplate"
@@ -65,7 +68,7 @@ func FederateHandler(w http.ResponseWriter, r *http.Request) error {
 		MaxTimestamp: end,
 		TagFilterss:  tagFilterss,
 	}
-	rss, err := netstorage.ProcessSearchQuery(sq, deadline)
+	rss, err := netstorage.ProcessSearchQuery(sq, true, deadline)
 	if err != nil {
 		return fmt.Errorf("cannot fetch data for %q: %s", sq, err)
 	}
@@ -157,7 +160,7 @@ func exportHandler(w http.ResponseWriter, matches []string, start, end int64, fo
 		MaxTimestamp: end,
 		TagFilterss:  tagFilterss,
 	}
-	rss, err := netstorage.ProcessSearchQuery(sq, deadline)
+	rss, err := netstorage.ProcessSearchQuery(sq, true, deadline)
 	if err != nil {
 		return fmt.Errorf("cannot fetch data for %q: %s", sq, err)
 	}
@@ -230,9 +233,39 @@ var deleteDuration = metrics.NewSummary(`vm_request_duration_seconds{path="/api/
 func LabelValuesHandler(labelName string, w http.ResponseWriter, r *http.Request) error {
 	startTime := time.Now()
 	deadline := getDeadline(r)
-	labelValues, err := netstorage.GetLabelValues(labelName, deadline)
-	if err != nil {
-		return fmt.Errorf(`cannot obtain label values for %q: %s`, labelName, err)
+
+	if err := r.ParseForm(); err != nil {
+		return fmt.Errorf("cannot parse form values: %s", err)
+	}
+	var labelValues []string
+	if len(r.Form["match[]"]) == 0 && len(r.Form["start"]) == 0 && len(r.Form["end"]) == 0 {
+		var err error
+		labelValues, err = netstorage.GetLabelValues(labelName, deadline)
+		if err != nil {
+			return fmt.Errorf(`cannot obtain label values for %q: %s`, labelName, err)
+		}
+	} else {
+		// Extended functionality that allows filtering by label filters and time range
+		// i.e. /api/v1/label/foo/values?match[]=foobar{baz="abc"}&start=...&end=...
+		// is equivalent to `label_values(foobar{baz="abc"}, foo)` call on the selected
+		// time range in Grafana templating.
+		matches := r.Form["match[]"]
+		if len(matches) == 0 {
+			matches = []string{fmt.Sprintf("{%s!=''}", labelName)}
+		}
+		ct := currentTime()
+		end, err := getTime(r, "end", ct)
+		if err != nil {
+			return err
+		}
+		start, err := getTime(r, "start", end-defaultStep)
+		if err != nil {
+			return err
+		}
+		labelValues, err = labelValuesWithMatches(labelName, matches, start, end, deadline)
+		if err != nil {
+			return fmt.Errorf("cannot obtain label values for %q, match[]=%q, start=%d, end=%d: %s", labelName, matches, start, end, err)
+		}
 	}

 	w.Header().Set("Content-Type", "application/json")
@@ -241,6 +274,50 @@ func LabelValuesHandler(labelName string, w http.ResponseWriter, r *http.Request
 	return nil
 }

+func labelValuesWithMatches(labelName string, matches []string, start, end int64, deadline netstorage.Deadline) ([]string, error) {
+	if len(matches) == 0 {
+		logger.Panicf("BUG: matches must be non-empty")
+	}
+	tagFilterss, err := getTagFilterssFromMatches(matches)
+	if err != nil {
+		return nil, err
+	}
+	if start >= end {
+		start = end - defaultStep
+	}
+	sq := &storage.SearchQuery{
+		MinTimestamp: start,
+		MaxTimestamp: end,
+		TagFilterss:  tagFilterss,
+	}
+	rss, err := netstorage.ProcessSearchQuery(sq, false, deadline)
+	if err != nil {
+		return nil, fmt.Errorf("cannot fetch data for %q: %s", sq, err)
+	}
+
+	m := make(map[string]struct{})
+	var mLock sync.Mutex
+	err = rss.RunParallel(func(rs *netstorage.Result, workerID uint) {
+		labelValue := rs.MetricName.GetTagValue(labelName)
+		if len(labelValue) == 0 {
+			return
+		}
+		mLock.Lock()
+		m[string(labelValue)] = struct{}{}
+		mLock.Unlock()
+	})
+	if err != nil {
+		return nil, fmt.Errorf("error when data fetching: %s", err)
+	}
+
+	labelValues := make([]string, 0, len(m))
+	for labelValue := range m {
+		labelValues = append(labelValues, labelValue)
+	}
+	sort.Strings(labelValues)
+	return labelValues, nil
+}
+
 var labelValuesDuration = metrics.NewSummary(`vm_request_duration_seconds{path="/api/v1/label/{}/values"}`)

 // LabelsCountHandler processes /api/v1/labels/count request.
@@ -309,13 +386,16 @@ func SeriesHandler(w http.ResponseWriter, r *http.Request) error {
 	if len(matches) == 0 {
 		return fmt.Errorf("missing `match[]` arg")
 	}
-	// Set start to minTimeMsecs by default as Prometheus does.
-	// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/91
-	start, err := getTime(r, "start", minTimeMsecs)
+	end, err := getTime(r, "end", ct)
 	if err != nil {
 		return err
 	}
-	end, err := getTime(r, "end", ct)
+	// Do not set start to minTimeMsecs by default as Prometheus does,
+	// since this leads to fetching and scanning all the data from the storage,
+	// which can take a lot of time for big storages.
+	// It is better setting start as end-defaultStep by default.
+	// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/91
+	start, err := getTime(r, "start", end-defaultStep)
 	if err != nil {
 		return err
 	}
@@ -333,7 +413,7 @@ func SeriesHandler(w http.ResponseWriter, r *http.Request) error {
 		MaxTimestamp: end,
 		TagFilterss:  tagFilterss,
 	}
-	rss, err := netstorage.ProcessSearchQuery(sq, deadline)
+	rss, err := netstorage.ProcessSearchQuery(sq, false, deadline)
 	if err != nil {
 		return fmt.Errorf("cannot fetch data for %q: %s", sq, err)
 	}
--- a/app/vmselect/promql/aggr.go
+++ b/app/vmselect/promql/aggr.go
@@ -312,7 +312,11 @@ func aggrFuncCount(tss []*timeseries) []*timeseries {
 			}
 			count++
 		}
-		dst.Values[i] = float64(count)
+		v := float64(count)
+		if count == 0 {
+			v = nan
+		}
+		dst.Values[i] = v
 	}
 	return tss[:1]
 }
--- a/app/vmselect/promql/aggr_incremental.go
+++ b/app/vmselect/promql/aggr_incremental.go
@@ -348,7 +348,12 @@ func mergeAggrCount(dst, src *incrementalAggrContext) {
 }

 func finalizeAggrCount(iac *incrementalAggrContext) {
-	// Nothing to do
+	dstValues := iac.ts.Values
+	for i, v := range dstValues {
+		if v == 0 {
+			dstValues[i] = nan
+		}
+	}
 }

 func updateAggrSum2(iac *incrementalAggrContext, values []float64) {
--- a/app/vmselect/promql/aggr_incremental_test.go
+++ b/app/vmselect/promql/aggr_incremental_test.go
@@ -81,7 +81,7 @@ func TestIncrementalAggr(t *testing.T) {
 	})
 	t.Run("count", func(t *testing.T) {
 		t.Parallel()
-		valuesExpected := []float64{6, 0, 5, 5}
+		valuesExpected := []float64{6, nan, 5, 5}
 		f("count", valuesExpected)
 	})
 	t.Run("sum2", func(t *testing.T) {
--- a/app/vmselect/promql/eval.go
+++ b/app/vmselect/promql/eval.go
@@ -466,31 +466,19 @@ func evalRollupFuncWithSubquery(ec *EvalConfig, name string, rf rollupFunc, re *
 	preFunc, rcs := getRollupConfigs(name, rf, ec.Start, ec.End, ec.Step, window, sharedTimestamps)
 	tss := make([]*timeseries, 0, len(tssSQ)*len(rcs))
 	var tssLock sync.Mutex
+	removeMetricGroup := !rollupFuncsKeepMetricGroup[name]
 	doParallel(tssSQ, func(tsSQ *timeseries, values []float64, timestamps []int64) ([]float64, []int64) {
 		values, timestamps = removeNanValues(values[:0], timestamps[:0], tsSQ.Values, tsSQ.Timestamps)
 		preFunc(values, timestamps)
 		for _, rc := range rcs {
 			var ts timeseries
-			ts.MetricName.CopyFrom(&tsSQ.MetricName)
-			if len(rc.TagValue) > 0 {
-				ts.MetricName.AddTag("rollup", rc.TagValue)
-			}
-			ts.Values = rc.Do(ts.Values[:0], values, timestamps)
-			ts.Timestamps = sharedTimestamps
-			ts.denyReuse = true
-
+			doRollupForTimeseries(rc, &ts, &tsSQ.MetricName, values, timestamps, sharedTimestamps, removeMetricGroup)
 			tssLock.Lock()
 			tss = append(tss, &ts)
 			tssLock.Unlock()
 		}
 		return values, timestamps
 	})
-	if !rollupFuncsKeepMetricGroup[name] {
-		tss = copyTimeseriesMetricNames(tss)
-		for _, ts := range tss {
-			ts.MetricName.ResetMetricGroup()
-		}
-	}
 	return tss, nil
 }

@@ -582,7 +570,7 @@ func evalRollupFuncWithMetricExpr(ec *EvalConfig, name string, rf rollupFunc, me
 		MaxTimestamp: ec.End + ec.Step,
 		TagFilterss:  [][]storage.TagFilter{me.TagFilters},
 	}
-	rss, err := netstorage.ProcessSearchQuery(sq, ec.Deadline)
+	rss, err := netstorage.ProcessSearchQuery(sq, true, ec.Deadline)
 	if err != nil {
 		return nil, err
 	}
@@ -614,22 +602,16 @@ func evalRollupFuncWithMetricExpr(ec *EvalConfig, name string, rf rollupFunc, me
 	defer rml.Put(uint64(rollupMemorySize))

 	// Evaluate rollup
+	removeMetricGroup := !rollupFuncsKeepMetricGroup[name]
 	var tss []*timeseries
 	if iafc != nil {
-		tss, err = evalRollupWithIncrementalAggregate(iafc, rss, rcs, preFunc, sharedTimestamps)
+		tss, err = evalRollupWithIncrementalAggregate(iafc, rss, rcs, preFunc, sharedTimestamps, removeMetricGroup)
 	} else {
-		tss, err = evalRollupNoIncrementalAggregate(rss, rcs, preFunc, sharedTimestamps)
+		tss, err = evalRollupNoIncrementalAggregate(rss, rcs, preFunc, sharedTimestamps, removeMetricGroup)
 	}
 	if err != nil {
 		return nil, err
 	}
-
-	if !rollupFuncsKeepMetricGroup[name] {
-		tss = copyTimeseriesMetricNames(tss)
-		for _, ts := range tss {
-			ts.MetricName.ResetMetricGroup()
-		}
-	}
 	tss = mergeTimeseries(tssCached, tss, start, ec)
 	rollupResultCacheV.Put(name, ec, me, iafc, window, tss)

@@ -649,21 +631,19 @@ func getRollupMemoryLimiter() *memoryLimiter {
 }

 func evalRollupWithIncrementalAggregate(iafc *incrementalAggrFuncContext, rss *netstorage.Results, rcs []*rollupConfig,
-	preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64) ([]*timeseries, error) {
+	preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64, removeMetricGroup bool) ([]*timeseries, error) {
 	err := rss.RunParallel(func(rs *netstorage.Result, workerID uint) {
 		preFunc(rs.Values, rs.Timestamps)
 		ts := getTimeseries()
 		defer putTimeseries(ts)
 		for _, rc := range rcs {
 			ts.Reset()
-			ts.MetricName.CopyFrom(&rs.MetricName)
-			if len(rc.TagValue) > 0 {
-				ts.MetricName.AddTag("rollup", rc.TagValue)
-			}
-			ts.Values = rc.Do(ts.Values[:0], rs.Values, rs.Timestamps)
-			ts.Timestamps = sharedTimestamps
+			doRollupForTimeseries(rc, ts, &rs.MetricName, rs.Values, rs.Timestamps, sharedTimestamps, removeMetricGroup)
 			iafc.updateTimeseries(ts, workerID)
+
+			// ts.Timestamps points to sharedTimestamps. Zero it, so it can be re-used.
 			ts.Timestamps = nil
+			ts.denyReuse = false
 		}
 	})
 	if err != nil {
@@ -674,21 +654,14 @@ func evalRollupWithIncrementalAggregate(iafc *incrementalAggrFuncContext, rss *n
 }

 func evalRollupNoIncrementalAggregate(rss *netstorage.Results, rcs []*rollupConfig,
-	preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64) ([]*timeseries, error) {
+	preFunc func(values []float64, timestamps []int64), sharedTimestamps []int64, removeMetricGroup bool) ([]*timeseries, error) {
 	tss := make([]*timeseries, 0, rss.Len()*len(rcs))
 	var tssLock sync.Mutex
 	err := rss.RunParallel(func(rs *netstorage.Result, workerID uint) {
 		preFunc(rs.Values, rs.Timestamps)
 		for _, rc := range rcs {
 			var ts timeseries
-			ts.MetricName.CopyFrom(&rs.MetricName)
-			if len(rc.TagValue) > 0 {
-				ts.MetricName.AddTag("rollup", rc.TagValue)
-			}
-			ts.Values = rc.Do(ts.Values[:0], rs.Values, rs.Timestamps)
-			ts.Timestamps = sharedTimestamps
-			ts.denyReuse = true
-
+			doRollupForTimeseries(rc, &ts, &rs.MetricName, rs.Values, rs.Timestamps, sharedTimestamps, removeMetricGroup)
 			tssLock.Lock()
 			tss = append(tss, &ts)
 			tssLock.Unlock()
@@ -700,6 +673,20 @@ func evalRollupNoIncrementalAggregate(rss *netstorage.Results, rcs []*rollupConf
 	return tss, nil
 }

+func doRollupForTimeseries(rc *rollupConfig, tsDst *timeseries, mnSrc *storage.MetricName, valuesSrc []float64, timestampsSrc []int64,
+	sharedTimestamps []int64, removeMetricGroup bool) {
+	tsDst.MetricName.CopyFrom(mnSrc)
+	if len(rc.TagValue) > 0 {
+		tsDst.MetricName.AddTag("rollup", rc.TagValue)
+	}
+	if removeMetricGroup {
+		tsDst.MetricName.ResetMetricGroup()
+	}
+	tsDst.Values = rc.Do(tsDst.Values[:0], valuesSrc, timestampsSrc)
+	tsDst.Timestamps = sharedTimestamps
+	tsDst.denyReuse = true
+}
+
 func getRollupConfigs(name string, rf rollupFunc, start, end, step, window int64, sharedTimestamps []int64) (func(values []float64, timestamps []int64), []*rollupConfig) {
 	preFunc := func(values []float64, timestamps []int64) {}
 	if rollupFuncsRemoveCounterResets[name] {
--- a/app/vmselect/promql/exec.go
+++ b/app/vmselect/promql/exec.go
@@ -16,6 +16,8 @@ import (

 var logSlowQueryDuration = flag.Duration("search.logSlowQueryDuration", 5*time.Second, "Log queries with execution time exceeding this value. Zero disables slow query logging")

+var slowQueries = metrics.NewCounter(`vm_slow_queries_total`)
+
 // ExpandWithExprs expands WITH expressions inside q and returns the resulting
 // PromQL without WITH expressions.
 func ExpandWithExprs(q string) (string, error) {
@@ -36,6 +38,7 @@ func Exec(ec *EvalConfig, q string, isFirstPointOnly bool) ([]netstorage.Result,
 			if d >= *logSlowQueryDuration {
 				logger.Infof("slow query according to -search.logSlowQueryDuration=%s: duration=%s, start=%d, end=%d, step=%d, query=%q",
 					*logSlowQueryDuration, d, ec.Start/1000, ec.End/1000, ec.Step/1000, q)
+				slowQueries.Inc()
 			}
 		}()
 	}
--- a/app/vmselect/promql/exec_test.go
+++ b/app/vmselect/promql/exec_test.go
@@ -2158,6 +2158,26 @@ func TestExecSuccess(t *testing.T) {
 		resultExpected := []netstorage.Result{r1, r2}
 		f(q, resultExpected)
 	})
+	t.Run(`histogram_quantile(negative-bucket-count)`, func(t *testing.T) {
+		t.Parallel()
+		q := `sort(histogram_quantile(0.6,
+			label_set(90, "foo", "bar", "le", "10")
+			or label_set(-100, "foo", "bar", "le", "30")
+			or label_set(300, "foo", "bar", "le", "+Inf")
+		))`
+		resultExpected := []netstorage.Result{}
+		f(q, resultExpected)
+	})
+	t.Run(`histogram_quantile(nan-bucket-count)`, func(t *testing.T) {
+		t.Parallel()
+		q := `sort(histogram_quantile(0.6,
+			label_set(90, "foo", "bar", "le", "10")
+			or label_set(NaN, "foo", "bar", "le", "30")
+			or label_set(300, "foo", "bar", "le", "+Inf")
+		))`
+		resultExpected := []netstorage.Result{}
+		f(q, resultExpected)
+	})
 	t.Run(`median_over_time()`, func(t *testing.T) {
 		t.Parallel()
 		q := `median_over_time({})`
@@ -2343,10 +2363,10 @@ func TestExecSuccess(t *testing.T) {
 	})
 	t.Run(`count(multi-vector)`, func(t *testing.T) {
 		t.Parallel()
-		q := `count(label_set(10, "foo", "bar") or label_set((15-time()/100)^0.5, "baz", "sss"))`
+		q := `count(label_set(time()<1500, "foo", "bar") or label_set(time()<1800, "baz", "sss"))`
 		r := netstorage.Result{
 			MetricName: metricNameExpected,
-			Values:     []float64{2, 2, 2, 1, 1, 1},
+			Values:     []float64{2, 2, 2, 1, nan, nan},
 			Timestamps: timestampsExpected,
 		}
 		resultExpected := []netstorage.Result{r}
--- a/app/vmselect/promql/rollup.go
+++ b/app/vmselect/promql/rollup.go
@@ -54,10 +54,13 @@ var rollupFuncs = map[string]newRollupFunc{
 }

 var rollupFuncsMayAdjustWindow = map[string]bool{
-	"deriv":      true,
-	"deriv_fast": true,
-	"irate":      true,
-	"rate":       true,
+	"default_rollup":  true,
+	"first_over_time": true,
+	"last_over_time":  true,
+	"deriv":           true,
+	"deriv_fast":      true,
+	"irate":           true,
+	"rate":            true,
 }

 var rollupFuncsRemoveCounterResets = map[string]bool{
@@ -228,10 +231,27 @@ func getMaxPrevInterval(timestamps []int64) int64 {
 	}
 	d := (timestamps[len(timestamps)-1] - timestamps[0]) / int64(len(timestamps)-1)
 	if d <= 0 {
-		return 1
+		return int64(maxSilenceInterval)
 	}
-	// Slightly increase d in order to handle possible jitter in scrape interval.
-	return d + (d / 16)
+	// Increase d more for smaller scrape intervals in order to hide possible gaps
+	// when high jitter is present.
+	// See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/139 .
+	if d <= 2*1000 {
+		return d + 4*d
+	}
+	if d <= 4*1000 {
+		return d + 2*d
+	}
+	if d <= 8*1000 {
+		return d + d
+	}
+	if d <= 16*1000 {
+		return d + d/2
+	}
+	if d <= 32*1000 {
+		return d + d/4
+	}
+	return d + d/8
 }

 func removeCounterResets(values []float64) {
--- a/app/vmselect/promql/rollup_test.go
+++ b/app/vmselect/promql/rollup_test.go
@@ -347,7 +347,7 @@ func TestRollupNoWindowNoPoints(t *testing.T) {
 		}
 		rc.Timestamps = getTimestamps(rc.Start, rc.End, rc.Step)
 		values := rc.Do(nil, testValues, testTimestamps)
-		valuesExpected := []float64{2, 0, 0, 0, 0, 0, 0, nan}
+		valuesExpected := []float64{2, 0, 0, 0, 0, 0, 0, 0}
 		timestampsExpected := []int64{120, 124, 128, 132, 136, 140, 144, 148}
 		testRowsEqual(t, values, rc.Timestamps, valuesExpected, timestampsExpected)
 	})
@@ -371,15 +371,15 @@ func TestRollupWindowNoPoints(t *testing.T) {
 	t.Run("afterEnd", func(t *testing.T) {
 		rc := rollupConfig{
 			Func:   rollupFirst,
-			Start:  141,
-			End:    171,
+			Start:  161,
+			End:    191,
 			Step:   10,
 			Window: 3,
 		}
 		rc.Timestamps = getTimestamps(rc.Start, rc.End, rc.Step)
 		values := rc.Do(nil, testValues, testTimestamps)
-		valuesExpected := []float64{34, nan, nan, nan}
-		timestampsExpected := []int64{141, 151, 161, 171}
+		valuesExpected := []float64{34, 34, 34, nan}
+		timestampsExpected := []int64{161, 171, 181, 191}
 		testRowsEqual(t, values, rc.Timestamps, valuesExpected, timestampsExpected)
 	})
 }
@@ -454,7 +454,7 @@ func TestRollupWindowPartialPoints(t *testing.T) {
 		}
 		rc.Timestamps = getTimestamps(rc.Start, rc.End, rc.Step)
 		values := rc.Do(nil, testValues, testTimestamps)
-		valuesExpected := []float64{44, 34, 34, nan}
+		valuesExpected := []float64{44, 34, 34, 34}
 		timestampsExpected := []int64{100, 120, 140, 160}
 		testRowsEqual(t, values, rc.Timestamps, valuesExpected, timestampsExpected)
 	})
--- a/app/vmselect/promql/transform.go
+++ b/app/vmselect/promql/transform.go
@@ -324,6 +324,14 @@ func transformHistogramQuantile(tfa *transformFuncArg) ([]*timeseries, error) {
 		if math.IsNaN(phi) {
 			return nan
 		}
+		// Verify for broken buckets with NaN or negative values.
+		for _, xs := range xss {
+			v := xs.ts.Values[i]
+			if math.IsNaN(v) || v < 0 {
+				// Broken bucket.
+				return nan
+			}
+		}
 		if phi < 0 {
 			return -inf
 		}
@@ -334,10 +342,6 @@ func transformHistogramQuantile(tfa *transformFuncArg) ([]*timeseries, error) {
 		for _, xs := range xss {
 			v := xs.ts.Values[i]
 			le := xs.le
-			if v <= vPrev {
-				v = vPrev
-				le = lePrev
-			}
 			if v < vReq {
 				vPrev = v
 				lePrev = le
--- a/app/vmstorage/main.go
+++ b/app/vmstorage/main.go
@@ -358,6 +358,29 @@ func registerStorageMetrics() {
 		return float64(idbm().SizeBytes)
 	})

+	metrics.NewGauge(`vm_rows_ignored_total{reason="big_timestamp"}`, func() float64 {
+		return float64(m().TooBigTimestampRows)
+	})
+	metrics.NewGauge(`vm_rows_ignored_total{reason="small_timestamp"}`, func() float64 {
+		return float64(m().TooSmallTimestampRows)
+	})
+
+	metrics.NewGauge(`vm_concurrent_addrows_limit_reached_total`, func() float64 {
+		return float64(m().AddRowsConcurrencyLimitReached)
+	})
+	metrics.NewGauge(`vm_concurrent_addrows_limit_timeout_total`, func() float64 {
+		return float64(m().AddRowsConcurrencyLimitTimeout)
+	})
+	metrics.NewGauge(`vm_concurrent_addrows_dropped_rows_total`, func() float64 {
+		return float64(m().AddRowsConcurrencyDroppedRows)
+	})
+	metrics.NewGauge(`vm_concurrent_addrows_capacity`, func() float64 {
+		return float64(m().AddRowsConcurrencyCapacity)
+	})
+	metrics.NewGauge(`vm_concurrent_addrows_current`, func() float64 {
+		return float64(m().AddRowsConcurrencyCurrent)
+	})
+
 	metrics.NewGauge(`vm_rows{type="storage/big"}`, func() float64 {
 		return float64(tm().BigRowsCount)
 	})
--- a/dashboards/victoriametrics.json
+++ b/dashboards/victoriametrics.json
@@ -60,12 +60,12 @@
      }
    ]
  },
-  "description": "Overview for single node VictoriaMetrics",
+  "description": "Overview for single node VictoriaMetrics v1.22.2 or higher",
  "editable": true,
  "gnetId": 10229,
  "graphTooltip": 0,
  "id": null,
-  "iteration": 1562261153865,
+  "iteration": 1563651131627,
  "links": [
    {
      "icon": "doc",
@@ -1693,7 +1693,7 @@
      },
      "id": 46,
      "panels": [],
-      "title": "Go runtime",
+      "title": "Resource usage",
      "type": "row"
    },
    {
@@ -1805,7 +1805,6 @@
      "bars": false,
      "dashLength": 10,
      "dashes": false,
-      "datasource": "${DS_PROMETHEUS}",
      "fill": 1,
      "gridPos": {
        "h": 8,
@@ -1813,13 +1812,13 @@
        "x": 12,
        "y": 69
      },
-      "id": 42,
+      "id": 57,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
-        "show": false,
+        "show": true,
        "total": false,
        "values": false
      },
@@ -1838,10 +1837,10 @@
      "steppedLine": false,
      "targets": [
        {
-          "expr": "sum(rate(go_gc_duration_seconds_sum{job=\"$job\"}[2m]))",
+          "expr": "rate(process_cpu_seconds_total{job=\"$job\"}[5m])",
          "format": "time_series",
-          "intervalFactor": 2,
-          "legendFormat": "gc duration",
+          "intervalFactor": 1,
+          "legendFormat": "Rate 5m",
          "refId": "A"
        }
      ],
@@ -1849,7 +1848,7 @@
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
-      "title": "GC duration",
+      "title": "CPU",
      "tooltip": {
        "shared": true,
        "sort": 0,
@@ -1865,7 +1864,7 @@
      },
      "yaxes": [
        {
-          "format": "s",
+          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
@@ -1986,6 +1985,92 @@
        "x": 12,
        "y": 77
      },
+      "id": 42,
+      "legend": {
+        "avg": false,
+        "current": false,
+        "max": false,
+        "min": false,
+        "show": false,
+        "total": false,
+        "values": false
+      },
+      "lines": true,
+      "linewidth": 1,
+      "links": [],
+      "nullPointMode": "null",
+      "options": {},
+      "percentage": false,
+      "pointradius": 2,
+      "points": false,
+      "renderer": "flot",
+      "seriesOverrides": [],
+      "spaceLength": 10,
+      "stack": false,
+      "steppedLine": false,
+      "targets": [
+        {
+          "expr": "sum(rate(go_gc_duration_seconds_sum{job=\"$job\"}[2m]))",
+          "format": "time_series",
+          "intervalFactor": 2,
+          "legendFormat": "gc duration",
+          "refId": "A"
+        }
+      ],
+      "thresholds": [],
+      "timeFrom": null,
+      "timeRegions": [],
+      "timeShift": null,
+      "title": "GC duration",
+      "tooltip": {
+        "shared": true,
+        "sort": 0,
+        "value_type": "individual"
+      },
+      "type": "graph",
+      "xaxis": {
+        "buckets": null,
+        "mode": "time",
+        "name": null,
+        "show": true,
+        "values": []
+      },
+      "yaxes": [
+        {
+          "format": "s",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        },
+        {
+          "format": "short",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        }
+      ],
+      "yaxis": {
+        "align": false,
+        "alignLevel": null
+      }
+    },
+    {
+      "aliasColors": {},
+      "bars": false,
+      "dashLength": 10,
+      "dashes": false,
+      "datasource": "${DS_PROMETHEUS}",
+      "fill": 1,
+      "gridPos": {
+        "h": 8,
+        "w": 12,
+        "x": 0,
+        "y": 85
+      },
      "id": 48,
      "legend": {
        "avg": false,
@@ -2011,7 +2096,7 @@
      "steppedLine": false,
      "targets": [
        {
-          "expr": "sum(go_threads{job=\"$job\"})",
+          "expr": "sum(process_num_threads{job=\"$job\"})",
          "format": "time_series",
          "intervalFactor": 2,
          "legendFormat": "threads",
@@ -2145,5 +2230,5 @@
  "timezone": "",
  "title": "VictoriaMetrics",
  "uid": "wNf0q_kZk",
-  "version": 1
+  "version": 2
 }
--- a/deployment/docker/Makefile
+++ b/deployment/docker/Makefile
@@ -1,5 +1,5 @@
 DOCKER_NAMESPACE := victoriametrics
-BUILDER_IMAGE := local/builder:go1.12.6
+BUILDER_IMAGE := local/builder:go1.12.7
 CERTS_IMAGE := local/certs:1.0.2

 package-certs:
@@ -19,8 +19,9 @@ app-via-docker: package-certs package-builder
 		--mount type=bind,src="$(shell pwd)/gocache-for-docker",dst=/gocache \
 		--env GOCACHE=/gocache \
 		--env GO111MODULE=on \
+		$(DOCKER_OPTS) \
 		$(BUILDER_IMAGE) \
-		go build $(RACE) -mod=vendor -ldflags "-s -w -extldflags '-static' $(GO_BUILDINFO)" -tags 'netgo osusergo' -o bin/$(APP_NAME)-prod $(PKG_PREFIX)/app/$(APP_NAME)
+		go build $(RACE) -mod=vendor -ldflags "-s -w -extldflags '-static' $(GO_BUILDINFO)" -tags 'netgo osusergo' -o bin/$(APP_NAME)$(APP_SUFFIX)-prod $(PKG_PREFIX)/app/$(APP_NAME)

 package-via-docker:
 	(docker image ls --format '{{.Repository}}:{{.Tag}}' | grep -q '$(DOCKER_NAMESPACE)/$(APP_NAME):$(PKG_TAG)$(RACE)') || (\
--- a/deployment/docker/builder/Dockerfile
+++ b/deployment/docker/builder/Dockerfile
@@ -1,2 +1,2 @@
-FROM golang:1.12.6
+FROM golang:1.12.7
 STOPSIGNAL SIGINT
--- a/go.mod
+++ b/go.mod
@@ -2,15 +2,17 @@ module github.com/VictoriaMetrics/VictoriaMetrics

 require (
 	github.com/VictoriaMetrics/fastcache v1.5.1
-	github.com/VictoriaMetrics/metrics v1.7.0
+	github.com/VictoriaMetrics/metrics v1.7.1
 	github.com/cespare/xxhash/v2 v2.0.1-0.20190104013014-3767db7a7e18
 	github.com/golang/snappy v0.0.1
+	github.com/google/go-cmp v0.3.0 // indirect
+	github.com/klauspost/compress v1.7.5
 	github.com/spaolacci/murmur3 v1.1.0 // indirect
 	github.com/valyala/fastjson v1.4.1
 	github.com/valyala/gozstd v1.5.1
 	github.com/valyala/histogram v1.0.1
 	github.com/valyala/quicktemplate v1.1.1
-	golang.org/x/sys v0.0.0-20190616124812-15dcb6c0061f
+	golang.org/x/sys v0.0.0-20190804053845-51ab0e2deafa
 )

 go 1.12
--- a/go.sum
+++ b/go.sum
@@ -3,8 +3,8 @@ github.com/OneOfOne/xxhash v1.2.5 h1:zl/OfRA6nftbBK9qTohYBJ5xvw6C/oNKizR7cZGl3cI
 github.com/OneOfOne/xxhash v1.2.5/go.mod h1:eZbhyaAYD41SGSSsnmcpxVoRiQ/MPUTjUdIIOT9Um7Q=
 github.com/VictoriaMetrics/fastcache v1.5.1 h1:qHgHjyoNFV7jgucU8QZUuU4gcdhfs8QW1kw68OD2Lag=
 github.com/VictoriaMetrics/fastcache v1.5.1/go.mod h1:+jv9Ckb+za/P1ZRg/sulP5Ni1v49daAVERr0H3CuscE=
-github.com/VictoriaMetrics/metrics v1.7.0 h1:+bdBpPEMOSgOwoQFf4KHqgeAy6xiXn/uzlrKx2YSCT8=
-github.com/VictoriaMetrics/metrics v1.7.0/go.mod h1:LU2j9qq7xqZYXz8tF3/RQnB2z2MbZms5TDiIg9/NHiQ=
+github.com/VictoriaMetrics/metrics v1.7.1 h1:g2qrY6Upn8rvlvR40cGHFY0crwi4hpqF0n9vJMNsCSg=
+github.com/VictoriaMetrics/metrics v1.7.1/go.mod h1:LU2j9qq7xqZYXz8tF3/RQnB2z2MbZms5TDiIg9/NHiQ=
 github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156 h1:eMwmnE/GDgah4HI848JfFxHt+iPb26b4zyfspmqY0/8=
 github.com/allegro/bigcache v1.2.1-0.20190218064605-e24eb225f156/go.mod h1:Cb/ax3seSYIx7SuZdm2G2xzfwmv3TPSk2ucNfQESPXM=
 github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko=
@@ -16,9 +16,14 @@ github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c
 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/golang/snappy v0.0.1 h1:Qgr9rKW7uDUkrbSmQeiDsGa8SjGyCOGtuasMWwvp2P4=
 github.com/golang/snappy v0.0.1/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
+github.com/google/go-cmp v0.3.0 h1:crn/baboCvb5fXaQ0IJ1SGTsTVrWpDsCWC8EGETZijY=
+github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
 github.com/klauspost/compress v1.4.0/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
 github.com/klauspost/compress v1.4.1/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
+github.com/klauspost/compress v1.7.5 h1:NMapGoDIKPKpk2hpcgAU6XHfsREHG2p8PIg7C3f/jpI=
+github.com/klauspost/compress v1.7.5/go.mod h1:RyIbtBH6LamlWaDj8nUwkbUhJ87Yi3uG0guNDohfE1A=
 github.com/klauspost/cpuid v0.0.0-20180405133222-e7e905edc00e/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
+github.com/klauspost/cpuid v1.2.0 h1:NMpwD2G9JSFOE1/TJjGSo5zG7Yb2bTe7eq1jH+irmeE=
 github.com/klauspost/cpuid v1.2.0/go.mod h1:Pj4uuM528wm8OyEC2QMXAi2YiTZ96dNQPGgoMS4s3ek=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
@@ -44,5 +49,5 @@ github.com/valyala/quicktemplate v1.1.1 h1:C58y/wN0FMTi2PR0n3onltemfFabany53j7M6
 github.com/valyala/quicktemplate v1.1.1/go.mod h1:EH+4AkTd43SvgIbQHYu59/cJyxDoOVRUAfrukLPuGJ4=
 github.com/valyala/tcplisten v0.0.0-20161114210144-ceec8f93295a/go.mod h1:v3UYOV9WzVtRmSR+PDvWpU/qWl4Wa5LApYYX4ZtKbio=
 golang.org/x/net v0.0.0-20180911220305-26e67e76b6c3/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
-golang.org/x/sys v0.0.0-20190616124812-15dcb6c0061f h1:25KHgbfyiSm6vwQLbM3zZIe1v9p/3ea4Rz+nnM5K/i4=
-golang.org/x/sys v0.0.0-20190616124812-15dcb6c0061f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
+golang.org/x/sys v0.0.0-20190804053845-51ab0e2deafa h1:KIDDMLT1O0Nr7TSxp8xM5tJcdn8tgyAONntO829og1M=
+golang.org/x/sys v0.0.0-20190804053845-51ab0e2deafa/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
--- a/lib/decimal/decimal_test.go
+++ b/lib/decimal/decimal_test.go
@@ -108,60 +108,60 @@ func testCalibrateScale(t *testing.T, a, b []int64, ae, be int16, aExpected, bEx
 }

 func TestMaxUpExponent(t *testing.T) {
-	testMaxUpExponent(t, 0, 1024)
-	testMaxUpExponent(t, -1<<63, 0)
-	testMaxUpExponent(t, (-1<<63)+1, 0)
-	testMaxUpExponent(t, (1<<63)-1, 0)
-	testMaxUpExponent(t, 1, 18)
-	testMaxUpExponent(t, 12, 17)
-	testMaxUpExponent(t, 123, 16)
-	testMaxUpExponent(t, 1234, 15)
-	testMaxUpExponent(t, 12345, 14)
-	testMaxUpExponent(t, 123456, 13)
-	testMaxUpExponent(t, 1234567, 12)
-	testMaxUpExponent(t, 12345678, 11)
-	testMaxUpExponent(t, 123456789, 10)
-	testMaxUpExponent(t, 1234567890, 9)
-	testMaxUpExponent(t, 12345678901, 8)
-	testMaxUpExponent(t, 123456789012, 7)
-	testMaxUpExponent(t, 1234567890123, 6)
-	testMaxUpExponent(t, 12345678901234, 5)
-	testMaxUpExponent(t, 123456789012345, 4)
-	testMaxUpExponent(t, 1234567890123456, 3)
-	testMaxUpExponent(t, 12345678901234567, 2)
-	testMaxUpExponent(t, 123456789012345678, 1)
-	testMaxUpExponent(t, 1234567890123456789, 0)
-	testMaxUpExponent(t, 923456789012345678, 0)
-	testMaxUpExponent(t, 92345678901234567, 1)
-	testMaxUpExponent(t, 9234567890123456, 2)
-	testMaxUpExponent(t, 923456789012345, 3)
-	testMaxUpExponent(t, 92345678901234, 4)
-	testMaxUpExponent(t, 9234567890123, 5)
-	testMaxUpExponent(t, 923456789012, 6)
-	testMaxUpExponent(t, 92345678901, 7)
-	testMaxUpExponent(t, 9234567890, 8)
-	testMaxUpExponent(t, 923456789, 9)
-	testMaxUpExponent(t, 92345678, 10)
-	testMaxUpExponent(t, 9234567, 11)
-	testMaxUpExponent(t, 923456, 12)
-	testMaxUpExponent(t, 92345, 13)
-	testMaxUpExponent(t, 9234, 14)
-	testMaxUpExponent(t, 923, 15)
-	testMaxUpExponent(t, 92, 17)
-	testMaxUpExponent(t, 9, 18)
-}
+	f := func(v int64, eExpected int16) {
+		t.Helper()

-func testMaxUpExponent(t *testing.T, v int64, eExpected int16) {
-	t.Helper()
+		e := maxUpExponent(v)
+		if e != eExpected {
+			t.Fatalf("unexpected e for v=%d; got %d; epxecting %d", v, e, eExpected)
+		}
+		e = maxUpExponent(-v)
+		if e != eExpected {
+			t.Fatalf("unexpected e for v=%d; got %d; expecting %d", -v, e, eExpected)
+		}
+	}

-	e := maxUpExponent(v)
-	if e != eExpected {
-		t.Fatalf("unexpected e for v=%d; got %d; epxecting %d", v, e, eExpected)
-	}
-	e = maxUpExponent(-v)
-	if e != eExpected {
-		t.Fatalf("unexpected e for v=%d; got %d; expecting %d", -v, e, eExpected)
-	}
+	f(0, 1024)
+	f(-1<<63, 0)
+	f((-1<<63)+1, 0)
+	f((1<<63)-1, 0)
+	f(1, 18)
+	f(12, 17)
+	f(123, 16)
+	f(1234, 15)
+	f(12345, 14)
+	f(123456, 13)
+	f(1234567, 12)
+	f(12345678, 11)
+	f(123456789, 10)
+	f(1234567890, 9)
+	f(12345678901, 8)
+	f(123456789012, 7)
+	f(1234567890123, 6)
+	f(12345678901234, 5)
+	f(123456789012345, 4)
+	f(1234567890123456, 3)
+	f(12345678901234567, 2)
+	f(123456789012345678, 1)
+	f(1234567890123456789, 0)
+	f(923456789012345678, 0)
+	f(92345678901234567, 1)
+	f(9234567890123456, 2)
+	f(923456789012345, 3)
+	f(92345678901234, 4)
+	f(9234567890123, 5)
+	f(923456789012, 6)
+	f(92345678901, 7)
+	f(9234567890, 8)
+	f(923456789, 9)
+	f(92345678, 10)
+	f(9234567, 11)
+	f(923456, 12)
+	f(92345, 13)
+	f(9234, 14)
+	f(923, 15)
+	f(92, 17)
+	f(9, 18)
 }

 func TestAppendFloatToDecimal(t *testing.T) {
@@ -207,83 +207,103 @@ func testAppendFloatToDecimal(t *testing.T, fa []float64, daExpected []int64, eE
 }

 func TestFloatToDecimal(t *testing.T) {
-	testFloatToDecimal(t, 0, 0, 0)
-	testFloatToDecimal(t, 1, 1, 0)
-	testFloatToDecimal(t, -1, -1, 0)
-	testFloatToDecimal(t, 0.9, 9, -1)
-	testFloatToDecimal(t, 0.99, 99, -2)
-	testFloatToDecimal(t, 9, 9, 0)
-	testFloatToDecimal(t, 99, 99, 0)
-	testFloatToDecimal(t, 20, 2, 1)
-	testFloatToDecimal(t, 100, 1, 2)
-	testFloatToDecimal(t, 3000, 3, 3)
-
-	testFloatToDecimal(t, 0.123, 123, -3)
-	testFloatToDecimal(t, -0.123, -123, -3)
-	testFloatToDecimal(t, 1.2345, 12345, -4)
-	testFloatToDecimal(t, -1.2345, -12345, -4)
-	testFloatToDecimal(t, 12000, 12, 3)
-	testFloatToDecimal(t, -12000, -12, 3)
-	testFloatToDecimal(t, 1e-30, 1, -30)
-	testFloatToDecimal(t, -1e-30, -1, -30)
-	testFloatToDecimal(t, 1e-260, 1, -260)
-	testFloatToDecimal(t, -1e-260, -1, -260)
-	testFloatToDecimal(t, 321e260, 321, 260)
-	testFloatToDecimal(t, -321e260, -321, 260)
-	testFloatToDecimal(t, 1234567890123, 1234567890123, 0)
-	testFloatToDecimal(t, -1234567890123, -1234567890123, 0)
-	testFloatToDecimal(t, 123e5, 123, 5)
-	testFloatToDecimal(t, 15e18, 15, 18)
-
-	testFloatToDecimal(t, math.Inf(1), vInfPos, 0)
-	testFloatToDecimal(t, math.Inf(-1), vInfNeg, 0)
-	testFloatToDecimal(t, 1<<63-1, 922337203685, 7)
-	testFloatToDecimal(t, -1<<63, -922337203685, 7)
-}
-
-func testFloatToDecimal(t *testing.T, f float64, vExpected int64, eExpected int16) {
-	t.Helper()
-
-	v, e := FromFloat(f)
-	if v != vExpected {
-		t.Fatalf("unexpected v for f=%e; got %d; expecting %d", f, v, vExpected)
-	}
-	if e != eExpected {
-		t.Fatalf("unexpected e for f=%e; got %d; expecting %d", f, e, eExpected)
+	f := func(f float64, vExpected int64, eExpected int16) {
+		t.Helper()
+		v, e := FromFloat(f)
+		if v != vExpected {
+			t.Fatalf("unexpected v for f=%e; got %d; expecting %d", f, v, vExpected)
+		}
+		if e != eExpected {
+			t.Fatalf("unexpected e for f=%e; got %d; expecting %d", f, e, eExpected)
+		}
 	}
+
+	f(0, 0, 0)
+	f(1, 1, 0)
+	f(-1, -1, 0)
+	f(0.9, 9, -1)
+	f(0.99, 99, -2)
+	f(9, 9, 0)
+	f(99, 99, 0)
+	f(20, 2, 1)
+	f(100, 1, 2)
+	f(3000, 3, 3)
+
+	f(0.123, 123, -3)
+	f(-0.123, -123, -3)
+	f(1.2345, 12345, -4)
+	f(-1.2345, -12345, -4)
+	f(12000, 12, 3)
+	f(-12000, -12, 3)
+	f(1e-30, 1, -30)
+	f(-1e-30, -1, -30)
+	f(1e-260, 1, -260)
+	f(-1e-260, -1, -260)
+	f(321e260, 321, 260)
+	f(-321e260, -321, 260)
+	f(1234567890123, 1234567890123, 0)
+	f(-1234567890123, -1234567890123, 0)
+	f(123e5, 123, 5)
+	f(15e18, 15, 18)
+
+	f(math.Inf(1), vInfPos, 0)
+	f(math.Inf(-1), vInfNeg, 0)
+	f(1<<63-1, 922337203685, 7)
+	f(-1<<63, -922337203685, 7)
+
+	// Test precision loss due to conversionPrecision.
+	f(0.1234567890123456, 12345678901234, -14)
+	f(-123456.7890123456, -12345678901234, -8)
 }

 func TestFloatToDecimalRoundtrip(t *testing.T) {
-	testFloatToDecimalRoundtrip(t, 0)
-	testFloatToDecimalRoundtrip(t, 1)
-	testFloatToDecimalRoundtrip(t, 0.123)
-	testFloatToDecimalRoundtrip(t, 1.2345)
-	testFloatToDecimalRoundtrip(t, 12000)
-	testFloatToDecimalRoundtrip(t, 1e-30)
-	testFloatToDecimalRoundtrip(t, 1e-260)
-	testFloatToDecimalRoundtrip(t, 321e260)
-	testFloatToDecimalRoundtrip(t, 1234567890123)
-	testFloatToDecimalRoundtrip(t, 12.34567890125)
-	testFloatToDecimalRoundtrip(t, 15e18)
+	f := func(f float64) {
+		t.Helper()

-	testFloatToDecimalRoundtrip(t, math.Inf(1))
-	testFloatToDecimalRoundtrip(t, math.Inf(-1))
-	testFloatToDecimalRoundtrip(t, 1<<63-1)
-	testFloatToDecimalRoundtrip(t, -1<<63)
+		v, e := FromFloat(f)
+		fNew := ToFloat(v, e)
+		if !equalFloat(fNew, f) {
+			t.Fatalf("unexpected fNew for v=%d, e=%d; got %g; expecting %g", v, e, fNew, f)
+		}
+
+		v, e = FromFloat(-f)
+		fNew = ToFloat(v, e)
+		if !equalFloat(fNew, -f) {
+			t.Fatalf("unexepcted fNew for v=%d, e=%d; got %g; expecting %g", v, e, fNew, -f)
+		}
+	}
+
+	f(0)
+	f(1)
+	f(0.123)
+	f(1.2345)
+	f(12000)
+	f(1e-30)
+	f(1e-260)
+	f(321e260)
+	f(1234567890123)
+	f(12.34567890125)
+	f(-1234567.8901256789)
+	f(15e18)
+
+	f(math.Inf(1))
+	f(math.Inf(-1))
+	f(1<<63 - 1)
+	f(-1 << 63)

 	for i := 0; i < 1e4; i++ {
-		f := rand.NormFloat64()
-		testFloatToDecimalRoundtrip(t, f)
-		testFloatToDecimalRoundtrip(t, f*1e-6)
-		testFloatToDecimalRoundtrip(t, f*1e6)
+		v := rand.NormFloat64()
+		f(v)
+		f(v * 1e-6)
+		f(v * 1e6)

-		testFloatToDecimalRoundtrip(t, roundFloat(f, 20))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, 10))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, 5))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, 0))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, -5))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, -10))
-		testFloatToDecimalRoundtrip(t, roundFloat(f, -20))
+		f(roundFloat(v, 20))
+		f(roundFloat(v, 10))
+		f(roundFloat(v, 5))
+		f(roundFloat(v, 0))
+		f(roundFloat(v, -5))
+		f(roundFloat(v, -10))
+		f(roundFloat(v, -20))
 	}
 }

@@ -292,22 +312,6 @@ func roundFloat(f float64, exp int) float64 {
 	return math.Trunc(f) * math.Pow10(exp)
 }

-func testFloatToDecimalRoundtrip(t *testing.T, f float64) {
-	t.Helper()
-
-	v, e := FromFloat(f)
-	fNew := ToFloat(v, e)
-	if !equalFloat(fNew, f) {
-		t.Fatalf("unexpected fNew for v=%d, e=%d; got %g; expecting %g", v, e, fNew, f)
-	}
-
-	v, e = FromFloat(-f)
-	fNew = ToFloat(v, e)
-	if !equalFloat(fNew, -f) {
-		t.Fatalf("unexepcted fNew for v=%d, e=%d; got %g; expecting %g", v, e, fNew, -f)
-	}
-}
-
 func equalFloat(f1, f2 float64) bool {
 	if math.IsInf(f1, 0) {
 		return math.IsInf(f1, 1) == math.IsInf(f2, 1) || math.IsInf(f1, -1) == math.IsInf(f2, -1)
--- a/lib/encoding/compress.go
+++ b/lib/encoding/compress.go
@@ -1,8 +1,8 @@
 package encoding

 import (
+	"github.com/VictoriaMetrics/VictoriaMetrics/lib/encoding/zstd"
 	"github.com/VictoriaMetrics/metrics"
-	"github.com/valyala/gozstd"
 )

 // CompressZSTDLevel appends compressed src to dst and returns
@@ -13,7 +13,7 @@ func CompressZSTDLevel(dst, src []byte, compressLevel int) []byte {
 	compressCalls.Inc()
 	originalBytes.Add(len(src))
 	dstLen := len(dst)
-	dst = gozstd.CompressLevel(dst, src, compressLevel)
+	dst = zstd.CompressLevel(dst, src, compressLevel)
 	compressedBytes.Add(len(dst) - dstLen)
 	return dst
 }
@@ -22,7 +22,7 @@ func CompressZSTDLevel(dst, src []byte, compressLevel int) []byte {
 // the appended dst.
 func DecompressZSTD(dst, src []byte) ([]byte, error) {
 	decompressCalls.Inc()
-	return gozstd.Decompress(dst, src)
+	return zstd.Decompress(dst, src)
 }

 var (
--- a/lib/encoding/encoding.go
+++ b/lib/encoding/encoding.go
@@ -296,7 +296,7 @@ func isDeltaConst(a []int64) bool {
 // i.e. arbitrary changing values.
 //
 // It is OK if a few gauges aren't detected (i.e. detected as counters),
-// since misdetected counters as gauges are much worse condition.
+// since misdetected counters as gauges leads to worser compression ratio.
 func isGauge(a []int64) bool {
 	// Check all the items in a, since a part of items may lead
 	// to incorrect gauge detection.
@@ -305,32 +305,36 @@ func isGauge(a []int64) bool {
 		return false
 	}

-	extremes := 0
-	plus := a[0] <= a[1]
-	v1 := a[1]
-	for _, v2 := range a[2:] {
-		if plus {
-			if v2 < v1 {
-				extremes++
-				plus = false
-			}
-		} else {
-			if v2 > v1 {
-				extremes++
-				plus = true
-			}
-		}
-		v1 = v2
+	resets := 0
+	vPrev := a[0]
+	if vPrev < 0 {
+		// Counter values cannot be negative.
+		return true
 	}
-	if extremes <= 2 {
-		// Probably counter reset.
+	for _, v := range a[1:] {
+		if v < vPrev {
+			if v < 0 {
+				// Counter values cannot be negative.
+				return true
+			}
+			if v > (vPrev >> 3) {
+				// Decreasing sequence detected.
+				// This is a gauge.
+				return true
+			}
+			// Possible counter reset.
+			resets++
+		}
+		vPrev = v
+	}
+	if resets <= 2 {
+		// Counter with a few resets.
 		return false
 	}

-	// A few extremes may indicate counter resets.
-	// Let it be a gauge if extremes exceed len(a)/32,
-	// otherwise assume counter reset.
-	return extremes > (len(a) >> 5)
+	// Let it be a gauge if resets exceeds len(a)/8,
+	// otherwise assume counter.
+	return resets > (len(a) >> 3)
 }

 func getCompressLevel(itemsCount int) int {
--- a/lib/encoding/encoding_cgo_test.go
+++ b/lib/encoding/encoding_cgo_test.go
@@ -0,0 +1,83 @@
+// +build cgo
+
+package encoding
+
+import (
+	"math/rand"
+	"testing"
+)
+
+func TestMarshalUnmarshalInt64Array(t *testing.T) {
+	var va []int64
+	var v int64
+
+	// Verify nearest delta encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 8*1024; i++ {
+		v += int64(rand.NormFloat64() * 1e6)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 23; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta)
+	}
+	for precisionBits := uint8(23); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
+	}
+
+	// Verify nearest delta2 encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 8*1024; i++ {
+		v += 30e6 + int64(rand.NormFloat64()*1e6)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 24; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta2)
+	}
+	for precisionBits := uint8(24); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
+	}
+
+	// Verify nearest delta encoding.
+	va = va[:0]
+	v = 1000
+	for i := 0; i < 6; i++ {
+		v += int64(rand.NormFloat64() * 100)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
+	}
+
+	// Verify nearest delta2 encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 6; i++ {
+		v += 3000 + int64(rand.NormFloat64()*100)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(5); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
+	}
+}
+
+func TestMarshalInt64ArraySize(t *testing.T) {
+	var va []int64
+	v := int64(rand.Float64() * 1e9)
+	for i := 0; i < 8*1024; i++ {
+		va = append(va, v)
+		v += 30e3 + int64(rand.NormFloat64()*1e3)
+	}
+
+	testMarshalInt64ArraySize(t, va, 1, 500, 1300)
+	testMarshalInt64ArraySize(t, va, 2, 500, 1400)
+	testMarshalInt64ArraySize(t, va, 3, 800, 1800)
+	testMarshalInt64ArraySize(t, va, 4, 1300, 2100)
+	testMarshalInt64ArraySize(t, va, 5, 2000, 3200)
+	testMarshalInt64ArraySize(t, va, 6, 3000, 4800)
+	testMarshalInt64ArraySize(t, va, 7, 4000, 6400)
+	testMarshalInt64ArraySize(t, va, 8, 6000, 8000)
+	testMarshalInt64ArraySize(t, va, 9, 7000, 8800)
+	testMarshalInt64ArraySize(t, va, 10, 8000, 10000)
+}
--- a/lib/encoding/encoding_pure_test.go
+++ b/lib/encoding/encoding_pure_test.go
@@ -0,0 +1,83 @@
+// +build !cgo
+
+package encoding
+
+import (
+	"math/rand"
+	"testing"
+)
+
+func TestMarshalUnmarshalInt64Array(t *testing.T) {
+	var va []int64
+	var v int64
+
+	// Verify nearest delta encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 8*1024; i++ {
+		v += int64(rand.NormFloat64() * 1e6)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 17; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta)
+	}
+	for precisionBits := uint8(23); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
+	}
+
+	// Verify nearest delta2 encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 8*1024; i++ {
+		v += 30e6 + int64(rand.NormFloat64()*1e6)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 15; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta2)
+	}
+	for precisionBits := uint8(24); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
+	}
+
+	// Verify nearest delta encoding.
+	va = va[:0]
+	v = 1000
+	for i := 0; i < 6; i++ {
+		v += int64(rand.NormFloat64() * 100)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(1); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
+	}
+
+	// Verify nearest delta2 encoding.
+	va = va[:0]
+	v = 0
+	for i := 0; i < 6; i++ {
+		v += 3000 + int64(rand.NormFloat64()*100)
+		va = append(va, v)
+	}
+	for precisionBits := uint8(5); precisionBits < 65; precisionBits++ {
+		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
+	}
+}
+
+func TestMarshalInt64ArraySize(t *testing.T) {
+	var va []int64
+	v := int64(rand.Float64() * 1e9)
+	for i := 0; i < 8*1024; i++ {
+		va = append(va, v)
+		v += 30e3 + int64(rand.NormFloat64()*1e3)
+	}
+
+	testMarshalInt64ArraySize(t, va, 1, 500, 1700)
+	testMarshalInt64ArraySize(t, va, 2, 600, 1800)
+	testMarshalInt64ArraySize(t, va, 3, 900, 2100)
+	testMarshalInt64ArraySize(t, va, 4, 1300, 2200)
+	testMarshalInt64ArraySize(t, va, 5, 2000, 3300)
+	testMarshalInt64ArraySize(t, va, 6, 3000, 5000)
+	testMarshalInt64ArraySize(t, va, 7, 4000, 6500)
+	testMarshalInt64ArraySize(t, va, 8, 6000, 8000)
+	testMarshalInt64ArraySize(t, va, 9, 7000, 8800)
+	testMarshalInt64ArraySize(t, va, 10, 8000, 17000)
+}
--- a/lib/encoding/encoding_test.go
+++ b/lib/encoding/encoding_test.go
@@ -43,27 +43,29 @@ func TestIsDeltaConst(t *testing.T) {
 }

 func TestIsGauge(t *testing.T) {
-	testIsGauge(t, []int64{}, false)
-	testIsGauge(t, []int64{0}, false)
-	testIsGauge(t, []int64{1, 2}, false)
-	testIsGauge(t, []int64{0, 1, 2, 3, 4, 5}, false)
-	testIsGauge(t, []int64{0, -1, -2, -3, -4}, false)
-	testIsGauge(t, []int64{0, 0, 0, 0, 0, 0, 0}, false)
-	testIsGauge(t, []int64{1, 1, 1, 1, 1}, false)
-	testIsGauge(t, []int64{1, 1, 2, 2, 2, 2}, false)
-	testIsGauge(t, []int64{1, 5, 2, 3}, false) // a single counter reset
-	testIsGauge(t, []int64{1, 5, 2, 3, 2}, true)
-	testIsGauge(t, []int64{-1, -5, -2, -3}, false) // a single counter reset
-	testIsGauge(t, []int64{-1, -5, -2, -3, -2}, true)
-}
-
-func testIsGauge(t *testing.T, a []int64, okExpected bool) {
-	t.Helper()
-
-	ok := isGauge(a)
-	if ok != okExpected {
-		t.Fatalf("unexpected result for isGauge(%d); got %v; expecting %v", a, ok, okExpected)
+	f := func(a []int64, okExpected bool) {
+		t.Helper()
+		ok := isGauge(a)
+		if ok != okExpected {
+			t.Fatalf("unexpected result for isGauge(%d); got %v; expecting %v", a, ok, okExpected)
+		}
 	}
+	f([]int64{}, false)
+	f([]int64{0}, false)
+	f([]int64{1, 2}, false)
+	f([]int64{0, 1, 2, 3, 4, 5}, false)
+	f([]int64{0, -1, -2, -3, -4}, true)
+	f([]int64{0, 0, 0, 0, 0, 0, 0}, false)
+	f([]int64{1, 1, 1, 1, 1}, false)
+	f([]int64{1, 1, 2, 2, 2, 2}, false)
+	f([]int64{1, 17, 2, 3}, false) // a single counter reset
+	f([]int64{1, 5, 2, 3}, true)
+	f([]int64{1, 5, 2, 3, 2}, true)
+	f([]int64{-1, -5, -2, -3}, true)
+	f([]int64{-1, -5, -2, -3, -2}, true)
+	f([]int64{5, 6, 4, 3, 2}, true)
+	f([]int64{4, 5, 6, 5, 4, 3, 2}, true)
+	f([]int64{1064, 1132, 1083, 1062, 856, 747}, true)
 }

 func TestEnsureNonDecreasingSequence(t *testing.T) {
@@ -87,73 +89,6 @@ func testEnsureNonDecreasingSequence(t *testing.T, a []int64, vMin, vMax int64,
 	}
 }

-func TestMarshalUnmarshalInt64Array(t *testing.T) {
-	testMarshalUnmarshalInt64Array(t, []int64{1, 20, 234}, 4, MarshalTypeNearestDelta2)
-	testMarshalUnmarshalInt64Array(t, []int64{1, 20, -2345, 678934, 342}, 4, MarshalTypeNearestDelta)
-	testMarshalUnmarshalInt64Array(t, []int64{1, 20, 2345, 6789, 12342}, 4, MarshalTypeNearestDelta2)
-
-	// Constant encoding
-	testMarshalUnmarshalInt64Array(t, []int64{1}, 4, MarshalTypeConst)
-	testMarshalUnmarshalInt64Array(t, []int64{1, 2}, 4, MarshalTypeDeltaConst)
-	testMarshalUnmarshalInt64Array(t, []int64{-1, 0, 1, 2, 3, 4, 5}, 4, MarshalTypeDeltaConst)
-	testMarshalUnmarshalInt64Array(t, []int64{-10, -1, 8, 17, 26}, 4, MarshalTypeDeltaConst)
-	testMarshalUnmarshalInt64Array(t, []int64{0, 0, 0, 0, 0, 0}, 4, MarshalTypeConst)
-	testMarshalUnmarshalInt64Array(t, []int64{100, 100, 100, 100}, 4, MarshalTypeConst)
-
-	var va []int64
-	var v int64
-
-	// Verify nearest delta encoding.
-	va = va[:0]
-	v = 0
-	for i := 0; i < 8*1024; i++ {
-		v += int64(rand.NormFloat64() * 1e6)
-		va = append(va, v)
-	}
-	for precisionBits := uint8(1); precisionBits < 23; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta)
-	}
-	for precisionBits := uint8(23); precisionBits < 65; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
-	}
-
-	// Verify nearest delta2 encoding.
-	va = va[:0]
-	v = 0
-	for i := 0; i < 8*1024; i++ {
-		v += 30e6 + int64(rand.NormFloat64()*1e6)
-		va = append(va, v)
-	}
-	for precisionBits := uint8(1); precisionBits < 24; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeZSTDNearestDelta2)
-	}
-	for precisionBits := uint8(24); precisionBits < 65; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
-	}
-
-	// Verify nearest delta encoding.
-	va = va[:0]
-	v = 1000
-	for i := 0; i < 6; i++ {
-		v += int64(rand.NormFloat64() * 100)
-		va = append(va, v)
-	}
-	for precisionBits := uint8(1); precisionBits < 65; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta)
-	}
-
-	// Verify nearest delta2 encoding.
-	va = va[:0]
-	v = 0
-	for i := 0; i < 6; i++ {
-		v += 3000 + int64(rand.NormFloat64()*100)
-		va = append(va, v)
-	}
-	for precisionBits := uint8(5); precisionBits < 65; precisionBits++ {
-		testMarshalUnmarshalInt64Array(t, va, precisionBits, MarshalTypeNearestDelta2)
-	}
-}
-
 func testMarshalUnmarshalInt64Array(t *testing.T, va []int64, precisionBits uint8, mtExpected MarshalType) {
 	t.Helper()

@@ -257,24 +192,18 @@ func TestMarshalUnmarshalValues(t *testing.T) {
 	}
 }

-func TestMarshalInt64ArraySize(t *testing.T) {
-	var va []int64
-	v := int64(rand.Float64() * 1e9)
-	for i := 0; i < 8*1024; i++ {
-		va = append(va, v)
-		v += 30e3 + int64(rand.NormFloat64()*1e3)
-	}
+func TestMarshalUnmarshalInt64ArrayGeneric(t *testing.T) {
+	testMarshalUnmarshalInt64Array(t, []int64{1, 20, 234}, 4, MarshalTypeNearestDelta2)
+	testMarshalUnmarshalInt64Array(t, []int64{1, 20, -2345, 678934, 342}, 4, MarshalTypeNearestDelta)
+	testMarshalUnmarshalInt64Array(t, []int64{1, 20, 2345, 6789, 12342}, 4, MarshalTypeNearestDelta2)

-	testMarshalInt64ArraySize(t, va, 1, 500, 1300)
-	testMarshalInt64ArraySize(t, va, 2, 600, 1400)
-	testMarshalInt64ArraySize(t, va, 3, 900, 1800)
-	testMarshalInt64ArraySize(t, va, 4, 1300, 2100)
-	testMarshalInt64ArraySize(t, va, 5, 2000, 3200)
-	testMarshalInt64ArraySize(t, va, 6, 3000, 4800)
-	testMarshalInt64ArraySize(t, va, 7, 4000, 6400)
-	testMarshalInt64ArraySize(t, va, 8, 6000, 8000)
-	testMarshalInt64ArraySize(t, va, 9, 7000, 8800)
-	testMarshalInt64ArraySize(t, va, 10, 8000, 10000)
+	// Constant encoding
+	testMarshalUnmarshalInt64Array(t, []int64{1}, 4, MarshalTypeConst)
+	testMarshalUnmarshalInt64Array(t, []int64{1, 2}, 4, MarshalTypeDeltaConst)
+	testMarshalUnmarshalInt64Array(t, []int64{-1, 0, 1, 2, 3, 4, 5}, 4, MarshalTypeDeltaConst)
+	testMarshalUnmarshalInt64Array(t, []int64{-10, -1, 8, 17, 26}, 4, MarshalTypeDeltaConst)
+	testMarshalUnmarshalInt64Array(t, []int64{0, 0, 0, 0, 0, 0}, 4, MarshalTypeConst)
+	testMarshalUnmarshalInt64Array(t, []int64{100, 100, 100, 100}, 4, MarshalTypeConst)
 }

 func testMarshalInt64ArraySize(t *testing.T, va []int64, precisionBits uint8, minSizeExpected, maxSizeExpected int) {
--- a/lib/encoding/zstd/zstd_cgo.go
+++ b/lib/encoding/zstd/zstd_cgo.go
@@ -0,0 +1,19 @@
+// +build cgo
+
+package zstd
+
+import (
+	"github.com/valyala/gozstd"
+)
+
+// Decompress appends decompressed src to dst and returns the result.
+func Decompress(dst, src []byte) ([]byte, error) {
+	return gozstd.Decompress(dst, src)
+}
+
+// CompressLevel appends compressed src to dst and returns the result.
+//
+// The given compressionLevel is used for the compression.
+func CompressLevel(dst, src []byte, compressionLevel int) []byte {
+	return gozstd.CompressLevel(dst, src, compressionLevel)
+}
--- a/lib/encoding/zstd/zstd_pure.go
+++ b/lib/encoding/zstd/zstd_pure.go
@@ -0,0 +1,78 @@
+// +build !cgo
+
+package zstd
+
+import (
+	"sync"
+	"sync/atomic"
+
+	"github.com/VictoriaMetrics/VictoriaMetrics/lib/logger"
+	"github.com/klauspost/compress/zstd"
+)
+
+var (
+	decoder *zstd.Decoder
+
+	mu sync.Mutex
+	av atomic.Value
+)
+
+type registry map[int]*zstd.Encoder
+
+func init() {
+	r := make(registry)
+	av.Store(r)
+
+	var err error
+	decoder, err = zstd.NewReader(nil)
+	if err != nil {
+		logger.Panicf("BUG: failed to create ZSTD reader: %s", err)
+	}
+}
+
+// Decompress appends decompressed src to dst and returns the result.
+func Decompress(dst, src []byte) ([]byte, error) {
+	return decoder.DecodeAll(src, dst)
+}
+
+// CompressLevel appends compressed src to dst and returns the result.
+//
+// The given compressionLevel is used for the compression.
+func CompressLevel(dst, src []byte, compressionLevel int) []byte {
+	e := getEncoder(compressionLevel)
+	return e.EncodeAll(src, dst)
+}
+
+func getEncoder(compressionLevel int) *zstd.Encoder {
+	r := av.Load().(registry)
+	e := r[compressionLevel]
+	if e != nil {
+		return e
+	}
+
+	mu.Lock()
+	// Create the encoder under lock in order to prevent from wasted work
+	// when concurrent goroutines create encoder for the same compressionLevel.
+	e = newEncoder(compressionLevel)
+	r1 := av.Load().(registry)
+	r2 := make(registry)
+	for k, v := range r1 {
+		r2[k] = v
+	}
+	r2[compressionLevel] = e
+	av.Store(r2)
+	mu.Unlock()
+
+	return e
+}
+
+func newEncoder(compressionLevel int) *zstd.Encoder {
+	level := zstd.EncoderLevelFromZstd(compressionLevel)
+	e, err := zstd.NewWriter(nil,
+		zstd.WithEncoderCRC(false), // Disable CRC for performance reasons.
+		zstd.WithEncoderLevel(level))
+	if err != nil {
+		logger.Panicf("BUG: failed to create ZSTD writer: %s", err)
+	}
+	return e
+}
--- a/lib/encoding/zstd/zstd_test.go
+++ b/lib/encoding/zstd/zstd_test.go
@@ -0,0 +1,96 @@
+// +build cgo
+
+package zstd
+
+import (
+	"math/rand"
+	"testing"
+
+	pure "github.com/klauspost/compress/zstd"
+	cgo "github.com/valyala/gozstd"
+)
+
+func TestCompressDecompress(t *testing.T) {
+	testCrossCompressDecompress(t, []byte("a"))
+	testCrossCompressDecompress(t, []byte("foobarbaz"))
+
+	var b []byte
+	for i := 0; i < 64*1024; i++ {
+		b = append(b, byte(rand.Int31n(256)))
+	}
+	testCrossCompressDecompress(t, b)
+}
+
+func testCrossCompressDecompress(t *testing.T, b []byte) {
+	testCompressDecompress(t, pureCompress, pureDecompress, b)
+	testCompressDecompress(t, cgoCompress, cgoDecompress, b)
+	testCompressDecompress(t, pureCompress, cgoDecompress, b)
+	testCompressDecompress(t, cgoCompress, pureDecompress, b)
+}
+
+func testCompressDecompress(t *testing.T, compress compressFn, decompress decompressFn, b []byte) {
+	bc, err := compress(nil, b, 5)
+	if err != nil {
+		t.Fatalf("unexpected error when compressing b=%x: %s", b, err)
+	}
+	bNew, err := decompress(nil, bc)
+	if err != nil {
+		t.Fatalf("unexpected error when decompressing b=%x from bc=%x: %s", b, bc, err)
+	}
+	if string(bNew) != string(b) {
+		t.Fatalf("invalid bNew; got\n%x; expecting\n%x", bNew, b)
+	}
+
+	prefix := []byte{1, 2, 33}
+	bcNew, err := compress(prefix, b, 5)
+	if err != nil {
+		t.Fatalf("unexpected error when compressing b=%x: %s", bcNew, err)
+	}
+	if string(bcNew[:len(prefix)]) != string(prefix) {
+		t.Fatalf("invalid prefix for b=%x; got\n%x; expecting\n%x", b, bcNew[:len(prefix)], prefix)
+	}
+	if string(bcNew[len(prefix):]) != string(bc) {
+		t.Fatalf("invalid prefixed bcNew for b=%x; got\n%x; expecting\n%x", b, bcNew[len(prefix):], bc)
+	}
+
+	bNew, err = decompress(prefix, bc)
+	if err != nil {
+		t.Fatalf("unexpected error when decompressing b=%x from bc=%x with prefix: %s", b, bc, err)
+	}
+	if string(bNew[:len(prefix)]) != string(prefix) {
+		t.Fatalf("invalid bNew prefix when decompressing bc=%x; got\n%x; expecting\n%x", bc, bNew[:len(prefix)], prefix)
+	}
+	if string(bNew[len(prefix):]) != string(b) {
+		t.Fatalf("invalid prefixed bNew; got\n%x; expecting\n%x", bNew[len(prefix):], b)
+	}
+}
+
+type compressFn func(dst, src []byte, compressionLevel int) ([]byte, error)
+
+func pureCompress(dst, src []byte, _ int) ([]byte, error) {
+	w, err := pure.NewWriter(nil,
+		pure.WithEncoderCRC(false), // Disable CRC for performance reasons.
+		pure.WithEncoderLevel(pure.SpeedBestCompression))
+	if err != nil {
+		return nil, err
+	}
+	return w.EncodeAll(src, dst), nil
+}
+
+func cgoCompress(dst, src []byte, compressionLevel int) ([]byte, error) {
+	return cgo.CompressLevel(dst, src, compressionLevel), nil
+}
+
+type decompressFn func(dst, src []byte) ([]byte, error)
+
+func pureDecompress(dst, src []byte) ([]byte, error) {
+	decoder, err := pure.NewReader(nil)
+	if err != nil {
+		return nil, err
+	}
+	return decoder.DecodeAll(src, dst)
+}
+
+func cgoDecompress(dst, src []byte) ([]byte, error) {
+	return cgo.Decompress(dst, src)
+}
--- a/lib/storage/block.go
+++ b/lib/storage/block.go
@@ -205,19 +205,6 @@ func (b *Block) MarshalData(timestampsBlockOffset, valuesBlockOffset uint64) ([]
 	b.bh.ValuesBlockSize = uint32(len(b.valuesData))
 	b.values = b.values[:0]

-	if len(timestamps) > 1 && (b.bh.ValuesMarshalType == encoding.MarshalTypeConst || b.bh.ValuesMarshalType == encoding.MarshalTypeDeltaConst) {
-		// Special case - values are constant or are changed with constant rate.
-		// In this case we may 'cheat' by assuming timestamps are changed
-		// at ideal constant rate. This improves timestamps' compression rate.
-		minTimestamp := timestamps[0]
-		maxTimestamp := timestamps[len(timestamps)-1]
-		delta := (maxTimestamp - minTimestamp) / int64(len(timestamps)-1)
-		ts := minTimestamp
-		for i := 1; i < len(timestamps); i++ {
-			ts += delta
-			timestamps[i] = ts
-		}
-	}
 	b.timestampsData, b.bh.TimestampsMarshalType, b.bh.MinTimestamp = encoding.MarshalTimestamps(b.timestampsData[:0], timestamps, b.bh.PrecisionBits)
 	b.bh.TimestampsBlockOffset = timestampsBlockOffset
 	b.bh.TimestampsBlockSize = uint32(len(b.timestampsData))
--- a/lib/storage/index_db.go
+++ b/lib/storage/index_db.go
@@ -1381,7 +1381,7 @@ func matchTagFilters(mn *MetricName, tfs []*tagFilter, kb *bytesutil.ByteBuffer)
 				continue
 			}

-			// Found the matching tag name. Match for the value.
+			// Found the matching tag name. Match the value.
 			b := tag.Marshal(kb.B)
 			kb.B = b[:len(kb.B)]
 			ok, err := matchTagFilter(b, tf)
@@ -1394,7 +1394,7 @@ func matchTagFilters(mn *MetricName, tfs []*tagFilter, kb *bytesutil.ByteBuffer)
 			tagMatched = true
 			break
 		}
-		if !tagMatched {
+		if !tagMatched && !tf.isNegative {
 			// Matching tag name wasn't found.
 			return false, nil
 		}
--- a/lib/storage/index_db_test.go
+++ b/lib/storage/index_db_test.go
@@ -753,8 +753,8 @@ func TestMatchTagFilters(t *testing.T) {
 	if err != nil {
 		t.Fatalf("unexpected error: %s", err)
 	}
-	if ok {
-		t.Fatalf("Shouldn't match")
+	if !ok {
+		t.Fatalf("Should match")
 	}
 	tfs.Reset()
 	if err := tfs.Add([]byte("non-existing-tag"), []byte("foob.+metric"), true, true); err != nil {
@@ -764,8 +764,19 @@ func TestMatchTagFilters(t *testing.T) {
 	if err != nil {
 		t.Fatalf("unexpected error: %s", err)
 	}
-	if ok {
-		t.Fatalf("Shouldn't match")
+	if !ok {
+		t.Fatalf("Should match")
+	}
+	tfs.Reset()
+	if err := tfs.Add([]byte("non-existing-tag"), []byte(".+"), true, true); err != nil {
+		t.Fatalf("cannot add regexp, negative filter: %s", err)
+	}
+	ok, err = matchTagFilters(&mn, toTFPointers(tfs.tfs), &bb)
+	if err != nil {
+		t.Fatalf("unexpected error: %s", err)
+	}
+	if !ok {
+		t.Fatalf("Should match")
 	}

 	// Negative match by existing tag
@@ -859,6 +870,17 @@ func TestMatchTagFilters(t *testing.T) {
 	if !ok {
 		t.Fatalf("Should match")
 	}
+	tfs.Reset()
+	if err := tfs.Add([]byte("key 3"), []byte(""), true, false); err != nil {
+		t.Fatalf("cannot add regexp, negative filter: %s", err)
+	}
+	ok, err = matchTagFilters(&mn, toTFPointers(tfs.tfs), &bb)
+	if err != nil {
+		t.Fatalf("unexpected error: %s", err)
+	}
+	if !ok {
+		t.Fatalf("Should match")
+	}

 	// Positive match by multiple tags and MetricGroup
 	tfs.Reset()
--- a/lib/storage/part_search.go
+++ b/lib/storage/part_search.go
@@ -28,6 +28,9 @@ type partSearch struct {
 	// tr is a time range to search.
 	tr TimeRange

+	// Skip populating timestampsData and valuesData in Block if fetchData=false.
+	fetchData bool
+
 	metaindex []metaindexRow

 	ibCache *indexBlockCache
@@ -48,6 +51,7 @@ func (ps *partSearch) reset() {
 	ps.p = nil
 	ps.tsids = ps.tsids[:0]
 	ps.tsidIdx = 0
+	ps.fetchData = true
 	ps.metaindex = nil
 	ps.ibCache = nil
 	ps.bhs = nil
@@ -61,7 +65,7 @@ func (ps *partSearch) reset() {
 }

 // Init initializes the ps with the given p, tsids and tr.
-func (ps *partSearch) Init(p *part, tsids []TSID, tr TimeRange) {
+func (ps *partSearch) Init(p *part, tsids []TSID, tr TimeRange, fetchData bool) {
 	ps.reset()
 	ps.p = p

@@ -72,6 +76,7 @@ func (ps *partSearch) Init(p *part, tsids []TSID, tr TimeRange) {
 		ps.tsids = append(ps.tsids[:0], tsids...)
 	}
 	ps.tr = tr
+	ps.fetchData = fetchData
 	ps.metaindex = p.metaindex
 	ps.ibCache = &p.ibCache

@@ -281,11 +286,14 @@ func (ps *partSearch) searchBHS() bool {

 func (ps *partSearch) readBlock(bh *blockHeader) {
 	ps.Block.Reset()
+	ps.Block.bh = *bh
+	if !ps.fetchData {
+		return
+	}
+
 	ps.Block.timestampsData = bytesutil.Resize(ps.Block.timestampsData[:0], int(bh.TimestampsBlockSize))
 	ps.p.timestampsFile.ReadAt(ps.Block.timestampsData, int64(bh.TimestampsBlockOffset))

 	ps.Block.valuesData = bytesutil.Resize(ps.Block.valuesData[:0], int(bh.ValuesBlockSize))
 	ps.p.valuesFile.ReadAt(ps.Block.valuesData, int64(bh.ValuesBlockOffset))
-
-	ps.Block.bh = *bh
 }
--- a/lib/storage/part_search_test.go
+++ b/lib/storage/part_search_test.go
@@ -1247,7 +1247,7 @@ func testPartSearch(t *testing.T, p *part, tsids []TSID, tr TimeRange, expectedR

 func testPartSearchSerial(p *part, tsids []TSID, tr TimeRange, expectedRawBlocks []rawBlock) error {
 	var ps partSearch
-	ps.Init(p, tsids, tr)
+	ps.Init(p, tsids, tr, true)
 	var bs []Block
 	for ps.NextBlock() {
 		var b Block
--- a/lib/storage/partition_search.go
+++ b/lib/storage/partition_search.go
@@ -56,7 +56,7 @@ func (pts *partitionSearch) reset() {
 // Init initializes the search in the given partition for the given tsid and tr.
 //
 // MustClose must be called when partition search is done.
-func (pts *partitionSearch) Init(pt *partition, tsids []TSID, tr TimeRange) {
+func (pts *partitionSearch) Init(pt *partition, tsids []TSID, tr TimeRange, fetchData bool) {
 	if pts.needClosing {
 		logger.Panicf("BUG: missing partitionSearch.MustClose call before the next call to Init")
 	}
@@ -85,7 +85,7 @@ func (pts *partitionSearch) Init(pt *partition, tsids []TSID, tr TimeRange) {
 	}
 	pts.psPool = pts.psPool[:len(pts.pws)]
 	for i, pw := range pts.pws {
-		pts.psPool[i].Init(pw.p, tsids, tr)
+		pts.psPool[i].Init(pw.p, tsids, tr, fetchData)
 	}

 	// Initialize the psHeap.
--- a/lib/storage/partition_search_test.go
+++ b/lib/storage/partition_search_test.go
@@ -238,7 +238,7 @@ func testPartitionSearchSerial(pt *partition, tsids []TSID, tr TimeRange, rbsExp

 	bs := []Block{}
 	var pts partitionSearch
-	pts.Init(pt, tsids, tr)
+	pts.Init(pt, tsids, tr, true)
 	for pts.NextBlock() {
 		var b Block
 		b.CopyFrom(pts.Block)
@@ -263,7 +263,7 @@ func testPartitionSearchSerial(pt *partition, tsids []TSID, tr TimeRange, rbsExp
 	}

 	// verify that empty tsids returns empty result
-	pts.Init(pt, []TSID{}, tr)
+	pts.Init(pt, []TSID{}, tr, true)
 	if pts.NextBlock() {
 		return fmt.Errorf("unexpected block got for an empty tsids list: %+v", pts.Block)
 	}
@@ -271,6 +271,16 @@ func testPartitionSearchSerial(pt *partition, tsids []TSID, tr TimeRange, rbsExp
 		return fmt.Errorf("unexpected error on empty tsids list: %s", err)
 	}
 	pts.MustClose()
+
+	pts.Init(pt, []TSID{}, tr, false)
+	if pts.NextBlock() {
+		return fmt.Errorf("unexpected block got for an empty tsids list: %+v", pts.Block)
+	}
+	if err := pts.Error(); err != nil {
+		return fmt.Errorf("unexpected error on empty tsids list: %s", err)
+	}
+	pts.MustClose()
+
 	return nil
 }

--- a/lib/storage/search.go
+++ b/lib/storage/search.go
@@ -110,7 +110,7 @@ func (s *Search) reset() {
 // Init initializes s from the given storage, tfss and tr.
 //
 // MustClose must be called when the search is done.
-func (s *Search) Init(storage *Storage, tfss []*TagFilters, tr TimeRange, maxMetrics int) {
+func (s *Search) Init(storage *Storage, tfss []*TagFilters, tr TimeRange, fetchData bool, maxMetrics int) {
 	if s.needClosing {
 		logger.Panicf("BUG: missing MustClose call before the next call to Init")
 	}
@@ -123,7 +123,7 @@ func (s *Search) Init(storage *Storage, tfss []*TagFilters, tr TimeRange, maxMet
 	// It is ok to call Init on error from storage.searchTSIDs.
 	// Init must be called before returning because it will fail
 	// on Seach.MustClose otherwise.
-	s.ts.Init(storage.tb, tsids, tr)
+	s.ts.Init(storage.tb, tsids, tr, fetchData)

 	if err != nil {
 		s.err = err
--- a/lib/storage/search_test.go
+++ b/lib/storage/search_test.go
@@ -193,7 +193,7 @@ func testSearch(st *Storage, tr TimeRange, mrs []MetricRow, accountsCount int) e
 		}

 		// Search
-		s.Init(st, []*TagFilters{tfs}, tr, 1e5)
+		s.Init(st, []*TagFilters{tfs}, tr, true, 1e5)
 		var mbs []MetricBlock
 		for s.NextMetricBlock() {
 			var b Block
--- a/lib/storage/storage.go
+++ b/lib/storage/storage.go
@@ -65,6 +65,13 @@ type Storage struct {

 	currHourMetricIDsUpdaterWG sync.WaitGroup
 	retentionWatcherWG         sync.WaitGroup
+
+	tooSmallTimestampRows uint64
+	tooBigTimestampRows   uint64
+
+	addRowsConcurrencyLimitReached uint64
+	addRowsConcurrencyLimitTimeout uint64
+	addRowsConcurrencyDroppedRows  uint64
 }

 // OpenStorage opens storage on the given path with the given number of retention months.
@@ -271,6 +278,15 @@ func (s *Storage) idb() *indexDB {

 // Metrics contains essential metrics for the Storage.
 type Metrics struct {
+	TooSmallTimestampRows uint64
+	TooBigTimestampRows   uint64
+
+	AddRowsConcurrencyLimitReached uint64
+	AddRowsConcurrencyLimitTimeout uint64
+	AddRowsConcurrencyDroppedRows  uint64
+	AddRowsConcurrencyCapacity     uint64
+	AddRowsConcurrencyCurrent      uint64
+
 	TSIDCacheSize       uint64
 	TSIDCacheSizeBytes  uint64
 	TSIDCacheRequests   uint64
@@ -308,6 +324,15 @@ func (m *Metrics) Reset() {

 // UpdateMetrics updates m with metrics from s.
 func (s *Storage) UpdateMetrics(m *Metrics) {
+	m.TooSmallTimestampRows += atomic.LoadUint64(&s.tooSmallTimestampRows)
+	m.TooBigTimestampRows += atomic.LoadUint64(&s.tooBigTimestampRows)
+
+	m.AddRowsConcurrencyLimitReached += atomic.LoadUint64(&s.addRowsConcurrencyLimitReached)
+	m.AddRowsConcurrencyLimitTimeout += atomic.LoadUint64(&s.addRowsConcurrencyLimitTimeout)
+	m.AddRowsConcurrencyDroppedRows += atomic.LoadUint64(&s.addRowsConcurrencyDroppedRows)
+	m.AddRowsConcurrencyCapacity = uint64(cap(addRowsConcurrencyCh))
+	m.AddRowsConcurrencyCurrent = uint64(len(addRowsConcurrencyCh))
+
 	var cs fastcache.Stats
 	s.tsidCache.UpdateStats(&cs)
 	m.TSIDCacheSize += cs.EntriesCount
@@ -713,15 +738,24 @@ func (s *Storage) AddRows(mrs []MetricRow, precisionBits uint8) error {
 	// Limit the number of concurrent goroutines that may add rows to the storage.
 	// This should prevent from out of memory errors and CPU trashing when too many
 	// goroutines call AddRows.
-	t := timerpool.Get(addRowsTimeout)
 	select {
 	case addRowsConcurrencyCh <- struct{}{}:
-		timerpool.Put(t)
 		defer func() { <-addRowsConcurrencyCh }()
-	case <-t.C:
-		timerpool.Put(t)
-		return fmt.Errorf("Cannot add %d rows to storage in %s, since it is overloaded with %d concurrent writers. Add more CPUs or reduce load",
-			len(mrs), addRowsTimeout, cap(addRowsConcurrencyCh))
+	default:
+		// Sleep for a while until giving up
+		atomic.AddUint64(&s.addRowsConcurrencyLimitReached, 1)
+		t := timerpool.Get(addRowsTimeout)
+		select {
+		case addRowsConcurrencyCh <- struct{}{}:
+			timerpool.Put(t)
+			defer func() { <-addRowsConcurrencyCh }()
+		case <-t.C:
+			timerpool.Put(t)
+			atomic.AddUint64(&s.addRowsConcurrencyLimitTimeout, 1)
+			atomic.AddUint64(&s.addRowsConcurrencyDroppedRows, uint64(len(mrs)))
+			return fmt.Errorf("Cannot add %d rows to storage in %s, since it is overloaded with %d concurrent writers. Add more CPUs or reduce load",
+				len(mrs), addRowsTimeout, cap(addRowsConcurrencyCh))
+		}
 	}

 	// Add rows to the storage.
@@ -760,8 +794,14 @@ func (s *Storage) add(rows []rawRow, mrs []MetricRow, precisionBits uint8) ([]ra
 			// doesn't know how to work with them.
 			continue
 		}
-		if mr.Timestamp < minTimestamp || mr.Timestamp > maxTimestamp {
-			// Skip rows with timestamps outside the retention.
+		if mr.Timestamp < minTimestamp {
+			// Skip rows with too small timestamps outside the retention.
+			atomic.AddUint64(&s.tooSmallTimestampRows, 1)
+			continue
+		}
+		if mr.Timestamp > maxTimestamp {
+			// Skip rows with too big timestamps significantly exceeding the current time.
+			atomic.AddUint64(&s.tooBigTimestampRows, 1)
 			continue
 		}
 		r := &rows[rowsLen+j]
--- a/lib/storage/storage_test.go
+++ b/lib/storage/storage_test.go
@@ -502,12 +502,24 @@ func testStorageDeleteMetrics(s *Storage, workerNum int) error {
 		MaxTimestamp: 2e10,
 	}
 	metricBlocksCount := func(tfs *TagFilters) int {
+		// Verify the number of blocks with fetchData=true
 		n := 0
-		sr.Init(s, []*TagFilters{tfs}, tr, 1e5)
+		sr.Init(s, []*TagFilters{tfs}, tr, true, 1e5)
 		for sr.NextMetricBlock() {
 			n++
 		}
 		sr.MustClose()
+
+		// Make sure the number of blocks with fetchData=false is the same.
+		m := 0
+		sr.Init(s, []*TagFilters{tfs}, tr, false, 1e5)
+		for sr.NextMetricBlock() {
+			m++
+		}
+		sr.MustClose()
+		if n != m {
+			return -1
+		}
 		return n
 	}
 	for i := 0; i < metricsCount; i++ {
--- a/lib/storage/table_search.go
+++ b/lib/storage/table_search.go
@@ -55,7 +55,7 @@ func (ts *tableSearch) reset() {
 // Init initializes the ts.
 //
 // MustClose must be called then the tableSearch is done.
-func (ts *tableSearch) Init(tb *table, tsids []TSID, tr TimeRange) {
+func (ts *tableSearch) Init(tb *table, tsids []TSID, tr TimeRange, fetchData bool) {
 	if ts.needClosing {
 		logger.Panicf("BUG: missing MustClose call before the next call to Init")
 	}
@@ -86,7 +86,7 @@ func (ts *tableSearch) Init(tb *table, tsids []TSID, tr TimeRange) {
 	}
 	ts.ptsPool = ts.ptsPool[:len(ts.ptws)]
 	for i, ptw := range ts.ptws {
-		ts.ptsPool[i].Init(ptw.pt, tsids, tr)
+		ts.ptsPool[i].Init(ptw.pt, tsids, tr, fetchData)
 	}

 	// Initialize the ptsHeap.
--- a/lib/storage/table_search_test.go
+++ b/lib/storage/table_search_test.go
@@ -251,7 +251,7 @@ func testTableSearchSerial(tb *table, tsids []TSID, tr TimeRange, rbsExpected []

 	bs := []Block{}
 	var ts tableSearch
-	ts.Init(tb, tsids, tr)
+	ts.Init(tb, tsids, tr, true)
 	for ts.NextBlock() {
 		var b Block
 		b.CopyFrom(ts.Block)
@@ -276,7 +276,7 @@ func testTableSearchSerial(tb *table, tsids []TSID, tr TimeRange, rbsExpected []
 	}

 	// verify that empty tsids returns empty result
-	ts.Init(tb, []TSID{}, tr)
+	ts.Init(tb, []TSID{}, tr, true)
 	if ts.NextBlock() {
 		return fmt.Errorf("unexpected block got for an empty tsids list: %+v", ts.Block)
 	}
@@ -284,5 +284,15 @@ func testTableSearchSerial(tb *table, tsids []TSID, tr TimeRange, rbsExpected []
 		return fmt.Errorf("unexpected error on empty tsids list: %s", err)
 	}
 	ts.MustClose()
+
+	ts.Init(tb, []TSID{}, tr, false)
+	if ts.NextBlock() {
+		return fmt.Errorf("unexpected block got for an empty tsids list with fetchData=false: %+v", ts.Block)
+	}
+	if err := ts.Error(); err != nil {
+		return fmt.Errorf("unexpected error on empty tsids list with fetchData=false: %s", err)
+	}
+	ts.MustClose()
+
 	return nil
 }
--- a/lib/storage/table_search_timing_test.go
+++ b/lib/storage/table_search_timing_test.go
@@ -26,7 +26,11 @@ func BenchmarkTableSearch(b *testing.B) {
 				b.Run(fmt.Sprintf("tsidsCount_%d", tsidsCount), func(b *testing.B) {
 					for _, tsidsSearch := range []int{1, 1e1, 1e2, 1e3, 1e4} {
 						b.Run(fmt.Sprintf("tsidsSearch_%d", tsidsSearch), func(b *testing.B) {
-							benchmarkTableSearch(b, rowsCount, tsidsCount, tsidsSearch)
+							for _, fetchData := range []bool{true, false} {
+								b.Run(fmt.Sprintf("fetchData_%v", fetchData), func(b *testing.B) {
+									benchmarkTableSearch(b, rowsCount, tsidsCount, tsidsSearch, fetchData)
+								})
+							}
 						})
 					}
 				})
@@ -103,9 +107,9 @@ func createBenchTable(b *testing.B, path string, startTimestamp int64, rowsPerIn
 	tb.MustClose()
 }

-func benchmarkTableSearch(b *testing.B, rowsCount, tsidsCount, tsidsSearch int) {
+func benchmarkTableSearch(b *testing.B, rowsCount, tsidsCount, tsidsSearch int, fetchData bool) {
 	startTimestamp := timestampFromTime(time.Now()) - 365*24*3600*1000
-	rowsPerInsert := int(maxRawRowsPerPartition)
+	rowsPerInsert := getMaxRawRowsPerPartition()

 	tb := openBenchTable(b, startTimestamp, rowsPerInsert, rowsCount, tsidsCount)
 	tr := TimeRange{
@@ -127,7 +131,7 @@ func benchmarkTableSearch(b *testing.B, rowsCount, tsidsCount, tsidsSearch int)
 			for i := range tsids {
 				tsids[i].MetricID = 1 + uint64(i)
 			}
-			ts.Init(tb, tsids, tr)
+			ts.Init(tb, tsids, tr, fetchData)
 			for ts.NextBlock() {
 			}
 			ts.MustClose()
--- a/lib/storage/tag_filters_test.go
+++ b/lib/storage/tag_filters_test.go
@@ -308,6 +308,62 @@ func TestTagFilterMatchSuffix(t *testing.T) {
 		mismatch("bar")
 		match("xhttpbar")
 	})
+	t.Run("non-empty-string-regexp-negative-match", func(t *testing.T) {
+		value := ".+"
+		isNegative := true
+		isRegexp := true
+		expectedPrefix := tvNoTrailingTagSeparator("")
+		init(value, isNegative, isRegexp, expectedPrefix)
+		if len(tf.orSuffixes) != 0 {
+			t.Fatalf("unexpected non-zero number of or suffixes: %d; %q", len(tf.orSuffixes), tf.orSuffixes)
+		}
+
+		match("")
+		mismatch("x")
+		mismatch("foo")
+	})
+	t.Run("non-empty-string-regexp-match", func(t *testing.T) {
+		value := ".+"
+		isNegative := false
+		isRegexp := true
+		expectedPrefix := tvNoTrailingTagSeparator("")
+		init(value, isNegative, isRegexp, expectedPrefix)
+		if len(tf.orSuffixes) != 0 {
+			t.Fatalf("unexpected non-zero number of or suffixes: %d; %q", len(tf.orSuffixes), tf.orSuffixes)
+		}
+
+		mismatch("")
+		match("x")
+		match("foo")
+	})
+	t.Run("match-all-regexp-negative-match", func(t *testing.T) {
+		value := ".*"
+		isNegative := true
+		isRegexp := true
+		expectedPrefix := tvNoTrailingTagSeparator("")
+		init(value, isNegative, isRegexp, expectedPrefix)
+		if len(tf.orSuffixes) != 0 {
+			t.Fatalf("unexpected non-zero number of or suffixes: %d; %q", len(tf.orSuffixes), tf.orSuffixes)
+		}
+
+		mismatch("")
+		mismatch("x")
+		mismatch("foo")
+	})
+	t.Run("match-all-regexp-match", func(t *testing.T) {
+		value := ".*"
+		isNegative := false
+		isRegexp := true
+		expectedPrefix := tvNoTrailingTagSeparator("")
+		init(value, isNegative, isRegexp, expectedPrefix)
+		if len(tf.orSuffixes) != 0 {
+			t.Fatalf("unexpected non-zero number of or suffixes: %d; %q", len(tf.orSuffixes), tf.orSuffixes)
+		}
+
+		match("")
+		match("x")
+		match("foo")
+	})
 }

 func TestGetOrValues(t *testing.T) {
--- a/package/VAR_BUILD
+++ b/package/VAR_BUILD
@@ -0,0 +1 @@
+1
--- a/package/VAR_VERSION
+++ b/package/VAR_VERSION
@@ -0,0 +1 @@
+1.7.0
--- a/package/deb/conffile
+++ b/package/deb/conffile
--- a/package/deb/control
+++ b/package/deb/control
@@ -0,0 +1,7 @@
+Package: victoria-metrics
+Maintainer: Aliaksandr Valialkin
+Depends: libc6 (>= 2.7-1)
+Section: net
+Priority: optional
+Homepage: https://github.com/VictoriaMetrics/VictoriaMetrics
+Description: VictoriaMetrics is fast, cost-effective and scalable time series database. It can be used as a long-term remote storage for Prometheus.
--- a/package/deb/postinst
+++ b/package/deb/postinst
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -e
+
+# Reload systemd unit
+systemctl daemon-reload
--- a/package/deb/postrm
+++ b/package/deb/postrm
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -e
+
+# Reload systemd unit
+systemctl daemon-reload
--- a/package/deb/prerm
+++ b/package/deb/prerm
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -e
+
+# Reload systemd unit
+systemctl stop victoria-metrics
--- a/package/package_deb.sh
+++ b/package/package_deb.sh
@@ -0,0 +1,103 @@
+#!/bin/bash
+
+ARCH="amd64"
+if [[ $# -ge 1 ]]
+then
+    ARCH="$1"
+fi
+
+# Map to Debian architecture
+if [[ "$ARCH" == "amd64" ]]; then
+    DEB_ARCH=amd64
+	EXENAME_SRC="victoria-metrics-prod"
+elif [[ "$ARCH" == "arm64" ]]; then
+    DEB_ARCH=arm64
+    EXENAME_SRC="victoria-metrics-arm64-prod"
+else
+    echo "*** Unknown arch $ARCH"
+    exit 1
+fi
+
+PACKDIR="./package"
+TEMPDIR="${PACKDIR}/temp-deb-${DEB_ARCH}"
+EXENAME_DST="victoria-metrics-prod"
+
+# Pull in version info
+
+VERSION=`cat ${PACKDIR}/VAR_VERSION | perl -ne 'chomp and print'`
+BUILD=`cat ${PACKDIR}/VAR_BUILD | perl -ne 'chomp and print'`
+
+# Create directories
+
+[[ -d "${TEMPDIR}" ]] && rm -rf "${TEMPDIR}"
+
+mkdir -p "${TEMPDIR}" && echo "*** Created   : ${TEMPDIR}"
+
+mkdir -p "${TEMPDIR}/usr/bin/"
+mkdir -p "${TEMPDIR}/lib/systemd/system/"
+
+echo "*** Version   : ${VERSION}-${BUILD}"
+echo "*** Arch      : ${DEB_ARCH}"
+
+OUT_DEB="victoria-metrics_${VERSION}-${BUILD}_$DEB_ARCH.deb"
+
+echo "*** Out .deb  : ${OUT_DEB}"
+
+# Copy the binary
+
+cp "./bin/${EXENAME_SRC}" "${TEMPDIR}/usr/bin/${EXENAME_DST}"
+
+# Copy supporting files
+
+cp "${PACKDIR}/victoria-metrics.service" "${TEMPDIR}/lib/systemd/system/"
+
+# Generate debian-binary
+
+echo "2.0" > "${TEMPDIR}/debian-binary"
+
+# Generate control
+
+echo "Version: $VERSION-$BUILD" > "${TEMPDIR}/control"
+echo "Installed-Size:" `du -sb "${TEMPDIR}" | awk '{print int($1/1024)}'` >> "${TEMPDIR}/control"
+echo "Architecture: $DEB_ARCH" >> "${TEMPDIR}/control"
+cat "${PACKDIR}/deb/control" >> "${TEMPDIR}/control"
+
+# Copy conffile
+
+cp "${PACKDIR}/deb/conffile" "${TEMPDIR}/conffile"
+
+# Copy postinst and postrm
+
+cp "${PACKDIR}/deb/postinst" "${TEMPDIR}/postinst"
+cp "${PACKDIR}/deb/prerm" "${TEMPDIR}/prerm"
+cp "${PACKDIR}/deb/postrm" "${TEMPDIR}/postrm"
+
+(
+    # Generate md5 sums
+
+    cd "${TEMPDIR}"
+
+    find ./usr ./lib -type f | while read i ; do
+        md5sum "$i" | sed 's/\.\///g' >> md5sums
+    done
+
+    # Archive control
+
+    chmod 644 control md5sums
+    chmod 755 postinst postrm prerm
+    fakeroot -- tar -c --xz -f ./control.tar.xz ./control ./md5sums ./postinst ./prerm ./postrm
+
+    # Archive data
+
+    fakeroot -- tar -c --xz -f ./data.tar.xz ./usr ./lib
+
+    # Make final archive
+
+    fakeroot -- ar -cr "${OUT_DEB}" debian-binary control.tar.xz data.tar.xz
+)
+
+ls -lh "${TEMPDIR}/${OUT_DEB}"
+
+cp "${TEMPDIR}/${OUT_DEB}" "${PACKDIR}"
+
+echo "*** Created   : ${PACKDIR}/${OUT_DEB}"
--- a/package/package_rpm.sh
+++ b/package/package_rpm.sh
@@ -0,0 +1,90 @@
+#!/bin/bash
+
+if ! which rpmbuild 2> /dev/null
+then
+	echo "*** Fatal: please install rpmbuild"
+    exit 1
+fi
+
+ARCH="amd64"
+if [[ $# -ge 1 ]]
+then
+    ARCH="$1"
+fi
+
+# Map to Debian architecture
+if [[ "$ARCH" == "amd64" ]]; then
+    RPM_ARCH=x86_64
+	EXENAME_SRC="victoria-metrics-prod"
+elif [[ "$ARCH" == "arm64" ]]; then
+    RPM_ARCH=aarch64
+    EXENAME_SRC="victoria-metrics-arm64-prod"
+else
+    echo "*** Unknown arch $ARCH"
+    exit 1
+fi
+
+PACKDIR="./package"
+TEMPDIR="${PACKDIR}/temp-rpm-${RPM_ARCH}"
+EXENAME_DST="victoria-metrics-prod"
+
+# Pull in version info
+
+VERSION=`cat ${PACKDIR}/VAR_VERSION | perl -ne 'chomp and print'`
+BUILD=`cat ${PACKDIR}/VAR_BUILD | perl -ne 'chomp and print'`
+
+# Create directories
+
+[[ -d "${TEMPDIR}" ]] && rm -rf "${TEMPDIR}"
+
+mkdir -p "${TEMPDIR}" && echo "*** Created   : ${TEMPDIR}"
+
+echo "*** Version   : ${VERSION}-${BUILD}"
+echo "*** Arch      : ${RPM_ARCH}"
+
+OUT_RPM="victoria-metrics-${VERSION}-${BUILD}.${RPM_ARCH}.rpm"
+
+echo "*** Out .rpm  : ${OUT_RPM}"
+
+cat > "${TEMPDIR}/victoria-metrics.spec" <<EOF
+Summary: The best long-term remote storage for Prometheus
+Name: victoria-metrics
+Version: ${VERSION}
+Release: ${BUILD}
+License: Apache License 2.0
+URL: https://victoriametrics.com/
+Group: System
+Packager: Aliaksandr Valialkin
+Requires: libpthread
+Requires: libc
+
+%description
+VictoriaMetrics is fast, cost-effective and scalable time series database. It can be used as a long-term remote storage for Prometheus.
+
+%files
+%attr(0744, root, root) /usr/bin/*
+%attr(0644, root, root) /lib/systemd/system/*
+
+%prep
+mkdir -p \$RPM_BUILD_ROOT/usr/bin/
+mkdir -p \$RPM_BUILD_ROOT/lib/systemd/system/
+
+cp ${PWD}/bin/${EXENAME_SRC} \$RPM_BUILD_ROOT/usr/bin/${EXENAME_DST}
+cp ${PWD}/package/victoria-metrics.service \$RPM_BUILD_ROOT/lib/systemd/system/
+
+%post
+/usr/bin/systemctl daemon-reload
+
+%preun
+/usr/bin/systemctl stop victoria-metrics
+
+%postun
+/usr/bin/systemctl daemon-reload
+EOF
+
+rpmbuild -bb --target "${RPM_ARCH}" \
+	"${TEMPDIR}/victoria-metrics.spec"
+
+cp "${HOME}/rpmbuild/RPMS/${RPM_ARCH}/${OUT_RPM}" "${PACKDIR}"
+
+echo "*** Created   : ${PACKDIR}/${OUT_RPM}"
--- a/package/rpm/README.md
+++ b/package/rpm/README.md
@@ -0,0 +1,16 @@
+# victoriametrics-rpm
+RPM for VictoriaMetrics - the best long-term remote storage for Prometheus
+
+*Get and started*
+
+```
+yum -y install yum-plugin-copr
+
+yum copr enable antonpatsev/VictoriaMetrics
+
+yum makecache
+
+yum -y install victoriametrics
+
+systemctl start victoriametrics
+```
--- a/package/rpm/victoriametrics-rpm.spec
+++ b/package/rpm/victoriametrics-rpm.spec
@@ -0,0 +1,53 @@
+Name:    victoriametrics
+Version: 1.23.0
+Release: 3
+Summary: The best long-term remote storage for Prometheus
+
+Group:   Development Tools
+License: ASL 2.0
+URL: https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v%{version}/victoria-metrics-v%{version}.tar.gz
+Source0: %{name}.service
+
+# Use systemd for fedora >= 18, rhel >=7, SUSE >= 12 SP1 and openSUSE >= 42.1
+%define use_systemd (0%{?fedora} && 0%{?fedora} >= 18) || (0%{?rhel} && 0%{?rhel} >= 7) || (!0%{?is_opensuse} && 0%{?suse_version} >=1210) || (0%{?is_opensuse} && 0%{?sle_version} >= 120100)
+
+%description
+VictoriaMetrics - the best long-term remote storage for Prometheus
+
+%prep
+curl -L %{url} > victoria-metrics.tar.gz
+tar -zxf victoria-metrics.tar.gz
+
+%install
+%{__install} -m 0755 -d %{buildroot}%{_bindir}
+cp victoria-metrics-prod %{buildroot}%{_bindir}/victoria-metrics-prod
+%{__install} -m 0755 -d %{buildroot}/var/lib/victoria-metrics-data
+
+%if %{use_systemd}
+%{__mkdir} -p %{buildroot}%{_unitdir}
+%{__install} -m644 %{SOURCE0} \
+    %{buildroot}%{_unitdir}/%{name}.service
+%endif
+
+%post
+%if %use_systemd
+/usr/bin/systemctl daemon-reload
+%endif
+
+%preun
+%if %use_systemd
+/usr/bin/systemctl stop %{name}
+%endif
+
+%postun
+%if %use_systemd
+/usr/bin/systemctl daemon-reload
+%endif
+
+%files
+%{_bindir}/victoria-metrics-prod
+/var/lib/victoria-metrics-data
+%if %{use_systemd}
+%{_unitdir}/%{name}.service
+%endif
+
--- a/package/rpm/victoriametrics.service
+++ b/package/rpm/victoriametrics.service
@@ -0,0 +1,17 @@
+[Unit]
+Description=High-performance, cost-effective and scalable time series database, long-term remote storage for Prometheus
+After=network.target
+
+[Service]
+Type=simple
+StartLimitBurst=5
+StartLimitInterval=0
+Restart=on-failure
+RestartSec=1
+ExecStart=/usr/bin/victoria-metrics-prod -storageDataPath=/var/lib/victoria-metrics-data
+ExecStop=/bin/kill -s SIGTERM $MAINPID
+LimitNOFILE=65536
+LimitNPROC=32000
+
+[Install]
+WantedBy=multi-user.target
--- a/package/victoria-metrics.service
+++ b/package/victoria-metrics.service
@@ -0,0 +1,17 @@
+[Unit]
+Description=High-performance, cost-effective and scalable time series database, long-term remote storage for Prometheus
+After=network.target
+
+[Service]
+Type=simple
+StartLimitBurst=5
+StartLimitInterval=0
+Restart=on-failure
+RestartSec=1
+ExecStart=/usr/bin/victoria-metrics-prod -storageDataPath=/var/lib/victoria-metrics-data
+ExecStop=/bin/kill -s SIGTERM $MAINPID
+LimitNOFILE=65536
+LimitNPROC=32000
+
+[Install]
+WantedBy=multi-user.target
--- a/vendor/github.com/VictoriaMetrics/metrics/process_metrics_linux.go
+++ b/vendor/github.com/VictoriaMetrics/metrics/process_metrics_linux.go
@@ -67,7 +67,11 @@ func writeProcessMetrics(w io.Writer) {
 	// It is expensive obtaining `process_open_fds` when big number of file descriptors is opened,
 	// don't do it here.

-	fmt.Fprintf(w, "process_cpu_seconds_total %g\n", float64(p.Utime+p.Stime)/userHZ)
+	utime := float64(p.Utime) / userHZ
+	stime := float64(p.Stime) / userHZ
+	fmt.Fprintf(w, "process_cpu_seconds_system_total %g\n", stime)
+	fmt.Fprintf(w, "process_cpu_seconds_total %g\n", utime+stime)
+	fmt.Fprintf(w, "process_cpu_seconds_user_total %g\n", utime)
 	fmt.Fprintf(w, "process_major_pagefaults_total %d\n", p.Majflt)
 	fmt.Fprintf(w, "process_minor_pagefaults_total %d\n", p.Minflt)
 	fmt.Fprintf(w, "process_num_threads %d\n", p.NumThreads)
--- a/vendor/github.com/klauspost/compress/LICENSE
+++ b/vendor/github.com/klauspost/compress/LICENSE
@@ -0,0 +1,28 @@
+Copyright (c) 2012 The Go Authors. All rights reserved.
+Copyright (c) 2019 Klaus Post. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+   * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+   * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+   * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/vendor/github.com/klauspost/compress/fse/README.md
+++ b/vendor/github.com/klauspost/compress/fse/README.md
@@ -0,0 +1,79 @@
+# Finite State Entropy
+
+This package provides Finite State Entropy encoding and decoding.
+            
+Finite State Entropy (also referenced as [tANS](https://en.wikipedia.org/wiki/Asymmetric_numeral_systems#tANS)) 
+encoding provides a fast near-optimal symbol encoding/decoding
+for byte blocks as implemented in [zstandard](https://github.com/facebook/zstd).
+
+This can be used for compressing input with a lot of similar input values to the smallest number of bytes.
+This does not perform any multi-byte [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) as LZ coders,
+but it can be used as a secondary step to compressors (like Snappy) that does not do entropy encoding. 
+
+* [Godoc documentation](https://godoc.org/github.com/klauspost/compress/fse)
+
+## News
+
+ * Feb 2018: First implementation released. Consider this beta software for now.
+
+# Usage
+
+This package provides a low level interface that allows to compress single independent blocks. 
+
+Each block is separate, and there is no built in integrity checks. 
+This means that the caller should keep track of block sizes and also do checksums if needed.  
+
+Compressing a block is done via the [`Compress`](https://godoc.org/github.com/klauspost/compress/fse#Compress) function.
+You must provide input and will receive the output and maybe an error.
+
+These error values can be returned:
+
+| Error               | Description                                                                 |
+|---------------------|-----------------------------------------------------------------------------|
+| `<nil>`             | Everything ok, output is returned                                           |
+| `ErrIncompressible` | Returned when input is judged to be too hard to compress                    |
+| `ErrUseRLE`         | Returned from the compressor when the input is a single byte value repeated |
+| `(error)`           | An internal error occurred.                                                 |
+
+As can be seen above there are errors that will be returned even under normal operation so it is important to handle these.
+
+To reduce allocations you can provide a [`Scratch`](https://godoc.org/github.com/klauspost/compress/fse#Scratch) object 
+that can be re-used for successive calls. Both compression and decompression accepts a `Scratch` object, and the same 
+object can be used for both.   
+
+Be aware, that when re-using a `Scratch` object that the *output* buffer is also re-used, so if you are still using this
+you must set the `Out` field in the scratch to nil. The same buffer is used for compression and decompression output.
+
+Decompressing is done by calling the [`Decompress`](https://godoc.org/github.com/klauspost/compress/fse#Decompress) function.
+You must provide the output from the compression stage, at exactly the size you got back. If you receive an error back
+your input was likely corrupted. 
+
+It is important to note that a successful decoding does *not* mean your output matches your original input. 
+There are no integrity checks, so relying on errors from the decompressor does not assure your data is valid.
+
+For more detailed usage, see examples in the [godoc documentation](https://godoc.org/github.com/klauspost/compress/fse#pkg-examples).
+
+# Performance
+
+A lot of factors are affecting speed. Block sizes and compressibility of the material are primary factors.  
+All compression functions are currently only running on the calling goroutine so only one core will be used per block.  
+
+The compressor is significantly faster if symbols are kept as small as possible. The highest byte value of the input
+is used to reduce some of the processing, so if all your input is above byte value 64 for instance, it may be 
+beneficial to transpose all your input values down by 64.   
+
+With moderate block sizes around 64k speed are typically 200MB/s per core for compression and 
+around 300MB/s decompression speed. 
+
+The same hardware typically does Huffman (deflate) encoding at 125MB/s and decompression at 100MB/s. 
+
+# Plans
+
+At one point, more internals will be exposed to facilitate more "expert" usage of the components. 
+
+A streaming interface is also likely to be implemented. Likely compatible with [FSE stream format](https://github.com/Cyan4973/FiniteStateEntropy/blob/dev/programs/fileio.c#L261).  
+
+# Contributing
+
+Contributions are always welcome. Be aware that adding public functions will require good justification and breaking 
+changes will likely not be accepted. If in doubt open an issue before writing the PR.  
--- a/vendor/github.com/klauspost/compress/fse/bitreader.go
+++ b/vendor/github.com/klauspost/compress/fse/bitreader.go
@@ -0,0 +1,107 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package fse
+
+import (
+	"errors"
+	"io"
+)
+
+// bitReader reads a bitstream in reverse.
+// The last set bit indicates the start of the stream and is used
+// for aligning the input.
+type bitReader struct {
+	in       []byte
+	off      uint // next byte to read is at in[off - 1]
+	value    uint64
+	bitsRead uint8
+}
+
+// init initializes and resets the bit reader.
+func (b *bitReader) init(in []byte) error {
+	if len(in) < 1 {
+		return errors.New("corrupt stream: too short")
+	}
+	b.in = in
+	b.off = uint(len(in))
+	// The highest bit of the last byte indicates where to start
+	v := in[len(in)-1]
+	if v == 0 {
+		return errors.New("corrupt stream, did not find end of stream")
+	}
+	b.bitsRead = 64
+	b.value = 0
+	b.fill()
+	b.fill()
+	b.bitsRead += 8 - uint8(highBits(uint32(v)))
+	return nil
+}
+
+// getBits will return n bits. n can be 0.
+func (b *bitReader) getBits(n uint8) uint16 {
+	if n == 0 || b.bitsRead >= 64 {
+		return 0
+	}
+	return b.getBitsFast(n)
+}
+
+// getBitsFast requires that at least one bit is requested every time.
+// There are no checks if the buffer is filled.
+func (b *bitReader) getBitsFast(n uint8) uint16 {
+	const regMask = 64 - 1
+	v := uint16((b.value << (b.bitsRead & regMask)) >> ((regMask + 1 - n) & regMask))
+	b.bitsRead += n
+	return v
+}
+
+// fillFast() will make sure at least 32 bits are available.
+// There must be at least 4 bytes available.
+func (b *bitReader) fillFast() {
+	if b.bitsRead < 32 {
+		return
+	}
+	// Do single re-slice to avoid bounds checks.
+	v := b.in[b.off-4 : b.off]
+	low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
+	b.value = (b.value << 32) | uint64(low)
+	b.bitsRead -= 32
+	b.off -= 4
+}
+
+// fill() will make sure at least 32 bits are available.
+func (b *bitReader) fill() {
+	if b.bitsRead < 32 {
+		return
+	}
+	if b.off > 4 {
+		v := b.in[b.off-4 : b.off]
+		low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
+		b.value = (b.value << 32) | uint64(low)
+		b.bitsRead -= 32
+		b.off -= 4
+		return
+	}
+	for b.off > 0 {
+		b.value = (b.value << 8) | uint64(b.in[b.off-1])
+		b.bitsRead -= 8
+		b.off--
+	}
+}
+
+// finished returns true if all bits have been read from the bit stream.
+func (b *bitReader) finished() bool {
+	return b.off == 0 && b.bitsRead >= 64
+}
+
+// close the bitstream and returns an error if out-of-buffer reads occurred.
+func (b *bitReader) close() error {
+	// Release reference.
+	b.in = nil
+	if b.bitsRead > 64 {
+		return io.ErrUnexpectedEOF
+	}
+	return nil
+}
--- a/vendor/github.com/klauspost/compress/fse/bitwriter.go
+++ b/vendor/github.com/klauspost/compress/fse/bitwriter.go
@@ -0,0 +1,168 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package fse
+
+import "fmt"
+
+// bitWriter will write bits.
+// First bit will be LSB of the first byte of output.
+type bitWriter struct {
+	bitContainer uint64
+	nBits        uint8
+	out          []byte
+}
+
+// bitMask16 is bitmasks. Has extra to avoid bounds check.
+var bitMask16 = [32]uint16{
+	0, 1, 3, 7, 0xF, 0x1F,
+	0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF,
+	0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF, 0xFFFF,
+	0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
+	0xFFFF, 0xFFFF} /* up to 16 bits */
+
+// addBits16NC will add up to 16 bits.
+// It will not check if there is space for them,
+// so the caller must ensure that it has flushed recently.
+func (b *bitWriter) addBits16NC(value uint16, bits uint8) {
+	b.bitContainer |= uint64(value&bitMask16[bits&31]) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// addBits16Clean will add up to 16 bits. value may not contain more set bits than indicated.
+// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
+func (b *bitWriter) addBits16Clean(value uint16, bits uint8) {
+	b.bitContainer |= uint64(value) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// addBits16ZeroNC will add up to 16 bits.
+// It will not check if there is space for them,
+// so the caller must ensure that it has flushed recently.
+// This is fastest if bits can be zero.
+func (b *bitWriter) addBits16ZeroNC(value uint16, bits uint8) {
+	if bits == 0 {
+		return
+	}
+	value <<= (16 - bits) & 15
+	value >>= (16 - bits) & 15
+	b.bitContainer |= uint64(value) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// flush will flush all pending full bytes.
+// There will be at least 56 bits available for writing when this has been called.
+// Using flush32 is faster, but leaves less space for writing.
+func (b *bitWriter) flush() {
+	v := b.nBits >> 3
+	switch v {
+	case 0:
+	case 1:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+		)
+	case 2:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+		)
+	case 3:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+		)
+	case 4:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+		)
+	case 5:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+		)
+	case 6:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+		)
+	case 7:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+			byte(b.bitContainer>>48),
+		)
+	case 8:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+			byte(b.bitContainer>>48),
+			byte(b.bitContainer>>56),
+		)
+	default:
+		panic(fmt.Errorf("bits (%d) > 64", b.nBits))
+	}
+	b.bitContainer >>= v << 3
+	b.nBits &= 7
+}
+
+// flush32 will flush out, so there are at least 32 bits available for writing.
+func (b *bitWriter) flush32() {
+	if b.nBits < 32 {
+		return
+	}
+	b.out = append(b.out,
+		byte(b.bitContainer),
+		byte(b.bitContainer>>8),
+		byte(b.bitContainer>>16),
+		byte(b.bitContainer>>24))
+	b.nBits -= 32
+	b.bitContainer >>= 32
+}
+
+// flushAlign will flush remaining full bytes and align to next byte boundary.
+func (b *bitWriter) flushAlign() {
+	nbBytes := (b.nBits + 7) >> 3
+	for i := uint8(0); i < nbBytes; i++ {
+		b.out = append(b.out, byte(b.bitContainer>>(i*8)))
+	}
+	b.nBits = 0
+	b.bitContainer = 0
+}
+
+// close will write the alignment bit and write the final byte(s)
+// to the output.
+func (b *bitWriter) close() error {
+	// End mark
+	b.addBits16Clean(1, 1)
+	// flush until next byte.
+	b.flushAlign()
+	return nil
+}
+
+// reset and continue writing by appending to out.
+func (b *bitWriter) reset(out []byte) {
+	b.bitContainer = 0
+	b.nBits = 0
+	b.out = out
+}
--- a/vendor/github.com/klauspost/compress/fse/bytereader.go
+++ b/vendor/github.com/klauspost/compress/fse/bytereader.go
@@ -0,0 +1,56 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package fse
+
+// byteReader provides a byte reader that reads
+// little endian values from a byte stream.
+// The input stream is manually advanced.
+// The reader performs no bounds checks.
+type byteReader struct {
+	b   []byte
+	off int
+}
+
+// init will initialize the reader and set the input.
+func (b *byteReader) init(in []byte) {
+	b.b = in
+	b.off = 0
+}
+
+// advance the stream b n bytes.
+func (b *byteReader) advance(n uint) {
+	b.off += int(n)
+}
+
+// Int32 returns a little endian int32 starting at current offset.
+func (b byteReader) Int32() int32 {
+	b2 := b.b[b.off : b.off+4 : b.off+4]
+	v3 := int32(b2[3])
+	v2 := int32(b2[2])
+	v1 := int32(b2[1])
+	v0 := int32(b2[0])
+	return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)
+}
+
+// Uint32 returns a little endian uint32 starting at current offset.
+func (b byteReader) Uint32() uint32 {
+	b2 := b.b[b.off : b.off+4 : b.off+4]
+	v3 := uint32(b2[3])
+	v2 := uint32(b2[2])
+	v1 := uint32(b2[1])
+	v0 := uint32(b2[0])
+	return v0 | (v1 << 8) | (v2 << 16) | (v3 << 24)
+}
+
+// unread returns the unread portion of the input.
+func (b byteReader) unread() []byte {
+	return b.b[b.off:]
+}
+
+// remain will return the number of bytes remaining.
+func (b byteReader) remain() int {
+	return len(b.b) - b.off
+}
--- a/vendor/github.com/klauspost/compress/fse/compress.go
+++ b/vendor/github.com/klauspost/compress/fse/compress.go
@@ -0,0 +1,684 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package fse
+
+import (
+	"errors"
+	"fmt"
+)
+
+// Compress the input bytes. Input must be < 2GB.
+// Provide a Scratch buffer to avoid memory allocations.
+// Note that the output is also kept in the scratch buffer.
+// If input is too hard to compress, ErrIncompressible is returned.
+// If input is a single byte value repeated ErrUseRLE is returned.
+func Compress(in []byte, s *Scratch) ([]byte, error) {
+	if len(in) <= 1 {
+		return nil, ErrIncompressible
+	}
+	if len(in) > (2<<30)-1 {
+		return nil, errors.New("input too big, must be < 2GB")
+	}
+	s, err := s.prepare(in)
+	if err != nil {
+		return nil, err
+	}
+
+	// Create histogram, if none was provided.
+	maxCount := s.maxCount
+	if maxCount == 0 {
+		maxCount = s.countSimple(in)
+	}
+	// Reset for next run.
+	s.clearCount = true
+	s.maxCount = 0
+	if maxCount == len(in) {
+		// One symbol, use RLE
+		return nil, ErrUseRLE
+	}
+	if maxCount == 1 || maxCount < (len(in)>>7) {
+		// Each symbol present maximum once or too well distributed.
+		return nil, ErrIncompressible
+	}
+	s.optimalTableLog()
+	err = s.normalizeCount()
+	if err != nil {
+		return nil, err
+	}
+	err = s.writeCount()
+	if err != nil {
+		return nil, err
+	}
+
+	if false {
+		err = s.validateNorm()
+		if err != nil {
+			return nil, err
+		}
+	}
+
+	err = s.buildCTable()
+	if err != nil {
+		return nil, err
+	}
+	err = s.compress(in)
+	if err != nil {
+		return nil, err
+	}
+	s.Out = s.bw.out
+	// Check if we compressed.
+	if len(s.Out) >= len(in) {
+		return nil, ErrIncompressible
+	}
+	return s.Out, nil
+}
+
+// cState contains the compression state of a stream.
+type cState struct {
+	bw         *bitWriter
+	stateTable []uint16
+	state      uint16
+}
+
+// init will initialize the compression state to the first symbol of the stream.
+func (c *cState) init(bw *bitWriter, ct *cTable, tableLog uint8, first symbolTransform) {
+	c.bw = bw
+	c.stateTable = ct.stateTable
+
+	nbBitsOut := (first.deltaNbBits + (1 << 15)) >> 16
+	im := int32((nbBitsOut << 16) - first.deltaNbBits)
+	lu := (im >> nbBitsOut) + first.deltaFindState
+	c.state = c.stateTable[lu]
+	return
+}
+
+// encode the output symbol provided and write it to the bitstream.
+func (c *cState) encode(symbolTT symbolTransform) {
+	nbBitsOut := (uint32(c.state) + symbolTT.deltaNbBits) >> 16
+	dstState := int32(c.state>>(nbBitsOut&15)) + symbolTT.deltaFindState
+	c.bw.addBits16NC(c.state, uint8(nbBitsOut))
+	c.state = c.stateTable[dstState]
+}
+
+// encode the output symbol provided and write it to the bitstream.
+func (c *cState) encodeZero(symbolTT symbolTransform) {
+	nbBitsOut := (uint32(c.state) + symbolTT.deltaNbBits) >> 16
+	dstState := int32(c.state>>(nbBitsOut&15)) + symbolTT.deltaFindState
+	c.bw.addBits16ZeroNC(c.state, uint8(nbBitsOut))
+	c.state = c.stateTable[dstState]
+}
+
+// flush will write the tablelog to the output and flush the remaining full bytes.
+func (c *cState) flush(tableLog uint8) {
+	c.bw.flush32()
+	c.bw.addBits16NC(c.state, tableLog)
+	c.bw.flush()
+}
+
+// compress is the main compression loop that will encode the input from the last byte to the first.
+func (s *Scratch) compress(src []byte) error {
+	if len(src) <= 2 {
+		return errors.New("compress: src too small")
+	}
+	tt := s.ct.symbolTT[:256]
+	s.bw.reset(s.Out)
+
+	// Our two states each encodes every second byte.
+	// Last byte encoded (first byte decoded) will always be encoded by c1.
+	var c1, c2 cState
+
+	// Encode so remaining size is divisible by 4.
+	ip := len(src)
+	if ip&1 == 1 {
+		c1.init(&s.bw, &s.ct, s.actualTableLog, tt[src[ip-1]])
+		c2.init(&s.bw, &s.ct, s.actualTableLog, tt[src[ip-2]])
+		c1.encodeZero(tt[src[ip-3]])
+		ip -= 3
+	} else {
+		c2.init(&s.bw, &s.ct, s.actualTableLog, tt[src[ip-1]])
+		c1.init(&s.bw, &s.ct, s.actualTableLog, tt[src[ip-2]])
+		ip -= 2
+	}
+	if ip&2 != 0 {
+		c2.encodeZero(tt[src[ip-1]])
+		c1.encodeZero(tt[src[ip-2]])
+		ip -= 2
+	}
+
+	// Main compression loop.
+	switch {
+	case !s.zeroBits && s.actualTableLog <= 8:
+		// We can encode 4 symbols without requiring a flush.
+		// We do not need to check if any output is 0 bits.
+		for ip >= 4 {
+			s.bw.flush32()
+			v3, v2, v1, v0 := src[ip-4], src[ip-3], src[ip-2], src[ip-1]
+			c2.encode(tt[v0])
+			c1.encode(tt[v1])
+			c2.encode(tt[v2])
+			c1.encode(tt[v3])
+			ip -= 4
+		}
+	case !s.zeroBits:
+		// We do not need to check if any output is 0 bits.
+		for ip >= 4 {
+			s.bw.flush32()
+			v3, v2, v1, v0 := src[ip-4], src[ip-3], src[ip-2], src[ip-1]
+			c2.encode(tt[v0])
+			c1.encode(tt[v1])
+			s.bw.flush32()
+			c2.encode(tt[v2])
+			c1.encode(tt[v3])
+			ip -= 4
+		}
+	case s.actualTableLog <= 8:
+		// We can encode 4 symbols without requiring a flush
+		for ip >= 4 {
+			s.bw.flush32()
+			v3, v2, v1, v0 := src[ip-4], src[ip-3], src[ip-2], src[ip-1]
+			c2.encodeZero(tt[v0])
+			c1.encodeZero(tt[v1])
+			c2.encodeZero(tt[v2])
+			c1.encodeZero(tt[v3])
+			ip -= 4
+		}
+	default:
+		for ip >= 4 {
+			s.bw.flush32()
+			v3, v2, v1, v0 := src[ip-4], src[ip-3], src[ip-2], src[ip-1]
+			c2.encodeZero(tt[v0])
+			c1.encodeZero(tt[v1])
+			s.bw.flush32()
+			c2.encodeZero(tt[v2])
+			c1.encodeZero(tt[v3])
+			ip -= 4
+		}
+	}
+
+	// Flush final state.
+	// Used to initialize state when decoding.
+	c2.flush(s.actualTableLog)
+	c1.flush(s.actualTableLog)
+
+	return s.bw.close()
+}
+
+// writeCount will write the normalized histogram count to header.
+// This is read back by readNCount.
+func (s *Scratch) writeCount() error {
+	var (
+		tableLog  = s.actualTableLog
+		tableSize = 1 << tableLog
+		previous0 bool
+		charnum   uint16
+
+		maxHeaderSize = ((int(s.symbolLen) * int(tableLog)) >> 3) + 3
+
+		// Write Table Size
+		bitStream = uint32(tableLog - minTablelog)
+		bitCount  = uint(4)
+		remaining = int16(tableSize + 1) /* +1 for extra accuracy */
+		threshold = int16(tableSize)
+		nbBits    = uint(tableLog + 1)
+	)
+	if cap(s.Out) < maxHeaderSize {
+		s.Out = make([]byte, 0, s.br.remain()+maxHeaderSize)
+	}
+	outP := uint(0)
+	out := s.Out[:maxHeaderSize]
+
+	// stops at 1
+	for remaining > 1 {
+		if previous0 {
+			start := charnum
+			for s.norm[charnum] == 0 {
+				charnum++
+			}
+			for charnum >= start+24 {
+				start += 24
+				bitStream += uint32(0xFFFF) << bitCount
+				out[outP] = byte(bitStream)
+				out[outP+1] = byte(bitStream >> 8)
+				outP += 2
+				bitStream >>= 16
+			}
+			for charnum >= start+3 {
+				start += 3
+				bitStream += 3 << bitCount
+				bitCount += 2
+			}
+			bitStream += uint32(charnum-start) << bitCount
+			bitCount += 2
+			if bitCount > 16 {
+				out[outP] = byte(bitStream)
+				out[outP+1] = byte(bitStream >> 8)
+				outP += 2
+				bitStream >>= 16
+				bitCount -= 16
+			}
+		}
+
+		count := s.norm[charnum]
+		charnum++
+		max := (2*threshold - 1) - remaining
+		if count < 0 {
+			remaining += count
+		} else {
+			remaining -= count
+		}
+		count++ // +1 for extra accuracy
+		if count >= threshold {
+			count += max // [0..max[ [max..threshold[ (...) [threshold+max 2*threshold[
+		}
+		bitStream += uint32(count) << bitCount
+		bitCount += nbBits
+		if count < max {
+			bitCount--
+		}
+
+		previous0 = count == 1
+		if remaining < 1 {
+			return errors.New("internal error: remaining<1")
+		}
+		for remaining < threshold {
+			nbBits--
+			threshold >>= 1
+		}
+
+		if bitCount > 16 {
+			out[outP] = byte(bitStream)
+			out[outP+1] = byte(bitStream >> 8)
+			outP += 2
+			bitStream >>= 16
+			bitCount -= 16
+		}
+	}
+
+	out[outP] = byte(bitStream)
+	out[outP+1] = byte(bitStream >> 8)
+	outP += (bitCount + 7) / 8
+
+	if uint16(charnum) > s.symbolLen {
+		return errors.New("internal error: charnum > s.symbolLen")
+	}
+	s.Out = out[:outP]
+	return nil
+}
+
+// symbolTransform contains the state transform for a symbol.
+type symbolTransform struct {
+	deltaFindState int32
+	deltaNbBits    uint32
+}
+
+// String prints values as a human readable string.
+func (s symbolTransform) String() string {
+	return fmt.Sprintf("dnbits: %08x, fs:%d", s.deltaNbBits, s.deltaFindState)
+}
+
+// cTable contains tables used for compression.
+type cTable struct {
+	tableSymbol []byte
+	stateTable  []uint16
+	symbolTT    []symbolTransform
+}
+
+// allocCtable will allocate tables needed for compression.
+// If existing tables a re big enough, they are simply re-used.
+func (s *Scratch) allocCtable() {
+	tableSize := 1 << s.actualTableLog
+	// get tableSymbol that is big enough.
+	if cap(s.ct.tableSymbol) < int(tableSize) {
+		s.ct.tableSymbol = make([]byte, tableSize)
+	}
+	s.ct.tableSymbol = s.ct.tableSymbol[:tableSize]
+
+	ctSize := tableSize
+	if cap(s.ct.stateTable) < ctSize {
+		s.ct.stateTable = make([]uint16, ctSize)
+	}
+	s.ct.stateTable = s.ct.stateTable[:ctSize]
+
+	if cap(s.ct.symbolTT) < 256 {
+		s.ct.symbolTT = make([]symbolTransform, 256)
+	}
+	s.ct.symbolTT = s.ct.symbolTT[:256]
+}
+
+// buildCTable will populate the compression table so it is ready to be used.
+func (s *Scratch) buildCTable() error {
+	tableSize := uint32(1 << s.actualTableLog)
+	highThreshold := tableSize - 1
+	var cumul [maxSymbolValue + 2]int16
+
+	s.allocCtable()
+	tableSymbol := s.ct.tableSymbol[:tableSize]
+	// symbol start positions
+	{
+		cumul[0] = 0
+		for ui, v := range s.norm[:s.symbolLen-1] {
+			u := byte(ui) // one less than reference
+			if v == -1 {
+				// Low proba symbol
+				cumul[u+1] = cumul[u] + 1
+				tableSymbol[highThreshold] = u
+				highThreshold--
+			} else {
+				cumul[u+1] = cumul[u] + v
+			}
+		}
+		// Encode last symbol separately to avoid overflowing u
+		u := int(s.symbolLen - 1)
+		v := s.norm[s.symbolLen-1]
+		if v == -1 {
+			// Low proba symbol
+			cumul[u+1] = cumul[u] + 1
+			tableSymbol[highThreshold] = byte(u)
+			highThreshold--
+		} else {
+			cumul[u+1] = cumul[u] + v
+		}
+		if uint32(cumul[s.symbolLen]) != tableSize {
+			return fmt.Errorf("internal error: expected cumul[s.symbolLen] (%d) == tableSize (%d)", cumul[s.symbolLen], tableSize)
+		}
+		cumul[s.symbolLen] = int16(tableSize) + 1
+	}
+	// Spread symbols
+	s.zeroBits = false
+	{
+		step := tableStep(tableSize)
+		tableMask := tableSize - 1
+		var position uint32
+		// if any symbol > largeLimit, we may have 0 bits output.
+		largeLimit := int16(1 << (s.actualTableLog - 1))
+		for ui, v := range s.norm[:s.symbolLen] {
+			symbol := byte(ui)
+			if v > largeLimit {
+				s.zeroBits = true
+			}
+			for nbOccurrences := int16(0); nbOccurrences < v; nbOccurrences++ {
+				tableSymbol[position] = symbol
+				position = (position + step) & tableMask
+				for position > highThreshold {
+					position = (position + step) & tableMask
+				} /* Low proba area */
+			}
+		}
+
+		// Check if we have gone through all positions
+		if position != 0 {
+			return errors.New("position!=0")
+		}
+	}
+
+	// Build table
+	table := s.ct.stateTable
+	{
+		tsi := int(tableSize)
+		for u, v := range tableSymbol {
+			// TableU16 : sorted by symbol order; gives next state value
+			table[cumul[v]] = uint16(tsi + u)
+			cumul[v]++
+		}
+	}
+
+	// Build Symbol Transformation Table
+	{
+		total := int16(0)
+		symbolTT := s.ct.symbolTT[:s.symbolLen]
+		tableLog := s.actualTableLog
+		tl := (uint32(tableLog) << 16) - (1 << tableLog)
+		for i, v := range s.norm[:s.symbolLen] {
+			switch v {
+			case 0:
+			case -1, 1:
+				symbolTT[i].deltaNbBits = tl
+				symbolTT[i].deltaFindState = int32(total - 1)
+				total++
+			default:
+				maxBitsOut := uint32(tableLog) - highBits(uint32(v-1))
+				minStatePlus := uint32(v) << maxBitsOut
+				symbolTT[i].deltaNbBits = (maxBitsOut << 16) - minStatePlus
+				symbolTT[i].deltaFindState = int32(total - v)
+				total += v
+			}
+		}
+		if total != int16(tableSize) {
+			return fmt.Errorf("total mismatch %d (got) != %d (want)", total, tableSize)
+		}
+	}
+	return nil
+}
+
+// countSimple will create a simple histogram in s.count.
+// Returns the biggest count.
+// Does not update s.clearCount.
+func (s *Scratch) countSimple(in []byte) (max int) {
+	for _, v := range in {
+		s.count[v]++
+	}
+	m := uint32(0)
+	for i, v := range s.count[:] {
+		if v > m {
+			m = v
+		}
+		if v > 0 {
+			s.symbolLen = uint16(i) + 1
+		}
+	}
+	return int(m)
+}
+
+// minTableLog provides the minimum logSize to safely represent a distribution.
+func (s *Scratch) minTableLog() uint8 {
+	minBitsSrc := highBits(uint32(s.br.remain()-1)) + 1
+	minBitsSymbols := highBits(uint32(s.symbolLen-1)) + 2
+	if minBitsSrc < minBitsSymbols {
+		return uint8(minBitsSrc)
+	}
+	return uint8(minBitsSymbols)
+}
+
+// optimalTableLog calculates and sets the optimal tableLog in s.actualTableLog
+func (s *Scratch) optimalTableLog() {
+	tableLog := s.TableLog
+	minBits := s.minTableLog()
+	maxBitsSrc := uint8(highBits(uint32(s.br.remain()-1))) - 2
+	if maxBitsSrc < tableLog {
+		// Accuracy can be reduced
+		tableLog = maxBitsSrc
+	}
+	if minBits > tableLog {
+		tableLog = minBits
+	}
+	// Need a minimum to safely represent all symbol values
+	if tableLog < minTablelog {
+		tableLog = minTablelog
+	}
+	if tableLog > maxTableLog {
+		tableLog = maxTableLog
+	}
+	s.actualTableLog = tableLog
+}
+
+var rtbTable = [...]uint32{0, 473195, 504333, 520860, 550000, 700000, 750000, 830000}
+
+// normalizeCount will normalize the count of the symbols so
+// the total is equal to the table size.
+func (s *Scratch) normalizeCount() error {
+	var (
+		tableLog          = s.actualTableLog
+		scale             = 62 - uint64(tableLog)
+		step              = (1 << 62) / uint64(s.br.remain())
+		vStep             = uint64(1) << (scale - 20)
+		stillToDistribute = int16(1 << tableLog)
+		largest           int
+		largestP          int16
+		lowThreshold      = (uint32)(s.br.remain() >> tableLog)
+	)
+
+	for i, cnt := range s.count[:s.symbolLen] {
+		// already handled
+		// if (count[s] == s.length) return 0;   /* rle special case */
+
+		if cnt == 0 {
+			s.norm[i] = 0
+			continue
+		}
+		if cnt <= lowThreshold {
+			s.norm[i] = -1
+			stillToDistribute--
+		} else {
+			proba := (int16)((uint64(cnt) * step) >> scale)
+			if proba < 8 {
+				restToBeat := vStep * uint64(rtbTable[proba])
+				v := uint64(cnt)*step - (uint64(proba) << scale)
+				if v > restToBeat {
+					proba++
+				}
+			}
+			if proba > largestP {
+				largestP = proba
+				largest = i
+			}
+			s.norm[i] = proba
+			stillToDistribute -= proba
+		}
+	}
+
+	if -stillToDistribute >= (s.norm[largest] >> 1) {
+		// corner case, need another normalization method
+		return s.normalizeCount2()
+	}
+	s.norm[largest] += stillToDistribute
+	return nil
+}
+
+// Secondary normalization method.
+// To be used when primary method fails.
+func (s *Scratch) normalizeCount2() error {
+	const notYetAssigned = -2
+	var (
+		distributed  uint32
+		total        = uint32(s.br.remain())
+		tableLog     = s.actualTableLog
+		lowThreshold = uint32(total >> tableLog)
+		lowOne       = uint32((total * 3) >> (tableLog + 1))
+	)
+	for i, cnt := range s.count[:s.symbolLen] {
+		if cnt == 0 {
+			s.norm[i] = 0
+			continue
+		}
+		if cnt <= lowThreshold {
+			s.norm[i] = -1
+			distributed++
+			total -= cnt
+			continue
+		}
+		if cnt <= lowOne {
+			s.norm[i] = 1
+			distributed++
+			total -= cnt
+			continue
+		}
+		s.norm[i] = notYetAssigned
+	}
+	toDistribute := (1 << tableLog) - distributed
+
+	if (total / toDistribute) > lowOne {
+		// risk of rounding to zero
+		lowOne = uint32((total * 3) / (toDistribute * 2))
+		for i, cnt := range s.count[:s.symbolLen] {
+			if (s.norm[i] == notYetAssigned) && (cnt <= lowOne) {
+				s.norm[i] = 1
+				distributed++
+				total -= cnt
+				continue
+			}
+		}
+		toDistribute = (1 << tableLog) - distributed
+	}
+	if distributed == uint32(s.symbolLen)+1 {
+		// all values are pretty poor;
+		//   probably incompressible data (should have already been detected);
+		//   find max, then give all remaining points to max
+		var maxV int
+		var maxC uint32
+		for i, cnt := range s.count[:s.symbolLen] {
+			if cnt > maxC {
+				maxV = i
+				maxC = cnt
+			}
+		}
+		s.norm[maxV] += int16(toDistribute)
+		return nil
+	}
+
+	if total == 0 {
+		// all of the symbols were low enough for the lowOne or lowThreshold
+		for i := uint32(0); toDistribute > 0; i = (i + 1) % (uint32(s.symbolLen)) {
+			if s.norm[i] > 0 {
+				toDistribute--
+				s.norm[i]++
+			}
+		}
+		return nil
+	}
+
+	var (
+		vStepLog = 62 - uint64(tableLog)
+		mid      = uint64((1 << (vStepLog - 1)) - 1)
+		rStep    = (((1 << vStepLog) * uint64(toDistribute)) + mid) / uint64(total) // scale on remaining
+		tmpTotal = mid
+	)
+	for i, cnt := range s.count[:s.symbolLen] {
+		if s.norm[i] == notYetAssigned {
+			var (
+				end    = tmpTotal + uint64(cnt)*rStep
+				sStart = uint32(tmpTotal >> vStepLog)
+				sEnd   = uint32(end >> vStepLog)
+				weight = sEnd - sStart
+			)
+			if weight < 1 {
+				return errors.New("weight < 1")
+			}
+			s.norm[i] = int16(weight)
+			tmpTotal = end
+		}
+	}
+	return nil
+}
+
+// validateNorm validates the normalized histogram table.
+func (s *Scratch) validateNorm() (err error) {
+	var total int
+	for _, v := range s.norm[:s.symbolLen] {
+		if v >= 0 {
+			total += int(v)
+		} else {
+			total -= int(v)
+		}
+	}
+	defer func() {
+		if err == nil {
+			return
+		}
+		fmt.Printf("selected TableLog: %d, Symbol length: %d\n", s.actualTableLog, s.symbolLen)
+		for i, v := range s.norm[:s.symbolLen] {
+			fmt.Printf("%3d: %5d -> %4d \n", i, s.count[i], v)
+		}
+	}()
+	if total != (1 << s.actualTableLog) {
+		return fmt.Errorf("warning: Total == %d != %d", total, 1<<s.actualTableLog)
+	}
+	for i, v := range s.count[s.symbolLen:] {
+		if v != 0 {
+			return fmt.Errorf("warning: Found symbol out of range, %d after cut", i)
+		}
+	}
+	return nil
+}
--- a/vendor/github.com/klauspost/compress/fse/decompress.go
+++ b/vendor/github.com/klauspost/compress/fse/decompress.go
@@ -0,0 +1,370 @@
+package fse
+
+import (
+	"errors"
+	"fmt"
+)
+
+const (
+	tablelogAbsoluteMax = 15
+)
+
+// Decompress a block of data.
+// You can provide a scratch buffer to avoid allocations.
+// If nil is provided a temporary one will be allocated.
+// It is possible, but by no way guaranteed that corrupt data will
+// return an error.
+// It is up to the caller to verify integrity of the returned data.
+// Use a predefined Scrach to set maximum acceptable output size.
+func Decompress(b []byte, s *Scratch) ([]byte, error) {
+	s, err := s.prepare(b)
+	if err != nil {
+		return nil, err
+	}
+	s.Out = s.Out[:0]
+	err = s.readNCount()
+	if err != nil {
+		return nil, err
+	}
+	err = s.buildDtable()
+	if err != nil {
+		return nil, err
+	}
+	err = s.decompress()
+	if err != nil {
+		return nil, err
+	}
+
+	return s.Out, nil
+}
+
+// readNCount will read the symbol distribution so decoding tables can be constructed.
+func (s *Scratch) readNCount() error {
+	var (
+		charnum   uint16
+		previous0 bool
+		b         = &s.br
+	)
+	iend := b.remain()
+	if iend < 4 {
+		return errors.New("input too small")
+	}
+	bitStream := b.Uint32()
+	nbBits := uint((bitStream & 0xF) + minTablelog) // extract tableLog
+	if nbBits > tablelogAbsoluteMax {
+		return errors.New("tableLog too large")
+	}
+	bitStream >>= 4
+	bitCount := uint(4)
+
+	s.actualTableLog = uint8(nbBits)
+	remaining := int32((1 << nbBits) + 1)
+	threshold := int32(1 << nbBits)
+	gotTotal := int32(0)
+	nbBits++
+
+	for remaining > 1 {
+		if previous0 {
+			n0 := charnum
+			for (bitStream & 0xFFFF) == 0xFFFF {
+				n0 += 24
+				if b.off < iend-5 {
+					b.advance(2)
+					bitStream = b.Uint32() >> bitCount
+				} else {
+					bitStream >>= 16
+					bitCount += 16
+				}
+			}
+			for (bitStream & 3) == 3 {
+				n0 += 3
+				bitStream >>= 2
+				bitCount += 2
+			}
+			n0 += uint16(bitStream & 3)
+			bitCount += 2
+			if n0 > maxSymbolValue {
+				return errors.New("maxSymbolValue too small")
+			}
+			for charnum < n0 {
+				s.norm[charnum&0xff] = 0
+				charnum++
+			}
+
+			if b.off <= iend-7 || b.off+int(bitCount>>3) <= iend-4 {
+				b.advance(bitCount >> 3)
+				bitCount &= 7
+				bitStream = b.Uint32() >> bitCount
+			} else {
+				bitStream >>= 2
+			}
+		}
+
+		max := (2*(threshold) - 1) - (remaining)
+		var count int32
+
+		if (int32(bitStream) & (threshold - 1)) < max {
+			count = int32(bitStream) & (threshold - 1)
+			bitCount += nbBits - 1
+		} else {
+			count = int32(bitStream) & (2*threshold - 1)
+			if count >= threshold {
+				count -= max
+			}
+			bitCount += nbBits
+		}
+
+		count-- // extra accuracy
+		if count < 0 {
+			// -1 means +1
+			remaining += count
+			gotTotal -= count
+		} else {
+			remaining -= count
+			gotTotal += count
+		}
+		s.norm[charnum&0xff] = int16(count)
+		charnum++
+		previous0 = count == 0
+		for remaining < threshold {
+			nbBits--
+			threshold >>= 1
+		}
+		if b.off <= iend-7 || b.off+int(bitCount>>3) <= iend-4 {
+			b.advance(bitCount >> 3)
+			bitCount &= 7
+		} else {
+			bitCount -= (uint)(8 * (len(b.b) - 4 - b.off))
+			b.off = len(b.b) - 4
+		}
+		bitStream = b.Uint32() >> (bitCount & 31)
+	}
+	s.symbolLen = charnum
+
+	if s.symbolLen <= 1 {
+		return fmt.Errorf("symbolLen (%d) too small", s.symbolLen)
+	}
+	if s.symbolLen > maxSymbolValue+1 {
+		return fmt.Errorf("symbolLen (%d) too big", s.symbolLen)
+	}
+	if remaining != 1 {
+		return fmt.Errorf("corruption detected (remaining %d != 1)", remaining)
+	}
+	if bitCount > 32 {
+		return fmt.Errorf("corruption detected (bitCount %d > 32)", bitCount)
+	}
+	if gotTotal != 1<<s.actualTableLog {
+		return fmt.Errorf("corruption detected (total %d != %d)", gotTotal, 1<<s.actualTableLog)
+	}
+	b.advance((bitCount + 7) >> 3)
+	return nil
+}
+
+// decSymbol contains information about a state entry,
+// Including the state offset base, the output symbol and
+// the number of bits to read for the low part of the destination state.
+type decSymbol struct {
+	newState uint16
+	symbol   uint8
+	nbBits   uint8
+}
+
+// allocDtable will allocate decoding tables if they are not big enough.
+func (s *Scratch) allocDtable() {
+	tableSize := 1 << s.actualTableLog
+	if cap(s.decTable) < int(tableSize) {
+		s.decTable = make([]decSymbol, tableSize)
+	}
+	s.decTable = s.decTable[:tableSize]
+
+	if cap(s.ct.tableSymbol) < 256 {
+		s.ct.tableSymbol = make([]byte, 256)
+	}
+	s.ct.tableSymbol = s.ct.tableSymbol[:256]
+
+	if cap(s.ct.stateTable) < 256 {
+		s.ct.stateTable = make([]uint16, 256)
+	}
+	s.ct.stateTable = s.ct.stateTable[:256]
+}
+
+// buildDtable will build the decoding table.
+func (s *Scratch) buildDtable() error {
+	tableSize := uint32(1 << s.actualTableLog)
+	highThreshold := tableSize - 1
+	s.allocDtable()
+	symbolNext := s.ct.stateTable[:256]
+
+	// Init, lay down lowprob symbols
+	s.zeroBits = false
+	{
+		largeLimit := int16(1 << (s.actualTableLog - 1))
+		for i, v := range s.norm[:s.symbolLen] {
+			if v == -1 {
+				s.decTable[highThreshold].symbol = uint8(i)
+				highThreshold--
+				symbolNext[i] = 1
+			} else {
+				if v >= largeLimit {
+					s.zeroBits = true
+				}
+				symbolNext[i] = uint16(v)
+			}
+		}
+	}
+	// Spread symbols
+	{
+		tableMask := tableSize - 1
+		step := tableStep(tableSize)
+		position := uint32(0)
+		for ss, v := range s.norm[:s.symbolLen] {
+			for i := 0; i < int(v); i++ {
+				s.decTable[position].symbol = uint8(ss)
+				position = (position + step) & tableMask
+				for position > highThreshold {
+					// lowprob area
+					position = (position + step) & tableMask
+				}
+			}
+		}
+		if position != 0 {
+			// position must reach all cells once, otherwise normalizedCounter is incorrect
+			return errors.New("corrupted input (position != 0)")
+		}
+	}
+
+	// Build Decoding table
+	{
+		tableSize := uint16(1 << s.actualTableLog)
+		for u, v := range s.decTable {
+			symbol := v.symbol
+			nextState := symbolNext[symbol]
+			symbolNext[symbol] = nextState + 1
+			nBits := s.actualTableLog - byte(highBits(uint32(nextState)))
+			s.decTable[u].nbBits = nBits
+			newState := (nextState << nBits) - tableSize
+			if newState > tableSize {
+				return fmt.Errorf("newState (%d) outside table size (%d)", newState, tableSize)
+			}
+			if newState == uint16(u) && nBits == 0 {
+				// Seems weird that this is possible with nbits > 0.
+				return fmt.Errorf("newState (%d) == oldState (%d) and no bits", newState, u)
+			}
+			s.decTable[u].newState = newState
+		}
+	}
+	return nil
+}
+
+// decompress will decompress the bitstream.
+// If the buffer is over-read an error is returned.
+func (s *Scratch) decompress() error {
+	br := &s.bits
+	br.init(s.br.unread())
+
+	var s1, s2 decoder
+	// Initialize and decode first state and symbol.
+	s1.init(br, s.decTable, s.actualTableLog)
+	s2.init(br, s.decTable, s.actualTableLog)
+
+	// Use temp table to avoid bound checks/append penalty.
+	var tmp = s.ct.tableSymbol[:256]
+	var off uint8
+
+	// Main part
+	if !s.zeroBits {
+		for br.off >= 8 {
+			br.fillFast()
+			tmp[off+0] = s1.nextFast()
+			tmp[off+1] = s2.nextFast()
+			br.fillFast()
+			tmp[off+2] = s1.nextFast()
+			tmp[off+3] = s2.nextFast()
+			off += 4
+			if off == 0 {
+				s.Out = append(s.Out, tmp...)
+			}
+		}
+	} else {
+		for br.off >= 8 {
+			br.fillFast()
+			tmp[off+0] = s1.next()
+			tmp[off+1] = s2.next()
+			br.fillFast()
+			tmp[off+2] = s1.next()
+			tmp[off+3] = s2.next()
+			off += 4
+			if off == 0 {
+				s.Out = append(s.Out, tmp...)
+				off = 0
+				if len(s.Out) >= s.DecompressLimit {
+					return fmt.Errorf("output size (%d) > DecompressLimit (%d)", len(s.Out), s.DecompressLimit)
+				}
+			}
+		}
+	}
+	s.Out = append(s.Out, tmp[:off]...)
+
+	// Final bits, a bit more expensive check
+	for {
+		if s1.finished() {
+			s.Out = append(s.Out, s1.final(), s2.final())
+			break
+		}
+		br.fill()
+		s.Out = append(s.Out, s1.next())
+		if s2.finished() {
+			s.Out = append(s.Out, s2.final(), s1.final())
+			break
+		}
+		s.Out = append(s.Out, s2.next())
+		if len(s.Out) >= s.DecompressLimit {
+			return fmt.Errorf("output size (%d) > DecompressLimit (%d)", len(s.Out), s.DecompressLimit)
+		}
+	}
+	return br.close()
+}
+
+// decoder keeps track of the current state and updates it from the bitstream.
+type decoder struct {
+	state uint16
+	br    *bitReader
+	dt    []decSymbol
+}
+
+// init will initialize the decoder and read the first state from the stream.
+func (d *decoder) init(in *bitReader, dt []decSymbol, tableLog uint8) {
+	d.dt = dt
+	d.br = in
+	d.state = uint16(in.getBits(tableLog))
+}
+
+// next returns the next symbol and sets the next state.
+// At least tablelog bits must be available in the bit reader.
+func (d *decoder) next() uint8 {
+	n := &d.dt[d.state]
+	lowBits := d.br.getBits(n.nbBits)
+	d.state = n.newState + lowBits
+	return n.symbol
+}
+
+// finished returns true if all bits have been read from the bitstream
+// and the next state would require reading bits from the input.
+func (d *decoder) finished() bool {
+	return d.br.finished() && d.dt[d.state].nbBits > 0
+}
+
+// final returns the current state symbol without decoding the next.
+func (d *decoder) final() uint8 {
+	return d.dt[d.state].symbol
+}
+
+// nextFast returns the next symbol and sets the next state.
+// This can only be used if no symbols are 0 bits.
+// At least tablelog bits must be available in the bit reader.
+func (d *decoder) nextFast() uint8 {
+	n := d.dt[d.state]
+	lowBits := d.br.getBitsFast(n.nbBits)
+	d.state = n.newState + lowBits
+	return n.symbol
+}
--- a/vendor/github.com/klauspost/compress/fse/fse.go
+++ b/vendor/github.com/klauspost/compress/fse/fse.go
@@ -0,0 +1,143 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+// Package fse provides Finite State Entropy encoding and decoding.
+//
+// Finite State Entropy encoding provides a fast near-optimal symbol encoding/decoding
+// for byte blocks as implemented in zstd.
+//
+// See https://github.com/klauspost/compress/tree/master/fse for more information.
+package fse
+
+import (
+	"errors"
+	"fmt"
+	"math/bits"
+)
+
+const (
+	/*!MEMORY_USAGE :
+	 *  Memory usage formula : N->2^N Bytes (examples : 10 -> 1KB; 12 -> 4KB ; 16 -> 64KB; 20 -> 1MB; etc.)
+	 *  Increasing memory usage improves compression ratio
+	 *  Reduced memory usage can improve speed, due to cache effect
+	 *  Recommended max value is 14, for 16KB, which nicely fits into Intel x86 L1 cache */
+	maxMemoryUsage     = 14
+	defaultMemoryUsage = 13
+
+	maxTableLog     = maxMemoryUsage - 2
+	maxTablesize    = 1 << maxTableLog
+	defaultTablelog = defaultMemoryUsage - 2
+	minTablelog     = 5
+	maxSymbolValue  = 255
+)
+
+var (
+	// ErrIncompressible is returned when input is judged to be too hard to compress.
+	ErrIncompressible = errors.New("input is not compressible")
+
+	// ErrUseRLE is returned from the compressor when the input is a single byte value repeated.
+	ErrUseRLE = errors.New("input is single value repeated")
+)
+
+// Scratch provides temporary storage for compression and decompression.
+type Scratch struct {
+	// Private
+	count          [maxSymbolValue + 1]uint32
+	norm           [maxSymbolValue + 1]int16
+	symbolLen      uint16 // Length of active part of the symbol table.
+	actualTableLog uint8  // Selected tablelog.
+	br             byteReader
+	bits           bitReader
+	bw             bitWriter
+	ct             cTable      // Compression tables.
+	decTable       []decSymbol // Decompression table.
+	zeroBits       bool        // no bits has prob > 50%.
+	clearCount     bool        // clear count
+	maxCount       int         // count of the most probable symbol
+
+	// Per block parameters.
+	// These can be used to override compression parameters of the block.
+	// Do not touch, unless you know what you are doing.
+
+	// Out is output buffer.
+	// If the scratch is re-used before the caller is done processing the output,
+	// set this field to nil.
+	// Otherwise the output buffer will be re-used for next Compression/Decompression step
+	// and allocation will be avoided.
+	Out []byte
+
+	// MaxSymbolValue will override the maximum symbol value of the next block.
+	MaxSymbolValue uint8
+
+	// TableLog will attempt to override the tablelog for the next block.
+	TableLog uint8
+
+	// DecompressLimit limits the maximum decoded size acceptable.
+	// If > 0 decompression will stop when approximately this many bytes
+	// has been decoded.
+	// If 0, maximum size will be 2GB.
+	DecompressLimit int
+}
+
+// Histogram allows to populate the histogram and skip that step in the compression,
+// It otherwise allows to inspect the histogram when compression is done.
+// To indicate that you have populated the histogram call HistogramFinished
+// with the value of the highest populated symbol, as well as the number of entries
+// in the most populated entry. These are accepted at face value.
+// The returned slice will always be length 256.
+func (s *Scratch) Histogram() []uint32 {
+	return s.count[:]
+}
+
+// HistogramFinished can be called to indicate that the histogram has been populated.
+// maxSymbol is the index of the highest set symbol of the next data segment.
+// maxCount is the number of entries in the most populated entry.
+// These are accepted at face value.
+func (s *Scratch) HistogramFinished(maxSymbol uint8, maxCount int) {
+	s.maxCount = maxCount
+	s.symbolLen = uint16(maxSymbol) + 1
+	s.clearCount = maxCount != 0
+}
+
+// prepare will prepare and allocate scratch tables used for both compression and decompression.
+func (s *Scratch) prepare(in []byte) (*Scratch, error) {
+	if s == nil {
+		s = &Scratch{}
+	}
+	if s.MaxSymbolValue == 0 {
+		s.MaxSymbolValue = 255
+	}
+	if s.TableLog == 0 {
+		s.TableLog = defaultTablelog
+	}
+	if s.TableLog > maxTableLog {
+		return nil, fmt.Errorf("tableLog (%d) > maxTableLog (%d)", s.TableLog, maxTableLog)
+	}
+	if cap(s.Out) == 0 {
+		s.Out = make([]byte, 0, len(in))
+	}
+	if s.clearCount && s.maxCount == 0 {
+		for i := range s.count {
+			s.count[i] = 0
+		}
+		s.clearCount = false
+	}
+	s.br.init(in)
+	if s.DecompressLimit == 0 {
+		// Max size 2GB.
+		s.DecompressLimit = (2 << 30) - 1
+	}
+
+	return s, nil
+}
+
+// tableStep returns the next table index.
+func tableStep(tableSize uint32) uint32 {
+	return (tableSize >> 1) + (tableSize >> 3) + 3
+}
+
+func highBits(val uint32) (n uint32) {
+	return uint32(bits.Len32(val) - 1)
+}
--- a/vendor/github.com/klauspost/compress/huff0/.gitignore
+++ b/vendor/github.com/klauspost/compress/huff0/.gitignore
@@ -0,0 +1 @@
+/huff0-fuzz.zip
--- a/vendor/github.com/klauspost/compress/huff0/README.md
+++ b/vendor/github.com/klauspost/compress/huff0/README.md
@@ -0,0 +1,87 @@
+# Huff0 entropy compression
+
+This package provides Huff0 encoding and decoding as used in zstd.
+            
+[Huff0](https://github.com/Cyan4973/FiniteStateEntropy#new-generation-entropy-coders), 
+a Huffman codec designed for modern CPU, featuring OoO (Out of Order) operations on multiple ALU 
+(Arithmetic Logic Unit), achieving extremely fast compression and decompression speeds.
+
+This can be used for compressing input with a lot of similar input values to the smallest number of bytes.
+This does not perform any multi-byte [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) as LZ coders,
+but it can be used as a secondary step to compressors (like Snappy) that does not do entropy encoding. 
+
+* [Godoc documentation](https://godoc.org/github.com/klauspost/compress/huff0)
+
+THIS PACKAGE IS NOT CONSIDERED STABLE AND API OR ENCODING MAY CHANGE IN THE FUTURE.
+
+## News
+
+ * Mar 2018: First implementation released. Consider this beta software for now.
+
+# Usage
+
+This package provides a low level interface that allows to compress single independent blocks. 
+
+Each block is separate, and there is no built in integrity checks. 
+This means that the caller should keep track of block sizes and also do checksums if needed.  
+
+Compressing a block is done via the [`Compress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress1X) and 
+[`Compress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress4X) functions.
+You must provide input and will receive the output and maybe an error.
+
+These error values can be returned:
+
+| Error               | Description                                                                 |
+|---------------------|-----------------------------------------------------------------------------|
+| `<nil>`             | Everything ok, output is returned                                           |
+| `ErrIncompressible` | Returned when input is judged to be too hard to compress                    |
+| `ErrUseRLE`         | Returned from the compressor when the input is a single byte value repeated |
+| `ErrTooBig`         | Returned if the input block exceeds the maximum allowed size (128 Kib)      |
+| `(error)`           | An internal error occurred.                                                 |
+
+
+As can be seen above some of there are errors that will be returned even under normal operation so it is important to handle these.
+
+To reduce allocations you can provide a [`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object 
+that can be re-used for successive calls. Both compression and decompression accepts a `Scratch` object, and the same 
+object can be used for both.   
+
+Be aware, that when re-using a `Scratch` object that the *output* buffer is also re-used, so if you are still using this
+you must set the `Out` field in the scratch to nil. The same buffer is used for compression and decompression output.
+
+The `Scratch` object will retain state that allows to re-use previous tables for encoding and decoding.  
+
+## Tables and re-use
+
+Huff0 allows for reusing tables from the previous block to save space if that is expected to give better/faster results. 
+
+The Scratch object allows you to set a [`ReusePolicy`](https://godoc.org/github.com/klauspost/compress/huff0#ReusePolicy) 
+that controls this behaviour. See the documentation for details. This can be altered between each block.
+
+Do however note that this information is *not* stored in the output block and it is up to the users of the package to
+record whether [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable) should be called,
+based on the boolean reported back from the CompressXX call. 
+
+If you want to store the table separate from the data, you can access them as `OutData` and `OutTable` on the 
+[`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object.
+
+## Decompressing
+
+The first part of decoding is to initialize the decoding table through [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable).
+This will initialize the decoding tables. 
+You can supply the complete block to `ReadTable` and it will return the data part of the block 
+which can be given to the decompressor. 
+
+Decompressing is done by calling the [`Decompress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress1X) 
+or [`Decompress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress4X) function.
+
+You must provide the output from the compression stage, at exactly the size you got back. If you receive an error back
+your input was likely corrupted. 
+
+It is important to note that a successful decoding does *not* mean your output matches your original input. 
+There are no integrity checks, so relying on errors from the decompressor does not assure your data is valid.
+
+# Contributing
+
+Contributions are always welcome. Be aware that adding public functions will require good justification and breaking 
+changes will likely not be accepted. If in doubt open an issue before writing the PR.
--- a/vendor/github.com/klauspost/compress/huff0/bitreader.go
+++ b/vendor/github.com/klauspost/compress/huff0/bitreader.go
@@ -0,0 +1,115 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package huff0
+
+import (
+	"errors"
+	"io"
+)
+
+// bitReader reads a bitstream in reverse.
+// The last set bit indicates the start of the stream and is used
+// for aligning the input.
+type bitReader struct {
+	in       []byte
+	off      uint // next byte to read is at in[off - 1]
+	value    uint64
+	bitsRead uint8
+}
+
+// init initializes and resets the bit reader.
+func (b *bitReader) init(in []byte) error {
+	if len(in) < 1 {
+		return errors.New("corrupt stream: too short")
+	}
+	b.in = in
+	b.off = uint(len(in))
+	// The highest bit of the last byte indicates where to start
+	v := in[len(in)-1]
+	if v == 0 {
+		return errors.New("corrupt stream, did not find end of stream")
+	}
+	b.bitsRead = 64
+	b.value = 0
+	b.fill()
+	b.fill()
+	b.bitsRead += 8 - uint8(highBit32(uint32(v)))
+	return nil
+}
+
+// getBits will return n bits. n can be 0.
+func (b *bitReader) getBits(n uint8) uint16 {
+	if n == 0 || b.bitsRead >= 64 {
+		return 0
+	}
+	return b.getBitsFast(n)
+}
+
+// getBitsFast requires that at least one bit is requested every time.
+// There are no checks if the buffer is filled.
+func (b *bitReader) getBitsFast(n uint8) uint16 {
+	const regMask = 64 - 1
+	v := uint16((b.value << (b.bitsRead & regMask)) >> ((regMask + 1 - n) & regMask))
+	b.bitsRead += n
+	return v
+}
+
+// peekBitsFast requires that at least one bit is requested every time.
+// There are no checks if the buffer is filled.
+func (b *bitReader) peekBitsFast(n uint8) uint16 {
+	const regMask = 64 - 1
+	v := uint16((b.value << (b.bitsRead & regMask)) >> ((regMask + 1 - n) & regMask))
+	return v
+}
+
+// fillFast() will make sure at least 32 bits are available.
+// There must be at least 4 bytes available.
+func (b *bitReader) fillFast() {
+	if b.bitsRead < 32 {
+		return
+	}
+	// Do single re-slice to avoid bounds checks.
+	v := b.in[b.off-4 : b.off]
+	low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
+	b.value = (b.value << 32) | uint64(low)
+	b.bitsRead -= 32
+	b.off -= 4
+}
+
+// fill() will make sure at least 32 bits are available.
+func (b *bitReader) fill() {
+	if b.bitsRead < 32 {
+		return
+	}
+	if b.off > 4 {
+		v := b.in[b.off-4 : b.off]
+		low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
+		b.value = (b.value << 32) | uint64(low)
+		b.bitsRead -= 32
+		b.off -= 4
+		return
+	}
+	for b.off > 0 {
+		b.value = (b.value << 8) | uint64(b.in[b.off-1])
+		b.bitsRead -= 8
+		b.off--
+	}
+}
+
+// finished returns true if all bits have been read from the bit stream.
+func (b *bitReader) finished() bool {
+	return b.off == 0 && b.bitsRead >= 64
+}
+
+// close the bitstream and returns an error if out-of-buffer reads occurred.
+func (b *bitReader) close() error {
+	// Release reference.
+	b.in = nil
+	if b.bitsRead > 64 {
+		return io.ErrUnexpectedEOF
+	}
+	return nil
+}
--- a/vendor/github.com/klauspost/compress/huff0/bitwriter.go
+++ b/vendor/github.com/klauspost/compress/huff0/bitwriter.go
@@ -0,0 +1,186 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package huff0
+
+import "fmt"
+
+// bitWriter will write bits.
+// First bit will be LSB of the first byte of output.
+type bitWriter struct {
+	bitContainer uint64
+	nBits        uint8
+	out          []byte
+}
+
+// bitMask16 is bitmasks. Has extra to avoid bounds check.
+var bitMask16 = [32]uint16{
+	0, 1, 3, 7, 0xF, 0x1F,
+	0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF,
+	0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF, 0xFFFF,
+	0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF, 0xFFFF,
+	0xFFFF, 0xFFFF} /* up to 16 bits */
+
+// addBits16NC will add up to 16 bits.
+// It will not check if there is space for them,
+// so the caller must ensure that it has flushed recently.
+func (b *bitWriter) addBits16NC(value uint16, bits uint8) {
+	b.bitContainer |= uint64(value&bitMask16[bits&31]) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// addBits16Clean will add up to 16 bits. value may not contain more set bits than indicated.
+// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
+func (b *bitWriter) addBits16Clean(value uint16, bits uint8) {
+	b.bitContainer |= uint64(value) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// addBits16Clean will add up to 16 bits. value may not contain more set bits than indicated.
+// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
+func (b *bitWriter) encSymbol(ct cTable, symbol byte) {
+	enc := ct[symbol]
+	b.bitContainer |= uint64(enc.val) << (b.nBits & 63)
+	b.nBits += enc.nBits
+}
+
+// addBits16ZeroNC will add up to 16 bits.
+// It will not check if there is space for them,
+// so the caller must ensure that it has flushed recently.
+// This is fastest if bits can be zero.
+func (b *bitWriter) addBits16ZeroNC(value uint16, bits uint8) {
+	if bits == 0 {
+		return
+	}
+	value <<= (16 - bits) & 15
+	value >>= (16 - bits) & 15
+	b.bitContainer |= uint64(value) << (b.nBits & 63)
+	b.nBits += bits
+}
+
+// flush will flush all pending full bytes.
+// There will be at least 56 bits available for writing when this has been called.
+// Using flush32 is faster, but leaves less space for writing.
+func (b *bitWriter) flush() {
+	v := b.nBits >> 3
+	switch v {
+	case 0:
+		return
+	case 1:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+		)
+		b.bitContainer >>= 1 << 3
+	case 2:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+		)
+		b.bitContainer >>= 2 << 3
+	case 3:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+		)
+		b.bitContainer >>= 3 << 3
+	case 4:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+		)
+		b.bitContainer >>= 4 << 3
+	case 5:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+		)
+		b.bitContainer >>= 5 << 3
+	case 6:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+		)
+		b.bitContainer >>= 6 << 3
+	case 7:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+			byte(b.bitContainer>>48),
+		)
+		b.bitContainer >>= 7 << 3
+	case 8:
+		b.out = append(b.out,
+			byte(b.bitContainer),
+			byte(b.bitContainer>>8),
+			byte(b.bitContainer>>16),
+			byte(b.bitContainer>>24),
+			byte(b.bitContainer>>32),
+			byte(b.bitContainer>>40),
+			byte(b.bitContainer>>48),
+			byte(b.bitContainer>>56),
+		)
+		b.bitContainer = 0
+		b.nBits = 0
+		return
+	default:
+		panic(fmt.Errorf("bits (%d) > 64", b.nBits))
+	}
+	b.nBits &= 7
+}
+
+// flush32 will flush out, so there are at least 32 bits available for writing.
+func (b *bitWriter) flush32() {
+	if b.nBits < 32 {
+		return
+	}
+	b.out = append(b.out,
+		byte(b.bitContainer),
+		byte(b.bitContainer>>8),
+		byte(b.bitContainer>>16),
+		byte(b.bitContainer>>24))
+	b.nBits -= 32
+	b.bitContainer >>= 32
+}
+
+// flushAlign will flush remaining full bytes and align to next byte boundary.
+func (b *bitWriter) flushAlign() {
+	nbBytes := (b.nBits + 7) >> 3
+	for i := uint8(0); i < nbBytes; i++ {
+		b.out = append(b.out, byte(b.bitContainer>>(i*8)))
+	}
+	b.nBits = 0
+	b.bitContainer = 0
+}
+
+// close will write the alignment bit and write the final byte(s)
+// to the output.
+func (b *bitWriter) close() error {
+	// End mark
+	b.addBits16Clean(1, 1)
+	// flush until next byte.
+	b.flushAlign()
+	return nil
+}
+
+// reset and continue writing by appending to out.
+func (b *bitWriter) reset(out []byte) {
+	b.bitContainer = 0
+	b.nBits = 0
+	b.out = out
+}
--- a/vendor/github.com/klauspost/compress/huff0/bytereader.go
+++ b/vendor/github.com/klauspost/compress/huff0/bytereader.go
@@ -0,0 +1,54 @@
+// Copyright 2018 Klaus Post. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
+
+package huff0
+
+// byteReader provides a byte reader that reads
+// little endian values from a byte stream.
+// The input stream is manually advanced.
+// The reader performs no bounds checks.
+type byteReader struct {
+	b   []byte
+	off int
+}
+
+// init will initialize the reader and set the input.
+func (b *byteReader) init(in []byte) {
+	b.b = in
+	b.off = 0
+}
+
+// advance the stream b n bytes.
+func (b *byteReader) advance(n uint) {
+	b.off += int(n)
+}
+
+// Int32 returns a little endian int32 starting at current offset.
+func (b byteReader) Int32() int32 {
+	v3 := int32(b.b[b.off+3])
+	v2 := int32(b.b[b.off+2])
+	v1 := int32(b.b[b.off+1])
+	v0 := int32(b.b[b.off])
+	return (v3 << 24) | (v2 << 16) | (v1 << 8) | v0
+}
+
+// Uint32 returns a little endian uint32 starting at current offset.
+func (b byteReader) Uint32() uint32 {
+	v3 := uint32(b.b[b.off+3])
+	v2 := uint32(b.b[b.off+2])
+	v1 := uint32(b.b[b.off+1])
+	v0 := uint32(b.b[b.off])
+	return (v3 << 24) | (v2 << 16) | (v1 << 8) | v0
+}
+
+// unread returns the unread portion of the input.
+func (b byteReader) unread() []byte {
+	return b.b[b.off:]
+}
+
+// remain will return the number of bytes remaining.
+func (b byteReader) remain() int {
+	return len(b.b) - b.off
+}
--- a/vendor/github.com/klauspost/compress/huff0/compress.go
+++ b/vendor/github.com/klauspost/compress/huff0/compress.go
@@ -0,0 +1,618 @@
+package huff0
+
+import (
+	"fmt"
+	"runtime"
+	"sync"
+)
+
+// Compress1X will compress the input.
+// The output can be decoded using Decompress1X.
+// Supply a Scratch object. The scratch object contains state about re-use,
+// So when sharing across independent encodes, be sure to set the re-use policy.
+func Compress1X(in []byte, s *Scratch) (out []byte, reUsed bool, err error) {
+	s, err = s.prepare(in)
+	if err != nil {
+		return nil, false, err
+	}
+	return compress(in, s, s.compress1X)
+}
+
+// Compress4X will compress the input. The input is split into 4 independent blocks
+// and compressed similar to Compress1X.
+// The output can be decoded using Decompress4X.
+// Supply a Scratch object. The scratch object contains state about re-use,
+// So when sharing across independent encodes, be sure to set the re-use policy.
+func Compress4X(in []byte, s *Scratch) (out []byte, reUsed bool, err error) {
+	s, err = s.prepare(in)
+	if err != nil {
+		return nil, false, err
+	}
+	if false {
+		// TODO: compress4Xp only slightly faster.
+		const parallelThreshold = 8 << 10
+		if len(in) < parallelThreshold || runtime.GOMAXPROCS(0) == 1 {
+			return compress(in, s, s.compress4X)
+		}
+		return compress(in, s, s.compress4Xp)
+	}
+	return compress(in, s, s.compress4X)
+}
+
+func compress(in []byte, s *Scratch, compressor func(src []byte) ([]byte, error)) (out []byte, reUsed bool, err error) {
+	// Nuke previous table if we cannot reuse anyway.
+	if s.Reuse == ReusePolicyNone {
+		s.prevTable = s.prevTable[:0]
+	}
+
+	// Create histogram, if none was provided.
+	maxCount := s.maxCount
+	var canReuse = false
+	if maxCount == 0 {
+		maxCount, canReuse = s.countSimple(in)
+	} else {
+		canReuse = s.canUseTable(s.prevTable)
+	}
+
+	// Reset for next run.
+	s.clearCount = true
+	s.maxCount = 0
+	if maxCount >= len(in) {
+		if maxCount > len(in) {
+			return nil, false, fmt.Errorf("maxCount (%d) > length (%d)", maxCount, len(in))
+		}
+		if len(in) == 1 {
+			return nil, false, ErrIncompressible
+		}
+		// One symbol, use RLE
+		return nil, false, ErrUseRLE
+	}
+	if maxCount == 1 || maxCount < (len(in)>>7) {
+		// Each symbol present maximum once or too well distributed.
+		return nil, false, ErrIncompressible
+	}
+
+	if s.Reuse == ReusePolicyPrefer && canReuse {
+		keepTable := s.cTable
+		s.cTable = s.prevTable
+		s.Out, err = compressor(in)
+		s.cTable = keepTable
+		if err == nil && len(s.Out) < len(in) {
+			s.OutData = s.Out
+			return s.Out, true, nil
+		}
+		// Do not attempt to re-use later.
+		s.prevTable = s.prevTable[:0]
+	}
+
+	// Calculate new table.
+	s.optimalTableLog()
+	err = s.buildCTable()
+	if err != nil {
+		return nil, false, err
+	}
+
+	if false && !s.canUseTable(s.cTable) {
+		panic("invalid table generated")
+	}
+
+	if s.Reuse == ReusePolicyAllow && canReuse {
+		hSize := len(s.Out)
+		oldSize := s.prevTable.estimateSize(s.count[:s.symbolLen])
+		newSize := s.cTable.estimateSize(s.count[:s.symbolLen])
+		if oldSize <= hSize+newSize || hSize+12 >= len(in) {
+			// Retain cTable even if we re-use.
+			keepTable := s.cTable
+			s.cTable = s.prevTable
+			s.Out, err = compressor(in)
+			s.cTable = keepTable
+			if len(s.Out) >= len(in) {
+				return nil, false, ErrIncompressible
+			}
+			s.OutData = s.Out
+			return s.Out, true, nil
+		}
+	}
+
+	// Use new table
+	err = s.cTable.write(s)
+	if err != nil {
+		s.OutTable = nil
+		return nil, false, err
+	}
+	s.OutTable = s.Out
+
+	// Compress using new table
+	s.Out, err = compressor(in)
+	if err != nil {
+		s.OutTable = nil
+		return nil, false, err
+	}
+	if len(s.Out) >= len(in) {
+		s.OutTable = nil
+		return nil, false, ErrIncompressible
+	}
+	// Move current table into previous.
+	s.prevTable, s.cTable = s.cTable, s.prevTable[:0]
+	s.OutData = s.Out[len(s.OutTable):]
+	return s.Out, false, nil
+}
+
+func (s *Scratch) compress1X(src []byte) ([]byte, error) {
+	return s.compress1xDo(s.Out, src)
+}
+
+func (s *Scratch) compress1xDo(dst, src []byte) ([]byte, error) {
+	var bw = bitWriter{out: dst}
+
+	// N is length divisible by 4.
+	n := len(src)
+	n -= n & 3
+	cTable := s.cTable[:256]
+
+	// Encode last bytes.
+	for i := len(src) & 3; i > 0; i-- {
+		bw.encSymbol(cTable, src[n+i-1])
+	}
+	if s.actualTableLog <= 8 {
+		n -= 4
+		for ; n >= 0; n -= 4 {
+			tmp := src[n : n+4]
+			// tmp should be len 4
+			bw.flush32()
+			bw.encSymbol(cTable, tmp[3])
+			bw.encSymbol(cTable, tmp[2])
+			bw.encSymbol(cTable, tmp[1])
+			bw.encSymbol(cTable, tmp[0])
+		}
+	} else {
+		n -= 4
+		for ; n >= 0; n -= 4 {
+			tmp := src[n : n+4]
+			// tmp should be len 4
+			bw.flush32()
+			bw.encSymbol(cTable, tmp[3])
+			bw.encSymbol(cTable, tmp[2])
+			bw.flush32()
+			bw.encSymbol(cTable, tmp[1])
+			bw.encSymbol(cTable, tmp[0])
+		}
+	}
+	err := bw.close()
+	return bw.out, err
+}
+
+var sixZeros [6]byte
+
+func (s *Scratch) compress4X(src []byte) ([]byte, error) {
+	if len(src) < 12 {
+		return nil, ErrIncompressible
+	}
+	segmentSize := (len(src) + 3) / 4
+
+	// Add placeholder for output length
+	offsetIdx := len(s.Out)
+	s.Out = append(s.Out, sixZeros[:]...)
+
+	for i := 0; i < 4; i++ {
+		toDo := src
+		if len(toDo) > segmentSize {
+			toDo = toDo[:segmentSize]
+		}
+		src = src[len(toDo):]
+
+		var err error
+		idx := len(s.Out)
+		s.Out, err = s.compress1xDo(s.Out, toDo)
+		if err != nil {
+			return nil, err
+		}
+		// Write compressed length as little endian before block.
+		if i < 3 {
+			// Last length is not written.
+			length := len(s.Out) - idx
+			s.Out[i*2+offsetIdx] = byte(length)
+			s.Out[i*2+offsetIdx+1] = byte(length >> 8)
+		}
+	}
+
+	return s.Out, nil
+}
+
+// compress4Xp will compress 4 streams using separate goroutines.
+func (s *Scratch) compress4Xp(src []byte) ([]byte, error) {
+	if len(src) < 12 {
+		return nil, ErrIncompressible
+	}
+	// Add placeholder for output length
+	s.Out = s.Out[:6]
+
+	segmentSize := (len(src) + 3) / 4
+	var wg sync.WaitGroup
+	var errs [4]error
+	wg.Add(4)
+	for i := 0; i < 4; i++ {
+		toDo := src
+		if len(toDo) > segmentSize {
+			toDo = toDo[:segmentSize]
+		}
+		src = src[len(toDo):]
+
+		// Separate goroutine for each block.
+		go func(i int) {
+			s.tmpOut[i], errs[i] = s.compress1xDo(s.tmpOut[i][:0], toDo)
+			wg.Done()
+		}(i)
+	}
+	wg.Wait()
+	for i := 0; i < 4; i++ {
+		if errs[i] != nil {
+			return nil, errs[i]
+		}
+		o := s.tmpOut[i]
+		// Write compressed length as little endian before block.
+		if i < 3 {
+			// Last length is not written.
+			s.Out[i*2] = byte(len(o))
+			s.Out[i*2+1] = byte(len(o) >> 8)
+		}
+
+		// Write output.
+		s.Out = append(s.Out, o...)
+	}
+	return s.Out, nil
+}
+
+// countSimple will create a simple histogram in s.count.
+// Returns the biggest count.
+// Does not update s.clearCount.
+func (s *Scratch) countSimple(in []byte) (max int, reuse bool) {
+	reuse = true
+	for _, v := range in {
+		s.count[v]++
+	}
+	m := uint32(0)
+	if len(s.prevTable) > 0 {
+		for i, v := range s.count[:] {
+			if v > m {
+				m = v
+			}
+			if v > 0 {
+				s.symbolLen = uint16(i) + 1
+				if i >= len(s.prevTable) {
+					reuse = false
+				} else {
+					if s.prevTable[i].nBits == 0 {
+						reuse = false
+					}
+				}
+			}
+		}
+		return int(m), reuse
+	}
+	for i, v := range s.count[:] {
+		if v > m {
+			m = v
+		}
+		if v > 0 {
+			s.symbolLen = uint16(i) + 1
+		}
+	}
+	return int(m), false
+}
+
+func (s *Scratch) canUseTable(c cTable) bool {
+	if len(c) < int(s.symbolLen) {
+		return false
+	}
+	for i, v := range s.count[:s.symbolLen] {
+		if v != 0 && c[i].nBits == 0 {
+			return false
+		}
+	}
+	return true
+}
+
+// minTableLog provides the minimum logSize to safely represent a distribution.
+func (s *Scratch) minTableLog() uint8 {
+	minBitsSrc := highBit32(uint32(s.br.remain()-1)) + 1
+	minBitsSymbols := highBit32(uint32(s.symbolLen-1)) + 2
+	if minBitsSrc < minBitsSymbols {
+		return uint8(minBitsSrc)
+	}
+	return uint8(minBitsSymbols)
+}
+
+// optimalTableLog calculates and sets the optimal tableLog in s.actualTableLog
+func (s *Scratch) optimalTableLog() {
+	tableLog := s.TableLog
+	minBits := s.minTableLog()
+	maxBitsSrc := uint8(highBit32(uint32(s.br.remain()-1))) - 2
+	if maxBitsSrc < tableLog {
+		// Accuracy can be reduced
+		tableLog = maxBitsSrc
+	}
+	if minBits > tableLog {
+		tableLog = minBits
+	}
+	// Need a minimum to safely represent all symbol values
+	if tableLog < minTablelog {
+		tableLog = minTablelog
+	}
+	if tableLog > tableLogMax {
+		tableLog = tableLogMax
+	}
+	s.actualTableLog = tableLog
+}
+
+type cTableEntry struct {
+	val   uint16
+	nBits uint8
+	// We have 8 bits extra
+}
+
+const huffNodesMask = huffNodesLen - 1
+
+func (s *Scratch) buildCTable() error {
+	s.huffSort()
+	if cap(s.cTable) < maxSymbolValue+1 {
+		s.cTable = make([]cTableEntry, s.symbolLen, maxSymbolValue+1)
+	} else {
+		s.cTable = s.cTable[:s.symbolLen]
+		for i := range s.cTable {
+			s.cTable[i] = cTableEntry{}
+		}
+	}
+
+	var startNode = int16(s.symbolLen)
+	nonNullRank := s.symbolLen - 1
+
+	nodeNb := int16(startNode)
+	huffNode := s.nodes[1 : huffNodesLen+1]
+
+	// This overlays the slice above, but allows "-1" index lookups.
+	// Different from reference implementation.
+	huffNode0 := s.nodes[0 : huffNodesLen+1]
+
+	for huffNode[nonNullRank].count == 0 {
+		nonNullRank--
+	}
+
+	lowS := int16(nonNullRank)
+	nodeRoot := nodeNb + lowS - 1
+	lowN := nodeNb
+	huffNode[nodeNb].count = huffNode[lowS].count + huffNode[lowS-1].count
+	huffNode[lowS].parent, huffNode[lowS-1].parent = uint16(nodeNb), uint16(nodeNb)
+	nodeNb++
+	lowS -= 2
+	for n := nodeNb; n <= nodeRoot; n++ {
+		huffNode[n].count = 1 << 30
+	}
+	// fake entry, strong barrier
+	huffNode0[0].count = 1 << 31
+
+	// create parents
+	for nodeNb <= nodeRoot {
+		var n1, n2 int16
+		if huffNode0[lowS+1].count < huffNode0[lowN+1].count {
+			n1 = lowS
+			lowS--
+		} else {
+			n1 = lowN
+			lowN++
+		}
+		if huffNode0[lowS+1].count < huffNode0[lowN+1].count {
+			n2 = lowS
+			lowS--
+		} else {
+			n2 = lowN
+			lowN++
+		}
+
+		huffNode[nodeNb].count = huffNode0[n1+1].count + huffNode0[n2+1].count
+		huffNode0[n1+1].parent, huffNode0[n2+1].parent = uint16(nodeNb), uint16(nodeNb)
+		nodeNb++
+	}
+
+	// distribute weights (unlimited tree height)
+	huffNode[nodeRoot].nbBits = 0
+	for n := nodeRoot - 1; n >= startNode; n-- {
+		huffNode[n].nbBits = huffNode[huffNode[n].parent].nbBits + 1
+	}
+	for n := uint16(0); n <= nonNullRank; n++ {
+		huffNode[n].nbBits = huffNode[huffNode[n].parent].nbBits + 1
+	}
+	s.actualTableLog = s.setMaxHeight(int(nonNullRank))
+	maxNbBits := s.actualTableLog
+
+	// fill result into tree (val, nbBits)
+	if maxNbBits > tableLogMax {
+		return fmt.Errorf("internal error: maxNbBits (%d) > tableLogMax (%d)", maxNbBits, tableLogMax)
+	}
+	var nbPerRank [tableLogMax + 1]uint16
+	var valPerRank [tableLogMax + 1]uint16
+	for _, v := range huffNode[:nonNullRank+1] {
+		nbPerRank[v.nbBits]++
+	}
+	// determine stating value per rank
+	{
+		min := uint16(0)
+		for n := maxNbBits; n > 0; n-- {
+			// get starting value within each rank
+			valPerRank[n] = min
+			min += nbPerRank[n]
+			min >>= 1
+		}
+	}
+
+	// push nbBits per symbol, symbol order
+	// TODO: changed `s.symbolLen` -> `nonNullRank+1` (micro-opt)
+	for _, v := range huffNode[:nonNullRank+1] {
+		s.cTable[v.symbol].nBits = v.nbBits
+	}
+
+	// assign value within rank, symbol order
+	for n, val := range s.cTable[:s.symbolLen] {
+		v := valPerRank[val.nBits]
+		s.cTable[n].val = v
+		valPerRank[val.nBits] = v + 1
+	}
+
+	return nil
+}
+
+// huffSort will sort symbols, decreasing order.
+func (s *Scratch) huffSort() {
+	type rankPos struct {
+		base    uint32
+		current uint32
+	}
+
+	// Clear nodes
+	nodes := s.nodes[:huffNodesLen+1]
+	s.nodes = nodes
+	nodes = nodes[1 : huffNodesLen+1]
+
+	// Sort into buckets based on length of symbol count.
+	var rank [32]rankPos
+	for _, v := range s.count[:s.symbolLen] {
+		r := highBit32(v+1) & 31
+		rank[r].base++
+	}
+	for n := 30; n > 0; n-- {
+		rank[n-1].base += rank[n].base
+	}
+	for n := range rank[:] {
+		rank[n].current = rank[n].base
+	}
+	for n, c := range s.count[:s.symbolLen] {
+		r := (highBit32(c+1) + 1) & 31
+		pos := rank[r].current
+		rank[r].current++
+		prev := nodes[(pos-1)&huffNodesMask]
+		for pos > rank[r].base && c > prev.count {
+			nodes[pos&huffNodesMask] = prev
+			pos--
+			prev = nodes[(pos-1)&huffNodesMask]
+		}
+		nodes[pos&huffNodesMask] = nodeElt{count: c, symbol: byte(n)}
+	}
+	return
+}
+
+func (s *Scratch) setMaxHeight(lastNonNull int) uint8 {
+	maxNbBits := s.TableLog
+	huffNode := s.nodes[1 : huffNodesLen+1]
+	//huffNode = huffNode[: huffNodesLen]
+
+	largestBits := huffNode[lastNonNull].nbBits
+
+	// early exit : no elt > maxNbBits
+	if largestBits <= maxNbBits {
+		return largestBits
+	}
+	totalCost := int(0)
+	baseCost := int(1) << (largestBits - maxNbBits)
+	n := uint32(lastNonNull)
+
+	for huffNode[n].nbBits > maxNbBits {
+		totalCost += baseCost - (1 << (largestBits - huffNode[n].nbBits))
+		huffNode[n].nbBits = maxNbBits
+		n--
+	}
+	// n stops at huffNode[n].nbBits <= maxNbBits
+
+	for huffNode[n].nbBits == maxNbBits {
+		n--
+	}
+	// n end at index of smallest symbol using < maxNbBits
+
+	// renorm totalCost
+	totalCost >>= largestBits - maxNbBits /* note : totalCost is necessarily a multiple of baseCost */
+
+	// repay normalized cost
+	{
+		const noSymbol = 0xF0F0F0F0
+		var rankLast [tableLogMax + 2]uint32
+
+		for i := range rankLast[:] {
+			rankLast[i] = noSymbol
+		}
+
+		// Get pos of last (smallest) symbol per rank
+		{
+			currentNbBits := uint8(maxNbBits)
+			for pos := int(n); pos >= 0; pos-- {
+				if huffNode[pos].nbBits >= currentNbBits {
+					continue
+				}
+				currentNbBits = huffNode[pos].nbBits // < maxNbBits
+				rankLast[maxNbBits-currentNbBits] = uint32(pos)
+			}
+		}
+
+		for totalCost > 0 {
+			nBitsToDecrease := uint8(highBit32(uint32(totalCost))) + 1
+
+			for ; nBitsToDecrease > 1; nBitsToDecrease-- {
+				highPos := rankLast[nBitsToDecrease]
+				lowPos := rankLast[nBitsToDecrease-1]
+				if highPos == noSymbol {
+					continue
+				}
+				if lowPos == noSymbol {
+					break
+				}
+				highTotal := huffNode[highPos].count
+				lowTotal := 2 * huffNode[lowPos].count
+				if highTotal <= lowTotal {
+					break
+				}
+			}
+			// only triggered when no more rank 1 symbol left => find closest one (note : there is necessarily at least one !)
+			// HUF_MAX_TABLELOG test just to please gcc 5+; but it should not be necessary
+			// FIXME: try to remove
+			for (nBitsToDecrease <= tableLogMax) && (rankLast[nBitsToDecrease] == noSymbol) {
+				nBitsToDecrease++
+			}
+			totalCost -= 1 << (nBitsToDecrease - 1)
+			if rankLast[nBitsToDecrease-1] == noSymbol {
+				// this rank is no longer empty
+				rankLast[nBitsToDecrease-1] = rankLast[nBitsToDecrease]
+			}
+			huffNode[rankLast[nBitsToDecrease]].nbBits++
+			if rankLast[nBitsToDecrease] == 0 {
+				/* special case, reached largest symbol */
+				rankLast[nBitsToDecrease] = noSymbol
+			} else {
+				rankLast[nBitsToDecrease]--
+				if huffNode[rankLast[nBitsToDecrease]].nbBits != maxNbBits-nBitsToDecrease {
+					rankLast[nBitsToDecrease] = noSymbol /* this rank is now empty */
+				}
+			}
+		}
+
+		for totalCost < 0 { /* Sometimes, cost correction overshoot */
+			if rankLast[1] == noSymbol { /* special case : no rank 1 symbol (using maxNbBits-1); let's create one from largest rank 0 (using maxNbBits) */
+				for huffNode[n].nbBits == maxNbBits {
+					n--
+				}
+				huffNode[n+1].nbBits--
+				rankLast[1] = n + 1
+				totalCost++
+				continue
+			}
+			huffNode[rankLast[1]+1].nbBits--
+			rankLast[1]++
+			totalCost++
+		}
+	}
+	return maxNbBits
+}
+
+type nodeElt struct {
+	count  uint32
+	parent uint16
+	symbol byte
+	nbBits uint8
+}
--- a/vendor/github.com/klauspost/compress/huff0/decompress.go
+++ b/vendor/github.com/klauspost/compress/huff0/decompress.go
@@ -0,0 +1,401 @@
+package huff0
+
+import (
+	"errors"
+	"fmt"
+	"io"
+
+	"github.com/klauspost/compress/fse"
+)
+
+type dTable struct {
+	single []dEntrySingle
+	double []dEntryDouble
+}
+
+// single-symbols decoding
+type dEntrySingle struct {
+	byte  uint8
+	nBits uint8
+}
+
+// double-symbols decoding
+type dEntryDouble struct {
+	seq   uint16
+	nBits uint8
+	len   uint8
+}
+
+// ReadTable will read a table from the input.
+// The size of the input may be larger than the table definition.
+// Any content remaining after the table definition will be returned.
+// If no Scratch is provided a new one is allocated.
+// The returned Scratch can be used for decoding input using this table.
+func ReadTable(in []byte, s *Scratch) (s2 *Scratch, remain []byte, err error) {
+	s, err = s.prepare(in)
+	if err != nil {
+		return s, nil, err
+	}
+	if len(in) <= 1 {
+		return s, nil, errors.New("input too small for table")
+	}
+	iSize := in[0]
+	in = in[1:]
+	if iSize >= 128 {
+		// Uncompressed
+		oSize := iSize - 127
+		iSize = (oSize + 1) / 2
+		if int(iSize) > len(in) {
+			return s, nil, errors.New("input too small for table")
+		}
+		for n := uint8(0); n < oSize; n += 2 {
+			v := in[n/2]
+			s.huffWeight[n] = v >> 4
+			s.huffWeight[n+1] = v & 15
+		}
+		s.symbolLen = uint16(oSize)
+		in = in[iSize:]
+	} else {
+		if len(in) <= int(iSize) {
+			return s, nil, errors.New("input too small for table")
+		}
+		// FSE compressed weights
+		s.fse.DecompressLimit = 255
+		hw := s.huffWeight[:]
+		s.fse.Out = hw
+		b, err := fse.Decompress(in[:iSize], s.fse)
+		s.fse.Out = nil
+		if err != nil {
+			return s, nil, err
+		}
+		if len(b) > 255 {
+			return s, nil, errors.New("corrupt input: output table too large")
+		}
+		s.symbolLen = uint16(len(b))
+		in = in[iSize:]
+	}
+
+	// collect weight stats
+	var rankStats [tableLogMax + 1]uint32
+	weightTotal := uint32(0)
+	for _, v := range s.huffWeight[:s.symbolLen] {
+		if v > tableLogMax {
+			return s, nil, errors.New("corrupt input: weight too large")
+		}
+		rankStats[v]++
+		weightTotal += (1 << (v & 15)) >> 1
+	}
+	if weightTotal == 0 {
+		return s, nil, errors.New("corrupt input: weights zero")
+	}
+
+	// get last non-null symbol weight (implied, total must be 2^n)
+	{
+		tableLog := highBit32(weightTotal) + 1
+		if tableLog > tableLogMax {
+			return s, nil, errors.New("corrupt input: tableLog too big")
+		}
+		s.actualTableLog = uint8(tableLog)
+		// determine last weight
+		{
+			total := uint32(1) << tableLog
+			rest := total - weightTotal
+			verif := uint32(1) << highBit32(rest)
+			lastWeight := highBit32(rest) + 1
+			if verif != rest {
+				// last value must be a clean power of 2
+				return s, nil, errors.New("corrupt input: last value not power of two")
+			}
+			s.huffWeight[s.symbolLen] = uint8(lastWeight)
+			s.symbolLen++
+			rankStats[lastWeight]++
+		}
+	}
+
+	if (rankStats[1] < 2) || (rankStats[1]&1 != 0) {
+		// by construction : at least 2 elts of rank 1, must be even
+		return s, nil, errors.New("corrupt input: min elt size, even check failed ")
+	}
+
+	// TODO: Choose between single/double symbol decoding
+
+	// Calculate starting value for each rank
+	{
+		var nextRankStart uint32
+		for n := uint8(1); n < s.actualTableLog+1; n++ {
+			current := nextRankStart
+			nextRankStart += rankStats[n] << (n - 1)
+			rankStats[n] = current
+		}
+	}
+
+	// fill DTable (always full size)
+	tSize := 1 << tableLogMax
+	if len(s.dt.single) != tSize {
+		s.dt.single = make([]dEntrySingle, tSize)
+	}
+
+	for n, w := range s.huffWeight[:s.symbolLen] {
+		length := (uint32(1) << w) >> 1
+		d := dEntrySingle{
+			byte:  uint8(n),
+			nBits: s.actualTableLog + 1 - w,
+		}
+		for u := rankStats[w]; u < rankStats[w]+length; u++ {
+			s.dt.single[u] = d
+		}
+		rankStats[w] += length
+	}
+	return s, in, nil
+}
+
+// Decompress1X will decompress a 1X encoded stream.
+// The length of the supplied input must match the end of a block exactly.
+// Before this is called, the table must be initialized with ReadTable unless
+// the encoder re-used the table.
+func (s *Scratch) Decompress1X(in []byte) (out []byte, err error) {
+	if len(s.dt.single) == 0 {
+		return nil, errors.New("no table loaded")
+	}
+	var br bitReader
+	err = br.init(in)
+	if err != nil {
+		return nil, err
+	}
+	s.Out = s.Out[:0]
+
+	decode := func() byte {
+		val := br.peekBitsFast(s.actualTableLog) /* note : actualTableLog >= 1 */
+		v := s.dt.single[val]
+		br.bitsRead += v.nBits
+		return v.byte
+	}
+	hasDec := func(v dEntrySingle) byte {
+		br.bitsRead += v.nBits
+		return v.byte
+	}
+
+	// Avoid bounds check by always having full sized table.
+	const tlSize = 1 << tableLogMax
+	const tlMask = tlSize - 1
+	dt := s.dt.single[:tlSize]
+
+	// Use temp table to avoid bound checks/append penalty.
+	var tmp = s.huffWeight[:256]
+	var off uint8
+
+	for br.off >= 8 {
+		br.fillFast()
+		tmp[off+0] = hasDec(dt[br.peekBitsFast(s.actualTableLog)&tlMask])
+		tmp[off+1] = hasDec(dt[br.peekBitsFast(s.actualTableLog)&tlMask])
+		br.fillFast()
+		tmp[off+2] = hasDec(dt[br.peekBitsFast(s.actualTableLog)&tlMask])
+		tmp[off+3] = hasDec(dt[br.peekBitsFast(s.actualTableLog)&tlMask])
+		off += 4
+		if off == 0 {
+			s.Out = append(s.Out, tmp...)
+		}
+	}
+
+	s.Out = append(s.Out, tmp[:off]...)
+
+	for !br.finished() {
+		br.fill()
+		s.Out = append(s.Out, decode())
+	}
+	return s.Out, br.close()
+}
+
+// Decompress4X will decompress a 4X encoded stream.
+// Before this is called, the table must be initialized with ReadTable unless
+// the encoder re-used the table.
+// The length of the supplied input must match the end of a block exactly.
+// The destination size of the uncompressed data must be known and provided.
+func (s *Scratch) Decompress4X(in []byte, dstSize int) (out []byte, err error) {
+	if len(s.dt.single) == 0 {
+		return nil, errors.New("no table loaded")
+	}
+	if len(in) < 6+(4*1) {
+		return nil, errors.New("input too small")
+	}
+	// TODO: We do not detect when we overrun a buffer, except if the last one does.
+
+	var br [4]bitReader
+	start := 6
+	for i := 0; i < 3; i++ {
+		length := int(in[i*2]) | (int(in[i*2+1]) << 8)
+		if start+length >= len(in) {
+			return nil, errors.New("truncated input (or invalid offset)")
+		}
+		err = br[i].init(in[start : start+length])
+		if err != nil {
+			return nil, err
+		}
+		start += length
+	}
+	err = br[3].init(in[start:])
+	if err != nil {
+		return nil, err
+	}
+
+	// Prepare output
+	if cap(s.Out) < dstSize {
+		s.Out = make([]byte, 0, dstSize)
+	}
+	s.Out = s.Out[:dstSize]
+	// destination, offset to match first output
+	dstOut := s.Out
+	dstEvery := (dstSize + 3) / 4
+
+	const tlSize = 1 << tableLogMax
+	const tlMask = tlSize - 1
+	single := s.dt.single[:tlSize]
+
+	decode := func(br *bitReader) byte {
+		val := br.peekBitsFast(s.actualTableLog) /* note : actualTableLog >= 1 */
+		v := single[val&tlMask]
+		br.bitsRead += v.nBits
+		return v.byte
+	}
+
+	// Use temp table to avoid bound checks/append penalty.
+	var tmp = s.huffWeight[:256]
+	var off uint8
+
+	// Decode 2 values from each decoder/loop.
+	const bufoff = 256 / 4
+bigloop:
+	for {
+		for i := range br {
+			if br[i].off < 4 {
+				break bigloop
+			}
+			br[i].fillFast()
+		}
+		tmp[off] = decode(&br[0])
+		tmp[off+bufoff] = decode(&br[1])
+		tmp[off+bufoff*2] = decode(&br[2])
+		tmp[off+bufoff*3] = decode(&br[3])
+		tmp[off+1] = decode(&br[0])
+		tmp[off+1+bufoff] = decode(&br[1])
+		tmp[off+1+bufoff*2] = decode(&br[2])
+		tmp[off+1+bufoff*3] = decode(&br[3])
+		off += 2
+		if off == bufoff {
+			if bufoff > dstEvery {
+				return nil, errors.New("corruption detected: stream overrun")
+			}
+			copy(dstOut, tmp[:bufoff])
+			copy(dstOut[dstEvery:], tmp[bufoff:bufoff*2])
+			copy(dstOut[dstEvery*2:], tmp[bufoff*2:bufoff*3])
+			copy(dstOut[dstEvery*3:], tmp[bufoff*3:bufoff*4])
+			off = 0
+			dstOut = dstOut[bufoff:]
+			// There must at least be 3 buffers left.
+			if len(dstOut) < dstEvery*3+3 {
+				return nil, errors.New("corruption detected: stream overrun")
+			}
+		}
+	}
+	if off > 0 {
+		ioff := int(off)
+		if len(dstOut) < dstEvery*3+ioff {
+			return nil, errors.New("corruption detected: stream overrun")
+		}
+		copy(dstOut, tmp[:off])
+		copy(dstOut[dstEvery:dstEvery+ioff], tmp[bufoff:bufoff*2])
+		copy(dstOut[dstEvery*2:dstEvery*2+ioff], tmp[bufoff*2:bufoff*3])
+		copy(dstOut[dstEvery*3:dstEvery*3+ioff], tmp[bufoff*3:bufoff*4])
+		dstOut = dstOut[off:]
+	}
+
+	for i := range br {
+		offset := dstEvery * i
+		br := &br[i]
+		for !br.finished() {
+			br.fill()
+			if offset >= len(dstOut) {
+				return nil, errors.New("corruption detected: stream overrun")
+			}
+			dstOut[offset] = decode(br)
+			offset++
+		}
+		err = br.close()
+		if err != nil {
+			return nil, err
+		}
+	}
+
+	return s.Out, nil
+}
+
+// matches will compare a decoding table to a coding table.
+// Errors are written to the writer.
+// Nothing will be written if table is ok.
+func (s *Scratch) matches(ct cTable, w io.Writer) {
+	if s == nil || len(s.dt.single) == 0 {
+		return
+	}
+	dt := s.dt.single[:1<<s.actualTableLog]
+	tablelog := s.actualTableLog
+	ok := 0
+	broken := 0
+	for sym, enc := range ct {
+		errs := 0
+		broken++
+		if enc.nBits == 0 {
+			for _, dec := range dt {
+				if dec.byte == byte(sym) {
+					fmt.Fprintf(w, "symbol %x has decoder, but no encoder\n", sym)
+					errs++
+					break
+				}
+			}
+			if errs == 0 {
+				broken--
+			}
+			continue
+		}
+		// Unused bits in input
+		ub := tablelog - enc.nBits
+		top := enc.val << ub
+		// decoder looks at top bits.
+		dec := dt[top]
+		if dec.nBits != enc.nBits {
+			fmt.Fprintf(w, "symbol 0x%x bit size mismatch (enc: %d, dec:%d).\n", sym, enc.nBits, dec.nBits)
+			errs++
+		}
+		if dec.byte != uint8(sym) {
+			fmt.Fprintf(w, "symbol 0x%x decoder output mismatch (enc: %d, dec:%d).\n", sym, sym, dec.byte)
+			errs++
+		}
+		if errs > 0 {
+			fmt.Fprintf(w, "%d errros in base, stopping\n", errs)
+			continue
+		}
+		// Ensure that all combinations are covered.
+		for i := uint16(0); i < (1 << ub); i++ {
+			vval := top | i
+			dec := dt[vval]
+			if dec.nBits != enc.nBits {
+				fmt.Fprintf(w, "symbol 0x%x bit size mismatch (enc: %d, dec:%d).\n", vval, enc.nBits, dec.nBits)
+				errs++
+			}
+			if dec.byte != uint8(sym) {
+				fmt.Fprintf(w, "symbol 0x%x decoder output mismatch (enc: %d, dec:%d).\n", vval, sym, dec.byte)
+				errs++
+			}
+			if errs > 20 {
+				fmt.Fprintf(w, "%d errros, stopping\n", errs)
+				break
+			}
+		}
+		if errs == 0 {
+			ok++
+			broken--
+		}
+	}
+	if broken > 0 {
+		fmt.Fprintf(w, "%d broken, %d ok\n", broken, ok)
+	}
+}
--- a/vendor/github.com/klauspost/compress/huff0/huff0.go
+++ b/vendor/github.com/klauspost/compress/huff0/huff0.go
@@ -0,0 +1,241 @@
+// Package huff0 provides fast huffman encoding as used in zstd.
+//
+// See README.md at https://github.com/klauspost/compress/tree/master/huff0 for details.
+package huff0
+
+import (
+	"errors"
+	"fmt"
+	"math"
+	"math/bits"
+
+	"github.com/klauspost/compress/fse"
+)
+
+const (
+	maxSymbolValue = 255
+
+	// zstandard limits tablelog to 11, see:
+	// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#huffman-tree-description
+	tableLogMax     = 11
+	tableLogDefault = 11
+	minTablelog     = 5
+	huffNodesLen    = 512
+
+	// BlockSizeMax is maximum input size for a single block uncompressed.
+	BlockSizeMax = 1<<18 - 1
+)
+
+var (
+	// ErrIncompressible is returned when input is judged to be too hard to compress.
+	ErrIncompressible = errors.New("input is not compressible")
+
+	// ErrUseRLE is returned from the compressor when the input is a single byte value repeated.
+	ErrUseRLE = errors.New("input is single value repeated")
+
+	// ErrTooBig is return if input is too large for a single block.
+	ErrTooBig = errors.New("input too big")
+)
+
+type ReusePolicy uint8
+
+const (
+	// ReusePolicyAllow will allow reuse if it produces smaller output.
+	ReusePolicyAllow ReusePolicy = iota
+
+	// ReusePolicyPrefer will re-use aggressively if possible.
+	// This will not check if a new table will produce smaller output,
+	// except if the current table is impossible to use or
+	// compressed output is bigger than input.
+	ReusePolicyPrefer
+
+	// ReusePolicyNone will disable re-use of tables.
+	// This is slightly faster than ReusePolicyAllow but may produce larger output.
+	ReusePolicyNone
+)
+
+type Scratch struct {
+	count [maxSymbolValue + 1]uint32
+
+	// Per block parameters.
+	// These can be used to override compression parameters of the block.
+	// Do not touch, unless you know what you are doing.
+
+	// Out is output buffer.
+	// If the scratch is re-used before the caller is done processing the output,
+	// set this field to nil.
+	// Otherwise the output buffer will be re-used for next Compression/Decompression step
+	// and allocation will be avoided.
+	Out []byte
+
+	// OutTable will contain the table data only, if a new table has been generated.
+	// Slice of the returned data.
+	OutTable []byte
+
+	// OutData will contain the compressed data.
+	// Slice of the returned data.
+	OutData []byte
+
+	// MaxSymbolValue will override the maximum symbol value of the next block.
+	MaxSymbolValue uint8
+
+	// TableLog will attempt to override the tablelog for the next block.
+	// Must be <= 11.
+	TableLog uint8
+
+	// Reuse will specify the reuse policy
+	Reuse ReusePolicy
+
+	br             byteReader
+	symbolLen      uint16 // Length of active part of the symbol table.
+	maxCount       int    // count of the most probable symbol
+	clearCount     bool   // clear count
+	actualTableLog uint8  // Selected tablelog.
+	prevTable      cTable // Table used for previous compression.
+	cTable         cTable // compression table
+	dt             dTable // decompression table
+	nodes          []nodeElt
+	tmpOut         [4][]byte
+	fse            *fse.Scratch
+	huffWeight     [maxSymbolValue + 1]byte
+}
+
+func (s *Scratch) prepare(in []byte) (*Scratch, error) {
+	if len(in) > BlockSizeMax {
+		return nil, ErrTooBig
+	}
+	if s == nil {
+		s = &Scratch{}
+	}
+	if s.MaxSymbolValue == 0 {
+		s.MaxSymbolValue = maxSymbolValue
+	}
+	if s.TableLog == 0 {
+		s.TableLog = tableLogDefault
+	}
+	if s.TableLog > tableLogMax {
+		return nil, fmt.Errorf("tableLog (%d) > maxTableLog (%d)", s.TableLog, tableLogMax)
+	}
+	if s.clearCount && s.maxCount == 0 {
+		for i := range s.count {
+			s.count[i] = 0
+		}
+		s.clearCount = false
+	}
+	if cap(s.Out) == 0 {
+		s.Out = make([]byte, 0, len(in))
+	}
+	s.Out = s.Out[:0]
+
+	s.OutTable = nil
+	s.OutData = nil
+	if cap(s.nodes) < huffNodesLen+1 {
+		s.nodes = make([]nodeElt, 0, huffNodesLen+1)
+	}
+	s.nodes = s.nodes[:0]
+	if s.fse == nil {
+		s.fse = &fse.Scratch{}
+	}
+	s.br.init(in)
+
+	return s, nil
+}
+
+type cTable []cTableEntry
+
+func (c cTable) write(s *Scratch) error {
+	var (
+		// precomputed conversion table
+		bitsToWeight [tableLogMax + 1]byte
+		huffLog      = s.actualTableLog
+		// last weight is not saved.
+		maxSymbolValue = uint8(s.symbolLen - 1)
+		huffWeight     = s.huffWeight[:256]
+	)
+	const (
+		maxFSETableLog = 6
+	)
+	// convert to weight
+	bitsToWeight[0] = 0
+	for n := uint8(1); n < huffLog+1; n++ {
+		bitsToWeight[n] = huffLog + 1 - n
+	}
+
+	// Acquire histogram for FSE.
+	hist := s.fse.Histogram()
+	hist = hist[:256]
+	for i := range hist[:16] {
+		hist[i] = 0
+	}
+	for n := uint8(0); n < maxSymbolValue; n++ {
+		v := bitsToWeight[c[n].nBits] & 15
+		huffWeight[n] = v
+		hist[v]++
+	}
+
+	// FSE compress if feasible.
+	if maxSymbolValue >= 2 {
+		huffMaxCnt := uint32(0)
+		huffMax := uint8(0)
+		for i, v := range hist[:16] {
+			if v == 0 {
+				continue
+			}
+			huffMax = byte(i)
+			if v > huffMaxCnt {
+				huffMaxCnt = v
+			}
+		}
+		s.fse.HistogramFinished(huffMax, int(huffMaxCnt))
+		s.fse.TableLog = maxFSETableLog
+		b, err := fse.Compress(huffWeight[:maxSymbolValue], s.fse)
+		if err == nil && len(b) < int(s.symbolLen>>1) {
+			s.Out = append(s.Out, uint8(len(b)))
+			s.Out = append(s.Out, b...)
+			return nil
+		}
+		// Unable to compress (RLE/uncompressible)
+	}
+	// write raw values as 4-bits (max : 15)
+	if maxSymbolValue > (256 - 128) {
+		// should not happen : likely means source cannot be compressed
+		return ErrIncompressible
+	}
+	op := s.Out
+	// special case, pack weights 4 bits/weight.
+	op = append(op, 128|(maxSymbolValue-1))
+	// be sure it doesn't cause msan issue in final combination
+	huffWeight[maxSymbolValue] = 0
+	for n := uint16(0); n < uint16(maxSymbolValue); n += 2 {
+		op = append(op, (huffWeight[n]<<4)|huffWeight[n+1])
+	}
+	s.Out = op
+	return nil
+}
+
+// estimateSize returns the estimated size in bytes of the input represented in the
+// histogram supplied.
+func (c cTable) estimateSize(hist []uint32) int {
+	nbBits := uint32(7)
+	for i, v := range c[:len(hist)] {
+		nbBits += uint32(v.nBits) * hist[i]
+	}
+	return int(nbBits >> 3)
+}
+
+// minSize returns the minimum possible size considering the shannon limit.
+func (s *Scratch) minSize(total int) int {
+	nbBits := float64(7)
+	fTotal := float64(total)
+	for _, v := range s.count[:s.symbolLen] {
+		n := float64(v)
+		if n > 0 {
+			nbBits += math.Log2(fTotal/n) * n
+		}
+	}
+	return int(nbBits) >> 3
+}
+
+func highBit32(val uint32) (n uint32) {
+	return uint32(bits.Len32(val) - 1)
+}
--- a/vendor/github.com/klauspost/compress/snappy/.gitignore
+++ b/vendor/github.com/klauspost/compress/snappy/.gitignore
@@ -0,0 +1,16 @@
+cmd/snappytool/snappytool
+testdata/bench
+
+# These explicitly listed benchmark data files are for an obsolete version of
+# snappy_test.go.
+testdata/alice29.txt
+testdata/asyoulik.txt
+testdata/fireworks.jpeg
+testdata/geo.protodata
+testdata/html
+testdata/html_x_4
+testdata/kppkn.gtb
+testdata/lcet10.txt
+testdata/paper-100k.pdf
+testdata/plrabn12.txt
+testdata/urls.10K
--- a/vendor/github.com/klauspost/compress/snappy/AUTHORS
+++ b/vendor/github.com/klauspost/compress/snappy/AUTHORS
@@ -0,0 +1,15 @@
+# This is the official list of Snappy-Go authors for copyright purposes.
+# This file is distinct from the CONTRIBUTORS files.
+# See the latter for an explanation.
+
+# Names should be added to this file as
+#	Name or Organization <email address>
+# The email address is not required for organizations.
+
+# Please keep the list sorted.
+
+Damian Gryski <dgryski@gmail.com>
+Google Inc.
+Jan Mercl <0xjnml@gmail.com>
+Rodolfo Carvalho <rhcarvalho@gmail.com>
+Sebastien Binet <seb.binet@gmail.com>
--- a/vendor/github.com/klauspost/compress/snappy/CONTRIBUTORS
+++ b/vendor/github.com/klauspost/compress/snappy/CONTRIBUTORS
@@ -0,0 +1,37 @@
+# This is the official list of people who can contribute
+# (and typically have contributed) code to the Snappy-Go repository.
+# The AUTHORS file lists the copyright holders; this file
+# lists people.  For example, Google employees are listed here
+# but not in AUTHORS, because Google holds the copyright.
+#
+# The submission process automatically checks to make sure
+# that people submitting code are listed in this file (by email address).
+#
+# Names should be added to this file only after verifying that
+# the individual or the individual's organization has agreed to
+# the appropriate Contributor License Agreement, found here:
+#
+#     http://code.google.com/legal/individual-cla-v1.0.html
+#     http://code.google.com/legal/corporate-cla-v1.0.html
+#
+# The agreement for individuals can be filled out on the web.
+#
+# When adding J Random Contributor's name to this file,
+# either J's name or J's organization's name should be
+# added to the AUTHORS file, depending on whether the
+# individual or corporate CLA was used.
+
+# Names should be added to this file like so:
+#     Name <email address>
+
+# Please keep the list sorted.
+
+Damian Gryski <dgryski@gmail.com>
+Jan Mercl <0xjnml@gmail.com>
+Kai Backman <kaib@golang.org>
+Marc-Antoine Ruel <maruel@chromium.org>
+Nigel Tao <nigeltao@golang.org>
+Rob Pike <r@golang.org>
+Rodolfo Carvalho <rhcarvalho@gmail.com>
+Russ Cox <rsc@golang.org>
+Sebastien Binet <seb.binet@gmail.com>
--- a/vendor/github.com/klauspost/compress/snappy/LICENSE
+++ b/vendor/github.com/klauspost/compress/snappy/LICENSE
@@ -0,0 +1,27 @@
+Copyright (c) 2011 The Snappy-Go Authors. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+   * Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+   * Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following disclaimer
+in the documentation and/or other materials provided with the
+distribution.
+   * Neither the name of Google Inc. nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/vendor/github.com/klauspost/compress/snappy/README
+++ b/vendor/github.com/klauspost/compress/snappy/README
@@ -0,0 +1,107 @@
+The Snappy compression format in the Go programming language.
+
+To download and install from source:
+$ go get github.com/golang/snappy
+
+Unless otherwise noted, the Snappy-Go source files are distributed
+under the BSD-style license found in the LICENSE file.
+
+
+
+Benchmarks.
+
+The golang/snappy benchmarks include compressing (Z) and decompressing (U) ten
+or so files, the same set used by the C++ Snappy code (github.com/google/snappy
+and note the "google", not "golang"). On an "Intel(R) Core(TM) i7-3770 CPU @
+3.40GHz", Go's GOARCH=amd64 numbers as of 2016-05-29:
+
+"go test -test.bench=."
+
+_UFlat0-8         2.19GB/s ± 0%  html
+_UFlat1-8         1.41GB/s ± 0%  urls
+_UFlat2-8         23.5GB/s ± 2%  jpg
+_UFlat3-8         1.91GB/s ± 0%  jpg_200
+_UFlat4-8         14.0GB/s ± 1%  pdf
+_UFlat5-8         1.97GB/s ± 0%  html4
+_UFlat6-8          814MB/s ± 0%  txt1
+_UFlat7-8          785MB/s ± 0%  txt2
+_UFlat8-8          857MB/s ± 0%  txt3
+_UFlat9-8          719MB/s ± 1%  txt4
+_UFlat10-8        2.84GB/s ± 0%  pb
+_UFlat11-8        1.05GB/s ± 0%  gaviota
+
+_ZFlat0-8         1.04GB/s ± 0%  html
+_ZFlat1-8          534MB/s ± 0%  urls
+_ZFlat2-8         15.7GB/s ± 1%  jpg
+_ZFlat3-8          740MB/s ± 3%  jpg_200
+_ZFlat4-8         9.20GB/s ± 1%  pdf
+_ZFlat5-8          991MB/s ± 0%  html4
+_ZFlat6-8          379MB/s ± 0%  txt1
+_ZFlat7-8          352MB/s ± 0%  txt2
+_ZFlat8-8          396MB/s ± 1%  txt3
+_ZFlat9-8          327MB/s ± 1%  txt4
+_ZFlat10-8        1.33GB/s ± 1%  pb
+_ZFlat11-8         605MB/s ± 1%  gaviota
+
+
+
+"go test -test.bench=. -tags=noasm"
+
+_UFlat0-8          621MB/s ± 2%  html
+_UFlat1-8          494MB/s ± 1%  urls
+_UFlat2-8         23.2GB/s ± 1%  jpg
+_UFlat3-8         1.12GB/s ± 1%  jpg_200
+_UFlat4-8         4.35GB/s ± 1%  pdf
+_UFlat5-8          609MB/s ± 0%  html4
+_UFlat6-8          296MB/s ± 0%  txt1
+_UFlat7-8          288MB/s ± 0%  txt2
+_UFlat8-8          309MB/s ± 1%  txt3
+_UFlat9-8          280MB/s ± 1%  txt4
+_UFlat10-8         753MB/s ± 0%  pb
+_UFlat11-8         400MB/s ± 0%  gaviota
+
+_ZFlat0-8          409MB/s ± 1%  html
+_ZFlat1-8          250MB/s ± 1%  urls
+_ZFlat2-8         12.3GB/s ± 1%  jpg
+_ZFlat3-8          132MB/s ± 0%  jpg_200
+_ZFlat4-8         2.92GB/s ± 0%  pdf
+_ZFlat5-8          405MB/s ± 1%  html4
+_ZFlat6-8          179MB/s ± 1%  txt1
+_ZFlat7-8          170MB/s ± 1%  txt2
+_ZFlat8-8          189MB/s ± 1%  txt3
+_ZFlat9-8          164MB/s ± 1%  txt4
+_ZFlat10-8         479MB/s ± 1%  pb
+_ZFlat11-8         270MB/s ± 1%  gaviota
+
+
+
+For comparison (Go's encoded output is byte-for-byte identical to C++'s), here
+are the numbers from C++ Snappy's
+
+make CXXFLAGS="-O2 -DNDEBUG -g" clean snappy_unittest.log && cat snappy_unittest.log
+
+BM_UFlat/0     2.4GB/s  html
+BM_UFlat/1     1.4GB/s  urls
+BM_UFlat/2    21.8GB/s  jpg
+BM_UFlat/3     1.5GB/s  jpg_200
+BM_UFlat/4    13.3GB/s  pdf
+BM_UFlat/5     2.1GB/s  html4
+BM_UFlat/6     1.0GB/s  txt1
+BM_UFlat/7   959.4MB/s  txt2
+BM_UFlat/8     1.0GB/s  txt3
+BM_UFlat/9   864.5MB/s  txt4
+BM_UFlat/10    2.9GB/s  pb
+BM_UFlat/11    1.2GB/s  gaviota
+
+BM_ZFlat/0   944.3MB/s  html (22.31 %)
+BM_ZFlat/1   501.6MB/s  urls (47.78 %)
+BM_ZFlat/2    14.3GB/s  jpg (99.95 %)
+BM_ZFlat/3   538.3MB/s  jpg_200 (73.00 %)
+BM_ZFlat/4     8.3GB/s  pdf (83.30 %)
+BM_ZFlat/5   903.5MB/s  html4 (22.52 %)
+BM_ZFlat/6   336.0MB/s  txt1 (57.88 %)
+BM_ZFlat/7   312.3MB/s  txt2 (61.91 %)
+BM_ZFlat/8   353.1MB/s  txt3 (54.99 %)
+BM_ZFlat/9   289.9MB/s  txt4 (66.26 %)
+BM_ZFlat/10    1.2GB/s  pb (19.68 %)
+BM_ZFlat/11  527.4MB/s  gaviota (37.72 %)
--- a/vendor/github.com/klauspost/compress/snappy/decode.go
+++ b/vendor/github.com/klauspost/compress/snappy/decode.go
@@ -0,0 +1,237 @@
+// Copyright 2011 The Snappy-Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package snappy
+
+import (
+	"encoding/binary"
+	"errors"
+	"io"
+)
+
+var (
+	// ErrCorrupt reports that the input is invalid.
+	ErrCorrupt = errors.New("snappy: corrupt input")
+	// ErrTooLarge reports that the uncompressed length is too large.
+	ErrTooLarge = errors.New("snappy: decoded block is too large")
+	// ErrUnsupported reports that the input isn't supported.
+	ErrUnsupported = errors.New("snappy: unsupported input")
+
+	errUnsupportedLiteralLength = errors.New("snappy: unsupported literal length")
+)
+
+// DecodedLen returns the length of the decoded block.
+func DecodedLen(src []byte) (int, error) {
+	v, _, err := decodedLen(src)
+	return v, err
+}
+
+// decodedLen returns the length of the decoded block and the number of bytes
+// that the length header occupied.
+func decodedLen(src []byte) (blockLen, headerLen int, err error) {
+	v, n := binary.Uvarint(src)
+	if n <= 0 || v > 0xffffffff {
+		return 0, 0, ErrCorrupt
+	}
+
+	const wordSize = 32 << (^uint(0) >> 32 & 1)
+	if wordSize == 32 && v > 0x7fffffff {
+		return 0, 0, ErrTooLarge
+	}
+	return int(v), n, nil
+}
+
+const (
+	decodeErrCodeCorrupt                  = 1
+	decodeErrCodeUnsupportedLiteralLength = 2
+)
+
+// Decode returns the decoded form of src. The returned slice may be a sub-
+// slice of dst if dst was large enough to hold the entire decoded block.
+// Otherwise, a newly allocated slice will be returned.
+//
+// The dst and src must not overlap. It is valid to pass a nil dst.
+func Decode(dst, src []byte) ([]byte, error) {
+	dLen, s, err := decodedLen(src)
+	if err != nil {
+		return nil, err
+	}
+	if dLen <= len(dst) {
+		dst = dst[:dLen]
+	} else {
+		dst = make([]byte, dLen)
+	}
+	switch decode(dst, src[s:]) {
+	case 0:
+		return dst, nil
+	case decodeErrCodeUnsupportedLiteralLength:
+		return nil, errUnsupportedLiteralLength
+	}
+	return nil, ErrCorrupt
+}
+
+// NewReader returns a new Reader that decompresses from r, using the framing
+// format described at
+// https://github.com/google/snappy/blob/master/framing_format.txt
+func NewReader(r io.Reader) *Reader {
+	return &Reader{
+		r:       r,
+		decoded: make([]byte, maxBlockSize),
+		buf:     make([]byte, maxEncodedLenOfMaxBlockSize+checksumSize),
+	}
+}
+
+// Reader is an io.Reader that can read Snappy-compressed bytes.
+type Reader struct {
+	r       io.Reader
+	err     error
+	decoded []byte
+	buf     []byte
+	// decoded[i:j] contains decoded bytes that have not yet been passed on.
+	i, j       int
+	readHeader bool
+}
+
+// Reset discards any buffered data, resets all state, and switches the Snappy
+// reader to read from r. This permits reusing a Reader rather than allocating
+// a new one.
+func (r *Reader) Reset(reader io.Reader) {
+	r.r = reader
+	r.err = nil
+	r.i = 0
+	r.j = 0
+	r.readHeader = false
+}
+
+func (r *Reader) readFull(p []byte, allowEOF bool) (ok bool) {
+	if _, r.err = io.ReadFull(r.r, p); r.err != nil {
+		if r.err == io.ErrUnexpectedEOF || (r.err == io.EOF && !allowEOF) {
+			r.err = ErrCorrupt
+		}
+		return false
+	}
+	return true
+}
+
+// Read satisfies the io.Reader interface.
+func (r *Reader) Read(p []byte) (int, error) {
+	if r.err != nil {
+		return 0, r.err
+	}
+	for {
+		if r.i < r.j {
+			n := copy(p, r.decoded[r.i:r.j])
+			r.i += n
+			return n, nil
+		}
+		if !r.readFull(r.buf[:4], true) {
+			return 0, r.err
+		}
+		chunkType := r.buf[0]
+		if !r.readHeader {
+			if chunkType != chunkTypeStreamIdentifier {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			r.readHeader = true
+		}
+		chunkLen := int(r.buf[1]) | int(r.buf[2])<<8 | int(r.buf[3])<<16
+		if chunkLen > len(r.buf) {
+			r.err = ErrUnsupported
+			return 0, r.err
+		}
+
+		// The chunk types are specified at
+		// https://github.com/google/snappy/blob/master/framing_format.txt
+		switch chunkType {
+		case chunkTypeCompressedData:
+			// Section 4.2. Compressed data (chunk type 0x00).
+			if chunkLen < checksumSize {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			buf := r.buf[:chunkLen]
+			if !r.readFull(buf, false) {
+				return 0, r.err
+			}
+			checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24
+			buf = buf[checksumSize:]
+
+			n, err := DecodedLen(buf)
+			if err != nil {
+				r.err = err
+				return 0, r.err
+			}
+			if n > len(r.decoded) {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			if _, err := Decode(r.decoded, buf); err != nil {
+				r.err = err
+				return 0, r.err
+			}
+			if crc(r.decoded[:n]) != checksum {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			r.i, r.j = 0, n
+			continue
+
+		case chunkTypeUncompressedData:
+			// Section 4.3. Uncompressed data (chunk type 0x01).
+			if chunkLen < checksumSize {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			buf := r.buf[:checksumSize]
+			if !r.readFull(buf, false) {
+				return 0, r.err
+			}
+			checksum := uint32(buf[0]) | uint32(buf[1])<<8 | uint32(buf[2])<<16 | uint32(buf[3])<<24
+			// Read directly into r.decoded instead of via r.buf.
+			n := chunkLen - checksumSize
+			if n > len(r.decoded) {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			if !r.readFull(r.decoded[:n], false) {
+				return 0, r.err
+			}
+			if crc(r.decoded[:n]) != checksum {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			r.i, r.j = 0, n
+			continue
+
+		case chunkTypeStreamIdentifier:
+			// Section 4.1. Stream identifier (chunk type 0xff).
+			if chunkLen != len(magicBody) {
+				r.err = ErrCorrupt
+				return 0, r.err
+			}
+			if !r.readFull(r.buf[:len(magicBody)], false) {
+				return 0, r.err
+			}
+			for i := 0; i < len(magicBody); i++ {
+				if r.buf[i] != magicBody[i] {
+					r.err = ErrCorrupt
+					return 0, r.err
+				}
+			}
+			continue
+		}
+
+		if chunkType <= 0x7f {
+			// Section 4.5. Reserved unskippable chunks (chunk types 0x02-0x7f).
+			r.err = ErrUnsupported
+			return 0, r.err
+		}
+		// Section 4.4 Padding (chunk type 0xfe).
+		// Section 4.6. Reserved skippable chunks (chunk types 0x80-0xfd).
+		if !r.readFull(r.buf[:chunkLen], false) {
+			return 0, r.err
+		}
+	}
+}
--- a/vendor/github.com/klauspost/compress/snappy/decode_amd64.go
+++ b/vendor/github.com/klauspost/compress/snappy/decode_amd64.go
@@ -0,0 +1,14 @@
+// Copyright 2016 The Snappy-Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !appengine
+// +build gc
+// +build !noasm
+
+package snappy
+
+// decode has the same semantics as in decode_other.go.
+//
+//go:noescape
+func decode(dst, src []byte) int
--- a/vendor/github.com/klauspost/compress/snappy/decode_amd64.s
+++ b/vendor/github.com/klauspost/compress/snappy/decode_amd64.s
@@ -0,0 +1,490 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !appengine
+// +build gc
+// +build !noasm
+
+#include "textflag.h"
+
+// The asm code generally follows the pure Go code in decode_other.go, except
+// where marked with a "!!!".
+
+// func decode(dst, src []byte) int
+//
+// All local variables fit into registers. The non-zero stack size is only to
+// spill registers and push args when issuing a CALL. The register allocation:
+//	- AX	scratch
+//	- BX	scratch
+//	- CX	length or x
+//	- DX	offset
+//	- SI	&src[s]
+//	- DI	&dst[d]
+//	+ R8	dst_base
+//	+ R9	dst_len
+//	+ R10	dst_base + dst_len
+//	+ R11	src_base
+//	+ R12	src_len
+//	+ R13	src_base + src_len
+//	- R14	used by doCopy
+//	- R15	used by doCopy
+//
+// The registers R8-R13 (marked with a "+") are set at the start of the
+// function, and after a CALL returns, and are not otherwise modified.
+//
+// The d variable is implicitly DI - R8,  and len(dst)-d is R10 - DI.
+// The s variable is implicitly SI - R11, and len(src)-s is R13 - SI.
+TEXT ·decode(SB), NOSPLIT, $48-56
+	// Initialize SI, DI and R8-R13.
+	MOVQ dst_base+0(FP), R8
+	MOVQ dst_len+8(FP), R9
+	MOVQ R8, DI
+	MOVQ R8, R10
+	ADDQ R9, R10
+	MOVQ src_base+24(FP), R11
+	MOVQ src_len+32(FP), R12
+	MOVQ R11, SI
+	MOVQ R11, R13
+	ADDQ R12, R13
+
+loop:
+	// for s < len(src)
+	CMPQ SI, R13
+	JEQ  end
+
+	// CX = uint32(src[s])
+	//
+	// switch src[s] & 0x03
+	MOVBLZX (SI), CX
+	MOVL    CX, BX
+	ANDL    $3, BX
+	CMPL    BX, $1
+	JAE     tagCopy
+
+	// ----------------------------------------
+	// The code below handles literal tags.
+
+	// case tagLiteral:
+	// x := uint32(src[s] >> 2)
+	// switch
+	SHRL $2, CX
+	CMPL CX, $60
+	JAE  tagLit60Plus
+
+	// case x < 60:
+	// s++
+	INCQ SI
+
+doLit:
+	// This is the end of the inner "switch", when we have a literal tag.
+	//
+	// We assume that CX == x and x fits in a uint32, where x is the variable
+	// used in the pure Go decode_other.go code.
+
+	// length = int(x) + 1
+	//
+	// Unlike the pure Go code, we don't need to check if length <= 0 because
+	// CX can hold 64 bits, so the increment cannot overflow.
+	INCQ CX
+
+	// Prepare to check if copying length bytes will run past the end of dst or
+	// src.
+	//
+	// AX = len(dst) - d
+	// BX = len(src) - s
+	MOVQ R10, AX
+	SUBQ DI, AX
+	MOVQ R13, BX
+	SUBQ SI, BX
+
+	// !!! Try a faster technique for short (16 or fewer bytes) copies.
+	//
+	// if length > 16 || len(dst)-d < 16 || len(src)-s < 16 {
+	//   goto callMemmove // Fall back on calling runtime·memmove.
+	// }
+	//
+	// The C++ snappy code calls this TryFastAppend. It also checks len(src)-s
+	// against 21 instead of 16, because it cannot assume that all of its input
+	// is contiguous in memory and so it needs to leave enough source bytes to
+	// read the next tag without refilling buffers, but Go's Decode assumes
+	// contiguousness (the src argument is a []byte).
+	CMPQ CX, $16
+	JGT  callMemmove
+	CMPQ AX, $16
+	JLT  callMemmove
+	CMPQ BX, $16
+	JLT  callMemmove
+
+	// !!! Implement the copy from src to dst as a 16-byte load and store.
+	// (Decode's documentation says that dst and src must not overlap.)
+	//
+	// This always copies 16 bytes, instead of only length bytes, but that's
+	// OK. If the input is a valid Snappy encoding then subsequent iterations
+	// will fix up the overrun. Otherwise, Decode returns a nil []byte (and a
+	// non-nil error), so the overrun will be ignored.
+	//
+	// Note that on amd64, it is legal and cheap to issue unaligned 8-byte or
+	// 16-byte loads and stores. This technique probably wouldn't be as
+	// effective on architectures that are fussier about alignment.
+	MOVOU 0(SI), X0
+	MOVOU X0, 0(DI)
+
+	// d += length
+	// s += length
+	ADDQ CX, DI
+	ADDQ CX, SI
+	JMP  loop
+
+callMemmove:
+	// if length > len(dst)-d || length > len(src)-s { etc }
+	CMPQ CX, AX
+	JGT  errCorrupt
+	CMPQ CX, BX
+	JGT  errCorrupt
+
+	// copy(dst[d:], src[s:s+length])
+	//
+	// This means calling runtime·memmove(&dst[d], &src[s], length), so we push
+	// DI, SI and CX as arguments. Coincidentally, we also need to spill those
+	// three registers to the stack, to save local variables across the CALL.
+	MOVQ DI, 0(SP)
+	MOVQ SI, 8(SP)
+	MOVQ CX, 16(SP)
+	MOVQ DI, 24(SP)
+	MOVQ SI, 32(SP)
+	MOVQ CX, 40(SP)
+	CALL runtime·memmove(SB)
+
+	// Restore local variables: unspill registers from the stack and
+	// re-calculate R8-R13.
+	MOVQ 24(SP), DI
+	MOVQ 32(SP), SI
+	MOVQ 40(SP), CX
+	MOVQ dst_base+0(FP), R8
+	MOVQ dst_len+8(FP), R9
+	MOVQ R8, R10
+	ADDQ R9, R10
+	MOVQ src_base+24(FP), R11
+	MOVQ src_len+32(FP), R12
+	MOVQ R11, R13
+	ADDQ R12, R13
+
+	// d += length
+	// s += length
+	ADDQ CX, DI
+	ADDQ CX, SI
+	JMP  loop
+
+tagLit60Plus:
+	// !!! This fragment does the
+	//
+	// s += x - 58; if uint(s) > uint(len(src)) { etc }
+	//
+	// checks. In the asm version, we code it once instead of once per switch case.
+	ADDQ CX, SI
+	SUBQ $58, SI
+	MOVQ SI, BX
+	SUBQ R11, BX
+	CMPQ BX, R12
+	JA   errCorrupt
+
+	// case x == 60:
+	CMPL CX, $61
+	JEQ  tagLit61
+	JA   tagLit62Plus
+
+	// x = uint32(src[s-1])
+	MOVBLZX -1(SI), CX
+	JMP     doLit
+
+tagLit61:
+	// case x == 61:
+	// x = uint32(src[s-2]) | uint32(src[s-1])<<8
+	MOVWLZX -2(SI), CX
+	JMP     doLit
+
+tagLit62Plus:
+	CMPL CX, $62
+	JA   tagLit63
+
+	// case x == 62:
+	// x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16
+	MOVWLZX -3(SI), CX
+	MOVBLZX -1(SI), BX
+	SHLL    $16, BX
+	ORL     BX, CX
+	JMP     doLit
+
+tagLit63:
+	// case x == 63:
+	// x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24
+	MOVL -4(SI), CX
+	JMP  doLit
+
+// The code above handles literal tags.
+// ----------------------------------------
+// The code below handles copy tags.
+
+tagCopy4:
+	// case tagCopy4:
+	// s += 5
+	ADDQ $5, SI
+
+	// if uint(s) > uint(len(src)) { etc }
+	MOVQ SI, BX
+	SUBQ R11, BX
+	CMPQ BX, R12
+	JA   errCorrupt
+
+	// length = 1 + int(src[s-5])>>2
+	SHRQ $2, CX
+	INCQ CX
+
+	// offset = int(uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24)
+	MOVLQZX -4(SI), DX
+	JMP     doCopy
+
+tagCopy2:
+	// case tagCopy2:
+	// s += 3
+	ADDQ $3, SI
+
+	// if uint(s) > uint(len(src)) { etc }
+	MOVQ SI, BX
+	SUBQ R11, BX
+	CMPQ BX, R12
+	JA   errCorrupt
+
+	// length = 1 + int(src[s-3])>>2
+	SHRQ $2, CX
+	INCQ CX
+
+	// offset = int(uint32(src[s-2]) | uint32(src[s-1])<<8)
+	MOVWQZX -2(SI), DX
+	JMP     doCopy
+
+tagCopy:
+	// We have a copy tag. We assume that:
+	//	- BX == src[s] & 0x03
+	//	- CX == src[s]
+	CMPQ BX, $2
+	JEQ  tagCopy2
+	JA   tagCopy4
+
+	// case tagCopy1:
+	// s += 2
+	ADDQ $2, SI
+
+	// if uint(s) > uint(len(src)) { etc }
+	MOVQ SI, BX
+	SUBQ R11, BX
+	CMPQ BX, R12
+	JA   errCorrupt
+
+	// offset = int(uint32(src[s-2])&0xe0<<3 | uint32(src[s-1]))
+	MOVQ    CX, DX
+	ANDQ    $0xe0, DX
+	SHLQ    $3, DX
+	MOVBQZX -1(SI), BX
+	ORQ     BX, DX
+
+	// length = 4 + int(src[s-2])>>2&0x7
+	SHRQ $2, CX
+	ANDQ $7, CX
+	ADDQ $4, CX
+
+doCopy:
+	// This is the end of the outer "switch", when we have a copy tag.
+	//
+	// We assume that:
+	//	- CX == length && CX > 0
+	//	- DX == offset
+
+	// if offset <= 0 { etc }
+	CMPQ DX, $0
+	JLE  errCorrupt
+
+	// if d < offset { etc }
+	MOVQ DI, BX
+	SUBQ R8, BX
+	CMPQ BX, DX
+	JLT  errCorrupt
+
+	// if length > len(dst)-d { etc }
+	MOVQ R10, BX
+	SUBQ DI, BX
+	CMPQ CX, BX
+	JGT  errCorrupt
+
+	// forwardCopy(dst[d:d+length], dst[d-offset:]); d += length
+	//
+	// Set:
+	//	- R14 = len(dst)-d
+	//	- R15 = &dst[d-offset]
+	MOVQ R10, R14
+	SUBQ DI, R14
+	MOVQ DI, R15
+	SUBQ DX, R15
+
+	// !!! Try a faster technique for short (16 or fewer bytes) forward copies.
+	//
+	// First, try using two 8-byte load/stores, similar to the doLit technique
+	// above. Even if dst[d:d+length] and dst[d-offset:] can overlap, this is
+	// still OK if offset >= 8. Note that this has to be two 8-byte load/stores
+	// and not one 16-byte load/store, and the first store has to be before the
+	// second load, due to the overlap if offset is in the range [8, 16).
+	//
+	// if length > 16 || offset < 8 || len(dst)-d < 16 {
+	//   goto slowForwardCopy
+	// }
+	// copy 16 bytes
+	// d += length
+	CMPQ CX, $16
+	JGT  slowForwardCopy
+	CMPQ DX, $8
+	JLT  slowForwardCopy
+	CMPQ R14, $16
+	JLT  slowForwardCopy
+	MOVQ 0(R15), AX
+	MOVQ AX, 0(DI)
+	MOVQ 8(R15), BX
+	MOVQ BX, 8(DI)
+	ADDQ CX, DI
+	JMP  loop
+
+slowForwardCopy:
+	// !!! If the forward copy is longer than 16 bytes, or if offset < 8, we
+	// can still try 8-byte load stores, provided we can overrun up to 10 extra
+	// bytes. As above, the overrun will be fixed up by subsequent iterations
+	// of the outermost loop.
+	//
+	// The C++ snappy code calls this technique IncrementalCopyFastPath. Its
+	// commentary says:
+	//
+	// ----
+	//
+	// The main part of this loop is a simple copy of eight bytes at a time
+	// until we've copied (at least) the requested amount of bytes.  However,
+	// if d and d-offset are less than eight bytes apart (indicating a
+	// repeating pattern of length < 8), we first need to expand the pattern in
+	// order to get the correct results. For instance, if the buffer looks like
+	// this, with the eight-byte <d-offset> and <d> patterns marked as
+	// intervals:
+	//
+	//    abxxxxxxxxxxxx
+	//    [------]           d-offset
+	//      [------]         d
+	//
+	// a single eight-byte copy from <d-offset> to <d> will repeat the pattern
+	// once, after which we can move <d> two bytes without moving <d-offset>:
+	//
+	//    ababxxxxxxxxxx
+	//    [------]           d-offset
+	//        [------]       d
+	//
+	// and repeat the exercise until the two no longer overlap.
+	//
+	// This allows us to do very well in the special case of one single byte
+	// repeated many times, without taking a big hit for more general cases.
+	//
+	// The worst case of extra writing past the end of the match occurs when
+	// offset == 1 and length == 1; the last copy will read from byte positions
+	// [0..7] and write to [4..11], whereas it was only supposed to write to
+	// position 1. Thus, ten excess bytes.
+	//
+	// ----
+	//
+	// That "10 byte overrun" worst case is confirmed by Go's
+	// TestSlowForwardCopyOverrun, which also tests the fixUpSlowForwardCopy
+	// and finishSlowForwardCopy algorithm.
+	//
+	// if length > len(dst)-d-10 {
+	//   goto verySlowForwardCopy
+	// }
+	SUBQ $10, R14
+	CMPQ CX, R14
+	JGT  verySlowForwardCopy
+
+makeOffsetAtLeast8:
+	// !!! As above, expand the pattern so that offset >= 8 and we can use
+	// 8-byte load/stores.
+	//
+	// for offset < 8 {
+	//   copy 8 bytes from dst[d-offset:] to dst[d:]
+	//   length -= offset
+	//   d      += offset
+	//   offset += offset
+	//   // The two previous lines together means that d-offset, and therefore
+	//   // R15, is unchanged.
+	// }
+	CMPQ DX, $8
+	JGE  fixUpSlowForwardCopy
+	MOVQ (R15), BX
+	MOVQ BX, (DI)
+	SUBQ DX, CX
+	ADDQ DX, DI
+	ADDQ DX, DX
+	JMP  makeOffsetAtLeast8
+
+fixUpSlowForwardCopy:
+	// !!! Add length (which might be negative now) to d (implied by DI being
+	// &dst[d]) so that d ends up at the right place when we jump back to the
+	// top of the loop. Before we do that, though, we save DI to AX so that, if
+	// length is positive, copying the remaining length bytes will write to the
+	// right place.
+	MOVQ DI, AX
+	ADDQ CX, DI
+
+finishSlowForwardCopy:
+	// !!! Repeat 8-byte load/stores until length <= 0. Ending with a negative
+	// length means that we overrun, but as above, that will be fixed up by
+	// subsequent iterations of the outermost loop.
+	CMPQ CX, $0
+	JLE  loop
+	MOVQ (R15), BX
+	MOVQ BX, (AX)
+	ADDQ $8, R15
+	ADDQ $8, AX
+	SUBQ $8, CX
+	JMP  finishSlowForwardCopy
+
+verySlowForwardCopy:
+	// verySlowForwardCopy is a simple implementation of forward copy. In C
+	// parlance, this is a do/while loop instead of a while loop, since we know
+	// that length > 0. In Go syntax:
+	//
+	// for {
+	//   dst[d] = dst[d - offset]
+	//   d++
+	//   length--
+	//   if length == 0 {
+	//     break
+	//   }
+	// }
+	MOVB (R15), BX
+	MOVB BX, (DI)
+	INCQ R15
+	INCQ DI
+	DECQ CX
+	JNZ  verySlowForwardCopy
+	JMP  loop
+
+// The code above handles copy tags.
+// ----------------------------------------
+
+end:
+	// This is the end of the "for s < len(src)".
+	//
+	// if d != len(dst) { etc }
+	CMPQ DI, R10
+	JNE  errCorrupt
+
+	// return 0
+	MOVQ $0, ret+48(FP)
+	RET
+
+errCorrupt:
+	// return decodeErrCodeCorrupt
+	MOVQ $1, ret+48(FP)
+	RET
--- a/vendor/github.com/klauspost/compress/snappy/decode_other.go
+++ b/vendor/github.com/klauspost/compress/snappy/decode_other.go
@@ -0,0 +1,101 @@
+// Copyright 2016 The Snappy-Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !amd64 appengine !gc noasm
+
+package snappy
+
+// decode writes the decoding of src to dst. It assumes that the varint-encoded
+// length of the decompressed bytes has already been read, and that len(dst)
+// equals that length.
+//
+// It returns 0 on success or a decodeErrCodeXxx error code on failure.
+func decode(dst, src []byte) int {
+	var d, s, offset, length int
+	for s < len(src) {
+		switch src[s] & 0x03 {
+		case tagLiteral:
+			x := uint32(src[s] >> 2)
+			switch {
+			case x < 60:
+				s++
+			case x == 60:
+				s += 2
+				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+					return decodeErrCodeCorrupt
+				}
+				x = uint32(src[s-1])
+			case x == 61:
+				s += 3
+				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+					return decodeErrCodeCorrupt
+				}
+				x = uint32(src[s-2]) | uint32(src[s-1])<<8
+			case x == 62:
+				s += 4
+				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+					return decodeErrCodeCorrupt
+				}
+				x = uint32(src[s-3]) | uint32(src[s-2])<<8 | uint32(src[s-1])<<16
+			case x == 63:
+				s += 5
+				if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+					return decodeErrCodeCorrupt
+				}
+				x = uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24
+			}
+			length = int(x) + 1
+			if length <= 0 {
+				return decodeErrCodeUnsupportedLiteralLength
+			}
+			if length > len(dst)-d || length > len(src)-s {
+				return decodeErrCodeCorrupt
+			}
+			copy(dst[d:], src[s:s+length])
+			d += length
+			s += length
+			continue
+
+		case tagCopy1:
+			s += 2
+			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+				return decodeErrCodeCorrupt
+			}
+			length = 4 + int(src[s-2])>>2&0x7
+			offset = int(uint32(src[s-2])&0xe0<<3 | uint32(src[s-1]))
+
+		case tagCopy2:
+			s += 3
+			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+				return decodeErrCodeCorrupt
+			}
+			length = 1 + int(src[s-3])>>2
+			offset = int(uint32(src[s-2]) | uint32(src[s-1])<<8)
+
+		case tagCopy4:
+			s += 5
+			if uint(s) > uint(len(src)) { // The uint conversions catch overflow from the previous line.
+				return decodeErrCodeCorrupt
+			}
+			length = 1 + int(src[s-5])>>2
+			offset = int(uint32(src[s-4]) | uint32(src[s-3])<<8 | uint32(src[s-2])<<16 | uint32(src[s-1])<<24)
+		}
+
+		if offset <= 0 || d < offset || length > len(dst)-d {
+			return decodeErrCodeCorrupt
+		}
+		// Copy from an earlier sub-slice of dst to a later sub-slice. Unlike
+		// the built-in copy function, this byte-by-byte copy always runs
+		// forwards, even if the slices overlap. Conceptually, this is:
+		//
+		// d += forwardCopy(dst[d:d+length], dst[d-offset:])
+		for end := d + length; d != end; d++ {
+			dst[d] = dst[d-offset]
+		}
+	}
+	if d != len(dst) {
+		return decodeErrCodeCorrupt
+	}
+	return 0
+}
--- a/vendor/github.com/klauspost/compress/snappy/encode.go
+++ b/vendor/github.com/klauspost/compress/snappy/encode.go
@@ -0,0 +1,285 @@
+// Copyright 2011 The Snappy-Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package snappy
+
+import (
+	"encoding/binary"
+	"errors"
+	"io"
+)
+
+// Encode returns the encoded form of src. The returned slice may be a sub-
+// slice of dst if dst was large enough to hold the entire encoded block.
+// Otherwise, a newly allocated slice will be returned.
+//
+// The dst and src must not overlap. It is valid to pass a nil dst.
+func Encode(dst, src []byte) []byte {
+	if n := MaxEncodedLen(len(src)); n < 0 {
+		panic(ErrTooLarge)
+	} else if len(dst) < n {
+		dst = make([]byte, n)
+	}
+
+	// The block starts with the varint-encoded length of the decompressed bytes.
+	d := binary.PutUvarint(dst, uint64(len(src)))
+
+	for len(src) > 0 {
+		p := src
+		src = nil
+		if len(p) > maxBlockSize {
+			p, src = p[:maxBlockSize], p[maxBlockSize:]
+		}
+		if len(p) < minNonLiteralBlockSize {
+			d += emitLiteral(dst[d:], p)
+		} else {
+			d += encodeBlock(dst[d:], p)
+		}
+	}
+	return dst[:d]
+}
+
+// inputMargin is the minimum number of extra input bytes to keep, inside
+// encodeBlock's inner loop. On some architectures, this margin lets us
+// implement a fast path for emitLiteral, where the copy of short (<= 16 byte)
+// literals can be implemented as a single load to and store from a 16-byte
+// register. That literal's actual length can be as short as 1 byte, so this
+// can copy up to 15 bytes too much, but that's OK as subsequent iterations of
+// the encoding loop will fix up the copy overrun, and this inputMargin ensures
+// that we don't overrun the dst and src buffers.
+const inputMargin = 16 - 1
+
+// minNonLiteralBlockSize is the minimum size of the input to encodeBlock that
+// could be encoded with a copy tag. This is the minimum with respect to the
+// algorithm used by encodeBlock, not a minimum enforced by the file format.
+//
+// The encoded output must start with at least a 1 byte literal, as there are
+// no previous bytes to copy. A minimal (1 byte) copy after that, generated
+// from an emitCopy call in encodeBlock's main loop, would require at least
+// another inputMargin bytes, for the reason above: we want any emitLiteral
+// calls inside encodeBlock's main loop to use the fast path if possible, which
+// requires being able to overrun by inputMargin bytes. Thus,
+// minNonLiteralBlockSize equals 1 + 1 + inputMargin.
+//
+// The C++ code doesn't use this exact threshold, but it could, as discussed at
+// https://groups.google.com/d/topic/snappy-compression/oGbhsdIJSJ8/discussion
+// The difference between Go (2+inputMargin) and C++ (inputMargin) is purely an
+// optimization. It should not affect the encoded form. This is tested by
+// TestSameEncodingAsCppShortCopies.
+const minNonLiteralBlockSize = 1 + 1 + inputMargin
+
+// MaxEncodedLen returns the maximum length of a snappy block, given its
+// uncompressed length.
+//
+// It will return a negative value if srcLen is too large to encode.
+func MaxEncodedLen(srcLen int) int {
+	n := uint64(srcLen)
+	if n > 0xffffffff {
+		return -1
+	}
+	// Compressed data can be defined as:
+	//    compressed := item* literal*
+	//    item       := literal* copy
+	//
+	// The trailing literal sequence has a space blowup of at most 62/60
+	// since a literal of length 60 needs one tag byte + one extra byte
+	// for length information.
+	//
+	// Item blowup is trickier to measure. Suppose the "copy" op copies
+	// 4 bytes of data. Because of a special check in the encoding code,
+	// we produce a 4-byte copy only if the offset is < 65536. Therefore
+	// the copy op takes 3 bytes to encode, and this type of item leads
+	// to at most the 62/60 blowup for representing literals.
+	//
+	// Suppose the "copy" op copies 5 bytes of data. If the offset is big
+	// enough, it will take 5 bytes to encode the copy op. Therefore the
+	// worst case here is a one-byte literal followed by a five-byte copy.
+	// That is, 6 bytes of input turn into 7 bytes of "compressed" data.
+	//
+	// This last factor dominates the blowup, so the final estimate is:
+	n = 32 + n + n/6
+	if n > 0xffffffff {
+		return -1
+	}
+	return int(n)
+}
+
+var errClosed = errors.New("snappy: Writer is closed")
+
+// NewWriter returns a new Writer that compresses to w.
+//
+// The Writer returned does not buffer writes. There is no need to Flush or
+// Close such a Writer.
+//
+// Deprecated: the Writer returned is not suitable for many small writes, only
+// for few large writes. Use NewBufferedWriter instead, which is efficient
+// regardless of the frequency and shape of the writes, and remember to Close
+// that Writer when done.
+func NewWriter(w io.Writer) *Writer {
+	return &Writer{
+		w:    w,
+		obuf: make([]byte, obufLen),
+	}
+}
+
+// NewBufferedWriter returns a new Writer that compresses to w, using the
+// framing format described at
+// https://github.com/google/snappy/blob/master/framing_format.txt
+//
+// The Writer returned buffers writes. Users must call Close to guarantee all
+// data has been forwarded to the underlying io.Writer. They may also call
+// Flush zero or more times before calling Close.
+func NewBufferedWriter(w io.Writer) *Writer {
+	return &Writer{
+		w:    w,
+		ibuf: make([]byte, 0, maxBlockSize),
+		obuf: make([]byte, obufLen),
+	}
+}
+
+// Writer is an io.Writer that can write Snappy-compressed bytes.
+type Writer struct {
+	w   io.Writer
+	err error
+
+	// ibuf is a buffer for the incoming (uncompressed) bytes.
+	//
+	// Its use is optional. For backwards compatibility, Writers created by the
+	// NewWriter function have ibuf == nil, do not buffer incoming bytes, and
+	// therefore do not need to be Flush'ed or Close'd.
+	ibuf []byte
+
+	// obuf is a buffer for the outgoing (compressed) bytes.
+	obuf []byte
+
+	// wroteStreamHeader is whether we have written the stream header.
+	wroteStreamHeader bool
+}
+
+// Reset discards the writer's state and switches the Snappy writer to write to
+// w. This permits reusing a Writer rather than allocating a new one.
+func (w *Writer) Reset(writer io.Writer) {
+	w.w = writer
+	w.err = nil
+	if w.ibuf != nil {
+		w.ibuf = w.ibuf[:0]
+	}
+	w.wroteStreamHeader = false
+}
+
+// Write satisfies the io.Writer interface.
+func (w *Writer) Write(p []byte) (nRet int, errRet error) {
+	if w.ibuf == nil {
+		// Do not buffer incoming bytes. This does not perform or compress well
+		// if the caller of Writer.Write writes many small slices. This
+		// behavior is therefore deprecated, but still supported for backwards
+		// compatibility with code that doesn't explicitly Flush or Close.
+		return w.write(p)
+	}
+
+	// The remainder of this method is based on bufio.Writer.Write from the
+	// standard library.
+
+	for len(p) > (cap(w.ibuf)-len(w.ibuf)) && w.err == nil {
+		var n int
+		if len(w.ibuf) == 0 {
+			// Large write, empty buffer.
+			// Write directly from p to avoid copy.
+			n, _ = w.write(p)
+		} else {
+			n = copy(w.ibuf[len(w.ibuf):cap(w.ibuf)], p)
+			w.ibuf = w.ibuf[:len(w.ibuf)+n]
+			w.Flush()
+		}
+		nRet += n
+		p = p[n:]
+	}
+	if w.err != nil {
+		return nRet, w.err
+	}
+	n := copy(w.ibuf[len(w.ibuf):cap(w.ibuf)], p)
+	w.ibuf = w.ibuf[:len(w.ibuf)+n]
+	nRet += n
+	return nRet, nil
+}
+
+func (w *Writer) write(p []byte) (nRet int, errRet error) {
+	if w.err != nil {
+		return 0, w.err
+	}
+	for len(p) > 0 {
+		obufStart := len(magicChunk)
+		if !w.wroteStreamHeader {
+			w.wroteStreamHeader = true
+			copy(w.obuf, magicChunk)
+			obufStart = 0
+		}
+
+		var uncompressed []byte
+		if len(p) > maxBlockSize {
+			uncompressed, p = p[:maxBlockSize], p[maxBlockSize:]
+		} else {
+			uncompressed, p = p, nil
+		}
+		checksum := crc(uncompressed)
+
+		// Compress the buffer, discarding the result if the improvement
+		// isn't at least 12.5%.
+		compressed := Encode(w.obuf[obufHeaderLen:], uncompressed)
+		chunkType := uint8(chunkTypeCompressedData)
+		chunkLen := 4 + len(compressed)
+		obufEnd := obufHeaderLen + len(compressed)
+		if len(compressed) >= len(uncompressed)-len(uncompressed)/8 {
+			chunkType = chunkTypeUncompressedData
+			chunkLen = 4 + len(uncompressed)
+			obufEnd = obufHeaderLen
+		}
+
+		// Fill in the per-chunk header that comes before the body.
+		w.obuf[len(magicChunk)+0] = chunkType
+		w.obuf[len(magicChunk)+1] = uint8(chunkLen >> 0)
+		w.obuf[len(magicChunk)+2] = uint8(chunkLen >> 8)
+		w.obuf[len(magicChunk)+3] = uint8(chunkLen >> 16)
+		w.obuf[len(magicChunk)+4] = uint8(checksum >> 0)
+		w.obuf[len(magicChunk)+5] = uint8(checksum >> 8)
+		w.obuf[len(magicChunk)+6] = uint8(checksum >> 16)
+		w.obuf[len(magicChunk)+7] = uint8(checksum >> 24)
+
+		if _, err := w.w.Write(w.obuf[obufStart:obufEnd]); err != nil {
+			w.err = err
+			return nRet, err
+		}
+		if chunkType == chunkTypeUncompressedData {
+			if _, err := w.w.Write(uncompressed); err != nil {
+				w.err = err
+				return nRet, err
+			}
+		}
+		nRet += len(uncompressed)
+	}
+	return nRet, nil
+}
+
+// Flush flushes the Writer to its underlying io.Writer.
+func (w *Writer) Flush() error {
+	if w.err != nil {
+		return w.err
+	}
+	if len(w.ibuf) == 0 {
+		return nil
+	}
+	w.write(w.ibuf)
+	w.ibuf = w.ibuf[:0]
+	return w.err
+}
+
+// Close calls Flush and then closes the Writer.
+func (w *Writer) Close() error {
+	w.Flush()
+	ret := w.err
+	if w.err == nil {
+		w.err = errClosed
+	}
+	return ret
+}
--- a/vendor/github.com/klauspost/compress/snappy/encode_amd64.go
+++ b/vendor/github.com/klauspost/compress/snappy/encode_amd64.go
@@ -0,0 +1,29 @@
+// Copyright 2016 The Snappy-Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build !appengine
+// +build gc
+// +build !noasm
+
+package snappy
+
+// emitLiteral has the same semantics as in encode_other.go.
+//
+//go:noescape
+func emitLiteral(dst, lit []byte) int
+
+// emitCopy has the same semantics as in encode_other.go.
+//
+//go:noescape
+func emitCopy(dst []byte, offset, length int) int
+
+// extendMatch has the same semantics as in encode_other.go.
+//
+//go:noescape
+func extendMatch(src []byte, i, j int) int
+
+// encodeBlock has the same semantics as in encode_other.go.
+//
+//go:noescape
+func encodeBlock(dst, src []byte) (d int)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Aliaksandr Valialkin	0b488f1e37	lib/storage: do not change timestamps to constant rate if values are constant or have constant delta This breaks the original timestamps, which results in issues like https://github.com/VictoriaMetrics/VictoriaMetrics/issues/120 and https://github.com/VictoriaMetrics/VictoriaMetrics/issues/141 .	2019-08-06 15:40:07 +03:00
Aliaksandr Valialkin	b8bb74ffc6	app/vmstorage: add `vm_concurrent_addrows_*` metrics for tracking concurrency for Storage.AddRows calls Track also the number of dropped rows due to the exceeded timeout on concurrency limit for Storage.AddRows. This number is tracked in `vm_concurrent_addrows_dropped_rows_total`	2019-08-06 15:08:33 +03:00
Aliaksandr Valialkin	5c9e48417a	vendor: update github.com/VictoriaMetrics/metrics to v1.7.1	2019-08-05 19:21:36 +03:00
Aliaksandr Valialkin	5c83f8e203	app: add `vm_concurrent_` metrics for visibility in concurrency limiters for vminsert and vmselect	2019-08-05 18:30:57 +03:00
Aliaksandr Valialkin	05713469c3	vendor: `make vendor-update`	2019-08-05 10:33:21 +03:00
Aliaksandr Valialkin	8822079b77	lib/storage: properly reset `partSearch.fetchData` in `partSearch.reset`	2019-08-05 09:56:06 +03:00
Aliaksandr Valialkin	99e048c9df	app/vmselect: allow passing `match[]`, `start` and `time` to `/api/v1/label/<label_name>/values` `/api/v1/label/<label_name>/values?match[]=q` emulates emulates `label_values(q, <label_name>)` call in Grafana templating.	2019-08-04 23:09:21 +03:00
Aliaksandr Valialkin	47e4b50112	app/vmselect: optimize `/api/v1/series` by skipping storage data Fetch and process only time series metainfo.	2019-08-04 23:01:28 +03:00
Aliaksandr Valialkin	241170dc05	app/vmselect/prometheus: prevent from fetching and scanning all the data on `/api/v1/searies` call by default	2019-08-04 19:42:36 +03:00
Aliaksandr Valialkin	1c69f4eadc	app/vmselect/promql: tune automatic window adjustement Increase the windows adjustement for small scrape intervals, since they usually have higher jitter. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/139 Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134	2019-08-04 19:34:05 +03:00
Aliaksandr Valialkin	8d93b15b86	app/vmselect/promql: further increase the allowed jitter for scrape interval Real-world production data shows higher jitter than 1/8 of scrape interval. This may results in gaps on the graph. So increase the allowed jitter to 1/4 of scrape interval in order to reduce the probability of gaps on the graphs over time series with high jitter for scrape_interval.	2019-08-02 20:10:23 +03:00
Aliaksandr Valialkin	fcc166622a	README.md: mention that monitoring is recommended for VictoriaMetrics	2019-08-02 15:27:10 +03:00
Aliaksandr Valialkin	a9f39168d2	app/vminsert/influx: round automatically generated timestamp according to the given `precision` arg	2019-08-02 00:24:06 +03:00
Aliaksandr Valialkin	f090b2e917	app/vmselect/promql: tolerate higher jitter in scrape interval Allow jitter for up to 1/8 instead of 1/16 for the scrape interval. This should imrpove graphs when `step` is smaller than the `scrape_interval`.	2019-08-01 23:26:00 +03:00
Aliaksandr Valialkin	10caad4728	lib/decimal: modernize tests a bit	2019-07-31 21:10:03 +03:00
Aliaksandr Valialkin	3b90c2a99a	Add CODE_OF_CONDUCT.md	2019-07-31 15:44:26 +03:00
Aliaksandr Valialkin	57ec4f5f92	Update issue templates Add a template for feature request	2019-07-31 15:41:57 +03:00
Aliaksandr Valialkin	01cb15b6f5	Update issue templates Add a template for bug report.	2019-07-31 15:39:41 +03:00
Aliaksandr Valialkin	b9256511e8	README.md: add `join slack` badge	2019-07-31 15:27:11 +03:00
Aliaksandr Valialkin	3a38b23fa3	app/vmselect/promql: add `vm_slow_queries_total` metric for counting slow queries The query is slow if its execution time exceeds `-search.logSlowQueryDuration`	2019-07-31 03:36:37 +03:00
Aliaksandr Valialkin	8bd6f1f6df	app/vmselect/promql: return NaN from histogram_quantile if at least a single bucket is broken	2019-07-31 01:18:07 +03:00
Aliaksandr Valialkin	4aaa5c2efc	app/vmselect/promql: allow adjusting window for default rollup function Default rollup function is `last_over_time`. It must support adjusting the provided window in order to prevent from gaps on the graph for window values smaller than scrape interval. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134	2019-07-31 00:45:54 +03:00
Aliaksandr Valialkin	10f5a26bec	app/vmselect/promql: return NaN values if invalid bucket counts are passed to histogram_quantile Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/136	2019-07-30 22:05:10 +03:00
Aliaksandr Valialkin	c14fd6c43f	lib/storage: typo fixes after `a77e88db7d`	2019-07-30 15:38:52 +03:00
Aliaksandr Valialkin	a77e88db7d	lib/storage: fix matching against tag filter with empty name Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/137	2019-07-30 15:15:09 +03:00
Aliaksandr Valialkin	aad7236e5d	README.md: formatting fixes	2019-07-28 22:02:42 +03:00
Artem Navoiev	5e5de6be9a	Create CONTRIBUTING.md	2019-07-28 20:42:32 +03:00
Anton Patsev	90cf6f3fcb	change /usr/bin/victoriametrics to /usr/bin/victoria-metrics-prod (#132 )	2019-07-28 20:40:46 +03:00
Artem Navoiev	8e3d69219f	Add roadmap (#130 ) * Add roadmap * Fix typos	2019-07-28 18:39:39 +01:00
Aliaksandr Valialkin	b842a2eccc	README.md: mention that VictoriaMetrics needs free disk space for background merges	2019-07-28 12:26:16 +03:00
Aliaksandr Valialkin	afcc7fb167	app/vmselect/netstorage: improve error message when reading data blocks from storage Mention the block number in the error. This should simplify troubleshooting in this code.	2019-07-28 12:12:35 +03:00
Aliaksandr Valialkin	57a57c711a	package: changed the remaining `/usr/local/bin` to `/usr/bin` This is a follo-up after `68f260d878`	2019-07-28 11:08:07 +03:00
Anton Patsev	68f260d878	change /usr/local/bin to /usr/bin (#131 )	2019-07-28 11:06:24 +03:00
Aliaksandr Valialkin	1eade9b358	app/vminsert: add `vm_rows_per_insert` summary metric This metric should help tuning batch sizes on clients writing data to VictoriaMetrics	2019-07-27 13:21:46 +03:00
Aliaksandr Valialkin	7e8747f6ed	README.md: add a section for production ARM build	2019-07-26 22:34:31 +03:00
Aliaksandr Valialkin	0168a1b658	package: various fixes - Use `-prod` binaries instead of development binaries for both deb and rpm packages. - Fix binary directory from /usr/sbin to /usr/local/bin as outlined in package/victoria-metrics.service - Fix binary name from `victoriametrics` to `victoria-metrics-prod` in package/victoria-metrics.service	2019-07-26 22:31:04 +03:00
Aliaksandr Valialkin	bf6cbb762c	app/vminsert: improve error messages for Influx, OpenTSDB and Graphite parsing Include in the error message the line which failed to parse. Updates https://github.com/VictoriaMetrics/VictoriaMetrics/issues/127	2019-07-26 22:08:52 +03:00
Kostya Vasilyev	6aeac37fc5	pick up .service file from ./rpm (#126 ) * pick up .service file from ./rpm * feedback from @patsevanton * remove 'start' from ExecStart command	2019-07-26 21:56:30 +03:00
Aliaksandr Valialkin	c98725db55	app/vmstorage: consistency renaming for `ignored rows` metrics vm_too_big_timestamp_rows_total -> vm_rows_ignored_total{reason="big_timestamp"} vm_too_small_timestamp_rows_total -> vm_rows_ignored_total{reason="small_timestamp"}	2019-07-26 20:02:06 +03:00
Anton Patsev	d8043f7161	Change default value storageDataPath (#125 ) Fixes #124 .	2019-07-26 14:13:55 +03:00
Aliaksandr Valialkin	f586e1f83c	lib/storage: add metrics for calculating skipped rows outside the retention The metrics are: - vm_too_big_timestamp_rows_total - vm_too_small_timestamp_rows_total	2019-07-26 14:11:01 +03:00
Kostya Vasilyev	d1132bb188	deb packaging fixes: 1) stop the service in prerm 2) reload services in postrm (#123 )	2019-07-26 12:38:59 +03:00
Aliaksandr Valialkin	915fb6df79	README.md: mention that arm builds can run on Raspberry Pi	2019-07-26 12:28:40 +03:00
Kostya Vasilyev	89eb6d78a4	RPM packaging (#122 )	2019-07-25 23:47:41 +03:00
Aliaksandr Valialkin	17096b5750	app/vmselect/promql: return NaN from `count()` over zero time series This aligns `count` behavior with Prometheus.	2019-07-25 22:02:30 +03:00
Aliaksandr Valialkin	66efa5745f	app/vmselect/promql: properly calculate incremental aggregations grouped by `__name__` Previously the following query may fail on multiple distinct metric names match: sum(count_over_time{__name__!=''}) by (__name__)	2019-07-25 21:53:20 +03:00
Anton Patsev	106ab78a47	Add package/rpm/ (#121 )	2019-07-25 11:21:55 +03:00
Aliaksandr Valialkin	8aa474d685	README.md: move `how to build VictoriaMetrics` section to the bottom This streamlines `getting started` experience	2019-07-25 11:17:30 +03:00
Aliaksandr Valialkin	9e059bb330	README.md: add links to `ARM build` and `Pure Go build` in TOC	2019-07-25 11:05:35 +03:00
Aliaksandr Valialkin	2346335ea6	README.md: moved advanced topics to the bottom, so they don't clutter `getting started` workflow	2019-07-25 11:00:41 +03:00
Aliaksandr Valialkin	b339890dca	lib/encoding/zstd: `go fmt`	2019-07-25 01:37:16 +03:00
Aliaksandr Valialkin	6c4ca89d75	lib/encoding/zstd: disable CRC checks in `pure Go` build This should give slightly better compression and decompressions performance. Additionally this shaves off 4 bytes per each compressed block.	2019-07-24 19:17:16 +03:00
Roman Khavronenko	f0fe7b5ad6	fix typo (#117 )	2019-07-24 07:48:28 +01:00
Aliaksandr Valialkin	22ed4e7fd4	vendor: `make vendor-update`	2019-07-23 20:00:19 +03:00
Aliaksandr Valialkin	162f1fb1b7	all: small updates after PR #114	2019-07-23 19:54:50 +03:00
Aliaksandr Valialkin	d07f616609	lib/encoding: small fixes in tests after the PR #114	2019-07-23 19:37:51 +03:00
Roman Khavronenko	5bf4e5ffb5	all: add Pure Go build (pull request #114 ) Updates #94	2019-07-23 19:26:39 +03:00
Kostya Vasilyev	8c3629a892	Debian packaging (#116 ) * initial commit of deb packaging * Incorporated feedback from @valyala: - Put data directory under /var/lib - More beef in systemd file - Packaging for arm64 - Package all target which builds and packages both amd64 and arm64 * Remove PIDFile from systemd unit, useless * per PR feedback, move debian specific files into deb subdirectory Updates #107 .	2019-07-22 17:12:48 +03:00
Aliaksandr Valialkin	ea07cf68ba	README.md: add `querying Graphite data` section Mention that Graphite data may be read either via Prometheus querying API or via go-graphite/carbonapi. See https://github.com/go-graphite/carbonapi/blob/master/cmd/carbonapi/carbonapi.example.prometheus.yaml	2019-07-21 16:10:19 +03:00
Roman Khavronenko	4ee41bab43	add versioning to dashboard description (#113 )	2019-07-21 14:34:50 +03:00
Roman Khavronenko	1273f31f19	Add CPU usage panel; rename `Go runtime` to `Resource usage` (#112 ) * add CPU usage panel; rename `Go runtime` to `Resource usage` * rm irate from CPU usage panel Updates #92 .	2019-07-20 17:24:24 +03:00
Aliaksandr Valialkin	0f2ecde0e6	lib/encoding: improve gauge series detection - Series with negative values are always gauges - Counters may only have increasing values with possible counter resets This should improve compression ratio for gauge series which were previously mistakenly detected as counters.	2019-07-20 14:05:09 +03:00
Aliaksandr Valialkin	6cd77d4847	deployment: switch builder from go1.12.6 to go1.12.7	2019-07-20 12:15:05 +03:00
Roman Khavronenko	fb14f23532	mention docker-compose as option to spin up VM (#97 )	2019-07-16 00:45:21 +03:00