WriteBatchWithIndex::DeleteRange returns Status::NotSupported. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes rocksdb_writebatch_wi_delete_range, rocksdb_writebatch_wi_delete_range_cf, rocksdb_writebatch_wi_delete_rangev, and rocksdb_writebatch_wi_delete_rangev_cf. In Java, that includes WriteBatchWithIndex::deleteRange.BLOB_DB_GC_NUM_FILES (number of blob files obsoleted during GC), BLOB_DB_GC_NUM_NEW_FILES (number of new blob files generated during GC), BLOB_DB_GC_FAILURES (number of failed GC passes), BLOB_DB_GC_NUM_KEYS_RELOCATED (number of blobs relocated during GC), and BLOB_DB_GC_BYTES_RELOCATED (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: BLOB_DB_GC_NUM_KEYS_OVERWRITTEN, BLOB_DB_GC_NUM_KEYS_EXPIRED, BLOB_DB_GC_BYTES_OVERWRITTEN, BLOB_DB_GC_BYTES_EXPIRED, and BLOB_DB_GC_MICROS.db_bench now supports value_size_distribution_type, value_size_min, value_size_max options for generating random variable sized value. Added blob_db_compression_type option for BlobDB to enable blob compression.OptimisticTransactionDBOptions Option that allows users to configure occ validation policy. The default policy changes from kValidateSerial to kValidateParallel to reduce mutex contention.max_background_jobs dynamically through the SetDBOptions interface.enable_garbage_collection is set to true in BlobDBOptions. Garbage collection is performed during compaction: any valid blobs located in the oldest N files (where N is the number of non-TTL blob files multiplied by the value of BlobDBOptions::garbage_collection_cutoff) encountered during compaction get relocated to new blob files, and old blob files are dropped once they are no longer needed. Note: we recommend enabling periodic compactions for the base DB when using this feature to deal with the case when some old blob files are kept alive by SSTs that otherwise do not get picked for compaction.db_bench now supports the garbage_collection_cutoff option for BlobDB.creation_time of new compaction outputs.ColumnFamilyHandle pointers themselves instead of only the column family IDs when checking whether an API call uses the default column family or not.GetLiveFilesMetaData and GetColumnFamilyMetaData now expose the file number of SST files as well as the oldest blob file referenced by each SST.sst_dump command line tool recompress command now displays how many blocks were compressed and how many were not, in particular how many were not compressed because the compression ratio was not met (12.5% threshold for GoodCompressionRatio), as seen in the number.block.not_compressed counter stat since version 6.0.0.db_bench now supports and by default issues non-TTL Puts to BlobDB. TTL Puts can be enabled by specifying a non-zero value for the blob_db_max_ttl_range command line parameter explicitly.sst_dump now supports printing BlobDB blob indexes in a human-readable format. This can be enabled by specifying the decode_blob_index flag on the command line.creation_time table property for compaction output files is now set to the minimum of the creation times of all compaction inputs.LevelAndStyleCustomFilterPolicy in db_bloom_filter_test.cc. While most existing custom implementations of FilterPolicy should continue to work as before, those wrapping the return of NewBloomFilterPolicy will require overriding new function GetBuilderWithContext(), because calling GetFilterBitsBuilder() on the FilterPolicy returned by NewBloomFilterPolicy is no longer supported.snap_refresh_nanos option.UINT64_MAX - 1 which allows RocksDB to auto-tune periodic compaction scheduling. When using the default value, periodic compactions are now auto-enabled if a compaction filter is used. A value of 0 will turn off the feature completely.UINT64_MAX - 1 which allows RocksDB to auto-tune ttl value. When using the default value, TTL will be auto-enabled to 30 days, when the feature is supported. To revert the old behavior, you can explicitly set it to 0.snap_refresh_nanos is set to 0..memtable_insert_hint_per_batch to WriteOptions. If it is true, each WriteBatch will maintain its own insert hints for each memtable in concurrent write. See include/rocksdb/options.h for more details.--secondary_path to ldb to open the database as the secondary instance. This would keep the original DB intact.RegisterCustomObjects function. By linking the unit test binary with the static library, the unit test can execute this function.snap_refresh_nanos (default to 0) to periodically refresh the snapshot list in compaction jobs. Assign to 0 to disable the feature.unordered_write which trades snapshot guarantees with higher write throughput. When used with WRITE_PREPARED transactions with two_write_queues=true, it offers higher throughput with however no compromise on guarantees.failed_move_fall_back_to_copy (default is true) for external SST ingestion. When move_files is true and hard link fails, ingestion falls back to copy if failed_move_fall_back_to_copy is true. Otherwise, ingestion reports an error.list_file_range_deletes in ldb, which prints out tombstones in SST files.Puts covered by range tombstones to reappear. Note Puts may exist even if the user only ever called Merge() due to an internal conversion during compaction to the bottommost level.strict_bytes_per_sync that causes a file-writing thread to block rather than exceed the limit on bytes pending writeback specified by bytes_per_sync or wal_bytes_per_sync.IsFlushPending() == true caused by one bg thread releasing the db mutex in ~ColumnFamilyData and another thread clearing flush_requested_ flag.cache_index_and_filter_blocks == true, we now store dictionary data used for decompression in the block cache for better control over memory usage. For users of ZSTD v1.1.4+ who compile with -DZSTD_STATIC_LINKING_ONLY, this includes a digested dictionary, which is used to increase decompression speed.GetStatsHistory API to retrieve these snapshots.SstFileWriter will now use dictionary compression if it is configured in the file writer's CompressionOptions.TableProperties::num_entries and TableProperties::num_deletions now also account for number of range tombstones.number.block.not_compressed now also counts blocks not compressed due to poor compression ratio.CompactionOptionsFIFO. The option has been deprecated and ttl in ColumnFamilyOptions is used instead.NotFound point lookup result when querying the endpoint of a file that has been extended by a range tombstone.JemallocNodumpAllocator memory allocator. When being use, block cache will be excluded from core dump.PerfContextByLevel as part of PerfContext which allows storing perf context at each level. Also replaced __thread with thread_local keyword for perf_context. Added per-level perf context for bloom filter and Get query.atomic_flush. If true, RocksDB supports flushing multiple column families and atomically committing the result to MANIFEST. Useful when WAL is disabled.num_deletions and num_merge_operands members to TableProperties.MemoryAllocator, which lets the user specify custom memory allocator for block based table.DeleteRange to prevent read performance degradation. The feature is no longer marked as experimental.DBOptions::use_direct_reads now affects reads issued by BackupEngine on the database's SSTs.NO_ITERATORS is divided into two counters NO_ITERATOR_CREATED and NO_ITERATOR_DELETE. Both of them are only increasing now, just as other counters.NO_FILE_CLOSES ticker statistic, which was always zero previously.OnTableFileCreated will now be called for empty files generated during compaction. In that case, TableFileCreationInfo::file_path will be "(nil)" and TableFileCreationInfo::file_size will be zero.FlushOptions::allow_write_stall, which controls whether Flush calls start working immediately, even if it causes user writes to stall, or will wait until flush can be performed without causing write stall (similar to CompactRangeOptions::allow_write_stall). Note that the default value is false, meaning we add delay to Flush calls until stalling can be avoided when possible. This is behavior change compared to previous RocksDB versions, where Flush calls didn't check if they might cause stall or not.OnCompactionCompleted.CompactFiles run with CompactionOptions::compression == CompressionType::kDisableCompressionOption. Now that setting causes the compression type to be chosen according to the column family-wide compression options.MergeOperator::ShouldMerge in the reversed order relative to how they were merged (passed to FullMerge or FullMergeV2) for performance reasonsmax_num_ikeys.CompressionOptions::zstd_max_train_bytes to a nonzero value) now requires ZSTD version 1.1.3 or later.bottommost_compression_opts. To keep backward compatible, a new boolean enabled is added to CompressionOptions. For compression_opts, it will be always used no matter what value of enabled is. For bottommost_compression_opts, it will only be used when user set enabled=true, otherwise, compression_opts will be used for bottommost_compression as default.Statistics objects created via CreateDBStatistics(), the format of the string returned by its ToString() method has changed.ColumnFamilyOptions::ttl via SetOptions().bytes_max_delete_chunk to 0 in NewSstFileManager() as it doesn't work well with checkpoints.DBOptions::use_direct_io_for_flush_and_compaction only applies to background writes, and DBOptions::use_direct_reads applies to both user reads and background reads. This conforms with Linux's open(2) manpage, which advises against simultaneously reading a file in buffered and direct modes, due to possibly undefined behavior and degraded performance.CompressionOptions::kDefaultCompressionLevel, which is a generic way to tell RocksDB to use the compression library's default level. It is now the default value for CompressionOptions::level. Previously the level defaulted to -1, which gave poor compression ratios in ZSTD.Env::LowerThreadPoolCPUPriority(Priority) method, which lowers the CPU priority of background (esp. compaction) threads to minimize interference with foreground tasks.Env::SetBackgroundThreads(), compactions to the bottom level will be delegated to that thread pool.prefix_extractor has been moved from ImmutableCFOptions to MutableCFOptions, meaning it can be dynamically changed without a DB restart.BackupableDBOptions::max_valid_backups_to_open to not delete backup files when refcount cannot be accurately determined.BlockBasedTableConfig.setBlockCache to allow sharing a block cache across DB instances.ignore_unknown_options argument will only be effective if the option file shows it is generated using a higher version of RocksDB than the current version.CompactRange() when the range specified by the user does not overlap unflushed memtables.ColumnFamilyOptions::max_subcompactions is set greater than one, we now parallelize large manual level-based compactions.include_end option to make the range end exclusive when include_end == false in DeleteFilesInRange().CompactRangeOptions::allow_write_stall, which makes CompactRange start working immediately, even if it causes user writes to stall. The default value is false, meaning we add delay to CompactRange calls until stalling can be avoided when possible. Note this delay is not present in previous RocksDB versions.Status::InvalidArgument; previously, it returned Status::IOError.DeleteFilesInRanges() to delete files in multiple ranges at once for better performance.DisableFileDeletions() followed by GetSortedWalFiles() to not return obsolete WAL files that PurgeObsoleteFiles() is going to delete.autoTune and getBytesPerSecond() to RocksJava RateLimitermake with environment variable USE_SSE set and PORTABLE unset, will use all machine features available locally. Previously this combination only compiled SSE-related features.NUMBER_ITER_SKIP, which returns how many internal keys were skipped during iterations (e.g., due to being tombstones or duplicate versions of a key).key_lock_wait_count and key_lock_wait_time, which measure the number of times transactions wait on key locks and total amount of time waiting.IngestExternalFile() affecting databases with large number of SST files.DeleteFilesInRange() deletes a subset of files spanned by a DeleteRange() marker.BackupableDBOptions::max_valid_backups_to_open == 0 now means no backups will be opened during BackupEngine initialization. Previously this condition disabled limiting backups opened.DBOptions::preserve_deletes is a new option that allows one to specify that DB should not drop tombstones for regular deletes if they have sequence number larger than what was set by the new API call DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum). Disabled by default.DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum) was added, users who wish to preserve deletes are expected to periodically call this function to advance the cutoff seqnum (all deletes made before this seqnum can be dropped by DB). It's user responsibility to figure out how to advance the seqnum in the way so the tombstones are kept for the desired period of time, yet are eventually processed in time and don't eat up too much space.ReadOptions::iter_start_seqnum was added;
if set to something > 0 user will see 2 changes in iterators behavior 1) only keys written with sequence larger than this parameter would be returned and 2) the Slice returned by iter->key() now points to the memory that keep User-oriented representation of the internal key, rather than user key. New struct FullKey was added to represent internal keys, along with a new helper function ParseFullKey(const Slice& internal_key, FullKey* result);.crc32c_3way on supported platforms to improve performance. The system will choose to use this algorithm on supported platforms automatically whenever possible. If PCLMULQDQ is not supported it will fall back to the old Fast_CRC32 algorithm.DBOptions::writable_file_max_buffer_size can now be changed dynamically.DBOptions::bytes_per_sync, DBOptions::compaction_readahead_size, and DBOptions::wal_bytes_per_sync can now be changed dynamically, DBOptions::wal_bytes_per_sync will flush all memtables and switch to a new WAL file.true to the auto_tuned parameter in NewGenericRateLimiter(). The value passed as rate_bytes_per_sec will still be respected as an upper-bound.ColumnFamilyOptions::compaction_options_fifo.EventListener::OnStallConditionsChanged() callback. Users can implement it to be notified when user writes are stalled, stopped, or resumed.ReadOptions::iterate_lower_bound.DB:Open() will abort if column family inconsistency is found during PIT recovery.DeleteRange().Statistics::getHistogramString() will see fewer histogram buckets and different bucket endpoints.Slice::compare and BytewiseComparator Compare no longer accept Slices containing nullptr.Transaction::Get and Transaction::GetForUpdate variants with PinnableSlice added.Env::SetBackgroundThreads(N, Env::Priority::BOTTOM), where N > 0.MergeOperator::AllowSingleOperand.DB::VerifyChecksum(), which verifies the checksums in all SST files in a running DB.BlockBasedTableOptions::checksum = kNoChecksum.rocksdb.db.get.micros, rocksdb.db.write.micros, and rocksdb.sst.read.micros.EventListener::OnBackgroundError() callback. Users can implement it to be notified of errors causing the DB to enter read-only mode, and optionally override them.DeleteRange() is used together with subcompactions.max_background_flushes=0. Instead, users can achieve this by configuring their high-pri thread pool to have zero threads.Options::max_background_flushes, Options::max_background_compactions, and Options::base_background_compactions all with Options::max_background_jobs, which automatically decides how many threads to allocate towards flush/compaction.IOStatsContext iostats_context with IOStatsContext* get_iostats_context(); replace global variable PerfContext perf_context with PerfContext* get_perf_context().DB::IngestExternalFile() now supports ingesting files into a database containing range deletions.max_open_files option via SetDBOptions()GetAllKeyVersions to see internal versions of a range of keys.allow_ingest_behindstats_dump_period_sec option via SetDBOptions().delete_obsolete_files_period_micros option via SetDBOptions().delayed_write_rate and max_total_wal_size options via SetDBOptions().delayed_write_rate option via SetDBOptions().const WriteEntry&make rocksdbjavastatic.StackableDB::GetRawDB() to StackableDB::GetBaseDB().WriteBatch::Data() const std::string& Data() const.TableStats to TableProperties.PrefixHashRepFactory. Please use NewHashSkipListRepFactory() instead.EnableFileDeletions() and DisableFileDeletions().DB::GetOptions().DB::GetDbIdentity().SliceParts - Variant of Put() that gathers output like writev(2)Get() -- 1fdb3f -- 1.5x QPS increase for some workloads