Client

Key	Default	isDynamic	Description	Since	Deprecated
celeborn.client.adaptive.optimizeSkewedPartitionRead.enabled	false	false	If this is true, Celeborn will adaptively split skewed partitions instead of reading them by Spark map range. Please note that this feature requires the `Celeborn-Optimize-Skew-Partitions-spark3_3.patch`.	0.6.0
celeborn.client.application.heartbeatInterval	10s	false	Interval for client to send heartbeat message to master.	0.3.0	celeborn.application.heartbeatInterval
celeborn.client.application.unregister.enabled	true	false	When true, Celeborn client will inform celeborn master the application is already shutdown during client exit, this allows the cluster to release resources immediately, resulting in resource savings.	0.3.2
celeborn.client.application.uuidSuffix.enabled	false	false	Whether to add UUID suffix for application id for unique. When `true`, add UUID suffix for unique application id. Currently, this only applies to Spark and MR.	0.6.0
celeborn.client.chunk.prefetch.enabled	false	false	Whether to enable chunk prefetch when creating CelebornInputStream.	0.5.1
celeborn.client.closeIdleConnections	true	false	Whether client will close idle connections.	0.3.0
celeborn.client.commitFiles.ignoreExcludedWorker	false	false	When true, LifecycleManager will skip workers which are in the excluded list.	0.3.0
celeborn.client.eagerlyCreateInputStream.threads	32	false	Threads count for streamCreatorPool in CelebornShuffleReader.	0.3.1
celeborn.client.excludePeerWorkerOnFailure.enabled	true	false	When true, Celeborn will exclude partition's peer worker on failure when push data to replica failed.	0.3.0
celeborn.client.excludedWorker.expireTimeout	180s	false	Timeout time for LifecycleManager to clear reserved excluded worker. Default to be 1.5 * `celeborn.master.heartbeat.worker.timeout` to cover worker heartbeat timeout check period	0.3.0	celeborn.worker.excluded.expireTimeout
celeborn.client.fetch.buffer.size	64k	false	Size of reducer partition buffer memory for shuffle reader. The fetched data will be buffered in memory before consuming. For performance consideration keep this buffer size not less than `celeborn.client.push.buffer.max.size`.	0.4.0
celeborn.client.fetch.dfsReadChunkSize	8m	false	Max chunk size for DfsPartitionReader.	0.3.1
celeborn.client.fetch.excludeWorkerOnFailure.enabled	false	false	Whether to enable shuffle client-side fetch exclude workers on failure.	0.3.0
celeborn.client.fetch.excludedWorker.expireTimeout	<value of celeborn.client.excludedWorker.expireTimeout>	false	ShuffleClient is a static object, it will be used in the whole lifecycle of Executor, We give a expire time for excluded workers to avoid a transient worker issues.	0.3.0
celeborn.client.fetch.maxReqsInFlight	3	false	Amount of in-flight chunk fetch request.	0.3.0	celeborn.fetch.maxReqsInFlight
celeborn.client.fetch.maxRetriesForEachReplica	3	false	Max retry times of fetch chunk on each replica	0.3.0	celeborn.fetch.maxRetriesForEachReplica,celeborn.fetch.maxRetries
celeborn.client.fetch.timeout	600s	false	Timeout for a task to open stream and fetch chunk.	0.3.0	celeborn.fetch.timeout
celeborn.client.flink.compression.enabled	true	false	Whether to compress data in Flink plugin.	0.3.0	remote-shuffle.job.enable-data-compression
celeborn.client.flink.inputGate.concurrentReadings	2147483647	false	Max concurrent reading channels for a input gate.	0.3.0	remote-shuffle.job.concurrent-readings-per-gate
celeborn.client.flink.inputGate.memory	32m	false	Memory reserved for a input gate.	0.3.0	remote-shuffle.job.memory-per-gate
celeborn.client.flink.inputGate.supportFloatingBuffer	true	false	Whether to support floating buffer in Flink input gates.	0.3.0	remote-shuffle.job.support-floating-buffer-per-input-gate
celeborn.client.flink.metrics.scope.shuffle	<host>.taskmanager.<tm_id>.<job_name>.<task_name>.<subtask_index>.<shuffle_id>	false	Defines the scope format string that is applied to all metrics scoped to a shuffle. Only effective when a identifier-based reporter is configured	0.6.0
celeborn.client.flink.partitionConnectionException.enabled	false	false	If enabled, `org.apache.flink.runtime.io.network.partition.consumer.PartitionConnectionException` would be thrown when RemoteBufferStreamReader finds that the current exception is about connection failure, then Flink can be aware of the lost Celeborn server side nodes and be able to re-compute affected data.	0.6.0
celeborn.client.flink.resultPartition.memory	64m	false	Memory reserved for a result partition.	0.3.0	remote-shuffle.job.memory-per-partition
celeborn.client.flink.resultPartition.supportFloatingBuffer	true	false	Whether to support floating buffer for result partitions.	0.3.0	remote-shuffle.job.support-floating-buffer-per-output-gate
celeborn.client.flink.shuffle.fallback.policy	AUTO	false	Celeborn supports the following kind of fallback policies. 1. ALWAYS: always use flink built-in shuffle implementation; 2. AUTO: prefer to use celeborn shuffle implementation, and fallback to use flink built-in shuffle implementation based on certain factors, e.g. availability of enough workers and quota; 3. NEVER: always use celeborn shuffle implementation, and fail fast when it it is concluded that fallback is required based on factors above.	0.6.0
celeborn.client.inputStream.creation.window	16	false	Window size that CelebornShuffleReader pre-creates CelebornInputStreams, for coalesced scenario where multiple Partitions are read	0.5.1
celeborn.client.mr.pushData.max	32m	false	Max size for a push data sent from mr client.	0.4.0
celeborn.client.partition.reader.checkpoint.enabled	false	false	Whether or not checkpoint reads when re-creating a partition reader. Setting to true minimizes the amount of unnecessary reads during partition read retries	0.6.0
celeborn.client.push.buffer.initial.size	8k	false		0.3.0	celeborn.push.buffer.initial.size
celeborn.client.push.buffer.max.size	64k	false	Max size of reducer partition buffer memory for shuffle hash writer. The pushed data will be buffered in memory before sending to Celeborn worker. For performance consideration keep this buffer size higher than 32K. Example: If reducer amount is 2000, buffer size is 64K, then each task will consume up to `64KiB * 2000 = 125MiB` heap memory.	0.3.0	celeborn.push.buffer.max.size
celeborn.client.push.excludeWorkerOnFailure.enabled	false	false	Whether to enable shuffle client-side push exclude workers on failures.	0.3.0
celeborn.client.push.limit.inFlight.sleepInterval	50ms	false	Sleep interval when check netty in-flight requests to be done.	0.3.0	celeborn.push.limit.inFlight.sleepInterval
celeborn.client.push.limit.inFlight.timeout	<undefined>	false	Timeout for netty in-flight requests to be done. Default value should be `celeborn.client.push.timeout * 2`.	0.3.0	celeborn.push.limit.inFlight.timeout
celeborn.client.push.limit.strategy	SIMPLE	false	The strategy used to control the push speed. Valid strategies are SIMPLE and SLOWSTART. The SLOWSTART strategy usually works with congestion control mechanism on the worker side.	0.3.0
celeborn.client.push.maxReqsInFlight.perWorker	32	false	Amount of Netty in-flight requests per worker. Default max memory of in flight requests per worker is `celeborn.client.push.maxReqsInFlight.perWorker` * `celeborn.client.push.buffer.max.size` * compression ratio(1 in worst case): 64KiB * 32 = 2MiB. The maximum memory will not exceed `celeborn.client.push.maxReqsInFlight.total`.	0.3.0
celeborn.client.push.maxReqsInFlight.total	256	false	Amount of total Netty in-flight requests. The maximum memory is `celeborn.client.push.maxReqsInFlight.total` * `celeborn.client.push.buffer.max.size` * compression ratio(1 in worst case): 64KiB * 256 = 16MiB	0.3.0	celeborn.push.maxReqsInFlight
celeborn.client.push.queue.capacity	512	false	Push buffer queue size for a task. The maximum memory is `celeborn.client.push.buffer.max.size` * `celeborn.client.push.queue.capacity`, default: 64KiB * 512 = 32MiB	0.3.0	celeborn.push.queue.capacity
celeborn.client.push.replicate.enabled	false	false	When true, Celeborn worker will replicate shuffle data to another Celeborn worker asynchronously to ensure the pushed shuffle data won't be lost after the node failure. It's recommended to set `false` when `HDFS` is enabled in `celeborn.storage.availableTypes`.	0.3.0	celeborn.push.replicate.enabled
celeborn.client.push.retry.threads	8	false	Thread number to process shuffle re-send push data requests.	0.3.0	celeborn.push.retry.threads
celeborn.client.push.revive.batchSize	2048	false	Max number of partitions in one Revive request.	0.3.0
celeborn.client.push.revive.interval	100ms	false	Interval for client to trigger Revive to LifecycleManager. The number of partitions in one Revive request is `celeborn.client.push.revive.batchSize`.	0.3.0
celeborn.client.push.revive.maxRetries	5	false	Max retry times for reviving when celeborn push data failed.	0.3.0
celeborn.client.push.sendBufferPool.checkExpireInterval	30s	false	Interval to check expire for send buffer pool. If the pool has been idle for more than `celeborn.client.push.sendBufferPool.expireTimeout`, the pooled send buffers and push tasks will be cleaned up.	0.3.1
celeborn.client.push.sendBufferPool.expireTimeout	60s	false	Timeout before clean up SendBufferPool. If SendBufferPool is idle for more than this time, the send buffers and push tasks will be cleaned up.	0.3.1
celeborn.client.push.slowStart.initialSleepTime	500ms	false	The initial sleep time if the current max in flight requests is 0	0.3.0
celeborn.client.push.slowStart.maxSleepTime	2s	false	If celeborn.client.push.limit.strategy is set to SLOWSTART, push side will take a sleep strategy for each batch of requests, this controls the max sleep time if the max in flight requests limit is 1 for a long time	0.3.0
celeborn.client.push.sort.randomizePartitionId.enabled	false	false	Whether to randomize partitionId in push sorter. If true, partitionId will be randomized when sort data to avoid skew when push to worker	0.3.0	celeborn.push.sort.randomizePartitionId.enabled
celeborn.client.push.stageEnd.timeout	<value of celeborn.<module>.io.connectionTimeout>	false	Timeout for waiting StageEnd. During this process, there are `celeborn.client.requestCommitFiles.maxRetries` times for retry opportunities for committing files and 1 times for releasing slots request. User can customize this value according to your setting. By default, the value is the max timeout value `celeborn.<module>.io.connectionTimeout`.	0.3.0	celeborn.push.stageEnd.timeout
celeborn.client.push.takeTaskMaxWaitAttempts	1	false	Max wait times if no task available to push to worker.	0.3.0
celeborn.client.push.takeTaskWaitInterval	50ms	false	Wait interval if no task available to push to worker.	0.3.0
celeborn.client.push.timeout	120s	false	Timeout for a task to push data rpc message. This value should better be more than twice of `celeborn.<module>.push.timeoutCheck.interval`	0.3.0	celeborn.push.data.timeout
celeborn.client.readLocalShuffleFile.enabled	false	false	Enable read local shuffle file for clusters that co-deployed with yarn node manager.	0.3.1
celeborn.client.readLocalShuffleFile.threads	4	false	Threads count for read local shuffle file.	0.3.1
celeborn.client.registerShuffle.maxRetries	3	false	Max retry times for client to register shuffle.	0.3.0	celeborn.shuffle.register.maxRetries
celeborn.client.registerShuffle.retryWait	3s	false	Wait time before next retry if register shuffle failed.	0.3.0	celeborn.shuffle.register.retryWait
celeborn.client.requestCommitFiles.maxRetries	4	false	Max retry times for requestCommitFiles RPC.	0.3.0
celeborn.client.reserveSlots.maxRetries	3	false	Max retry times for client to reserve slots.	0.3.0	celeborn.slots.reserve.maxRetries
celeborn.client.reserveSlots.rackaware.enabled	false	false	Whether need to place different replicates on different racks when allocating slots.	0.3.1	celeborn.client.reserveSlots.rackware.enabled
celeborn.client.reserveSlots.retryWait	3s	false	Wait time before next retry if reserve slots failed.	0.3.0	celeborn.slots.reserve.retryWait
celeborn.client.rpc.cache.concurrencyLevel	32	false	The number of write locks to update rpc cache.	0.3.0	celeborn.rpc.cache.concurrencyLevel
celeborn.client.rpc.cache.expireTime	15s	false	The time before a cache item is removed.	0.3.0	celeborn.rpc.cache.expireTime
celeborn.client.rpc.cache.size	256	false	The max cache items count for rpc cache.	0.3.0	celeborn.rpc.cache.size
celeborn.client.rpc.commitFiles.askTimeout	<value of celeborn.rpc.askTimeout>	false	Timeout for CommitHandler commit files.	0.4.1
celeborn.client.rpc.getReducerFileGroup.askTimeout	<value of celeborn.rpc.askTimeout>	false	Timeout for ask operations during getting reducer file group information. During this process, there are `celeborn.client.requestCommitFiles.maxRetries` times for retry opportunities for committing files and 1 times for releasing slots request. User can customize this value according to your setting.	0.2.0
celeborn.client.rpc.maxRetries	3	false	Max RPC retry times in client.	0.3.2
celeborn.client.rpc.registerShuffle.askTimeout	<value of celeborn.rpc.askTimeout>	false	Timeout for ask operations during register shuffle. During this process, there are two times for retry opportunities for requesting slots, one request for establishing a connection with Worker and `celeborn.client.reserveSlots.maxRetries` times for retry opportunities for reserving slots. User can customize this value according to your setting.	0.3.0	celeborn.rpc.registerShuffle.askTimeout
celeborn.client.rpc.requestPartition.askTimeout	<value of celeborn.rpc.askTimeout>	false	Timeout for ask operations during requesting change partition location, such as reviving or splitting partition. During this process, there are `celeborn.client.reserveSlots.maxRetries` times for retry opportunities for reserving slots. User can customize this value according to your setting.	0.2.0
celeborn.client.rpc.reserveSlots.askTimeout	<value of celeborn.rpc.askTimeout>	false	Timeout for LifecycleManager request reserve slots.	0.3.0
celeborn.client.rpc.retryWait	1s	false	Client-specified time to wait before next retry on RpcTimeoutException.	0.5.4
celeborn.client.rpc.shared.threads	16	false	Number of shared rpc threads in LifecycleManager.	0.3.2
celeborn.client.shuffle.batchHandleChangePartition.interval	100ms	false	Interval for LifecycleManager to schedule handling change partition requests in batch.	0.3.0	celeborn.shuffle.batchHandleChangePartition.interval
celeborn.client.shuffle.batchHandleChangePartition.partitionBuckets	256	false	Max number of change partition requests which can be concurrently processed.	0.5.0
celeborn.client.shuffle.batchHandleChangePartition.threads	8	false	Threads number for LifecycleManager to handle change partition request in batch.	0.3.0	celeborn.shuffle.batchHandleChangePartition.threads
celeborn.client.shuffle.batchHandleCommitPartition.interval	5s	false	Interval for LifecycleManager to schedule handling commit partition requests in batch.	0.3.0	celeborn.shuffle.batchHandleCommitPartition.interval
celeborn.client.shuffle.batchHandleCommitPartition.threads	8	false	Threads number for LifecycleManager to handle commit partition request in batch.	0.3.0	celeborn.shuffle.batchHandleCommitPartition.threads
celeborn.client.shuffle.batchHandleReleasePartition.interval	5s	false	Interval for LifecycleManager to schedule handling release partition requests in batch.	0.3.0
celeborn.client.shuffle.batchHandleReleasePartition.threads	8	false	Threads number for LifecycleManager to handle release partition request in batch.	0.3.0
celeborn.client.shuffle.batchHandleRemoveExpiredShuffles.enabled	false	false	Whether to batch remove expired shuffles. This is an optimization switch on removing expired shuffles.	0.6.0
celeborn.client.shuffle.checkWorker.enabled	true	false	When true, before registering shuffle, LifecycleManager should check if current cluster have available workers, if cluster don't have available workers, fallback to default shuffle.	0.5.0	celeborn.client.spark.shuffle.checkWorker.enabled
celeborn.client.shuffle.compression.codec	LZ4	false	The codec used to compress shuffle data. By default, Celeborn provides three codecs: `lz4`, `zstd`, `none`. `none` means that shuffle compression is disabled. Since Flink version 1.16, zstd is supported for Flink shuffle client.	0.3.0	celeborn.shuffle.compression.codec,remote-shuffle.job.compression.codec
celeborn.client.shuffle.compression.zstd.level	1	false	Compression level for Zstd compression codec, its value should be an integer between -5 and 22. Increasing the compression level will result in better compression at the expense of more CPU and memory.	0.3.0	celeborn.shuffle.compression.zstd.level
celeborn.client.shuffle.decompression.lz4.xxhash.instance	<undefined>	false	Decompression XXHash instance for Lz4. Available options: JNI, JAVASAFE, JAVAUNSAFE.	0.3.2
celeborn.client.shuffle.dynamicResourceEnabled	false	false	When enabled, the ChangePartitionManager will obtain candidate workers from the availableWorkers pool during heartbeats when worker resource change.	0.6.0
celeborn.client.shuffle.dynamicResourceFactor	0.5	false	The ChangePartitionManager will check whether (unavailable workers / shuffle allocated workers) is more than the factor before obtaining candidate workers from the requestSlots RPC response when `celeborn.client.shuffle.dynamicResourceEnabled` set true	0.6.0
celeborn.client.shuffle.expired.checkInterval	60s	false	Interval for client to check expired shuffles.	0.3.0	celeborn.shuffle.expired.checkInterval
celeborn.client.shuffle.manager.port	0	false	Port used by the LifecycleManager on the Driver.	0.3.0	celeborn.shuffle.manager.port
celeborn.client.shuffle.partition.type	REDUCE	false	Type of shuffle's partition.	0.3.0	celeborn.shuffle.partition.type
celeborn.client.shuffle.partitionSplit.mode	SOFT	false	soft: the shuffle file size might be larger than split threshold. hard: the shuffle file size will be limited to split threshold.	0.3.0	celeborn.shuffle.partitionSplit.mode
celeborn.client.shuffle.partitionSplit.threshold	1G	false	Shuffle file size threshold, if file size exceeds this, trigger split.	0.3.0	celeborn.shuffle.partitionSplit.threshold
celeborn.client.shuffle.rangeReadFilter.enabled	false	false	If a spark application have skewed partition, this value can set to true to improve performance.	0.2.0	celeborn.shuffle.rangeReadFilter.enabled
celeborn.client.shuffle.register.filterExcludedWorker.enabled	false	false	Whether to filter excluded worker when register shuffle.	0.4.0
celeborn.client.shuffle.reviseLostShuffles.enabled	false	false	Whether to revise lost shuffles.	0.6.0
celeborn.client.slot.assign.maxWorkers	10000	false	Max workers that slots of one shuffle can be allocated on. Will choose the smaller positive one from Master side and Client side, see `celeborn.master.slot.assign.maxWorkers`.	0.3.1
celeborn.client.spark.fetch.cleanFailedShuffle	false	false	whether to clean those disk space occupied by shuffles which cannot be fetched	0.6.0
celeborn.client.spark.fetch.cleanFailedShuffleInterval	1s	false	the interval to clean the failed-to-fetch shuffle files, only valid when celeborn.client.spark.fetch.cleanFailedShuffle is enabled	0.6.0
celeborn.client.spark.push.dynamicWriteMode.enabled	false	false	Whether to dynamically switch push write mode based on conditions.If true, shuffle mode will be only determined by partition count	0.5.0
celeborn.client.spark.push.dynamicWriteMode.partitionNum.threshold	2000	false	Threshold of shuffle partition number for dynamically switching push writer mode. When the shuffle partition number is greater than this value, use the sort-based shuffle writer for memory efficiency; otherwise use the hash-based shuffle writer for speed. This configuration only takes effect when celeborn.client.spark.push.dynamicWriteMode.enabled is true.	0.5.0
celeborn.client.spark.push.sort.memory.maxMemoryFactor	0.4	false	the max portion of executor memory which can be used for SortBasedWriter buffer (only valid when celeborn.client.spark.push.sort.memory.useAdaptiveThreshold is enabled	0.5.0
celeborn.client.spark.push.sort.memory.smallPushTolerateFactor	0.2	false	Only be in effect when celeborn.client.spark.push.sort.memory.useAdaptiveThreshold is turned on. The larger this value is, the more aggressive Celeborn will enlarge the Sort-based Shuffle writer's memory threshold. Specifically, this config controls when to enlarge the sort shuffle writer's memory threshold. With N bytes data in memory and V as the value of this config, if the number of pushes, C, when using sort based shuffle writer C >= (1 + V) * C' where C' is the number of pushes if we were using hash based writer, we will enlarge the memory threshold by 2X.	0.5.0
celeborn.client.spark.push.sort.memory.threshold	64m	false	When SortBasedPusher use memory over the threshold, will trigger push data.	0.3.0	celeborn.push.sortMemory.threshold
celeborn.client.spark.push.sort.memory.useAdaptiveThreshold	false	false	Adaptively adjust sort-based shuffle writer's memory threshold	0.5.0
celeborn.client.spark.push.unsafeRow.fastWrite.enabled	true	false	This is Celeborn's optimization on UnsafeRow for Spark and it's true by default. If you have changed UnsafeRow's memory layout set this to false.	0.2.2
celeborn.client.spark.shuffle.fallback.numPartitionsThreshold	2147483647	false	Celeborn will only accept shuffle of partition number lower than this configuration value. This configuration only takes effect when `celeborn.client.spark.shuffle.fallback.policy` is `AUTO`.	0.5.0	celeborn.shuffle.forceFallback.numPartitionsThreshold,celeborn.client.spark.shuffle.forceFallback.numPartitionsThreshold
celeborn.client.spark.shuffle.fallback.policy	AUTO	false	Celeborn supports the following kind of fallback policies. 1. ALWAYS: always use spark built-in shuffle implementation; 2. AUTO: prefer to use celeborn shuffle implementation, and fallback to use spark built-in shuffle implementation based on certain factors, e.g. availability of enough workers and quota, shuffle partition number; 3. NEVER: always use celeborn shuffle implementation, and fail fast when it it is concluded that fallback is required based on factors above.	0.5.0
celeborn.client.spark.shuffle.forceFallback.enabled	false	false	Always use spark built-in shuffle implementation. This configuration is deprecated, consider configuring `celeborn.client.spark.shuffle.fallback.policy` instead.	0.3.0	celeborn.shuffle.forceFallback.enabled
celeborn.client.spark.shuffle.getReducerFileGroup.broadcast.enabled	false	false	Whether to leverage Spark broadcast mechanism to send the GetReducerFileGroupResponse. If the response size is large and Spark executor number is large, the Spark driver network may be exhausted because each executor will pull the response from the driver. With broadcasting GetReducerFileGroupResponse, it prevents the driver from being the bottleneck in sending out multiple copies of the GetReducerFileGroupResponse (one per executor).	0.6.0
celeborn.client.spark.shuffle.getReducerFileGroup.broadcast.miniSize	512k	false	The size at which we use Broadcast to send the GetReducerFileGroupResponse to the executors.	0.6.0
celeborn.client.spark.shuffle.writer	HASH	false	Celeborn supports the following kind of shuffle writers. 1. hash: hash-based shuffle writer works fine when shuffle partition count is normal; 2. sort: sort-based shuffle writer works fine when memory pressure is high or shuffle partition count is huge. This configuration only takes effect when celeborn.client.spark.push.dynamicWriteMode.enabled is false.	0.3.0	celeborn.shuffle.writer
celeborn.client.spark.stageRerun.enabled	true	false	Whether to enable stage rerun. If true, client throws FetchFailedException instead of CelebornIOException.	0.4.0	celeborn.client.spark.fetch.throwsFetchFailure
celeborn.identity.provider	org.apache.celeborn.common.identity.DefaultIdentityProvider	false	IdentityProvider class name. Default class is `org.apache.celeborn.common.identity.DefaultIdentityProvider`. Optional values: org.apache.celeborn.common.identity.HadoopBasedIdentityProvider user name will be obtained by UserGroupInformation.getUserName; org.apache.celeborn.common.identity.DefaultIdentityProvider user name and tenant id are default values or user-specific values.	0.6.0	celeborn.quota.identity.provider
celeborn.identity.user-specific.tenant	default	false	Tenant id if celeborn.identity.provider is org.apache.celeborn.common.identity.DefaultIdentityProvider.	0.6.0	celeborn.quota.identity.user-specific.tenant
celeborn.identity.user-specific.userName	default	false	User name if celeborn.identity.provider is org.apache.celeborn.common.identity.DefaultIdentityProvider.	0.6.0	celeborn.quota.identity.user-specific.userName
celeborn.master.endpoints	<localhost>:9097	false	Endpoints of master nodes for celeborn clients to connect. Client uses resolver provided by celeborn.master.endpoints.resolver to resolve the master endpoints. By default Celeborn uses `org.apache.celeborn.common.client.StaticMasterEndpointResolver` which take static master endpoints as input. Allowed pattern: `<host1>:<port1>[,<host2>:<port2>]*`, e.g. `clb1:9097,clb2:9098,clb3:9099`. If the port is omitted, 9097 will be used. If the master endpoints are not static then users can pass custom resolver implementation to discover master endpoints actively using celeborn.master.endpoints.resolver.	0.2.0
celeborn.master.endpoints.resolver	org.apache.celeborn.common.client.StaticMasterEndpointResolver	false	Resolver class that can be used for discovering and updating the master endpoints. This allows users to provide a custom master endpoint resolver implementation. This is useful in environments where the master nodes might change due to scaling operations or infrastructure updates. Clients need to ensure that provided resolver class should be present in the classpath.	0.5.2
celeborn.quota.enabled	true	false	When Master side sets to true, the master will enable to check the quota via QuotaManager. When Client side sets to true, LifecycleManager will request Master side to check whether the current user has enough quota before registration of shuffle. Fallback to the default shuffle service when Master side checks that there is no enough quota for current user.	0.2.0
celeborn.quota.interruptShuffle.enabled	false	false	Whether to enable interrupt shuffle when quota exceeds.	0.6.0
celeborn.storage.availableTypes	HDD	false	Enabled storages. Available options: MEMORY,HDD,SSD,HDFS,S3,OSS. Note: HDD and SSD would be treated as identical.	0.3.0	celeborn.storage.activeTypes
celeborn.storage.hdfs.dir	<undefined>	false	HDFS base directory for Celeborn to store shuffle data.	0.2.0
celeborn.storage.oss.access.key	<undefined>	false	OSS access key for Celeborn to store shuffle data.	0.6.0
celeborn.storage.oss.dir	<undefined>	false	OSS base directory for Celeborn to store shuffle data.	0.6.0
celeborn.storage.oss.endpoint	<undefined>	false	OSS endpoint for Celeborn to store shuffle data.	0.6.0
celeborn.storage.oss.ignore.credentials	true	false	Whether to skip oss credentials, disable this config to support jindo sdk .	0.6.0
celeborn.storage.oss.secret.key	<undefined>	false	OSS secret key for Celeborn to store shuffle data.	0.6.0
celeborn.storage.s3.dir	<undefined>	false	S3 base directory for Celeborn to store shuffle data.	0.6.0
celeborn.storage.s3.endpoint.region	<undefined>	false	S3 endpoint for Celeborn to store shuffle data.	0.6.0
celeborn.tags.tagsExpr		true	Expression to filter workers by tags. The expression is a comma-separated list of tags. The expression is evaluated as a logical AND of all tags. For example, `prod,high-io` filters workers that have both the `prod` and `high-io` tags.	0.6.0