Release Notes - Spark - Version 2.3.4 - HTML format

Sub-task

Bug

  • [SPARK-21882] - OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function
  • [SPARK-23408] - Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right
  • [SPARK-23416] - Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false
  • [SPARK-24211] - Flaky test: StreamingOuterJoinSuite
  • [SPARK-24239] - Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets
  • [SPARK-24669] - Managed table was not cleared of path after drop database cascade
  • [SPARK-24935] - Problem with Executing Hive UDF's from Spark 2.2 Onwards
  • [SPARK-25139] - PythonRunner#WriterThread released block after TaskRunner finally block which invoke BlockManager#releaseAllLocksForTask
  • [SPARK-25863] - java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
  • [SPARK-26082] - Misnaming of spark.mesos.fetch(er)Cache.enable in MesosClusterScheduler
  • [SPARK-26572] - Join on distinct column with monotonically_increasing_id produces wrong output
  • [SPARK-26606] - parameters passed in extraJavaOptions are not being picked up
  • [SPARK-26734] - StackOverflowError on WAL serialization caused by large receivedBlockQueue
  • [SPARK-26758] - Idle Executors are not getting killed after spark.dynamicAllocation.executorIdleTimeout value
  • [SPARK-26859] - Fix field writer index bug in non-vectorized ORC deserializer
  • [SPARK-26873] - FileFormatWriter creates inconsistent MR job IDs
  • [SPARK-26895] - When running spark 2.3 as a proxy user (--proxy-user), SparkSubmit fails to resolve globs owned by target user
  • [SPARK-26927] - Race condition may cause dynamic allocation not working
  • [SPARK-26950] - Make RandomDataGenerator use Float.NaN or Double.NaN for all NaN values
  • [SPARK-26961] - Found Java-level deadlock in Spark Driver
  • [SPARK-26998] - spark.ssl.keyStorePassword in plaintext on 'ps -ef' output of executor processes in Standalone mode
  • [SPARK-27018] - Checkpointed RDD deleted prematurely when using GBTClassifier
  • [SPARK-27065] - avoid more than one active task set managers for a stage
  • [SPARK-27080] - Read parquet file with merging metastore schema should compare schema field in uniform case.
  • [SPARK-27111] - A continuous query may fail with InterruptedException when kafka consumer temporally 0 partitions temporally
  • [SPARK-27112] - Spark Scheduler encounters two independent Deadlocks when trying to kill executors either due to dynamic allocation or blacklisting
  • [SPARK-27160] - Incorrect Literal Casting of DecimalType in OrcFilters
  • [SPARK-27216] - Upgrade RoaringBitmap to 0.7.45 to fix Kryo unsafe ser/dser issue
  • [SPARK-27244] - Redact Passwords While Using Option logConf=true
  • [SPARK-27275] - Potential corruption in EncryptedMessage.transferTo
  • [SPARK-27301] - DStreamCheckpointData failed to clean up because it's fileSystem cached
  • [SPARK-27338] - Deadlock between TaskMemoryManager and UnsafeExternalSorter$SpillableIterator
  • [SPARK-27347] - Fix supervised driver retry logic when agent crashes/restarts
  • [SPARK-27496] - RPC should send back the fatal errors
  • [SPARK-27577] - Wrong thresholds selected by BinaryClassificationMetrics when downsampling
  • [SPARK-27621] - Calling transform() method on a LinearRegressionModel throws NoSuchElementException
  • [SPARK-27624] - Fix CalenderInterval to show an empty interval correctly
  • [SPARK-27626] - Fix `docker-image-tool.sh` to be robust in non-bash shell env
  • [SPARK-27735] - Interval string in upper case is not supported in Trigger
  • [SPARK-27798] - ConvertToLocalRelation should tolerate expression reusing output object
  • [SPARK-27869] - Redact sensitive information in System Properties from UI
  • [SPARK-27907] - HiveUDAF should return NULL in case of 0 rows
  • [SPARK-28081] - word2vec 'large' count value too low for very large corpora
  • [SPARK-28156] - Join plan sometimes does not use cached query
  • [SPARK-28157] - Make SHS clear KVStore LogInfo for the blacklisted entries
  • [SPARK-28160] - TransportClient.sendRpcSync may hang forever
  • [SPARK-28164] - usage description does not match with shell scripts
  • [SPARK-28302] - SparkLauncher: The process cannot access the file because it is being used by another process
  • [SPARK-28308] - CalendarInterval sub-second part should be padded before parsing
  • [SPARK-28404] - Fix negative timeout value in RateStreamContinuousPartitionReader
  • [SPARK-28430] - Some stage table rows render wrong number of columns if tasks are missing metrics
  • [SPARK-28582] - Pyspark daemon exit failed when receive SIGTERM on py3.7
  • [SPARK-28699] - Cache an indeterminate RDD could lead to incorrect result while stage rerun
  • [SPARK-28766] - Fix CRAN incoming feasibility warning on invalid URL
  • [SPARK-28775] - DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database
  • [SPARK-28780] - Delete the incorrect setWeightCol method in LinearSVCModel
  • [SPARK-28844] - Fix typo in SQLConf FILE_COMRESSION_FACTOR

Improvement

  • [SPARK-24898] - Adding spark.checkpoint.compress to the docs
  • [SPARK-26604] - Register channel for stream request
  • [SPARK-27358] - Update jquery to 1.12.x to pick up security fixes
  • [SPARK-27563] - automatically get the latest Spark versions in HiveExternalCatalogVersionsSuite
  • [SPARK-27672] - Add since info to string expressions
  • [SPARK-27673] - Add since info to random. regex, null expressions
  • [SPARK-27771] - Add SQL description for grouping functions (cube, rollup, grouping and grouping_id)
  • [SPARK-28545] - Add the hash map size to the directional log of ObjectAggregationIterator
  • [SPARK-28891] - do-release-docker.sh in master does not work for branch-2.3

Test

  • [SPARK-24352] - Flaky test: StandaloneDynamicAllocationSuite
  • [SPARK-28261] - Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
  • [SPARK-28335] - Flaky test: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.offset recovery from kafka
  • [SPARK-28357] - Fix Flaky Test - FileAppenderSuite.rolling file appender - size-based rolling compressed
  • [SPARK-28361] - Test equality of generated code with id in class name
  • [SPARK-28418] - Flaky Test: pyspark.sql.tests.test_dataframe: test_query_execution_listener_on_collect
  • [SPARK-28535] - Flaky test: JobCancellationSuite."interruptible iterator of shuffle reader"

Task

  • [SPARK-26897] - Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

Documentation

  • [SPARK-27800] - Example for xor function has a wrong answer
  • [SPARK-28777] - Pyspark sql function "format_string" has the wrong parameters in doc string

Edit/Copy Release Notes

The text area below allows the project release notes to be edited and copied to another document.