Skip to content

Releases: NVIDIA/spark-rapids-tools

v24.04.0

07 May 21:20
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • [FEA] Add CLI to run prediction on estimation_model (#961)
  • Adding SHAP predict values as new output file (#982)
  • Update docs for building to clarify to build in a virtual environment (#976)

Core

  • [BUG] Catch Profiler error when app info is empty (#994)
  • Get stages from sqlId for collecting info for output writer functions (#996)
  • Account for joboverhead time in qualification tool estimation (#992)
  • [Followup] Fix handling of clusterTags and SparkVersion in Q/P Tools (#993)
  • Fix handling of clusterTags and SparkVersion in Q/P Tools (#991)
  • Refactor AppBase to use common AppMetaData between Q/P tools (#983)
  • Refactor Stage info code between Q/P tools (#971)

v24.02.4

30 Apr 17:07
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Fix Hadoop Azure version to be compatibe with Spark-3.5.0 (#975)
  • Add speedup categories in qualification summary output (#958)
  • Improve cluster node initialisation for CSPs (#964)

Core

  • Remove databricks profiling recommendation for dynamicFilePruning (#972)
  • Add AQEShuffleRead WriteFiles execs to the supportedOps and score files (#963)
  • [FEA] Automate appending new operators to the platform score sheets (#954)
  • Add support for InSubqueryExec Expression (#960)

Miscellaneous

  • Bump dev version to 24.02.4 (#968)
  • Revert versions back to 24.02.3 (#967)

v24.02.3

24 Apr 17:56
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Cache CLI calls for node instance description (#952)
  • Improve error handling in prediction code (#950)
  • Support dynamic calculation of JVM resources in CLI cmd (#944)
  • Syncup estimation model prediction logic updates (#946)
  • Cluster inference should not run for unsupported platform (#941)
  • Fix invalid values in cluster creation script (#935)
  • Fix core tool doc links and user qualification tool default argument values (#931)
  • Fix gpu cluster recommendation in user tools (#930)
  • Bump idna from 3.4 to 3.7 in /data_validation (#932)
  • Add cluster details in qualification summary output (#921)
  • Refactor find_matches_for_node return values (#920)
  • [FEA] Add and use g5 AWS instances as default for qualification tool output (#898)
  • Add jar argument to spark_rapids CLI (#902)
  • Support driverlog argument in profiler CLI (#897)

Core

  • Followups on handling Photon eventlogs (#953)
  • Sync operators support timestamped 24-04-16 (#951)
  • Add CheckOverflowInTableInsert support: verify absence from physical plan (#942)
  • Fix Notes column in the supported ops CSV files (#933)
  • Improve sync plugin supported CSV python script (#919)
  • Add cluster details in qualification summary output (#921)
  • Add support for unsupported expressions reasons per Exec (#923)
  • Adding more metrics and options for qual validation (#926)
  • Generate cluster details in JSON output (#912)
  • Add Divide and multiple interval expressions as supported (#917)
  • Add support for PythonMapInArrowExec and MapInArrowExec (#913)
  • Re-enable support for GetJsonObject by default (#916)
  • Add support for WindowGroupLimitExec (#906)
  • [FEA] Skip Spark Structured Streaming event logs for Qualification tool (#905)
  • [FEA] Add and use g5 AWS instances as default for qualification tool output (#898)
  • Initial version of qual tool validation script for classification metrics (#903)
  • Fix Delta-core dependency for Spark35+ (#904)
  • Add support for AtomicCreateTableAsSelectExec (#895)
  • Add support for KnownNullable and EphemeralSubstring expressions (#894)
  • Add Support for BloomFilterAggregate and BloomFilterMightContain exprs (#891)
  • [DOC] Update README for sync plugin supported ops script (#893)
  • Add operators to ignore list and update WindowExpr parser (#890)
  • Add support to RoundCeil and RoundFloor expressions (#889)

v24.02.2

27 Mar 20:55
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Override estimated speedups when estimation model is enabled (#885)
  • [FEA] Make top candidates view as the default view in user-tools (#879)
  • Introduce new csv file containing output for all apps before grouping (#875)
  • Fix calculation of unsupported operators stages duration and update output row (#874)
  • Implement top candidate filter for user tools CLI output (#866)

Core

  • [FEA] Skip Databricks Photon jobs at app level in Qualification tool (#886)
  • [FEA] Add Estimation Model to Qualification CLI (#870)
  • Add rootExecutionID to output csv files (#871)
  • [FEA] Generate updated supported CSV files from plugin repo (#847)
  • Add action column to qual execs output (#859)
  • Extend supportLevels in PluginTypeChecker (#863)
  • Propagate Reason/Notes for operators disabled by default from plugin to Qualification tool unsupported operators csv file (#850)

Miscellaneous

  • Bump default Spark-version to 3.5.0 (#877)
  • Update Github actions version (#876)

v24.02.1

15 Mar 01:10
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Remove redundant initialization scripts from user tools output (#830)
  • [DOC] Update Databricks Azure user tool setup instructions for output format (#826)
  • Estimate cluster instances and generate cost savings (#803)

Core

  • Fix implementation of processSQLPlanMetrics in Profiler (#853)
  • Deduplicate SQL duration wallclock time for databricks eventlog (#810)
  • Consider additional factors in spark.sql.shuffle.partitions recommendation in Autotuner (#722)
  • Fix case matching error In AutoTuner (#828)
  • Fix ReadSchema in Qualification tool and NPE in Profiling tool (#825)
  • AutoTuner does not process arguments skipList and limitedLogic (#812)

v24.02.0

24 Feb 20:42
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Fix missing config file for Dataproc GKE (#778)
  • [FEA] Qualification user_tools runs AutoTuner by default (#771)
  • [BUG] Fix databricks-aws user profiling tool error with --gpu_cluster argument (#707)

Core

  • [FEA] Qualification tool should mark WriteIntoDeltaCommand as supported (#801)
  • Qualification tool should mark SubqueryExec as IgnoreNoPerf (#798)
  • Generate cluster information from event logs in Qualification tool (#789)
  • Sync up supported ops for 24.02 plugin release (#796)
  • Qualification should mark empty2null as supported (#791)
  • Incorrect parsing of aggregates in DB queries (#790)
  • Qualification should mark WriteFiles as supported (#784)
  • Introduce GpuDevice abstraction and refactor AutoTuner (#740)
  • Consolidate unsupportedOperators into a single view (#766)
  • Speedup generator script fails after adding runtime_properties (#776)
  • Tools fail on DB10.4 clusters with IllegalArgException (#768)
  • Fix SparkPlanGraphCluster constructor for DB Platforms (#765)
  • Amendment to PR-763 (#764)
  • Fix SQLPLanMetric constructor for DB Platforms (#763)
  • Fix node constructor for DB platforms (#761)
  • Add penalty for stages with UDF's (#757)
  • Add support to appendDataExecV1 and overwriteByExprExecV1 (#756)
  • Qualification fails to detect sortMergeJoin with arguments (#754)
  • Fix Qualification crash during aggregation of stats (#753)
  • [FEA] Extend the list of operators to be ignored in Qualification (#745)
  • Remove ReusedSubquery from SparkPlanGraph construction (#741)
  • Update unsupported operator csv file's app duration column (#748)
  • [FEA] Qualification tool triggers the AutoTuner module (#739)
  • Disable support of GetJsonObject in Qualification tool (#737)
  • [FEA] AutoTuner warns that non-utf8 may not support some GPU expressions (#736)
  • [FEA] AutoTuner should not skip non-gpu eventlogs (#728)

Miscellaneous

  • Add auto-copyright for precommits (#732)

v23.12.3

12 Jan 21:20
Compare
Choose a tag to compare

Packages

Changes

Core

  • Add support of HiveTableScan and InsertIntoHive text-format (#723)
  • Fix compilation error with JDK11 (#720)
  • Generate an output file with runtime and build information (#705)
  • AutoTuner should poll maven-meta to retrieve the latest jar version (#711)
  • Profiling tool : Profiling tool throws NPE when appInfo is null and unchecked (#640)
  • Add support to parse_url host and protocol (#708)
  • [FEA] Profiling tool auto-tuner should consider spark.databricks.adaptive.autoOptimizeShuffle.enabled (#710)
  • [FEA] Profiler autotuner should only specify standard Spark versions for shuffle manager setting (#662)

Miscellaneous

  • [FEA] Enable AQE related recommendations in Profiler Auto-tuner (#688)

v23.12.2

27 Dec 23:34
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Polling maven-metadata.xml to pull the latest tools jar (#703)

Core

  • Update pom to fail on warnings (#701)

v23.12.1

23 Dec 19:24
Compare
Choose a tag to compare

v23.12.0

20 Dec 18:55
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Fix user qualification tool runtime error in get_platform_name for onprem platform (#684)
  • [FEA] User tool should pass --platform option/argument to Profiling tool (#679)
  • Fix incorrect processing of short flags for user tools cli (#677)
  • Updating new CLI name from ascli to spark_rapids (#673)
  • Bump pyarrow version (#664)
  • Improve new CLI testing ensuring complete coverage of arguments cases (#652)

Core

  • Qualification tool: Add more information for unsupported operators (#680)
  • Sync Execs and Expressions from spark-rapids resources (#691)
  • Support parsing of inprogress eventlogs (#686)
  • Enable features via config that are off by default in the profiler AutoTuner (#668)
  • Fix platform names as string constants and reduce redundancy in unit tests (#667)
  • Unified platform handling and fetching of operator score files (#661)
  • Qualification tool: Ignore some of the unsupported Execs from output (#665)

Miscellaneous

  • add markdown link checker (#672)