Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DispatchQueue and BookingQueue with HealthyThreadPool #1035

Merged

Conversation

DiegoTavares
Copy link
Collaborator

Replace DispatchQueue and BookingQueue with HealthyThreadPool

Queues will not inherit from ThreadPoolExecutor, instead they will manage
an instance of HealthThreadPool, which is a threadPoolExecutor that handles
healthChecks, termination and repeated tasks. With this the Booking queue
should be able to self-heal when locked threads happen.

DiegoTavares and others added 9 commits September 30, 2020 09:51
Sorting jobs only by priority causes a situation where low priority jobs can get starved by a constant flow of high priority jobs.
The new formula adds a modifier to the sorting rank to take into account the number of cores the job is requesting and also the number of days the job is waiting on the queue.
Priorities numbers over 200 will mostly override the formula and work as a priority only based scheduling.
sort = priority + (100 * (1 - (job.cores/job.int_min_cores))) + (age in days)

Besides that, also take layer_int_cores_min into account when filtering folder_resourse limitations to avoid allocating more cores than the folder limits.

(cherry picked from commit 566411aeeddc60983a30eabe121fd03263d05525)
Queues will not inherit from ThreadPoolExecutor, instead they will
manage an instance of HealthThreadPool, which is a
threadPoolExecutor that handles healthChecks, termination and repeated
tasks. With this the Booking queue should be able to self-heal when locked threads happen.
@DiegoTavares
Copy link
Collaborator Author

I still have some merge issues causing test errors to work on.

Use a guava cache to store only the last version of a HostReport per host.
@splhack
Copy link
Contributor

splhack commented Nov 4, 2021

I'd like to clarify three things.

Handle Repeated Tasks

This PR will handle repeated Host-reports by always keeping the latest version.
However, the Host-report is sent every 30 seconds (The interval is randomized but the average should be around 30 seconds). It means, Cuebot received a new host-report while the host-report-queue is still keeping the previous host-report from the same host for 30 seconds. Thus, Cuebot was not able to handle the previous host-report for 30 seconds and counting. IMHO, it should not happen.

The root cause of this situation is the host-report-queue contention. Quoting myself from #1008,

  • HostReportHandler ThreadPoolExecutor(reportQueue) only has 8 threads.
  • Default number of PostgreSQL connections is only 10 (HirakiCP default.)

With the default settings, if Cuebot is connected from a several 100 to 1,000 RQDs,

  • reportQueue filled up easily due to the very limited number of threads (8) and the PostgreSQL connections (10).
  • Thus, Cuebot cannot handle host-reports over 30 seconds even over 5 minutes. It marks many RQDs down because Cuebot detects RQDs did not send the host-report over 5 minutes.

So the solution is obvious, prevent host-report-queue contention at all.

  • Increase the number of host-report queue threads
  • Increase the number of HikariCP PostgreSQL connection

I posted the example settings in #1008.

If it is still not enough, the system needs an extra Cuebot instance to handle the number of RQDs. Eventually PostgreSQL would be the bottleneck. In that case, the system needs to adjust the RQD host-report interval to reduce the number of host-reports.

Healthy check

Handling repeated task and healthy check are not able to fix the root cause, queue contention. It might be able to reduce some of repeated host-reports, or discard the filled queue. But it is temporary, very short time. If the host-report-queue filled up one time, it will always happen in the system. Because of the combination - the number of host-report-queue threads, the HikariCP PostgreSQL connection, and the number of RQDs.

Could you try #1008 with the example settings?

locked threads?

Could you elaborate on what is the locked thread? I'm not aware of anything like that with #1008.

should be able to self-heal when locked threads happen

@DiegoTavares
Copy link
Collaborator Author

I'd like to clarify three things.

Handle Repeated Tasks

This PR will handle repeated Host-reports by always keeping the latest version. However, the Host-report is sent every 30 seconds (The interval is randomized but the average should be around 30 seconds). It means, Cuebot received a new host-report while the host-report-queue is still keeping the previous host-report from the same host for 30 seconds. Thus, Cuebot was not able to handle the previous host-report for 30 seconds and counting. IMHO, it should not happen.

The root cause of this situation is the host-report-queue contention. Quoting myself from #1008,

  • HostReportHandler ThreadPoolExecutor(reportQueue) only has 8 threads.
  • Default number of PostgreSQL connections is only 10 (HirakiCP default.)

With the default settings, if Cuebot is connected from a several 100 to 1,000 RQDs,

  • reportQueue filled up easily due to the very limited number of threads (8) and the PostgreSQL connections (10).
  • Thus, Cuebot cannot handle host-reports over 30 seconds even over 5 minutes. It marks many RQDs down because Cuebot detects RQDs did not send the host-report over 5 minutes.

So the solution is obvious, prevent host-report-queue contention at all.

  • Increase the number of host-report queue threads
  • Increase the number of HikariCP PostgreSQL connection

This solution is only "obvious" if we consider tasks will always finish in a timely fashion, which was not the case for our situation (See root cause explained bellow)

I posted the example settings in #1008.

If it is still not enough, the system needs an extra Cuebot instance to handle the number of RQDs. Eventually PostgreSQL would be the bottleneck. In that case, the system needs to adjust the RQD host-report interval to reduce the number of host-reports.

Healthy check

Handling repeated task and healthy check are not able to fix the root cause, queue contention. It might be able to reduce some of repeated host-reports, or discard the filled queue. But it is temporary, very short time. If the host-report-queue filled up one time, it will always happen in the system. Because of the combination - the number of host-report-queue threads, the HikariCP PostgreSQL connection, and the number of RQDs.

Could you try #1008 with the example settings?

Just to be clear, I don't think this PR show be used in favor of #1008. IMW both are complimentary. We're currently running more than 5k rqds with this configuration:

booking_queue.threadpool.health_threshold=10
booking_queue.threadpool.core_pool_size=10
booking_queue.threadpool.max_pool_size=14
booking_queue.threadpool.queue_capacity=2000
dispatch.threadpool.core_pool_size=6
dispatch.threadpool.max_pool_size=8
dispatch.threadpool.queue_capacity=2000
healthy_threadpool.health_threshold=6
healthy_threadpool.min_unhealthy_period_min=3

locked threads?

Could you elaborate on what is the locked thread? I'm not aware of anything like that with #1008.

should be able to self-heal when locked threads happen

My bad for mentioning locked threads without a context. There was a condition on the server->rqd communication where threads would get locked waiting for a grpc response without a timeout and it got fixed by #994. The changes on this PR, besides reorganizing the threadpools, worked as a temporary fix while we searched for the root cause, which wasn't contention on our case as the number of threads and Hiraki limits have already been tuned to our needs, but the mentioned starving condition.

The proposal of this PR is to not only have a "self-healing" threadpool as a protection for future starving conditions, but also reorganize the threadpools to facilitate maintenance.

@splhack
Copy link
Contributor

splhack commented Nov 12, 2021

We're currently running more than 5k rqds

I'm very happy to hear that! That is pretty much the same as ours. 5 Cuebots + 5.5K RQDs, and counting.

@DiegoTavares DiegoTavares reopened this Dec 1, 2021
@DiegoTavares DiegoTavares self-assigned this Dec 8, 2021
@DiegoTavares
Copy link
Collaborator Author

@bcipriano Can I get your review on this?

Copy link
Collaborator

@bcipriano bcipriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good here -- a few minor comments, and this needs a merge/rebase from master.

Test doesn't make sense with the new threadpool and will also
cause problems whenever an user changes a config property.
@DiegoTavares
Copy link
Collaborator Author

Alright, unit tests are passing now. @bcipriano Approval still missing.

@DiegoTavares
Copy link
Collaborator Author

@bcipriano Can you we consider this approved?

@DiegoTavares
Copy link
Collaborator Author

@bcipriano Can we have your final approval on this. This PR is causing several conflicts in our end and would be good to have merged ASAP

Copy link
Collaborator

@bcipriano bcipriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry for the delay.

@DiegoTavares DiegoTavares merged commit 027d853 into AcademySoftwareFoundation:master Mar 29, 2022
akim-ruslanov pushed a commit to akim-ruslanov/OpenCue that referenced this pull request Apr 1, 2022
…ySoftwareFoundation#1035)

* Update dispatchQuery to use min_cores

Sorting jobs only by priority causes a situation where low priority jobs can get starved by a constant flow of high priority jobs.
The new formula adds a modifier to the sorting rank to take into account the number of cores the job is requesting and also the number of days the job is waiting on the queue.
Priorities numbers over 200 will mostly override the formula and work as a priority only based scheduling.
sort = priority + (100 * (1 - (job.cores/job.int_min_cores))) + (age in days)

Besides that, also take layer_int_cores_min into account when filtering folder_resourse limitations to avoid allocating more cores than the folder limits.

(cherry picked from commit 566411aeeddc60983a30eabe121fd03263d05525)

* Revert "Update dispatchQuery to use min_cores"

This reverts commit 2eb4936

* Replace DispatchQueue and BookingQueue with HealthyThreadPool

Queues will not inherit from ThreadPoolExecutor, instead they will
manage an instance of HealthThreadPool, which is a
threadPoolExecutor that handles healthChecks, termination and repeated
tasks. With this the Booking queue should be able to self-heal when locked threads happen.

* Remove trackit reference

* Refactor HostReportQueue to use guava Cache

Use a guava cache to store only the last version of a HostReport per host.

* Configure HostReportQueue on opencue.properties

* Fix unit tests

* Fix unit tests

* This unit tests is not actually testing anything useful

Test doesn't make sense with the new threadpool and will also
cause problems whenever an user changes a config property.

Co-authored-by: Roula O'Regan <[email protected]>
akim-ruslanov pushed a commit to akim-ruslanov/OpenCue that referenced this pull request Apr 5, 2022
…ySoftwareFoundation#1035)

* Update dispatchQuery to use min_cores

Sorting jobs only by priority causes a situation where low priority jobs can get starved by a constant flow of high priority jobs.
The new formula adds a modifier to the sorting rank to take into account the number of cores the job is requesting and also the number of days the job is waiting on the queue.
Priorities numbers over 200 will mostly override the formula and work as a priority only based scheduling.
sort = priority + (100 * (1 - (job.cores/job.int_min_cores))) + (age in days)

Besides that, also take layer_int_cores_min into account when filtering folder_resourse limitations to avoid allocating more cores than the folder limits.

(cherry picked from commit 566411aeeddc60983a30eabe121fd03263d05525)

* Revert "Update dispatchQuery to use min_cores"

This reverts commit 2eb4936

* Replace DispatchQueue and BookingQueue with HealthyThreadPool

Queues will not inherit from ThreadPoolExecutor, instead they will
manage an instance of HealthThreadPool, which is a
threadPoolExecutor that handles healthChecks, termination and repeated
tasks. With this the Booking queue should be able to self-heal when locked threads happen.

* Remove trackit reference

* Refactor HostReportQueue to use guava Cache

Use a guava cache to store only the last version of a HostReport per host.

* Configure HostReportQueue on opencue.properties

* Fix unit tests

* Fix unit tests

* This unit tests is not actually testing anything useful

Test doesn't make sense with the new threadpool and will also
cause problems whenever an user changes a config property.

Co-authored-by: Roula O'Regan <[email protected]>
akim-ruslanov pushed a commit to akim-ruslanov/OpenCue that referenced this pull request Apr 5, 2022
…ySoftwareFoundation#1035)

* Update dispatchQuery to use min_cores

Sorting jobs only by priority causes a situation where low priority jobs can get starved by a constant flow of high priority jobs.
The new formula adds a modifier to the sorting rank to take into account the number of cores the job is requesting and also the number of days the job is waiting on the queue.
Priorities numbers over 200 will mostly override the formula and work as a priority only based scheduling.
sort = priority + (100 * (1 - (job.cores/job.int_min_cores))) + (age in days)

Besides that, also take layer_int_cores_min into account when filtering folder_resourse limitations to avoid allocating more cores than the folder limits.

(cherry picked from commit 566411aeeddc60983a30eabe121fd03263d05525)

* Revert "Update dispatchQuery to use min_cores"

This reverts commit 2eb4936

* Replace DispatchQueue and BookingQueue with HealthyThreadPool

Queues will not inherit from ThreadPoolExecutor, instead they will
manage an instance of HealthThreadPool, which is a
threadPoolExecutor that handles healthChecks, termination and repeated
tasks. With this the Booking queue should be able to self-heal when locked threads happen.

* Remove trackit reference

* Refactor HostReportQueue to use guava Cache

Use a guava cache to store only the last version of a HostReport per host.

* Configure HostReportQueue on opencue.properties

* Fix unit tests

* Fix unit tests

* This unit tests is not actually testing anything useful

Test doesn't make sense with the new threadpool and will also
cause problems whenever an user changes a config property.

Co-authored-by: Roula O'Regan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants