Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thread pool properties #1008

Merged

Conversation

splhack
Copy link
Contributor

@splhack splhack commented Aug 2, 2021

Problems

Cuebot can barely handle less than a thousand running frames with the default settings.

  • HostReportHandler ThreadPoolExecutor(reportQueue) only has 8 threads.
  • Default number of PostgreSQL connections is only 10 (HirakiCP default.)

It can be verified with this warning message.

if (reportQueue.getQueue().size() > 0 ||
System.currentTimeMillis() - startTime > 100) {
/*
* Write a log if the host report takes a long time to process.
*/
CueUtil.logDuration(startTime, "host report " +
report.getHost().getName() + " with " +
report.getFramesCount() +
" running frames, waiting: " +
reportQueue.getQueue().size());
}

Cuebot kills running frames even though there are no issues on the frames. The cause is

  • reportQueue filled up easily due to the very limited number of threads (8) and the PostgreSQL connections (10).
  • Thus Cuebot marks RQD down because it can't catch up to handle RQD ping message for 5 minutes, rejects too.

Cuebot also fails to dispatch frames from FrameCompleteHandler due to the very limited dispatch pool threads (4).

com.imageworks.spcue.dispatcher.RqdRetryReportException: error processing the frame complete report, sending retry message to RQD com.imageworks.spcue.dispatcher.DispatchQueueTaskRejectionException: Warning, dispatch queue [DispatchQueue rejected task #9611

Solution

This PR will add thread pool properties to allow increasing the number of threads without build.

  • This PR won't change the thread pools itself, behavior, or the default number of threads. It will only change the way to initialize the thread pools.
    • Added a unittest to ensure the default number of threads are not changed.
  • The reason of why it doesn't use ${dispatcher.dispatch_pool.core_pool_size} in applicationContext-service.xml, because --spring.config.additional-location can't override the value. Instead, it's using autowired env to read property values.
  • ThreadPoolTaskExecutorWrapper is just to initialize ThreadPoolTaskExecutor.

Example settings

To address the scalability issues.
Definitely YMMV, but some data point with these settings.
Maximum CPU usage: ~200% (2 cores out of 16 cores)
Maximum memory usage: ~24GB (out of 32GB)

datasource.cue-data-source.maximum-pool-size=100
dispatcher.dispatch_pool.core_pool_size=20000
dispatcher.dispatch_pool.max_pool_size=20000
dispatcher.dispatch_pool.queue_capacity=20000
dispatcher.report_queue.core_pool_size=20000
dispatcher.report_queue.max_pool_size=20000
dispatcher.report_queue.queue_capacity=20000
dispatcher.kill_queue.core_pool_size=20000
dispatcher.kill_queue.max_pool_size=20000
dispatcher.kill_queue.queue_capacity=20000
dispatcher.booking_queue.core_pool_size=20000
dispatcher.booking_queue.max_pool_size=20000
dispatcher.booking_queue.queue_capacity=20000

@splhack splhack changed the title Add HostReport thread pool properties Add thread pool properties Aug 2, 2021
@bcipriano
Copy link
Collaborator

@DiegoTavares @roulaoregan-spi Would love to get your feedback on this one.

@splhack
Copy link
Contributor Author

splhack commented Aug 2, 2021

I will force-update this to generalize the thread pool settings.

Comment on lines +55 to +59
dispatcher.dispatch_pool.core_pool_size=4
# Maximum number of threads to allow in the pool for various operation.
dispatcher.dispatch_pool.max_pool_size=4
# Queue capacity for various operation.
dispatcher.dispatch_pool.queue_capacity=500
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

<property name="corePoolSize" value="4" />
<property name="maxPoolSize" value="4" />
<property name="queueCapacity" value="500" />

Comment on lines +48 to +52
dispatcher.launch_queue.core_pool_size=1
# Maximum number of threads to allow in the pool for launching job.
dispatcher.launch_queue.max_pool_size=1
# Queue capacity for launching job.
dispatcher.launch_queue.queue_capacity=100
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

<property name="corePoolSize" value="1" />
<property name="maxPoolSize" value="1" />
<property name="queueCapacity" value="100" />

Comment on lines +62 to +66
dispatcher.manage_pool.core_pool_size=8
# Maximum number of threads to allow in the pool for management operation.
dispatcher.manage_pool.max_pool_size=8
# Queue capacity for management operation.
dispatcher.manage_pool.queue_capacity=250
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

<property name="corePoolSize" value="8" />
<property name="maxPoolSize" value="8" />
<property name="queueCapacity" value="250" />

Comment on lines +69 to +73
dispatcher.report_queue.core_pool_size=6
# Maximum number of threads to allow in the pool for handling Host Report.
dispatcher.report_queue.max_pool_size=8
# Queue capacity for handling Host Report.
dispatcher.report_queue.queue_capacity=1000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

private static final int THREAD_POOL_SIZE_INITIAL = 6;
private static final int THREAD_POOL_SIZE_MAX = 8;
private static final int QUEUE_SIZE_INITIAL = 1000;

Comment on lines +76 to +80
dispatcher.kill_queue.core_pool_size=6
# Maximum number of threads to allow in the pool for kill frame operation.
dispatcher.kill_queue.max_pool_size=8
# Queue capacity for kill frame operation.
dispatcher.kill_queue.queue_capacity=1000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

private static final int THREAD_POOL_SIZE_INITIAL = 6;
private static final int THREAD_POOL_SIZE_MAX = 8;
private static final int QUEUE_SIZE_INITIAL = 1000;

Comment on lines +83 to +87
dispatcher.booking_queue.core_pool_size=6
# Maximum number of threads to allow in the pool for booking.
dispatcher.booking_queue.max_pool_size=6
# Queue capacity for booking.
dispatcher.booking_queue.queue_capacity=1000
Copy link
Contributor Author

@splhack splhack Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original values

private static final int INITIAL_QUEUE_SIZE = 1000;
private static final int THREADS_MINIMUM = 6;
private static final int THREADS_MAXIMUM = 6;

@splhack
Copy link
Contributor Author

splhack commented Aug 3, 2021

this is ready to review.

@DiegoTavares
Copy link
Collaborator

We have a very similar PR on the tube that handles the same issue, but also adds some functionality to the threadpools to avoid repeated tasks and allow being reset without downtime. Will try to get it pushed this week to evaluate how we can merge both changes.

@splhack
Copy link
Contributor Author

splhack commented Nov 4, 2021

Sorry for the delay, commented in #1035 (comment) to explain the detail of the root cause and the solution.

I think this #1008 is very safe to merge first because this won't change any behavior or the default settings, at all. But this unblocks users can adjust their Cuebot queue settings for their system without modifying source code.

If we still need the features from #1035, I can help to merge those onto master after merging #1008.

Does it sound reasonable? @bcipriano @DiegoTavares @roulaoregan-spi

@DiegoTavares
Copy link
Collaborator

Sorry for the delay, commented in #1035 (comment) to explain the detail of the root cause and the solution.

I think this #1008 is very safe to merge first because this won't change any behavior or the default settings, at all. But this unblocks users can adjust their Cuebot queue settings for their system without modifying source code.

If we still need the features from #1035, I can help to merge those onto master after merging #1008.

Does it sound reasonable? @bcipriano @DiegoTavares @roulaoregan-spi

Sounds reasonable to me. As I mentioned in my comment on #1035, IMO both are complimentary

@roulaoregan-spi
Copy link
Contributor

Sounds reasonable to me as well.

@splhack
Copy link
Contributor Author

splhack commented Dec 14, 2021

Could you merge this first? @bcipriano @DiegoTavares @roulaoregan-spi

@DiegoTavares
Copy link
Collaborator

Sounds good to me, let's merge it in.

@DiegoTavares DiegoTavares merged commit 7e05c4e into AcademySoftwareFoundation:master Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants