Version information
This version is compatible with:
- ,
Start using this module
Add this module to your Puppetfile:
mod 'HEPPuppet-htcondor', '2.4.3'
Learn more about managing modules with a PuppetfileDocumentation
Puppet module for HTCondor batch system
Latest stable version: https://github.com/HEP-Puppet/htcondor/releases/tag/v2.4.3
Development branch: https://github.com/HEP-Puppet/htcondor/tree/development
Puppetforge: https://forge.puppetlabs.com/HEPPuppet/htcondor
Table of Contents
- Overview - What is the htcondor module?
- Module Description - What does the module do?
- Setup - The basics of getting started with htcondor
- Singularity container support
- Kerberos authentication support
- Additional logging parameters
- Additional custom parameters
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
Overview
The htcondor modules allows you to set up a HTCondor cluster (https://research.cs.wisc.edu/htcondor/). It depends on several other modules, including puppetlabs/(stdlib|concat|firewall). Please check the metadata.json for detailed dependencies.
Module Description
An HTCondor cluster consists of at least three types of nodes:
- a worker for executing the jobs
- a scheduler for job submission
- a collector/negotiator to match jobs with workers
This puppet modules allows for the configuration of these three types of nodes plus a fourth one:
- a remote_submit, for local users to login and have a configured condor_submit
Setup
What the htcondor module affects:
- configuration files and directories (/etc/condor/*)
- installation of htcondor software (condor* packages)
- a new fact for facter: condor_version
Beginning with HTCondor
Since admins might wish to run their own repository or disable repositories after install, the HTCondor repository is no longer included in the Puppet module since version 2.0.0. Therefore, the first step is to install the latest HTCondor repository for your OS (https://research.cs.wisc.edu/htcondor/yum/):
yum install -y https://research.cs.wisc.edu/htcondor/yum/repo.d/htcondor-stable-rhel6.repo
If you wish to use a pool password for authentication you will need to create one first: condor_store_cred -f <path_to_htcondor_module>/files/pool_password
.
Examples
hiera
config examples can be found in the examples folder. They describe a minimal example of
- settings shared across different node types:
htcondor_common.yaml
- settings for managers (nodes that run collector & negotiator daemons):
htcondor_manager.yaml
- settings for schedulers:
htcondor_scheduler.yaml
- settings for worker nodes:
htcondor_common.yaml
- settings for remote submit nodes:
htcondor_remote_submit.yaml
The examples assume class management in hiere by adding hiera_include('classes')
to the site.pp
.
Real life examples can be found in https://github.com/uobdic/UKI-SOUTHGRID-BRIS-HEP.
Custom machine/job attributes
Sometimes it is necessary to create custom attributes for condor. Machine attributes can be used
in job requirements (e.g. HasMatLab = True
) and job attributes for job reporting/monitoring (e.g. HEPSPEC06 = 14.00
).
To specify the attributes in hiera simply add
htcondor::custom_attributes:
- HasMatLab: True
...
and for job attributes
htcondor::custom_job_attributes:
- HEPSPEC06: 14.00
- CPUScaling: 1.04
...
Although the use is identical, they are put into different places. custom_attributes
end up added to the STARTD_ATTRS
and custom_job_attributes
are added to STARTD_JOB_ATTRS
.
Singularity
The module also provides support for Singularity containers to the extent to which this is implemented in HTCondor. As compared to e.g. Docker containers, Singularity containers are less isolated and can run without a privileged daemon, granting the user the same permissions inside the container as the user would have on the underlying host. Hence, they are ideal to run HPC jobs.
Example configuration parameters could be:
use_singularity => true,
force_singularity_jobs => true,
singularity_image_expr => '"/images/myimage.img"',
singularity_bind_paths => ['/some_shared_filesystem', '/pool', '/usr/libexec/condor/'],
singularity_target_dir => '/srv',
starter_job_environment => { 'SINGULARITY_HOME' => '/srv' },
mount_under_scratch_dirs => ['/tmp','/var/tmp'],
This forces all jobs to run inside Singularity containers, while offering tmp
space inside the container, and binding a shared filesystem mount point and HTCondor-specific directories inside.
The binding of the two HTCondor specific directories is a workaround to allow interactive jobs to run, this will hopefully be fixed in a future HTCondor release.
The same holds for setting SINGULARITY_HOME
: This ensures non-interactive jobs start in the job's working directory instead of the user's home directory which might not even be accessible from the worker.
The Image in this example is a simple string, but the variable allows to set an HTCondor expression. Hence, for a simple string, one needs to add explicit double quotes. One more complex example relying on a custom JobAd variable ContainerOS
would be:
singularity_image_expr => 'ifThenElse(TARGET.ContainerOS is "Ubuntu1604", "/somewhere/Ubuntu1604", "/somewhere/SL6")',
More details on that are provided in the HTCondor documentation.
Kerberos
The module provides support for Kerberos auth, to the extent to which this is implemented in HTCondor.
Example configuration parameters could be:
use_kerberos_auth => true,
krb_srv_keytab => '/etc/condor/condor.keytab',
krb_srv_principal => 'condor-daemon/$(FULL_HOSTNAME)@MYREALM',
krb_srv_user => 'condor-daemon',
use_krb_map_file => true,
krb_mapfile_entries => {'REALM1' =>'realm1', 'REALM2' => 'realm2'},
This will deploy a map file containing the entries listed in the krb_mapfile_entries
hash. The keytab, however, is not deployed through this module and has to be placed to a path corresponding to krb_srv_keytab
, with the appropriate owner and mode.
Logging
If you want HTCondor to use custom logging parameters, you may specify use_custom_logs
and the logging_parameters
hash with the {parameter_name => desired_value}
form. For example:
use_custom_logs => true,
logging_parameters => { 'SCHEDD_DEBUG' => 'D_NETWORK,D_PROTOCOL', NEGOTIATOR_DEBUG' => 'D_FULLDEBUG', ... }
Please note that no verification is applied, you have to carefully check your syntax to ensure daemons will restart correctly.
If you want HTCondor to log to syslog, there's a specific log_to_syslog
boolean predefined, which defaults to false. To enable it:
use_custom_logs => true,
log_to_syslog => true,
Custom parameters
If you want HTCondor to use custom parameters which are not managed elsewhere in the module, you may specify custom_knobs
hash with the {parameter_name => desired_value}
form. For example:
custom_knobs => { 'CLAIM_PARTITIONABLE_LEFTOVERS' => 'false', ... }
Please note that:
- no verification is applied, you have to carefully check your syntax to ensure daemons will restart correctly
- these parameters will be deployed on all nodes (workers, schedulers and managers)
Limitations
General
Development
Contributing
Running tests
Please run
bundle exec rake validate && bundle exec rake lint && bundle exec rake spec SPEC_OPTS='--format documentation'
and make sure no errors are present when submitting code.
Generating changlelog
export CHANGELOG_GITHUB_TOKEN<your github token>
github_changelog_generator -u hep-puppet -p htcondor
Release instructions
export CHANGELOG_GITHUB_TOKEN<your github token>
export RELEASE=2.0.7
make release
# follow instructions
Change Log
v2.4.3
Merged pull requests:
v2.4.2 (2019-05-21)
Closed issues:
- Typo in params.pp #114
Merged pull requests:
v2.4.1 (2019-05-21)
v2.4.0 (2019-05-21)
Merged pull requests:
- Puppet 5 & Puppet 6 in CI #113 (kreczko)
- Add schedd_blocked_users and block message. #111 (olifre)
- Add dns_cache_refresh parameter. #110 (olifre)
- defrag params customizables #109 (ccnifo)
v2.3.1 (2019-05-21)
Merged pull requests:
v2.3.0 (2019-05-21)
Merged pull requests:
v2.2.0 (2019-05-21)
Closed issues:
- QUEUE_SUPER_USERS not set #97
Merged pull requests:
- Fix lint errors on Travis test #112 (ccnifo)
- History and logging #105 (ccnifo)
- fix requirements range error for travis test #104 (ccnifo)
- Make healthcheck_period configurable. #102 (olifre)
- Add a "remote_submit" role #101 (ccnifo)
- Make claim_worklife configurable. #100 (olifre)
- Queue super users #99 (olifre)
v2.1.0 (2018-07-17)
Implemented enhancements:
- Add general knob turning #57
Merged pull requests:
v2.0.9 (2018-07-17)
Merged pull requests:
- CGROUP_MEMORY_LIMIT_POLICY customizable #96 (ccnifo)
- parameterize healthcheck script path #93 (ccnifo)
v2.0.8 (2018-05-31)
Closed issues:
- [Regression] Unable to specify actual expression for SINGULARITY_IMAGE_EXPR since 2.0.7 #91
Merged pull requests:
v2.0.7 (2018-05-31)
Merged pull requests:
v2.0.6 (2018-05-02)
Merged pull requests:
v2.0.5 (2018-05-02)
Merged pull requests:
v2.0.4 (2018-02-05)
Fixed bugs:
- CGROUPS setup in 20_workernode.config.erb #78
Closed issues:
- Badly initialized variables #79
- healhcheck script mode #77
- Some changes aren't working #73
- Changes in HTCondor 8.6.1 #58
Merged pull requests:
- Fixing permissions for worker healthcheck script (issue #77) #87 (kreczko)
- config::worker Fix puppet lint warning. #86 (olifre)
- Repair metadata.json syntax. #85 (olifre)
- Remove MOUNT_UNDER_SCRATCH if no folders are defined #84 (kreczko)
- Add parameters gpgcheck and gpgkey #83 (wiene)
- Central manager HA with shared port #82 (wiene)
- Setting of correct SELinux context for pool directory if we create it. #81 (olifre)
v2.0.3 (2017-11-03)
Merged pull requests:
- Fixing CGroup issue and badly initialized variables #80 (kreczko)
- Add starter environment configuration #76 (olifre)
- init: Fixup parameter default values. #75 (olifre)
- Remove bad quotes from MOUNT_UNDER_SCRATCH variable. #74 (olifre)
- Allow to turn off debug notification. #72 (olifre)
- Add singularity support. #71 (olifre)
- security: Change CM authentication to use ALLOW instead of HOSTALLOW. #70 (olifre)
v2.0.2 (2017-07-17)
Closed issues:
- Changes in CentOS7 cgroup setup #59
Merged pull requests:
- updated to version (2.0.2) and added changelog #69 (kreczko)
- Implement SSL authentication #65 (olifre)
- htcondor::security Pull CERTIFICATE_MAPFILE out of krb-auth dependency. #64 (olifre)
- Allow to specify the source for certificate and kerberos map files. #63 (olifre)
- htcondor::sharedport: Add configuration for condor_shared_port daemon. #62 (olifre)
- Fix baseurl for yum repositories #61 (wiene)
v2.0.0 (2017-07-14)
Implemented enhancements:
Merged pull requests:
v1.3.1 (2017-05-18)
Implemented enhancements:
- Add profiles #46
- Simplify parameters #44
- Repository clean up #43
- Towards version 2.0 - part 3 #55 (kreczko)
- Step 2 towards version 2.0 #54 (kreczko)
- Big simplifications #53 (kreczko)
Closed issues:
Merged pull requests:
- Version 1.3.1 #52 (kreczko)
- [New feature] high-availability deployment for multiple managers #51 (kreczko)
- 2016 spring clean #50 (kreczko)
v1.3.0 (2016-01-29)
Implemented enhancements:
Closed issues:
Merged pull requests:
v1.2.0 (2014-11-12)
Merged pull requests:
- Added kerberos map file #37 (kreczko)
- setting default FILESYSTEM_DOMAIN to FQDN (not all WNs have shared FS) #36 (kreczko)
- Improving auth method setting #35 (kreczko)
- Condor id fix, better tests and puppet-lint fixes #33 (kreczko)
v1.1.0 (2014-10-16)
Closed issues:
- htcondor files ownership #14
Merged pull requests:
- Exclude condor-i686 and few other bits #32 (kashif74)
- A few small changes #31 (kreczko)
- New feature: ganglia #30 (kreczko)
- New feature: kerberos #29 (kreczko)
- Queues with hiera or config #28 (kreczko)
- Fairshares updated #27 (kreczko)
- email, default and filesystem domains #26 (kreczko)
- Request memory #25 (kreczko)
- adding option for DEV repositories #24 (kreczko)
- Defrag and partitionable slots #23 (kreczko)
- fix ":" -> "=" #22 (kreczko)
v1.0.0 (2014-08-07)
[Full Changelog](https://github.com/hep-puppet/htcondor/compare/New features...v1.0.0)
Closed issues:
Merged pull requests:
- Added defrag and healthcheck #19 (kashif74)
- make sure condor_reconfig is not run before service is up #18 (fschaer)
- allow user-defined templates to be specified #17 (fschaer)
- Fix3 #16 (fschaer)
- specify file ownership and allow for user (root) override, as this is #15 (fschaer)
- be librarian-puppet friendly #13 (fschaer)
- Changes for seperate scheduler configuartion #12 (kashif74)
- Fixes for Nagios tests #11 (kreczko)
- Fairshare fixes #10 (kreczko)
- Fixes for issues #4 and #8 + other stuff #9 (kreczko)
- Updating things for productin #7 (kreczko)
- new version #6 (kashif74)
- Added priority to repo #5 (kashif74)
- First working version of Puppet module for HTCondor #1 (kreczko)
Version 2.0.0
Version 2.0.0 brought big changes to the module. The biggest change is a structural one.
htcondor::params.pp
was added to set defaults for all the parameters.
In addition, parameters are attempted to be read via hiera
first. Full merge
support for hashes and arrays is provided.
With these changes the htcondor::config.pp
was split into six pieces:
- the main config setting up the rest
- a common config part
- the security configuration
- separate configs for manager, scheduler & worker The full detail of these changes can be seen in PR 53.
New features
- configure connection broker for private workers (i.e. workers that cannot be reached from the manager or scheduler but can reach the manager).
- enabled
ganglia
daemon for schedulers (previously only possible on managers) - flag to enable condor reporting, disabed by default
- added
use_anonymous_auth
- added
custom_machine_attributes
andcustom_machine_attributes
which can be used to add classads forSTARTD_ATTRS
andSTARTD_JOB_ATTRS
Bug fixes
- daemon list would be incorrect for some versions of Ruby. This was due to the use of
and
andor
operators which is incorrect for boolean comparisons. - added missing
cluster_has_multiple_domains
parameter (w.r.t to 2.0.0 beta) - removed repository dependency if it is disabled
Other
- changed config templates to ensure new line at the end of the file and reduced the use of
-%>
- workers are no longer able to write to schedulers by default
- new formatting for the security config: one line per entry for manager/scheduler/worker
- removed
use_pkg_config
parameter. - no longer changing
/etc/condor/condor_config
nor/etc/condor/condor_config.local
as recommended by the HTCondor team - content previously in
/etc/condor/condor_config.local
now in/etc/condor/config.d/00_config_local.config
* This Change Log was automatically generated by github_changelog_generator
Dependencies
- puppet/selinux (> 1.0.0)
- puppetlabs/apt (> 6.2.1)