alat error on claix18
Original problem description:
The 'alat error' sometimes occurs when submitting kkr_imp_wc
. It is discernible by repeated output of
[read_potential] error ALAT value in potential is does not match
in the remote/retrieved file _scheduler-stderr.txt'
. The error comes from here in the jukkr source code. The exit_status
pattern is thus:
kkr_imp_wc [144]
└── kkr_imp_sub_wc [130]
└── KkrimpCalculation [302]
The 'alat error' first turned up while trying to run on iffslurm
in ~March. See resp. iffchat discussion with ruess
/ mozumder
. Found no solution there.
Then, long after update of aiida-core
to v1.6.1 and immediately after 'recovery' from hard quota limit crash on claix18
on 2021-04-28 (see card), the error suddenly appeared also on claix18
. All recently submitted wcs failed with that error.
UPDATE 2021-08-06:
At AI meeting, showed my progress. Roman picked up on fact that I had only achieved 70% completion with the 10k kkr workchains. I said due to ALAT error. He said he had this also. We met in PM afterward to discuss.
Roman explained that the Angstrom to Bohr conversion function get_Ang2aBohr
in masci-tools
which is used by aiida-kkr, changed the value of the constant ANG_BOHR_KKR
. This broke all his scripts. He could not restart kkr calcs anymore.
What is happening is that calcs performed before this change used the old value, after this change the new value. So you can''t use old values anymore to restart, or use as inputs, as I do in data generation Step4.
Source code links:
-
masci-tools get_Ang2aBohr conversion function. shows also where it is used in
aiida-kkr
. - masci-tools ANG_BOHR_KKR conversion constant
(off-topic background: Roman said that the reason for the change was that they had agreed to use 12 decimal places cutoff, cause the last four in 16 only contain numerical noise and not worth the effort. can't remember why he told me this, but i guess cause the new value changed in the significant digits. or that the kkr guys agreed on 12, but the fleur guys on sth different.)
Roman says, there are two solutions to the ALAT error problem.
Solution 1):
It says so this in constants.py
:
For masci-tools versions after ``0.4.6`` the constants used in the KKR functions are replaced
by the NIST values by default. If you still want to use the old values
you can set the environment variable MASCI_TOOLS_USE_OLD_CONSTANTS to True
This is also what Roman told me. See also the respective code location of the corresponing commit.
Roman said that you only have to set the env var where aiida runs, so in my case on my iff workstation, for example in my .bashrc
. And then it should work out of the box.
Finding out when excactly the value changes of the three constants, angstrom to
bohr, bohr to angstrom, and rydberg to electron volt, in masci-tools
common_functions.py
and constants.py
occurred. Up to Feb 16, 2021, these
were defined by the three functions in common_functions.py
: get_Ang2aBohr()
,
get_aBohr2Ang()
, get_Ry2eV()
. Feb 16, 22021, they got replaced by NIST
constants definitions in constants.py
with differing values. Starting from Apr
28, 2021, the old functions were reintroduced for KKR dependent modules in
masci-tools. But now the functions drew the values from underlying constants in
constants.py
, instead of defining them themselves: ANG_BOHR_KKR
, 1.0 / ANG_BOHR_KKR
, EV_TO_RY_KKR
. In addition, starting Apr 28, 2021, there were
two versions of them, with 'new' being default and 'old' available via the
environment variable switch MASCI_TOOLS_USE_OLD_CONSTANTS='True'
.
We see in the summary table below that the 'ang 2 bohr constant' and the 'bohr to ang constant' had three different values over that time, while the 'ry to ev constant' had two.
Summary Table
date | commit hash | type | ang 2 bohr constant | bohr to ang constant | ry to ev constant |
---|---|---|---|---|---|
2018-10-26 | 04d55ea | 'old' | 1.8897261254578281 | 0.5291772106700000 | 13.605693009000000 |
2021-02-16 | c171563 | 'interim' | 1.8897261249935897 | 0.5291772108000000 | 13.605693122994000 |
2021-04-28 | 66953f8 | 'old' | 1.8897261254578281 | 0.5291772106700000 | 13.605693009000000 |
2021-04-28 | 66953f8 | 'new' | 1.8897261246257702 | 0.5291772109030000 | 13.605693122994000 |
Solution 2)
As Roman said, if you do a completely new calculation, it should work fine with the new constant values. So if Solution 1) fails, for the remaining 3000 workchains, I could run a completely new pipeline starting from Host SCF, and then they should work.