Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Actually indicated fatal bug in the Linux kernel Infiniband layer (RDMA) and not in Lustre itself. In these kind of circumstances it is important to dig deeper and not assume Lustre or LNET to be the root cause.

4.1 Changelog reader

You can enable Lustre changelogreader. Note that enabling log reader will keep accumulating changelogs even if the service using them, has been stopped. You have to register changelog for all metadata servers. Note that one reader only exits once, eg. you cannot create reader cl1, then remove cl1 and recreate it:

lctl --device lustre-MDT0000 changelog_register

And disable it with following. Disabling changelog reader will clean stored logs. You have to deregister changelog for all metadata servers.

lctl --device lustre-MDT0000 changelog_deregister cl1

See current readers

lctl get_param mdd.<lustre>-MDT0000.changelog_users

Clear indication that you are starting to run into a problem with accumulated changelogs can be found from demesg:

LustreError: 5144:0:(mdd_dir.c:1061:mdd_changelog_ns_store()) lustre-MDD0001: cannot store changelog record: type = 1, name = 'sh-thd-1849266489385', t = [0x24000faf3:0x1d:0x0], p = [0x240000400:0xf:0x0]: rc = -28

4.1.2 Changelog reader and orphan objects

...

Lustre lfsck has been rewritten. It can be run concurrently in production system, and could be included in a cron job that runs periodically.

...