Author Topic: Very slow content updating - edited  (Read 7441 times)

Online Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24908
    • View Profile
    • Camera Bits, Inc.
Re: Very slow content updating - edited
« Reply #30 on: August 26, 2022, 09:34:31 AM »
David,

I can now remove the missing files but it takes an impractically long time. The 280 files above took ~20 minutes and I have almost 100,000. At the same rate that would take 5 days. I've just tried another a batch and the removal time seems consistent at ~4.5 seconds/file. Is that to be expected?

No.

I was hoping not. Any thoughts as to what I might try to speed this up?

I can't think of any reason why it would be so slow.  I do not have a solution for you at this time.

-Kirk

Offline DavidHoffmanuk

  • Sr. Member
  • ****
  • Posts: 313
    • View Profile
Re: Very slow content updating - edited
« Reply #31 on: September 03, 2022, 02:05:45 AM »
I've made some progress on understanding the slow removal or deletion of files from the catalog.

Normally removing files from the catalog is fast, 100 files take a few seconds. Ive reorganised my storage of a few 100,000 images. After moving & renaming folders, changing drives and doing a full sync the newly moved files are in the catalog as they should be. The offline & missing files in the sync folder are the previous versions 'lost' through moving and I want to remove them from the catalog. When I try to do this each missing file takes more than 5 seconds. I have more than 150,000 of these files and this is hardly practical.

Looking at these files they all have a path similar to this:

Path amdoc///33e66c5f-889d-4857-5b29-c3b069931951/Archive%203%20TIFFs%20&%20PSDs/Pre%202019%20TIFFs/L53-30-1.tif

There is no "amdoc' folder, not even a hidden folder, named "amdoc" on my Mac. The next part of the path has also been generated by PM+ and looks similar to the proxies naming. (I'm wondering if these files are the previews relating the files that are now offline/missing?) The next part of the path is the original (now removed) folder name with the spaces replaced by %20.

I have removed about 50,000 of the missing/offline by leaving the process running overnight over the last week but when I did another full sync the 'offline' collection had  increased by another 50,000 files, all with an amdoc path. Searching PM+ for amdoc reveals nothing.

How can I get rid of these offline/missing files and prevent PM+ generating new ones?

David

Online Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24908
    • View Profile
    • Camera Bits, Inc.
Re: Very slow content updating - edited
« Reply #32 on: September 06, 2022, 09:07:59 AM »
David,

I've made some progress on understanding the slow removal or deletion of files from the catalog.

Normally removing files from the catalog is fast, 100 files take a few seconds. Ive reorganised my storage of a few 100,000 images. After moving & renaming folders, changing drives and doing a full sync the newly moved files are in the catalog as they should be. The offline & missing files in the sync folder are the previous versions 'lost' through moving and I want to remove them from the catalog. When I try to do this each missing file takes more than 5 seconds. I have more than 150,000 of these files and this is hardly practical.

Looking at these files they all have a path similar to this:

Path amdoc///33e66c5f-889d-4857-5b29-c3b069931951/Archive%203%20TIFFs%20&%20PSDs/Pre%202019%20TIFFs/L53-30-1.tif

There is no "amdoc' folder, not even a hidden folder, named "amdoc" on my Mac. The next part of the path has also been generated by PM+ and looks similar to the proxies naming. (I'm wondering if these files are the previews relating the files that are now offline/missing?) The next part of the path is the original (now removed) folder name with the spaces replaced by %20.

All paths in the catalog are stored as 'amdoc' and the local state database is consulted for final location mapping.  When no mapping can be found, that path is displayed.  It is URL-encoded (spaces become %20 in URLs).

I have removed about 50,000 of the missing/offline by leaving the process running overnight over the last week but when I did another full sync the 'offline' collection had  increased by another 50,000 files, all with an amdoc path. Searching PM+ for amdoc reveals nothing.

How can I get rid of these offline/missing files and prevent PM+ generating new ones?

Moving files/renaming folders outside of Photo Mechanic Plus has that effect.

How many are remaining to be processed at this time?

-Kirk

Offline DavidHoffmanuk

  • Sr. Member
  • ****
  • Posts: 313
    • View Profile
Re: Very slow content updating - edited
« Reply #33 on: September 07, 2022, 02:58:45 AM »
Hi Kirk

Thanks for clarifying what's going on with the 'amdoc' paths.

I did try to use the Navigator for the moving and renaming process but it's not really suitable for making numerous moves, renames & deletions of files and folders. It's fine for copying the occasional folder but far too slow and clunky to manage tens of thousands of files at a time so I had no realistic option but to do the work outside of PM+. Maybe in the future...

There are sill just under 60,000 files missing/offline offline files with that path so if there is a quicker way to get rid of them then that would be very welcome.

David

Online Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24908
    • View Profile
    • Camera Bits, Inc.
Re: Very slow content updating - edited
« Reply #34 on: September 07, 2022, 09:10:19 AM »
David,

Thanks for clarifying what's going on with the 'amdoc' paths.

I did try to use the Navigator for the moving and renaming process but it's not really suitable for making numerous moves, renames & deletions of files and folders. It's fine for copying the occasional folder but far too slow and clunky to manage tens of thousands of files at a time so I had no realistic option but to do the work outside of PM+. Maybe in the future...

There are sill just under 60,000 files missing/offline offline files with that path so if there is a quicker way to get rid of them then that would be very welcome.

It sounds like a lot of work is being done per item removed that likely could wait until all of the items are removed.  This would have to be improved in the software and a new build released.  I have no solution for you at this time, but if you'll share your logs (with catalog logging turned on) while it is removing the items then that may help me figure out why it takes so long to remove an item.

-Kirk

Offline DavidHoffmanuk

  • Sr. Member
  • ****
  • Posts: 313
    • View Profile
Re: Very slow content updating - edited
« Reply #35 on: September 07, 2022, 10:25:21 AM »
Hi Kirk

I'm fairly sure that you're right about PM+ doing a great deal work in this process. Activity monitor supports that too.

I'm away on holiday until late next week so I'll start the logging and pick this up again when I'm back.

David

Offline DavidHoffmanuk

  • Sr. Member
  • ****
  • Posts: 313
    • View Profile
Re: Very slow content updating - edited
« Reply #36 on: September 23, 2022, 04:09:46 AM »
Hi Kirk, sorry to be slow picking this up again.

In late August, after I'd changed some drives and moved or renamed many folders and their contents outside of PM I found around 150,000 files following a 'select missing'. I made a collection with these files and tried to remove them all. Although Activity Monitor showed PM busy nothing appeared to have happened after many hours. I now think that I should have been more patient and the operation would probably have completed in a few days. Instead I started removing them in batches that would only take a few hours each. To begin with this was taking 5-6 seconds per file. Batch removals from a much smaller test catalog were almost instantaneous.

As I proceeded the time per file decreased gradually until by the end it was only taking a little over 0.4 seconds/file. I wondered if PM was reading the whole list of missing files and their metadata again and again as it continued the deletions.

Once I'd removed all the missing files (the current copies were now in their new locations and in the catalog) I copied the contents of an external drive to one of the internals and removed the external drive. After a full sync the collection showed (I think) around 80,000 files offline which seemed about right.

At this point I had to leave London for 10 days and the Mac was shut down. When I returned I turned debugging on and selected batches of those offline files to remove them. By the time I was removing the last files it was only taking about 0.23 seconds/file.

I added a test batch of 1,000 files to the catalog from an external drive. Once they were processed, I deleted them using the finder. A full sync showed 1,013 missing (I didn't follow up on that discrepancy) and deleting them took less than 2 minutes!

I don't have a debugging log file from the earlier period before I went away. Looking in the PM caches I see 3 PMLib files from that time but I've not seen anything else with a mod date around that time. I've attached those and the debugging report generated today. Is there anywhere else I should look for useful data?

Hoping you might make sense of some of this
David

Offline DavidHoffmanuk

  • Sr. Member
  • ****
  • Posts: 313
    • View Profile
Re: Very slow content updating - edited
« Reply #37 on: September 23, 2022, 04:15:40 AM »
One more PMLib file from the start of this problem.
David

Online Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24908
    • View Profile
    • Camera Bits, Inc.
Re: Very slow content updating - edited
« Reply #38 on: September 23, 2022, 08:17:55 AM »
David,

I don't have a debugging log file from the earlier period before I went away. Looking in the PM caches I see 3 PMLib files from that time but I've not seen anything else with a mod date around that time. I've attached those and the debugging report generated today. Is there anywhere else I should look for useful data?

No, those logs are all that is relevant.

-Kirk