Author Topic: Open: Catalog sometimes sluggish to repond after query change  (Read 865 times)

Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2474
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
(this was posted as part of a different discussion, I've now split it so the discussion can focus on this issue separately)

Changing catalog queries sometimes is sluggish: if thumbnail generation hasn't finished yet with what is in the current search, it waits for that to finish before updating the contact sheet with the new/updated search results. This can be observed easily with a contact sheet with overly large images (for which thumbnail generation really takes a while). This can be confirmed by the experiments I performed in the browse mode of the catalog.

My catalog contains some 20K+ images (so not even overly many). I conducted the following experiment using the folder path to switch between sets of images.

1. Select a smallish folder, this is fairly quick.
2. Select another smallish folder. Small delay, but images start rendering (some still have red names showing).
3. Switch back almost directly, this actually takes noticable time.
4. Wait a while and repeat switching. Now the switches are performed quite quickly (on par with lightroom?).
5. Open folder in which I know are some really big images (some even >2G).
6. Switch back to previous smallish folder => tremendous long wait.
7. Back to big folder (which shows faster than before due to caching of previews I guess).
8. Scroll a bit and make sure there are ample "red" images then switch back to small folder from before, again long wait.
9. Go back to big folder, scroll a bit, but now wait until all "red" images are black
10. Switch back to small folder: fairly quick again.

All in all I'm pretty sure the sluggishness is caused by the background rendering holding up things.

By the way, I never see PM use more than two cores of my 8 core processor system are things really performed in multiple threads?
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2474
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #1 on: May 25, 2019, 03:04:52 AM »
This still looks to be a problem in beta 3215. I (also) performed some tests while on the background re-indexing one of the catalogs, this actually showed what I suspected as well: the catalog (database) operations themselves are performed in a single thread, even if they are operating on different catalogs. This feels like a thing that could (should) be improved upon.
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline esambo

  • Newcomer
  • *
  • Posts: 28
    • View Profile
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #2 on: May 25, 2019, 10:15:07 AM »
the pm-task process and its open files indicates that it is using Ruby for the background processing, which isn't the best choice for fast distributed computing. The Elixir programming language is inspired by Ruby, but a lot faster and is build to natively take advantage of multiple cores, processors and computers.

Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #3 on: May 25, 2019, 02:49:10 PM »
Hi Hayo,

5. Open folder in which I know are some really big images (some even >2G).
6. Switch back to previous smallish folder => tremendous long wait.

Thanks, interesting.

I haven't done performance tuning on PM with very large images in awhile. So I'll definitely be looking into this.


All in all I'm pretty sure the sluggishness is caused by the background rendering holding up things.

By the way, I never see PM use more than two cores of my 8 core processor system are things really performed in multiple threads?

PM's central image production pipeline is heavily multi-threaded, and we have some nice tools for analyzing and visualizing its performance.

It is usually I/O bound. Even on a fast SSD, PM's threads go idle waiting for data from the disk.

Consider the following results from my last round of performance tuning:
Quote
Test case is a folder of 1,000 JPEG images of varying sizes, totaling 5.2 GB, located on an SSD partition.

A separate test program that does nothing but spawn threads (6C+6HT=12 thread CPU here) and read these files as fast as possible, takes a certain baseline amount of time:

  found 1000 files
  using 12 threads, read 5.226 GB in 19.665125 sec (272.115 MB/sec)

In approx. the same amount of time, 19.966 seconds, PM is able to do the following with that same set of images:

  read and parse image metadata
  load and decode jpeg
  scale bitmap to thumbnail size
  encode thumbnail to jpeg
  write jpeg back to PM disk cache
  render images to contact sheet
  scroll from beginning to end of contact sheet as images become ready
In other words, we're processing the data at the limit of how fast we can get it off the SSD. A lot of CPU cores are idle because we just can't get the data any faster.


That said, I can see how the processing of very large images could lead to stalls in terms of prompt delivery of images when switching between contact sheets.

I'll need to investigate, and consider mitigation strategies.


Regards,

Bill





Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #4 on: May 25, 2019, 02:52:18 PM »
I (also) performed some tests while on the background re-indexing one of the catalogs, this actually showed what I suspected as well: the catalog (database) operations themselves are performed in a single thread, even if they are operating on different catalogs. This feels like a thing that could (should) be improved upon.

This is indeed on our to-do list for Catalog.


Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #5 on: May 25, 2019, 03:00:07 PM »
Hi esambo,

the pm-task process and its open files indicates that it is using Ruby for the background processing, which isn't the best choice for fast distributed computing.

Fortunately, we don't use Ruby for the heavy lifting, but rather as glue for calling out to C and C++.

But indeed: There are some subsystems used by Catalog which, while currently single-threaded, are intended to be multi-threaded in future.

Regards,

Bill



Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2474
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #6 on: June 21, 2019, 12:28:58 AM »
Though in the 3328 notes you mentioned that you are now multithreading catalog tasks, the sluggishness I mentioned (especially around steps 5 and 6) is still there. Looks like building the previews is still hogging the contact sheet switch, instead of being cancelled/put in the background.
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #7 on: June 21, 2019, 10:01:23 AM »
Hi Hayo,

Though in the 3328 notes you mentioned that you are now multithreading catalog tasks, the sluggishness I mentioned (especially around steps 5 and 6) is still there. Looks like building the previews is still hogging the contact sheet switch, instead of being cancelled/put in the background.

The catalog worker threads in question relate only to database operations, and don't process pixels.

Does the step 5-6 sluggishness you describe only happen on search result contact sheets? Or does it also happen on regular folder-based contact sheets, where catalog is not involved?



Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #8 on: June 21, 2019, 05:54:27 PM »
Update: I've been able to reproduce the step 5-6 issue using 2GB+ files, without Catalog being in the mix.

On the one hand, there are the anticipated architectural constraints: If the cache workers are all busy loading/decoding/scaling/color managing huge images, switching contact sheet tabs does not dispatch a squad of machete wielding maniacs to cut short the lives of the busy worker threads, so there's a delay before images can be produced for the new contact sheet.

On the other hand, I'm seeing some instances where a worker thread waits on an I/O lock associated with a file already being worked on by another thread, which was unexpected, and can/should be improved. (But this will only help a little with the overall issue.)


Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2474
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #9 on: June 22, 2019, 12:24:34 AM »
Hi Bill, thanks for looking into this! What you describe is indeed what I thought was going on. Not wanting to kill the threads working on creating previews makes sense (in most cases), I think. Just the fact that they then basically hog the whole system is a bit annoying and is perhaps something that can be improved/changed, e.g. by spawning more threads?

If I understand you correctly, it was possible that new threads got started for images that were already being processed (e.g. when switching back to a query/contact sheet with those same images). This should, of course, be prevented since that would consume resources unnecessarily.
Hayo Baan - Photography
Web: www.hayobaan.nl

Offline Bill Kelly

  • Software Developer
  • Camera Bits Staff
  • Member
  • *****
  • Posts: 59
    • View Profile
    • Camera Bits, Inc.
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #10 on: June 23, 2019, 10:16:26 PM »
Just the fact that they then basically hog the whole system is a bit annoying and is perhaps something that can be improved/changed, e.g. by spawning more threads?

If I understand you correctly, it was possible that new threads got started for images that were already being processed (e.g. when switching back to a query/contact sheet with those same images). This should, of course, be prevented since that would consume resources unnecessarily.

There are multiple constraining factors, currently:

- Fixed pool of image worker threads (based on cpu cores)
- Workers not designed to be interruptible
- On-screen images have priority, but off-screen image precache is scheduled when otherwise idle
- Multiple threads competing for I/O on a non-SSD drive is slow, so exclusive locking is used to serialize I/O

So, for example, simply spawning more worker threads doesn't help if they must ultimately wait for the I/O lock, owned by another thread loading a huge file.

That said, I can see ways to improve the existing system (making I/O serialization somewhat more granular; etc.)

Nevertheless, multiple constraints must be eased to solve this.



Offline Hayo Baan

  • Uber Member
  • ******
  • Posts: 2474
  • Professional Photographer & Software Developer
    • View Profile
    • Hayo Baan - Photography
Re: Open: Catalog sometimes sluggish to repond after query change
« Reply #11 on: June 23, 2019, 10:57:17 PM »
Hi Bill,

I fully understand all the constraints and also why more threads may not help (much). I do usually find that more threads than cores still sometimes is faster though since during the wait on e.g. disk access of one thread, another can still perform computations.

That said, perhaps the only real improvement you can make here is making the worker threads interruptible so they can be stopped if no longer necessary (because the images they were working on are no longer there). This might not be the easiest thing to do well though because e.g. a different query might still have (some of) the images in view and then killing the thread would mean having to redo (part of) the killed computation.

Hmmm, tough one to crack :(

Without catalog, this actually wouldn't be such an issue (if at all) since then you normally spend longer with a single contact sheet. But with the catalog, I expect frequent changes of the images shown, especially when building up queries, so then the problem becomes more apparent.
Hayo Baan - Photography
Web: www.hayobaan.nl