Author Topic: Request to add the capability to utilize an image's catalog status.  (Read 1040 times)

Offline ejhutch

  • Newcomer
  • *
  • Posts: 45
    • View Profile
I'm making the following related feature requests for PM6+

Add the capability to utilize an image's catalog status (online, offline, unknown) to sort, filter, or otherwise categorize images displayed in a contact sheet and/or elsewhere in PM+ that would be useful.
Add the capability to display with each image and each image's thumbnail (and/or elsewhere in PM+ wherever it might be useful) an indicator of which catalog or catalogs an image is included in.

I am purposefully leaving out of my request the actual how of implementing these features, because it requires further consideration and discussion, but have identified a non-comprehensive list of some possible methods:

  • add the catalog status to the contact sheet Sort button
  • add the catalog status to a contact sheet filter button
  • provide the capability to search and filter the current contact sheet through a search box and / or a filter box that could search for and / or filter on image metadata (including catalog status) for all images included in the contact sheet
  • provide the capability to access the image's catalog status through a variable which could then be displayed in the text below the image thumbnail on the contact sheet and/or in text about the image when it is displayed in the image Preview or edit window
  • provide the capability to access which catalog or catalogs an image is included in through a through a variable or variable construct which could then be displayed in the text below the image thumbnail on the contact sheet and/or in text about the image when it is displayed in the image Preview or edit window
  • provide some other visual indicator of which catalog or catalogs an image is included in
  • provide the ability for users to enable or disable the display and/or use of some/any/all of the implemented capabilities


A few points of information to clarify my understanding of the current functionality:
  • PM+ manages groups of images by storing data about them in one or more "Catalogs", which are databases which contain references to the actual images themselves, and references to where in the computer system the actual image is stored. It does not store the actual images themselves like some other image catalog software does
  • When an image is added to a catalog, PM+ "knows" that that image has been added to a catalog.
  • PM+ uses this "knowledge" to display the image's catalog status in a contact sheet.
  • As long as the image is referenced in an active catalog that PM+ knows about, it will display the image's catalog status as either "online" or "offline" whether or not the catalog is enabled for searching or add/modify.
  • If the image is not referenced in a catalog that PM+ "knows about" (ie., it is not in a catalog that appears in the list of Active Catalogs, or it is in a catalog that does not appear in the list of active catalogs, or it is not in any PM+ catalog at all), then it displays the images catalog status as "unknown".
  • Images' catalog status is currently displayed on the image thumbnail and identified by a small filled-in circle near the lower right-hand corner of the thumbnail.
  • An image's Catalog status is currently identified by the color fill of the circle:
    Unfilled (background color): completely unknown to any catalog.
    Yellow: known to a catalog but currently offline or missing.
    Green: known to a catalog and currently online.

Users have made prior requests for:
  • a PM+ query to find every picture that is not a part of a catalog
  • a tool that would query a folder and let me know what photos are not in a catalog

I believe that those requests are well founded, and useful, but ambiguously worded.

While there are technical methods to query the PM+ databases and PM+ catalog databases for a list of images that PM+ knows about, and also search the entire filesystem to cross-reference the list of files that PM+ knows about with every image file that exists in the filesystem, those methods are complicated and require knowledge and capabilities far beyond the scope of what PM+ was currently designed to do and far beyond what a typical PM+ user should have in order to use the software.

However, I do believe that since PM+ can "know" an image's catalog status without making any external queries, that utilizing this information more broadly than just in the current display indicator would be very useful to the entire PM+ user base.

Additionally, some background information is provided below:

In a support request:

http://forums.camerabits.com/index.php?topic=14653.msg72117#msg72117

User Indy wrote: http://forums.camerabits.com/index.php?topic=14653.msg72109#msg72109

Quote
Well, I would welcome a tool that would query a folder and let me know what photos are not in a catalog.

But, for now, since a contact sheet "knows" the catalog status of each photo it would be great to filter or sort by that status.  Manual and time consuming?  Yes, but my current method is scrolling through a contact sheet looking at the catalog indicator with my eyes >>> very manual, very time consuming AND very prone to errors.

Thanks,

And I included the following in my reply http://forums.camerabits.com/index.php?topic=14653.msg72113#msg72113 after asking a few clarifying questions about whether PM+ contact sheets "know" if an image in the filesystem is part of a catalog:

Quote
I can actually see how it would be useful, while using the navigator and contact sheets to browse a directory hierarchy, to be able to sort by the dot (catalog) status, and even be able to have the catalogs an image has been added to be displayed somehow (and their online/offilne status per catalog because that could be different on a per catalog basis), perhaps as a variable or some other way.

Kirk answered some of Indy's questions, and some of mine as well, but didn't touch on sorting the contact sheet by the image dot status (whether the image is in a catalog, online or offline, or unknown to any currently active catalogs). Though he sort of did in another thread here: http://forums.camerabits.com/index.php?topic=14242.msg70550#msg70550:

Quote
There is no such query [a query that will allow [a user] to find every picture in [user's] collection that is not part of a catalog/Collection].  The Catalog system can only perform queries on images that it knows about.  Images that are not known to the catalog are not in its database and cannot be queried.

The color of the dots (the catalog status indicator) vary between:

Unfilled (background color): completely unknown to any catalog.
Yellow: known to a catalog but currently offline or missing.
Green: known to a catalog and currently online.

Since there were no prompts to make a feature request, I've taken the initiative and written this one up.  Hope it is for something that will be useful.

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24217
    • View Profile
    • Camera Bits, Inc.
Re: Request to add the capability to utilize an image's catalog status.
« Reply #1 on: December 02, 2021, 09:34:08 AM »
I'm making the following related feature requests for PM6+

Add the capability to utilize an image's catalog status (online, offline, unknown) to sort, filter, or otherwise categorize images displayed in a contact sheet and/or elsewhere in PM+ that would be useful.
Add the capability to display with each image and each image's thumbnail (and/or elsewhere in PM+ wherever it might be useful) an indicator of which catalog or catalogs an image is included in.

I am purposefully leaving out of my request the actual how of implementing these features, because it requires further consideration and discussion, but have identified a non-comprehensive list of some possible methods:

  • add the catalog status to the contact sheet Sort button
  • add the catalog status to a contact sheet filter button
  • provide the capability to search and filter the current contact sheet through a search box and / or a filter box that could search for and / or filter on image metadata (including catalog status) for all images included in the contact sheet
  • provide the capability to access the image's catalog status through a variable which could then be displayed in the text below the image thumbnail on the contact sheet and/or in text about the image when it is displayed in the image Preview or edit window
  • provide the capability to access which catalog or catalogs an image is included in through a through a variable or variable construct which could then be displayed in the text below the image thumbnail on the contact sheet and/or in text about the image when it is displayed in the image Preview or edit window
  • provide some other visual indicator of which catalog or catalogs an image is included in
  • provide the ability for users to enable or disable the display and/or use of some/any/all of the implemented capabilities

I think a sort method would be most appropriate, but it will be a client-side sort and will be slow if the number of images is large.  The catalog database itself doesn't know if an image is online or not based on a catalog database query.  The filesystem itself must be queried for each file being tested.

As for the second request, right-clicking on a photo and looking at the Catalog Info submenu should reveal which catalog(s) the photo has been added.

A few points of information to clarify my understanding of the current functionality:
PM+ manages groups of images by storing data about them in one or more "Catalogs", which are databases which contain references to the actual images themselves, and references to where in the computer system the actual image is stored. It does not store the actual images themselves like some other image catalog software does

Close, but not quite.  The catalog has a UUID for each image known to the catalog.  The actual path is known to the local Catalog State database (the one that sometimes needs reintegration) and the UUID is mapped to a path when it's time to show the image.

When an image is added to a catalog, PM+ "knows" that that image has been added to a catalog.

Correct.

PM+ uses this "knowledge" to display the image's catalog status in a contact sheet.

The catalog does not have a direct reference but in the end, yes, the status is known (see above about the UUID to local path mapping).

  • As long as the image is referenced in an active catalog that PM+ knows about, it will display the image's catalog status as either "online" or "offline" whether or not the catalog is enabled for searching or add/modify.
  • If the image is not referenced in a catalog that PM+ "knows about" (ie., it is not in a catalog that appears in the list of Active Catalogs, or it is in a catalog that does not appear in the list of active catalogs, or it is not in any PM+ catalog at all), then it displays the images catalog status as "unknown".
  • Images' catalog status is currently displayed on the image thumbnail and identified by a small filled-in circle near the lower right-hand corner of the thumbnail.
  • An image's Catalog status is currently identified by the color fill of the circle:
    Unfilled (background color): completely unknown to any catalog.
    Yellow: known to a catalog but currently offline or missing.
    Green: known to a catalog and currently online.

All correct.

Users have made prior requests for:
  • a PM+ query to find every picture that is not a part of a catalog
  • a tool that would query a folder and let me know what photos are not in a catalog

I believe that those requests are well founded, and useful, but ambiguously worded.

While there are technical methods to query the PM+ databases and PM+ catalog databases for a list of images that PM+ knows about, and also search the entire filesystem to cross-reference the list of files that PM+ knows about with every image file that exists in the filesystem, those methods are complicated and require knowledge and capabilities far beyond the scope of what PM+ was currently designed to do and far beyond what a typical PM+ user should have in order to use the software.

However, I do believe that since PM+ can "know" an image's catalog status without making any external queries, that utilizing this information more broadly than just in the current display indicator would be very useful to the entire PM+ user base.

I agree up to the point of knowing if a file is actually on-line or not.  The filesystem has to be queried for each file being tested.  On a local SSD, those queries are very fast (<1ms), but on a NAS, those queries will take many milliseconds each.  The time to complete these queries will contribute to the time to sort a given contact sheet.

Iterating through the entire filesystem (of drives that are currently mounted) and then checking to see if any of a number of catalogs knows about them is not something that we're going to do.  The sorting/filtering idea can be accomplished though it won't be as high performance as a query to the catalog database only.

-Kirk


Offline ejhutch

  • Newcomer
  • *
  • Posts: 45
    • View Profile
Re: Request to add the capability to utilize an image's catalog status.
« Reply #2 on: December 02, 2021, 11:14:08 AM »
Quote
Close, but not quite.  The catalog has a UUID for each image known to the catalog.  The actual path is known to the local Catalog State database (the one that sometimes needs reintegration) and the UUID is mapped to a path when it's time to show the image.

So, when a contact sheet is displayed, how/when does the system determine if the image is online or not and map the path to the UUID?

and it sounds to me like it makes that determination by querying the path to the image file for each image as it is initially displayed on the contact sheet when the contact sheet loads, as the contact sheet display process iterates through the list of images to be displayed.

So, is the process something like this:
  • I request to load a group of images a contact sheet through one of multiple methods currently available
  • The DB or some other source (like the filesystem) is queried to build the list of images to be displayed
  • A list of images to be displayed is built
  • The contact sheet tab and UI elements are displayed, with placeholders for each image in the list of images to be displayed
  • As that happens, I can start scrolling through the contact sheet or not
  • As image placeholders are displayed in the viewable area of the contact sheet, sources including the DB and filesystem are queried about each image to be displayed in order to retrieve the data about the image that the user has requested to be displayed. Then, the thumbnail for the image is displayed, and then the relevant retrieved image data is displayed for that image thumbnail.
  • This process seems to continue until the entire list of images to be displayed has been iterated through
  • if, at any point I scroll past the images that have already been loaded, the system either:
    • immediately requests and displays the thumbnails for the images newly scrolled into the viewable area and then each image's relevant data
    or
    • waits till the entire list of images has been iterated through and all relevant data has been loaded and then displays the image thumbnails and data for the images newly scrolled to in the viewable area
    • or some combination or some other behavior

That would also explain why I see the dots change from unknown to green or yellow when scrolling through a newly opened contact sheet.

It seems to me that when displaying a contact sheet of images that are cataloged, you still pull some amount of metadata about the image from the filesystem while iterating through the list of images to display. I wonder how much of that information could be pulled into the catalog and indexed to cut down on filesystem calls during that iteration and speed up loading the contact sheet when a filesystem is unavailable.

No wonder you're concerned that sorting/filtering would be slow for a large contact sheet with files from many different locations.

That must already slow down contact sheet display for images on a NAS or some other slower storage location.

Also, when PM+ is running, is it holding in memory a list of all known UUIDs? How or when does it compare a displayed image to it's known UUID to see if it is known or not? Or is it an indexed query that can still be run very quickly even when iterating through a large list of images?

Is there a way to, perhaps, remember (dynamically build a list of) the folderpath and parentfolders up to the root of the path to an image as each image is displayed on the contact sheet, and then compare each subsequent image to the dynamically built list of known "online" or "offline" paths, so that if an entire NAS or directory tree is unavailable, for example, any image under that path would very quickly be able to be marked as "offline" without having to be queried for each sort request?

 Also, as the list of paths and path parts is built, any subsequent images that have a path that does not match any of the existing paths could also very quickly be marked as unknown as soon as the first image is processed for display -- in addition to comparing it with a list of known UUIDs.

Does all of that make sense?

Offline Kirk Baker

  • Senior Software Engineer
  • Camera Bits Staff
  • Superhero Member
  • *****
  • Posts: 24217
    • View Profile
    • Camera Bits, Inc.
Re: Request to add the capability to utilize an image's catalog status.
« Reply #3 on: December 02, 2021, 11:53:58 AM »
Quote
Close, but not quite.  The catalog has a UUID for each image known to the catalog.  The actual path is known to the local Catalog State database (the one that sometimes needs reintegration) and the UUID is mapped to a path when it's time to show the image.

So, when a contact sheet is displayed, how/when does the system determine if the image is online or not and map the path to the UUID?

If the image came from the catalog, the mapping of the UUID to the last known path is always performed as the metadata for the image is streamed from the Catalog server to the Photo Mechanic Plus client (they usually both run on the same system.)

and it sounds to me like it makes that determination by querying the path to the image file for each image as it is initially displayed on the contact sheet when the contact sheet loads, as the contact sheet display process iterates through the list of images to be displayed.

So, is the process something like this:
  • I request to load a group of images a contact sheet through one of multiple methods currently available
  • The DB or some other source (like the filesystem) is queried to build the list of images to be displayed
  • A list of images to be displayed is built
  • The contact sheet tab and UI elements are displayed, with placeholders for each image in the list of images to be displayed
  • As that happens, I can start scrolling through the contact sheet or not
  • As image placeholders are displayed in the viewable area of the contact sheet, sources including the DB and filesystem are queried about each image to be displayed in order to retrieve the data about the image that the user has requested to be displayed. Then, the thumbnail for the image is displayed, and then the relevant retrieved image data is displayed for that image thumbnail.
  • This process seems to continue until the entire list of images to be displayed has been iterated through
  • if, at any point I scroll past the images that have already been loaded, the system either:
    • immediately requests and displays the thumbnails for the images newly scrolled into the viewable area and then each image's relevant data
    or
    • waits till the entire list of images has been iterated through and all relevant data has been loaded and then displays the image thumbnails and data for the images newly scrolled to in the viewable area
    • or some combination or some other behavior


You mostly have that right, some things happen in different order and some things happen differently depending on whether the image can be accessed locally or not.

That would also explain why I see the dots change from unknown to green or yellow when scrolling through a newly opened contact sheet.

It seems to me that when displaying a contact sheet of images that are cataloged, you still pull some amount of metadata about the image from the filesystem while iterating through the list of images to display. I wonder how much of that information could be pulled into the catalog and indexed to cut down on filesystem calls during that iteration and speed up loading the contact sheet when a filesystem is unavailable.

All of it is put into the catalog when the image is cataloged, and all of it is used from the catalog when the images aren't locally available.

No wonder you're concerned that sorting/filtering would be slow for a large contact sheet with files from many different locations.

That must already slow down contact sheet display for images on a NAS or some other slower storage location.

Indeed it does.

Also, when PM+ is running, is it holding in memory a list of all known UUIDs?

No.  They're in a database (the local catalog state database).

How or when does it compare a displayed image to it's known UUID to see if it is known or not?

Only during the UUID to local path mapping that happens once per image as they're delivered.

Or is it an indexed query that can still be run very quickly even when iterating through a large list of images?

Definitely indexed.

Is there a way to, perhaps, remember (dynamically build a list of) the folderpath and parentfolders up to the root of the path to an image as each image is displayed on the contact sheet, and then compare each subsequent image to the dynamically built list of known "online" or "offline" paths, so that if an entire NAS or directory tree is unavailable, for example, any image under that path would very quickly be able to be marked as "offline" without having to be queried for each sort request?

Yes, that information could be cached, but then there could be a caching problem due to the dynamics of file systems.  Should the cache be the final arbiter of this information forever?  What if the NAS is brought online?  What if a folder on the NAS is moved or renamed outside the purview of Photo Mechanic Plus?

Cache coherency is one of the biggest issues in making fast and accurate applications.  Flush the cache unnecessarily and you've lost speed.  Keep the cache around too long and you've lost accuracy.

Does all of that make sense?

Yes, and now that you know all of that information, you'll be well on your way to writing your own catalog system. ;)

-Kirk

Offline ejhutch

  • Newcomer
  • *
  • Posts: 45
    • View Profile
Re: Request to add the capability to utilize an image's catalog status.
« Reply #4 on: December 03, 2021, 03:07:07 AM »
Kirk,
Quote
Yes, and now that you know all of that information, you'll be well on your way to writing your own catalog system. ;)

Certainly! And I'll have it completed an over to you for implementation after the weekend! ;)

Seriously, though, Thanks for taking the time to understand and respond to my questions.  I had a feeling that your answers would be something like what they were.

Quote
I agree up to the point of knowing if a file is actually on-line or not.  The filesystem has to be queried for each file being tested.  On a local SSD, those queries are very fast (<1ms), but on a NAS, those queries will take many milliseconds each.  The time to complete these queries will contribute to the time to sort a given contact sheet [...] The sorting/filtering idea can be accomplished though it won't be as high performance as a query to the catalog database only.

Then, hopefully, when it can be implemented, it can be done in such a manner that a user might be warned of the impact on large contact sheets with images residing on slow storage.  I can certainly see from your design, and I and everyone appreciate the lengths you go to in order to keep the system as speedy as possible given the heterogeneous nature of user systems that PM+ runs on.

Quote
Yes, that information could be cached, but then there could be a caching problem due to the dynamics of file systems.  Should the cache be the final arbiter of this information forever?  What if the NAS is brought online?  What if a folder on the NAS is moved or renamed outside the purview of Photo Mechanic Plus?

Cache coherency is one of the biggest issues in making fast and accurate applications.  Flush the cache unnecessarily and you've lost speed.  Keep the cache around too long and you've lost accuracy.

Yes. and some filesystems and OSes have started to address those issues better, but it's definitely tough to make that call for applications like PM+.

Thanks again for your open and detailed responses, and now that I have a much better understanding of how PM+ works, I'll tailor future questions more appropriately so you won't have so much to read and answer from me. :)