I'll come forward and admit it, my process is one of those which would be considered very aggressive, but there is a good explanation for it. It takes the data in ComicRack that was scraped with the ComicVine scraper and queries ComicVine looking for missing issues in a collection. There are built in scripts for ComicRack that will find gaps in volumes, but nothing that will find missing issues in a collection at the end of a volume or if "issue number" is something other than a number. This is especially useful for volumes that have been on hiatus for a while and suddenly become active again and those released infrequently/irregularly. For this you need to query ComicVine to get a current list of issue numbers in a given volume and compare it against the local collection. It's an extremely useful tool for active collections, but can generate A LOT of queries to the API, especially if a user is requerying the entire collection. This is the reason. At the time I wrote it, the ComicVine volume ID was not stored in ComicRack in any usable way so I had to query every issue in order to find which volume to place it. I store select volume information, basically title, start year and the associate issue numbers, in local cache files and it defaults to incremental updates which is typically not a huge amount of querying. I also suggested people only do full updates every 6 months or so because it's simply overkill to do it more often.
Some time over the last year custom fields were implemented in ComicRack and now the Scraper is storing the volume ID for each issue. Everything that does not have this field will unfortunately have to be rescraped to create these custom fields, but this is a one time pain. For a large collection, this change will cut the number of required queries for a full build down from 10s of thousands to potentially only hundreds (Essentially, the number of volumes). I've been looking at reducing this further by using the last modified date on the volume and skipping volume queries for those that have not been updated in a given amount of time. This has proven to be problematic, but it's still on my radar. I've now implemented an internal speed limit so my users can define a delay between queries, lessening the load on ComicVine and my users will have to get their own API key. I'm hoping to complete changes and release the new version within the next week or so, at which point I'll likely ask to have my current API key disabled and changed in order to force people to get their own. Like previous incarnations of the scraper, my key is hard coded in to the process.
I also support the 800/60 limit as I think this would be easily sufficient for most of my users. Even for several thousand volumes that might be in extremely large collections, this is manageable if they add a few seconds of delay between queries to lighten the query load. While I don't use it much anymore myself and I have no idea how many people do use it, I don't think many, I continue to maintain it for those few who do.
Log in to comment