arkay74's forum posts

Avatar image for arkay74
#1 Posted by arkay74 (100 posts) - - Show Bio

If anyone would like to hack his scraper, then commenting out a line of code makes the search work much better. Seems that putting an " AND " between each term is what causes problems.

I commented out line 66 in C:\Users\<your username>\AppData\Roaming\cYo\ComicRack\Scripts\Comic Vine Scraper\cvconnection.py:

Changing from:

searchterm_s = " AND ".join( re.split(r'\s+', searchterm_s) ); # issue 349

to:

# searchterm_s = " AND ".join( re.split(r'\s+', searchterm_s) ); # issue 349

It doesn't work as well as it did before, but at least now I don't have to come here and do a query for every 3rd book or so.

Avatar image for arkay74
#2 Posted by arkay74 (100 posts) - - Show Bio

@pikahyper: Always depends on how deep said new search engine is integrated. If it's being used by the website and the API then it makes sense :)

Avatar image for arkay74
#3 Posted by arkay74 (100 posts) - - Show Bio

@pikahyper: Thanks, that's good to know! I will take a look.

Avatar image for arkay74
#4 Posted by arkay74 (100 posts) - - Show Bio
Avatar image for arkay74
#5 Edited by arkay74 (100 posts) - - Show Bio

Seems to be the same problem with colons, if you enter a term that follows it you are not getting results back.

If you search for "Avengers: X-Sanction" and enter "Avengers X-Sanction", the search fails. With only "Avengers" it works.

A more complicated example for "Angry Birds Quarterly: Monsters and Mistletoe"

The API query sent by the scraper:

angry AND birds AND comics AND quarterly AND monsters AND mistletoe -> 0 results

angry AND birds AND comics AND quarterly -> 1 result, wrong volume, not "Monsters And Mistletoe"

Through the wiki search:

Angry Birds Quarterly Monsters and Mistletoe -> Correct entry Found

Avatar image for arkay74
#6 Posted by arkay74 (100 posts) - - Show Bio

Half of the stuff it doesn't seem to find at the moment, for example plenty of Spider-Man books (Amazing Spider-Man Epic Collection, etc.). When I enter the same search query on the webpage I found those books without any problems.

So as a I workaround I have to come here, retry the seach on the webpage and then paste the resulting ID into the CV Scraper search dialog to find the entry.

Any ideas what changed?

Avatar image for arkay74
#7 Posted by arkay74 (100 posts) - - Show Bio

I am experiencing something much worse. Whenever I use the ComicVine scraper I can't use the site anymore. Seems like my IP address gets immediately blacklisted on the whole CV site because nothing else works here anymore, I keep getting timeouts whenever I try to load the page or the forum. If I change my IP the site is working again until I try to scrape a comic book, then it stops working again. Really weird.

Avatar image for arkay74
#8 Posted by arkay74 (100 posts) - - Show Bio

It has been like that for months now. Usually it's off by 1, I have seen cases where it's more though.

Avatar image for arkay74
#9 Edited by arkay74 (100 posts) - - Show Bio

Limiting on the server side is doable, constantly asking the API users to change the software just makes no sense. Next week you are going to tell them that now it should be 2 seconds or that something else needs to be done. Taking the matters in your own hands makes it tweakable. You wouldn't tell the website visitors to only click on 3 links per minute, would you? Same thing. Limit or queue it on your side.

What do you expect to gain from the 1s delay? It is going to take us longer to get our books tagged. Hence more users will be online at the same time which--again--is going to increase your load. This isn't a solution, you are just shifting the problem. You need a good strategy and not trial-and-error development. Load & performance tests don't hurt.

Avatar image for arkay74
#10 Posted by arkay74 (100 posts) - - Show Bio

@clearmist: That's how I would have implemented it as well. The resource restrictions are on the server side and therefore should also be enforced there _in a sane way_. Otherwise the client applications will have to be modified each time the policy changes (as it has been for years now) and that just doesn't seem right.