Avatar image for rick
#1 Posted by rick (119 posts) - - Show Bio

Many of you have noticed that our API rate limiting is stifling to put it mildly. We heard you and we, yet again, changed the way we limit API use. You'll like this one we're sure...

Previously:

There's a limit of 450 requests within a 15 minute window. If you go above that you're temporarily blocked. You can make all those requests within anywhere from 1 second to 15 minutes.

Now:

TL;DR: Space out your requests so AT LEAST one second passes between each and you can make requests all day. Go even a millisecond faster and you'll hit a brick wall REALLY HARD.

There is no limit of the number of requests. You are limited to how often you can make requests. There are no hard numbers in this, its more of a throttling algorithm that will restrict aggressive apps and reward those that are well behaved. If your app spreads out requests to at most one per second you will not have any problems and can make requests 24/7. If the time between requests is less than 1 second you will be restricted and the more of these requests you make the more likely you will be blocked and proceeding amounts of allowed requests will dramatically drop.

Avatar image for arkay74
#2 Edited by arkay74 (92 posts) - - Show Bio

Yeah, that won't work too well at first since the ComicVine Scraper does more than 1 request per file as far as I know and there's only a way to set a delay between files in ComicRack (even 5 seconds seems to be too low). So there are probably code modifications necessary until you see a better behavior at your end.

Avatar image for arkay74
#3 Posted by arkay74 (92 posts) - - Show Bio

Even with a 15 second delay I am stuck on my last 10 files and keep getting the limit error message.

It wouldn't have hurt to let the people using your API know beforehand that this was coming? As a software developer I find this very, very odd.

Avatar image for clearmist
#4 Posted by Clearmist (37 posts) - - Show Bio

I second your opinion, arkay74. However, I sympathize with how restricted CBS is in giving their developers enough resources and time to work on comicvine.com. In that situation I understand how edgework would jump in to the latest task and try to complete it quickly without much thought to the proper way of doing things.

arkay, you are completely right about the ComicRack scraper performing multiple connections under one second. First it's the search then a scrape for the currently selected cover, third a scrape for the issue metadata (if you select the issue within one second of the initial search). cbanack said he's abandoned the project. I do not know if he'll do the required work and make his plugin wait one second between all connection requests.

Avatar image for rick
#5 Posted by rick (119 posts) - - Show Bio

@arkay74: Yes and that's been a problem. Comic Vine's database usage is 3-5x that of GameSpot and GameSpot has 10x the users. The reason for this are the scrapers. They are affecting the site quite negatively. The current algorithm will allow for people to run these scrapers without affecting the rest of the site. We've got to find a balance here...

Avatar image for marv74
#6 Posted by Marv74 (7 posts) - - Show Bio

SCRAPE_DELAY=15 will solve this? I got over 60,000 comics to scrap, I tried to let it all on auto during the night, only to find out the computer halted on it on about 200 comics :/

Avatar image for hyperspacerebel
#7 Posted by Hyperspacerebel (8 posts) - - Show Bio

When you say you'll hit a brick wall, does that mean if a request is made < 1 second after the last you'll return some sort of access denied response, or that the response will be delayed a second or two to keep them in line? The latter would be wonderful, work well with existing tools, and save you guys the hits. The former however, and what I'm assuming is the case, still leaves the users unable to do much until the tools are updated. And then when you guys update your policy again in 6 months, all the apps will have to be updated again. Whereas if it's on your end, you just set the delay to whatever you want, and everyone continues to automatically follow the rules.

Avatar image for marv74
#8 Posted by Marv74 (7 posts) - - Show Bio

SCRAPE_DELAY=8 seens to be okay. I was using 7 as instructed somewhere here on the forum, but that was not good anymore.

Avatar image for rick
#9 Posted by rick (119 posts) - - Show Bio

@hyperspacerebel: Think of this in terms of general relativity. The faster you go the more mass you'll have and the Higgs field is going to drag harder on you. The algorithm is exactly based on that. You will not be allowed to go past the cosmic speed limit. (Sorry I won't say what that is so as not to feed into gaming the system). Suffice to say >= 1 second wait in between requests and you will never have a problem.

Avatar image for hyperspacerebel
#10 Edited by Hyperspacerebel (8 posts) - - Show Bio

That is not what I'm asking. In any way shape or form.

Look, 99.99% of the people hitting your api are using a 3rd party program, and do not have control over whether the application follows your stated but not-programmatically-enforced rules. What I'm asking is if I, or any other user, who is using these applications, is going to get banned for using the programs without knowing whether they are hitting the api too many times or not? If your webserver has a basic request limiter (that any competent nginx or apache person could do in literally 2 minutes) that will automatically make sure all requests from an ip address stay within the 1 second rule, then everything is fine and dandy: all of us people using 3rd party scripts can rest easy knowing whatever limits you impose will just happen, regardless of how the app is programmed to work. If, on the other hand, you're simply tracking requests and summarily banning/restricting people who break the 1 second rule, then that's a bit of a problem for 99.99% of the people hitting your api, because they have NO IDEA whether they are breaking any rules or not (the code of the app is often gibberish to the person using it), and they won't be able to control or even know to control their usage.

I really want to stress that literally 99.99%+ of the people hitting the api are not programmers and don't know if the app they are using is breaking arbitrary rules or not. It's literally better for all parties involved if your system can proactively ratelimit things to whatever rule you want (1 r/s in this case) and not rely on the script writers to do that, who may or may not be following that rule and just pass the program and the penalties onto the users.. If that is the case already, great.

Again, I'm going to reiterate it, because I keep getting the impression that you don't understand it: when you instruct us that "Suffice to say >= 1 second wait in between requests and you will never have a problem", that is useless advice for 99.99% of the people hitting your api. They went to a website, downloaded a program, and use it to tag their comics. They have no control over the way the program was written to hit the api, and most of them wouldn't know how to fix it if it was written in a way that was hurting you. So, what systems are in place for a normal user to not get banned/restricted? How can I, as a regular user of tagging programs, ensure that I am following your rules and making you happy and am not going to be punished?

Avatar image for johnkfisher
#11 Posted by JohnKFisher (1 posts) - - Show Bio

Hyperspacerebel is 100% correct. Is there not a way to rate-limit on your side so that everything just works for everyone, INCLUDING ComicVine?

Avatar image for cbanack
#12 Edited by cbanack (118 posts) - - Show Bio
@edgework said:

Suffice to say >= 1 second wait in between requests and you will never have a problem.

This does not appear to be working as you describe. I have adjusted my app to ensure that it NEVER talks to api.comicvine.com more than once every 2000 ms, and after a short while I still get blocked with the 'slow down cowboy' error message (i.e. the API accessed too often problem.) The only difference is that now I am not blocked for very long; if I wait a minute and try again, I am able to access the API again. But then a few minutes later, I am blocked again.

Several other contributors to the project have independently tried to make the same change that I made, and have had similar results (i.e. it doesn't work).

(And yes, I did search the code very carefully to make sure there isn't an API call I'm missing somewhere.)

Avatar image for cbanack
#13 Edited by cbanack (118 posts) - - Show Bio

FWIW, I also agree with the other commenters in this thread; evening out the load on your server(s) is your job, not mine. Using an arbitrary timeout that you expect API users to figure out and follow (on penalty of being beaten with the ban-hammer) is a very atypical way to offer a web API.

This isn't a matter of 'badly behaved' and 'well behaved' applications. When a typical software developer tries to use a web API conscientiously, he or she is worrying about the volume of requests that are being generated, not the timing of those requests. It is generally assumed that the server will queue up requests as necessary if too many happen to come in at (nearly) the same time.

Avatar image for clearmist
#14 Posted by Clearmist (37 posts) - - Show Bio

I second @hyperspacerebel's post; including his implication that @edgework is simply not reading or understanding the posts in this thread. Just look at post #5: edgeybaby replied to @arkay74's first reply, but not his second.

There are two long-term solutions: Inserting a connection delay at the web server (Apache) level or pestering Chris over at the comicbookdb.com to finally write an API.

Avatar image for arkay74
#15 Posted by arkay74 (92 posts) - - Show Bio

@clearmist: That's how I would have implemented it as well. The resource restrictions are on the server side and therefore should also be enforced there _in a sane way_. Otherwise the client applications will have to be modified each time the policy changes (as it has been for years now) and that just doesn't seem right.

Avatar image for rick
#16 Posted by rick (119 posts) - - Show Bio

HTTP error codes 420 and 429 are meant specifically for this. (I don't know why we don't use them, that's on me I thought we did). But most API services will rate limit requests. It is up to the API user to limit themselves. For the server side to do it we'd need to keep a bunch of threads and TCP/IP connections stalled as we queue and process the requests. That's something we're not going to do...

BTW I do read your posts, I just don't really have the bandwidth to address everything specifically. I'm going to look over logs this weekend to see if there's anything we can adjust here. But the current scheme will be the permanent scheme.

Avatar image for castleage1974
#17 Posted by castleage1974 (1 posts) - - Show Bio

I'm not a programmer, I'm just a user, but it's fairly obvious that "the current scheme will be the permanent scheme" is PR doublespeak. It's broken now. The current scheme can't be permanent if it's broken.

Avatar image for cncboy
#18 Posted by CnCBoy (9 posts) - - Show Bio

@marv74: I tried with 10 and got blocked again. I set it to 15 one hour later. I will see if it works correctly.

Avatar image for arkay74
#19 Edited by arkay74 (92 posts) - - Show Bio

Limiting on the server side is doable, constantly asking the API users to change the software just makes no sense. Next week you are going to tell them that now it should be 2 seconds or that something else needs to be done. Taking the matters in your own hands makes it tweakable. You wouldn't tell the website visitors to only click on 3 links per minute, would you? Same thing. Limit or queue it on your side.

What do you expect to gain from the 1s delay? It is going to take us longer to get our books tagged. Hence more users will be online at the same time which--again--is going to increase your load. This isn't a solution, you are just shifting the problem. You need a good strategy and not trial-and-error development. Load & performance tests don't hurt.

Avatar image for marv74
#20 Posted by Marv74 (7 posts) - - Show Bio

@cncboy: I tried 7, 8, 10, 12, 15, 25... all eventually fail. Don't know what to do anymore, this was a VERY useful resource and now it's ruined...

Avatar image for cbanack
#21 Posted by cbanack (118 posts) - - Show Bio

@marv74: @cncboy: Changing the scrape delay is not going to help you, because that is a per-comic delay, not a per-api-request delay. Just be patient. There will be a new version of Comic Vine Scraper that does not violate the new 1 second rule. I'm just waiting to hear back from @edgework about a bug that I am experiencing with it, and once it is working properly I'll release it for everyone. Keep an eye out for it over on the ComicRack forum.

Avatar image for marv74
#22 Posted by Marv74 (7 posts) - - Show Bio
Avatar image for roboman
#23 Posted by RoboMan (2 posts) - - Show Bio

Regardless of the rate limitations imposed, these limitations need to work. Although the comicvine scraper may well exceed the 1 request/sec limit imposed, the implementation of the rate limitation appears fundamentally broken.

Testing using a simple C# app implies that keys are arbitrarily blocked if a non-specified amount of activity is recorded, and blocked on specific requests rather than generally.

e.g. Calling the request http://www.comicvine.com/api/volume/4050-XXXX with a measured 4 second gap between requests resulted in a persistent 'slow down cowboy error' -- this error only appears on this call -- API requests for issue or issues are unaffected.

Avatar image for roboman
#24 Posted by RoboMan (2 posts) - - Show Bio

New limits are MUCH more restrictive than previously. It appears that the rate limiting of 1/sec, now has an hourly limit of 200 requests per specific API resource. The hourly limit only resets after 60 minutes of inactivity on all API resources. i.e. single call to Issues will reset the waiting period for any resources that have been used to reach a zero counter.

This is definitely not what has been stated above. Although comicvine have the right to restrict access as they see fit, they should clearly state what these restrictions are.

Avatar image for cncboy
#25 Posted by CnCBoy (9 posts) - - Show Bio

@cbanack: Thank for the info and for the work you put on this. I used API in my daily job and I know it can be really frustrating sometimes.

Avatar image for rick
#26 Edited by rick (119 posts) - - Show Bio

@cbanack@cncboy@robomanetc

Sorry... At first I was all

No Caption Provided

...and then, while trying to prove a point I found out that a developer (who I won't name so you guys don't string him up) left a debug value of 5 seconds as the minimum space in between requests, then I was all

No Caption Provided

We'll get fix live on Monday. Sorry we can't do it now, there's a code-freeze this week,

...and I'm sure many of you now are all

No Caption Provided

but when this goes live everything will be

No Caption Provided

and you'll all be

No Caption Provided

Just remember, There are no bugs in the Comic Vine API

No Caption Provided

No Caption Provided

Avatar image for theotherjasonf
#27 Posted by theotherjasonf (1 posts) - - Show Bio

Great news! Thanks @edgework for taking the time to take a close look.

Avatar image for cncboy
#28 Posted by CnCBoy (9 posts) - - Show Bio

@edgework: There is no bug, only undocumented features. :-)

I have a subgestion about the API. In an application of mine, we had a method in the API returning a version number.

The implementing program could check the version to see if it still matched the hardcode version value they expect to use. Of course the api user developper could choice to not implement a check of the API version, but he choose to do it, it can offer to the user to stop using the program or do it anyways. An API version check leave a chance to the api user to avoid attacking an API the wrong ways.(Sorry if I butchered english a little, it isn't my prime language).

Avatar image for fieldhouse
#29 Posted by fieldhouse (3 posts) - - Show Bio
Avatar image for rick
#30 Posted by rick (119 posts) - - Show Bio

Uh, the fix didn't make it into today's release but it is for sure queued up for tomorrow. Please refrain from stalking me over this...

No Caption Provided

Avatar image for marv74
#32 Posted by Marv74 (7 posts) - - Show Bio
No Caption Provided

And I still get the message and can't scrape... what's up?

Avatar image for solidus0079
#33 Posted by solidus0079 (3 posts) - - Show Bio

@marv74 said:
No Caption Provided

And I still get the message and can't scrape... what's up?

Are you using the latest Comicvine Scraper version? It was updated recently to work with these API changes.

Avatar image for tglass1976
#34 Posted by tglass1976 (15 posts) - - Show Bio

@marv74 said:

And I still get the message and can't scrape... what's up?

Did you install the new version of CVS that was released over the weekend?

Avatar image for marv74
#35 Edited by Marv74 (7 posts) - - Show Bio

@tglass1976@solidus0079Yes, i got it already. Oddly, over the night it went WAY over 200% and didn't stopped... any relaxed rules for late night scrapping?

Avatar image for cncboy
#36 Posted by CnCBoy (9 posts) - - Show Bio

@marv74: I asked myself the same question. I check the API stats and stop when I see I reach the limit since I suppose I am expected to stop. Nothing appear to stop the scraping here.

Avatar image for marv74
#37 Posted by Marv74 (7 posts) - - Show Bio

@cncboy: That's fine by me, I let the computer on all nite and got near 6000 scraps done when I woke up. Not bad :D

Avatar image for brokenxwing
#38 Edited by brokenxwing (1 posts) - - Show Bio

@rick: I realize this was a long time ago when you posted this, but this seems to be not true anymore. Also the bans seem insanely long. Like at least 24 hours if I go over too many requests in a second. I've waited nearly 24 hours already and still NOTHING. Also my API limit says it's fine that there's no limit so what's up with that? Why are you guys becoming so restrictive with this stuff? Over the last few years it seems you've gotten more and more restrictive. What's the deal?

Why are you guys so hell bent on limiting the ability of users to EFFICIENTLY make use of your database? I mean this guys program is amazing. It's abilities are just unparalleled I'e tried manual copying of information from the site. Do you have ANY idea how bad that is? It took HOURS to update less than 100 issues. Like were talking half a day maybe. And it wasn't even able to get all the information because it was too much work to do it.

Not to mention it would require me to literally open EVERY SINGLE issue in my browser, one by one, then I'd have to go through the atrocious pages trying to copy the titles which are hyperlinks and therefore not as easy to copy and past. I just don't understand, why do you even have a database like this if you aren't going to allow the people who want to USE IT do to so in the best ways?

I'm completely ignorant on APIs. No essentially nothing about them, but from reading the thread you had with the creator of CVS it seemed you had little interest in making your API better let alone allowing more access to it. Why is it the consumers job to worry about hitting your arbitrary new API limits that get more and more restrictive and less and less able to be effectively used. The fact that you haven't even made ANY new posts in over a year about your updates is disconcerting as well...

So my point I guess was what is the current API limit, how long is the ban if you exceed it, and why is it considered "malicious" in the first place if you accidentally do so? I was fine when I was using CVS but I used a different program for a bit called ComicTagger cause it's a better at remembering the thumbnails of of covers after they're loaded once and is better at changing the file names based on the metadata of the files. But it seems to not have a setting for limiting the number of scraps to at least 1 per second. It seemed it got me banned MUCH quicker and for 24 hours at least. Is there a way to get this ban removed now?

Avatar image for pikahyper
#39 Posted by pikahyper (16939 posts) - - Show Bio

@brokenxwing: rick hasn't worked here for a very long time, he is no longer employed by CBSi.

Moderator
Avatar image for imawindev
#40 Posted by imawindev (6 posts) - - Show Bio

@pikahyper: But who does then? Radio silence for months :(

Avatar image for pikahyper
#41 Posted by pikahyper (16939 posts) - - Show Bio

@imawindev: There's really only one engineer left and he has to work on multiple sites.

Moderator
Avatar image for imawindev
#42 Posted by imawindev (6 posts) - - Show Bio

@pikahyper: Thanks. So the API is abandoned. It doesn't make any sense to use it for my new project. Too bad!

Avatar image for pikahyper
#43 Posted by pikahyper (16939 posts) - - Show Bio

@imawindev: oh it's not abandoned, most of the people that use the API don't seem to realize that this is a wiki, the site itself is the priority, the API is just something that we offer so that developers can utilize the wealth of data that is available on the site, it is offered as is though. Right now the engineers might be short handed but they are still working on the site, there is a new version of the wiki platform in the works and it is all new, from scratch I believe, but since they are short handed it is taking longer. For Comicvine specifically the new platform and keeping the site running smoothly is the priority but I know they still have more plans for the API we just don't know when that will happen unfortunately and for now the engineers are focused on their work instead of chatting on the site so that's a good thing, hopefully they will finally hire some more engineers though and the updates can speed back up.

Moderator
Avatar image for imawindev
#44 Edited by imawindev (6 posts) - - Show Bio

@pikahyper:Thanks for the insight! The main problem is that the rate limiting is way to restrictive and there is no current information on how it works exactly and no answers to questions about it.

The following is what I would like to do but I have no idea if this is okay with the current rate limiting. According to postings in this forum it's not but this was a long time ago so I just don't know. Here we go:

I need to get the details for all issues in a volume. Sometimes the volume contains just one issue, but most times it's way more, e. g. The Waling Dead has 164 issues. First, rate limiting won't let me call /api/issue/ 164 times at once and second, if there has to be a delay between the calls it will take forever. How am I supposed to do this?

Avatar image for pikahyper
#45 Posted by pikahyper (16939 posts) - - Show Bio

@imawindev: I have no experience with API's but from what was laid out in the original post it seems like you just have to delay the requests so you don't have more then one per second, that doesn't seem like much of a problem

Moderator
Avatar image for imawindev
#46 Posted by imawindev (6 posts) - - Show Bio

@pikahyper: But 164 calls will almost use up the rate limiting of 200 calls per hour and adding a delay of one second means getting the details for all issues will take almost 3 minutes. No user will accept that.

Avatar image for pikahyper
#47 Edited by pikahyper (16939 posts) - - Show Bio

@imawindev: three minutes isn't that bad for that amount of data, the API is a free resource and users of these third party apps need to be grateful for what can be given and use it in moderation as we don't have the resources to allow more than is available, without these limitations the API calls take down the entire site regularly because too many third party users take the service for granted and flood the API with requests, we did it for a little while and it sucked.

Moderator
Avatar image for imawindev
#48 Edited by imawindev (6 posts) - - Show Bio

@pikahyper: A simple solution would be to add a method to the API which gets the details for all issues in a volume in a single request.

Avatar image for pikahyper
#49 Edited by pikahyper (16939 posts) - - Show Bio

@imawindev: It wouldn't be ideal though as it would run into time out problems, we have volumes with thousands of issues in them. In its current form the API is semi basic and can't handle complex or time/resource intensive operations, it is still an information wiki, third party apps have just chosen to use it as the backbone for their collection applications (as free alternatives for pricey collection apps that can afford to have more resources), the API may be able to handle small collections but it can't handle large ones. I doubt the API was developed with collection software in mind, let alone multiple applications:

Now that you've found this motherload, do the right thing. Don't just steal it and build some crappy collection app.

- from the API page

Moderator