Cloud Four Blog

Technical notes, War stories and anecdotes

Mobile Device Detection Results

In a previous post I presented the performance numbers for mobile device detection using WURFL and Device Atlas. Our numbers were generated on a data set consisting of 1,572 unique user agents we collected as part of our mobile device concurrency test.  At the time I was looking for information about the quality of our results – did we have accurate classification of mobile devices for the purposes of our test?  I unintentionally wandered into performance for a while, but now I’m back onto the quality thing.

Here, once again, are the results for processing 1,572 unique user agents:

Method Time (seconds) Mobile Non-Mobile
WURFL Old API 1082 711 861
WURFL New API 20.8 1090 482
Device Atlas 1.2 527 1045
Mobile Device Detect 1.3 684 888

As you can see, there are obviously some differences between results for each detection method. Interestingly, both the old and new WURFL API use the same device database (wurfl.xml file), but yield considerably different results. So, what’s going on here? Is there a clear quality winner in the group?

For the time being, we are using the old WURFL API as our classifier. The new WURFL API is giving us a lot of false positives – many different user agents are mapping to the wireless device “amoi_e72_ver1,” which could be some kind of fall-through condition. It’s possible that I’m doing something wrong, but the API usage is pretty simple: instantiate a class and call a method with the user agent as a parameter.

Device Atlas is interesting to me and it’s quite possible we’ll be shifting to that for device detection. It’s certainly much faster than WURFL in our environment. While there are a few misses, there is really just one issue I’m seeing, and that may not be a real issue at all. The Opera Mini user agents are not being classified as mobile. Is this because we’re talking to a transcoder? Or is it a mistaken classification? I’m not sure, but the other methods all classify these user agents as mobile devices. Take this user agent for example. Should this be classified as a mobile device? Device Atlas says no; WURFL says yes.

Opera/9.50 (J2ME/MIDP; Opera Mini/4.0.10406/298; U; de)

The mobile device detection script actually does a pretty good job. Rather than use a comprehensive user agent database, this script looks for specific, known string fragments in the user agent string. That’s also a bit of its undoing though. For example, I’m seeing a bunch of false positives for user agents containing the word “Java.” I think these false positives could be eliminated by also checking the HTTP_ACCEPT header for mobile content types.

Java/1.6.0_05

So the implication here is that you need to be aware that device detection is not guaranteed to be 100% accurate. User agent strings are highly variable and non-standardized, so there’s a bit of artistry here.

Resources

19 Comments on “Mobile Device Detection Results”

  1. Hi John,
    good comparison and of course very pleased to see that you are confirming our speed tests of the PHP API and in fact with compiled languages such as Java and .NET you can get much faster.

    OK, enough of the sales pitch and self-promotion. ;)

    The Opera Mini is a tricky one, if you ask Opera they will tell you you should treat the Opera Mini users are “desktop” users, because that’s the quality of the experience they will get thanks to the excellent browser they provide.
    If you ask someone else, I’m sure you will get mixed answers.

    My personal view is that it’s mobile and it doesn’t matter if the quality of the experience is similar to what you can get on a desktop PC, what matters is the context and when I’m using Opera Mini my context is different from when I’m sitting at my desk as I am in this moment. Of course, what I just said contradicts what DeviceAtlas says, we’re working on it and hopefully DeviceAtlas and I will soon agree. ;)

    Back to our API, I am not sure what you are logging in your test platform, but if you look at the phpDoc that comes with our API you will see early on that we recommend to look for other headers when you see Opera Mini.

    We have also made a step further as we understand developers don’t want to bother about the intricacies of HTTP request headers, we have a new API in the works that should take care of most if not all of these issues plus increasing the accuracy. You can get a preview on our Web site, here. It’s a Beta and we know it’s not perfect, in fact we are preparing an update to it that is very interesting, I think. Of course, I’m just an HTTP headers junkie, so I can’t promise it will be AS exciting for everyone, but for sure it’ll offload developers from some of the complexities and peculiarities of mobile.

  2. John Keith says:

    Hi Andrea,

    Thanks for the doc reference! I couldn’t remember where I’d read about the alternative headers for Opera Mini, but now I know where to go look again. I tried googling for it this week, so thanks for pointing me in the right direction :)

    And the beta idea sounds like something to follow up on too. I’ll take a look here in a couple days as we get ready to release everything.

  3. Luca Passani says:

    Keith, I think that your test is flawed with some serious issues. Here they come:

    – the WURFL API (both the old and even more the new one) do a lot of computations to avoid false positives. But those computations are *cached* (and I repeat *cached*), which means that they will perform blazing fast from request number 2 and on for all the one million requests that come after that. Of course, this is not a bug. It’s how the API is built in order to give developers the level of control and configurability they may find they need along the way without compromising performance. In short, your comparison is not fair. It’s more than unfair. It’s totally flawed.

    – as far as false positives are concerned, are you using the Web Patch when you run your tests? the web patch is an additional mini-WURFL which can integrate WURFL knowledge and teach the framework how to recognize web browsers. Most people are not using WURFL to match web browsers, so they will use WURFL without the web patch. Those who do, will just need to include the web patch. Of course, if you throw web browser UA strings to a WURFL installation without web patch, you will get false positives. If you did not use the web patch, your statement about false positives is misguided.

    – Opera Mini is typically running on a mobile device. WURFL will treat opera mini as a mobile device by default, but will also tell you (thru the “unique” capability) that the UA string is not telling you the whole story. This gives developers who care a chance to do the extra effort to figure out the real device (the “transcoder_ua_header” capability will give you a hint about where to look).

    Also, what about transcoders? did you know that the WURFL APi goes out of its way to protect your site from transcoders?

    Now, please re-run your tests after you have fixed them and blog about the result. The WURFL API contains years of experience in analysing UAs of different kinds, we used a lot of time to make it good, fast and flexible for all needs (in addition to being open source). Your results are not a reflection of the reality of what the WURFL API can deliver.

    Luca

  4. John,

    nice to see the comparison, but I’d be interested more in accuracy and methodology used. Also, it might be interesting to see how MDBF (http://mdbf.codeplex.com/) performs (MDBF is compiled from different sources, dotMobi and WURFL among them).

    I used to perform a lot of such test in the past, but the main focus was quality — it doesn’t mean much if the API is brutally fast, but gets the wrong data.

  5. Chris Abbott says:

    The speed comparison is pointless unless you’re comparing apples with apples: in this case I think you should go into much more detail about the deployment: for instance, which (if any) cache did you use for these? How much of the time for the test consisted of loading the initial file into cache? (WURFL takes a while to load into cache, but if it’s a persistent cache then your results would have been radically different the next time). Were you using Multicache for the first WURFL result?

  6. John Keith says:

    Hi Luca,

    As usual, you make good points (I am subscribed to wmlprogramming for a reason – saw your note). You may have noticed that we are continuing to use WURFL for our tests, so this is not an indictment of WURFL – just some experience to share, rightly or wrongly.

    We are indeed using the web browser patch, and only see the false positives using the new API. We do not see the false positives using the old API. It was just a question for me as to what I may be doing wrong, as I said.

    WURFL correctly (or so I believe) assigned Opera Mini as a mobile device; it is Device Atlas that made the decision the other way. So I think you may have misundersood me there.

    And finally yes, running 1,500+ unique UAs through is something that will not take advantage of a cache. A more typical use-case will be to operate on repeated UAs, so I get that.

    I wasn’t seeking a speed test, but given what I encountered in this particular example I found it interesting to share.

    It’s funny because I’m a strong WURFL proponent, so my intention was to talk about what we’re doing, and not to attack WURFL!

    Peace.

  7. John Keith says:

    Hi Chris,

    I agree, a little more information would be a good idea. I did indeed spend some time looking at what was happening.

    First we are using the wurfl.xml file from 2009-03-12, with the web_browsers_patch.xml file from 2009-03-06.

    For the original WURFL API, we have multicache enabled, and we’d previously hit update_cache.php to create the 13,000+ files in the multicache directory. I did run the test several times and published the best number I got.

    For the new WURFL API, we are using file based Cache and Persitance as configured in the wurfl_config.xml file. The FILE_PERSISTANCE_PROVIDER has 13,000+ files in it, and the FILE_CACHE_PROVIDER has about 3,800 files. Again, I made probably a dozen runs over the data as I tried to understand my results. The number I chose was the best run time. And the new API runtime wasn’t bad at all really. It was much faster than the old API and totally usable.

    I did not use any persistent memory caching for either WURFL or Device Atlas, and I noted that. The test performed by operating on a batch of unique, non-repeating UAs is not typical (as per Luca’s comments), but the installation is typical of the increasing number of people who are beginning to perform device adaptation.

    Cheers,
    John

  8. Luca Passani says:

    John, I am happy to hear you are a WURFL supporter. As far as I can tell, this is not obvious from your blog post, though. What I see is:

    – a test which depicts the WURFL API as 20 to 1000 times slower than other APIs.

    – the suggestion that because of those results, you may be shifting to another API.

    not exactly what I would call a “strong proponent” of the WURFL API, don’t you think?

    Anyway, I hear your points about supporting WURFL, so I would love if you could:

    – re-run the tests with the second invocation of the API (in order to allow the built-in cache to kick in) and publish the result.

    – post the list of UAs which you are using for your tests, so that I can see what those false positives are (if you don’t want to post for any reason, feel free to send it to me privately. So that I can see what is failing).

    The new WURFL API can fine tune device detection at multiple levels, so I am sure we can improve on what we already have.

    thank you

    Luca

  9. John Keith says:

    Hi Luca,

    Is there a real competition between the APIs that I need to understand here? I consider switching because, for this simple test I want fast, accurate mobile vs. non-mobile discrimination. For our purposes, WURFL with the old API gives the best quality results; Device Atlas was the fastest. I look to see which tool best fits my particular need and adapt accordingly.

    I have re-run the tests, many times, before posting, but in the interest of fairness I shall do so again.

    And I’ll post the UAs here as well.

    Thanks for talking about it here.

    – John

  10. Luca Passani says:

    What wasn’t clear to me until you posted your answer to Chris was that you were talking about the PHP API. This changes a few things (caching works differently with PHP as compared to Java). I imagine that running the tests with MEMCache would not be very significant (all APIs would perform the same).

    What is more interesting to me, is understanding why the new API isn’t cutting it for you (as compared to the old one) in terms of device detection.

    I would be grateful if you could help us to get to the end of this. The new API has been released only recently, and it may need some fine tuning, yet I am confident that it will blow anything anyone may have had before out of the weeds in terms of precision and control eventually.

    Cheers

    Luca

  11. John Keith says:

    For anyone interested, here is a link to the user agents that have come to our mobile concurrency test:

    In the order we processed them while testing:
    http://www.cloudfour.com/mobile/user_agents.php

    Or sorted by UA:
    http://www.cloudfour.com/mobile/user_agents.php?sort=UA

    Regards,
    John

  12. Luca Passani says:

    John, I have asked my team to look into this. I am curious to see the results we get.
    After that, we will improve the web patch and re-run the test to see the difference.
    If necessary, we can also update the matcher strategies used by the API and go for 100% coverage. My take is that this is probably not necessary, because the bulk of unrecognised UAs is probably to be found among those spiders and bots.

    Question: are you telling me that, in your application, you cannot use any caching? why is that?

    Luca

  13. John Keith says:

    Hi Luca,

    I probably will end up adding some memory-based caching to our test environment, though with our low volume it hasn’t been the highest thing on my priority list. Now, of course, it becomes quite interesting to make the comparison. Due to the way we constructed our initial test environment, we ended up with batch processing of unique, non-repeating UAs, but going forward the plan is to perform lookups when the connection is established. I expect that caching makes much more sense now.

    The other thing we are trying to understand is what, if any, considerations need to be made by the many small website operators who are adding device adaption to their sites. Our experience with these people is that they often do not have enough control over their server environment, or perhaps lack the technical capability, to implement an effective caching strategy. This is why we are looking at the PHP APIs (most prevalent in our world), and why we are looking at constrained server environments as real world examples.

    It’s probably true that beginners need to focus on functionality over performance, which would tend to push the performance aspects to the back burner. But we do see 1) some beginners that are high volume already and 2) people who have become successful with their mobile web application and are facing refactoring work to address subsequent performance issues. Server-side caching opportunities are part of that equation, as are improved techniques for delivering web content to mobile devices (our main focus at Cloud Four).

    Cheers,
    John

  14. Hi,
    about the 1580 UA you tested, how many are mobile ?
    This is important to decide the best API that have you tested.
    If I want to decide which API to use, sure I want the fastest but, also I need know which API detects the UA with more accuracy.

    Idel

  15. John Keith says:

    Hi Idel,

    We currently have 641 UAs marked as mobile, using WURFL as the identifier. WURFL and Device Atlas both work well, with basically one difference around Opera Mini that is well understood and not an actual issue for us.

    Our results are freely available here:
    http://www.cloudfour.com/mobile/summary.php

    – John

  16. I must fill in here. We are using WURFL. Currently the old API and speed is completely ok. I do device detection on average 25ms per device. Getting the right page etc from the database takes another 25ms.

    I was trying out the new API and average detection speed was 0,5ms thanks to some zip-goodness and matching strategy improvements! Looking forward to the new .net api!

  17. Andrew Deal says:

    I too noticed how slow the wurfl API was when i implemented it last year.. I moved to the MySQL version (forgot that exact name) and was appalled at the lack of indexing on the db.

    i wish I had an extra 20-30 hours to set up my own db for managing wurfl data in MySQL… that way, I think I could add a lot of value to the tool for all to share.

  18. Alex Kerr says:

    In reply to Andrew Deal above, most people move straight to the MySQL implementation of WURFL, which is called TeraWURFL (unless they move to the new API), you’re right the ordinary WURFL XML parser is way too slow for many. http://www.tera-wurfl.com/

    I just checked the DB that TeraWURFL automatically generates and it looks very well indexed to me – and performance matches that. So not sure what you are referring to?

    In more general terms I am “betting the farm” on WURFL simply due to the years of input from all sorts of professional sources, and the fact it’s used to run many commercial systems, some very large. This counts for an awful lot in my book (and it’s free). Worth noting too that rival systems like Device Atlas in large part populate their DBs directly from WURFL.

  19. Hi John (and Luca and Alex), I see this post has attracted the lead developers and large clients :). I just wanted to add some insight to this discussion, albeit a little late. As performance is the primary objective of Tera-WURFL, I have run repeated tests on all the WURFL APIs, both from the WURFL project and third-parties. I run my DB of 45000 unique UAs through the APIs to determine the uncached and cached performance. One secondary goal for Tera-WURFL is fast determination of mobile vs. non-mobile, so the DB includes both. I would like to know if during your tests of the WURFL PHP APIs you are reinstatiating the library between every iteration or simply running a new UA through the existing object. Due to the complex structure of the new WURFL PHP API, there is a significant overhead involved in instatiation (upwards of 40 PHP class files may be included, not counting the cache).
    As the author of Tera-WURFL and a big supporter of the WURFL project, I spend a lot of time thinking about the process of device detection and the future of the mobile Internet experience. Here is what the focus should be right now: we need very accurate results *and* high performance! As Luca implied, it is worth the cost of a slower initial detection since the subsequent requsts will be served very quickly. Accoring to the data I have gathered from very large clients, the number of unique *mobile* user agents that hit your site are in an exponential decay. Some of my clients are getting tens of thousands of unique UAs within the first couple weeks, then the number of new UAs starts dropping drammatically. My recommendation is that you keep the accuracy of the WURFL project and evaluate your requirements for performance. I am biased towards Tera-WURFL, but the PHP WURFL API is also very good. I would be interested to see how Tera-WURFL would fare in your results. Version 2.1.0 will be released on February 10, 2010 and features a new high-speed mobile/non-mobile engine that shows a detection rate of over 300 unique non-cached UAs per second with a cached lookup speed of about 1000/sec on my laptop. I will probably conduct a test similar to yours and post it on http://www.tera-wurfl.com. Please let me know if you need a larger pool of UAs to test with.