![]() |
| Soooooo slooooow scan engine |
|
alch
Site Admin
|
it takes that long because the virus database grown to contain more than 100000 signatures. Next point release will be a significant improvement in database loading.
|
||||||||||||
|
|
|||||||||||||
|
blgd
|
Great!!, I'll wait anxious for it.
Thank you for telling it. |
||||||||||||
|
|
|||||||||||||
|
al968
|
A donation to Clamwin can only help Al968 |
||||||||||||||
|
|
|||||||||||||||
| Point release didn't do much |
|
kcbrown
|
I'm now at 0.90.2.1, and it takes well over 30 seconds just for the engine to load the database. This is consistent across machines of varying capabilities. It's true of the 3 GHz P4 system with 2 gig of memory I have.
For a database size of some 10 megabytes (main.cvd + daily.cvd, possibly highly compressed), this is jaw-dropping slow. Seems to me you need to have clamscan write a cache file directly from the in-memory structures after it has loaded the database from the database files. Have it check the modification date of the cache against the database files and rebuild the cache if it's out of date. Have it checksum the cache or even sign it if it has to in order to guarantee integrity, but the bottom line is that it needs a really fast format for loading the database directly into its internal structures. You'll probably cut the startup time to less than a second that way. But as it is, it's become nearly unusable. I've been using it in conjunction with Winpooch, but it's gotten to the point where I'm having to disable that on the systems just to keep them usable, because Winpooch has a 30 second timeout on it after which it will abort the scan and tell the user that the file's infected. PLEASE fix this. |
||||||||||||
|
|
|||||||||||||
|
blgd
|
Yes, I agree that when I posted this were 20 seconds of waiting but now are 30s and sometimes 40s and it's really being unusable
|
||||||||||||
|
|
|||||||||||||
|
Trigon
|
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program.
(how do other anti viruses programs make loading of the database fast?) |
||||||||||||
|
|
|||||||||||||
|
budtse
|
That is how version 1.0 will work. In the current program structure there's no (easy) way to load the database once and then use it when needed. That is because the current ClamWin uses the Clamscan command line utility, which loads the database, scans the file(s), and exits. We fixed the Outlook plugin to work this way (load database once at the beginning). regards, budtse |
||||||||||||||
|
|
|||||||||||||||
|
GuitarBob
|
Is there any way you could use the Outlook code, change it as needed to work under the regular file scanner, and then substitute it for the existing code?
Regards, |
||||||||||||
|
|
|||||||||||||
|
alch
Site Admin
|
no, we can't reuse outlook code. We load database only once when Outlook is loaded and use it until Outlook is closed. With on-demand scanning you can't do that unless there is a system service (like clamd), but we already have that in V1 code.
|
||||||||||||
|
|
|||||||||||||
|
GuitarBob
|
Thanks for the info.
Regards, |
||||||||||||
|
|
|||||||||||||
|
kcbrown
|
But taking more than 30 seconds to load 130000 records on a very fast computer is a bit long for the amount of data involved, don't you think? Clamwin is almost entirely CPU-bound during that entire period of time. On my P4 2.4 GHz system with 2G of RAM, it takes 46 seconds to load the database with 133,000 virus entries and scan a zero-length file. This is immediately after scanning something else, so the contents of the virus database are likely in the buffer cache in their entirety. The CPU specs on my system are the same as the original poster's. For the original poster, it took 25 seconds to load a database with 115,000 entries. A 15% increase in the database size caused a 90% increase in the database load time! Without really knowing more about what it's trying to do, this strongly smells like an algorithm efficiency issue. Based on the numbers above, my calculations show the algorithm to be roughly O(n^4.7). That's really, really bad. You're likely to get the most bang for the buck on this problem by fixing the root cause rather than trying to work around it. |
||||||||||||||||
|
|
|||||||||||||||||
| The Unix version has exactly the same problem |
|
kcbrown
|
This issue is also a problem under Linux. It gets the very same performance after accounting for the difference in CPU speed between my Windows box and my Linux box. Cool. That means it shouldn't be all that hard to figure out. Just profile the code to start with and see what the profiler says is eating all the CPU.
|
||||||||||||
|
|
|||||||||||||
| Fixed it |
|
kcbrown
|
Whoever wrote the code in libclamav/readdb.c has now officially failed data structures and/or algorithms.
Trying to maintain an ordered single-linked list at insertion time? For shame... The problem is the MD5 section linked list insertion in cli_loadhdb(). For every MD5 section that it encounters in the virus database, it scans the entire list looking for the right place to put the new node. This is an O(n^2) algorithm. Like I said, for shame! Anyway, I fixed it, and it's now just a question of how I'm supposed to get the fix to someone who can apply it. The fix takes the entire scan runtime against an almost-empty text file on my Athlon64 linux box from 33 seconds to 2 seconds. I can't tell how much of that time is spent reading the database versus other things. Let's just say this is a major win and leave it at that. Depending on how that list is used, more performance might be had by changing the data structure being used from a linked list to a variant of a tree, but this seems to take care of the lowest-hanging fruit. I'd attach the source patch here but I don't see any way of attaching anything... Edit: I've submitted bug 1749001 against this issue and supplied the patch there. Here's the link to the bug: https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462 https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462 |
||||||||||||
|
|
|||||||||||||
|
GuitarBob
|
Thanks for your work. I hope they can get it into ClamWin soon. Is this also something that can be used by ClamAV as well? They are really separate projects, and giving it to one project might not necessarily get it to the other project. In fact, ClamAV would probably be the one to give it to, since, at this point, ClamWin essentially provides a Windows GUI to ClamAV, with specific Windows coding only as needed.
Regards, |
||||||||||||
|
|
|||||||||||||
| Soooooo slooooow scan engine |
|
||
|
Powered by phpBB © phpBB Group
Design by phpBBStyles.com | Styles Database.
Content © ClamWin Free Antivirus GNU GPL Free Software Open Source Virus Scanner. Free Windows Antivirus. Stay Virus Free with Free Software.
Design by phpBBStyles.com | Styles Database.
Content © ClamWin Free Antivirus GNU GPL Free Software Open Source Virus Scanner. Free Windows Antivirus. Stay Virus Free with Free Software.


