 |
 | Soooooo slooooow scan engine |  |
blgd
Joined: 08 May 2007 |
Posts: 0 |
|
|
 |
Posted: Tue May 08, 2007 10:24 pm |
|
 |
 |
 |
 |
What is happening with the scan engine?
From the 0.88.9 or so I don't know if the engine is out of gasoline but every file you try to scan needs at least 20 seconds to be done.
Look, this is what happened when I scanned today a file of 757 Bytes, less than a KB:
----------- SCAN SUMMARY -----------
Known viruses: 115406
Engine version: 0.90.2
Scanned directories: 0
Scanned files: 1
Skipped non-executable files: 0
Infected files: 0
Data scanned: 0.00 MB
Time: 24.875 sec (0 m 24 s)
--------------------------------------
Completed
--------------------------------------
The scan is less than a second (probably it took the half of the 875ms) but the start of the engine is to go to take a coffee while it starts...
I love clamwin because it doesn't need services or run a computer out of memory to run but this...
Can you fix it?
Thank you
My computer is a P4 2.4Ghz 768MB of RAM and Windows 2000.
|
|
 |
 | |  |
alch
Site Admin
Joined: 27 Nov 2005 |
Posts: 0 |
|
|
 |
Posted: Wed May 09, 2007 1:46 am |
|
 |
 |
 |
 |
it takes that long because the virus database grown to contain more than 100000 signatures. Next point release will be a significant improvement in database loading.
|
|
blgd
Joined: 08 May 2007 |
Posts: 0 |
|
|
 |
Posted: Wed May 09, 2007 2:59 am |
|
 |
 |
 |
 |
Great!!, I'll wait anxious for it.
Thank you for telling it.
|
|
 | Point release didn't do much |  |
kcbrown
Joined: 05 Jul 2007 |
Posts: 0 |
|
|
 |
Posted: Thu Jul 05, 2007 1:19 pm |
|
 |
 |
 |
 |
I'm now at 0.90.2.1, and it takes well over 30 seconds just for the engine to load the database. This is consistent across machines of varying capabilities. It's true of the 3 GHz P4 system with 2 gig of memory I have.
For a database size of some 10 megabytes (main.cvd + daily.cvd, possibly highly compressed), this is jaw-dropping slow.
Seems to me you need to have clamscan write a cache file directly from the in-memory structures after it has loaded the database from the database files. Have it check the modification date of the cache against the database files and rebuild the cache if it's out of date. Have it checksum the cache or even sign it if it has to in order to guarantee integrity, but the bottom line is that it needs a really fast format for loading the database directly into its internal structures. You'll probably cut the startup time to less than a second that way.
But as it is, it's become nearly unusable. I've been using it in conjunction with Winpooch, but it's gotten to the point where I'm having to disable that on the systems just to keep them usable, because Winpooch has a 30 second timeout on it after which it will abort the scan and tell the user that the file's infected.
PLEASE fix this.
|
|
 |
 | |  |
Trigon
Joined: 04 Jul 2007 |
Posts: 0 |
|
|
 |
Posted: Thu Jul 05, 2007 1:57 pm |
|
 |
 |
 |
 |
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program.
(how do other anti viruses programs make loading of the database fast?)
|
|
budtse
Joined: 14 Jan 2006 |
Posts: 0 |
Location: Belgium |
|
 |
Posted: Thu Jul 05, 2007 6:30 pm |
|
 |
 |
 |
 |
Trigon wrote: |
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program. |
That is how version 1.0 will work. In the current program structure there's no (easy) way to load the database once and then use it when needed. That is because the current ClamWin uses the Clamscan command line utility, which loads the database, scans the file(s), and exits.
We fixed the Outlook plugin to work this way (load database once at the beginning).
regards,
budtse
|
|
GuitarBob
Joined: 09 Jul 2006 |
Posts: 9 |
Location: USA |
|
 |
Posted: Thu Jul 05, 2007 6:37 pm |
|
 |
 |
 |
 |
Is there any way you could use the Outlook code, change it as needed to work under the regular file scanner, and then substitute it for the existing code?
Regards,
|
|
alch
Site Admin
Joined: 27 Nov 2005 |
Posts: 0 |
|
|
 |
Posted: Thu Jul 05, 2007 10:07 pm |
|
 |
 |
 |
 |
no, we can't reuse outlook code. We load database only once when Outlook is loaded and use it until Outlook is closed. With on-demand scanning you can't do that unless there is a system service (like clamd), but we already have that in V1 code.
|
|
GuitarBob
Joined: 09 Jul 2006 |
Posts: 9 |
Location: USA |
|
 |
Posted: Thu Jul 05, 2007 10:33 pm |
|
 |
 |
 |
 |
Thanks for the info.
Regards,
|
|
 |
 | |  |
kcbrown
Joined: 05 Jul 2007 |
Posts: 0 |
|
|
 |
Posted: Fri Jul 06, 2007 5:24 am |
|
 |
 |
 |
 |
budtse wrote: |
Trigon wrote: |
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program. |
That is how version 1.0 will work. In the current program structure there's no (easy) way to load the database once and then use it when needed. That is because the current ClamWin uses the Clamscan command line utility, which loads the database, scans the file(s), and exits.
We fixed the Outlook plugin to work this way (load database once at the beginning).
regards,
budtse |
But taking more than 30 seconds to load 130000 records on a very fast computer is a bit long for the amount of data involved, don't you think? Clamwin is almost entirely CPU-bound during that entire period of time.
On my P4 2.4 GHz system with 2G of RAM, it takes 46 seconds to load the database with 133,000 virus entries and scan a zero-length file. This is immediately after scanning something else, so the contents of the virus database are likely in the buffer cache in their entirety. The CPU specs on my system are the same as the original poster's.
For the original poster, it took 25 seconds to load a database with 115,000 entries. A 15% increase in the database size caused a 90% increase in the database load time!
Without really knowing more about what it's trying to do, this strongly smells like an algorithm efficiency issue. Based on the numbers above, my calculations show the algorithm to be roughly O(n^4.7). That's really, really bad.
You're likely to get the most bang for the buck on this problem by fixing the root cause rather than trying to work around it.
|
|
 |
 | The Unix version has exactly the same problem |  |
kcbrown
Joined: 05 Jul 2007 |
Posts: 0 |
|
|
 |
Posted: Fri Jul 06, 2007 6:31 am |
|
 |
 |
 |
 |
This issue is also a problem under Linux. It gets the very same performance after accounting for the difference in CPU speed between my Windows box and my Linux box. Cool. That means it shouldn't be all that hard to figure out. Just profile the code to start with and see what the profiler says is eating all the CPU.
|
|
 | Fixed it |  |
kcbrown
Joined: 05 Jul 2007 |
Posts: 0 |
|
|
 |
Posted: Fri Jul 06, 2007 10:50 am |
|
 |
 |
 |
 |
Whoever wrote the code in libclamav/readdb.c has now officially failed data structures and/or algorithms.
Trying to maintain an ordered single-linked list at insertion time? For shame...
The problem is the MD5 section linked list insertion in cli_loadhdb(). For every MD5 section that it encounters in the virus database, it scans the entire list looking for the right place to put the new node. This is an O(n^2) algorithm.
Like I said, for shame!
Anyway, I fixed it, and it's now just a question of how I'm supposed to get the fix to someone who can apply it.
The fix takes the entire scan runtime against an almost-empty text file on my Athlon64 linux box from 33 seconds to 2 seconds. I can't tell how much of that time is spent reading the database versus other things.
Let's just say this is a major win and leave it at that. Depending on how that list is used, more performance might be had by changing the data structure being used from a linked list to a variant of a tree, but this seems to take care of the lowest-hanging fruit.
I'd attach the source patch here but I don't see any way of attaching anything...
Edit: I've submitted bug 1749001 against this issue and supplied the patch there. Here's the link to the bug: https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462 https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462
|
|
 |
 | |  |
GuitarBob
Joined: 09 Jul 2006 |
Posts: 9 |
Location: USA |
|
 |
Posted: Fri Jul 06, 2007 3:13 pm |
|
 |
 |
 |
 |
Thanks for your work. I hope they can get it into ClamWin soon. Is this also something that can be used by ClamAV as well? They are really separate projects, and giving it to one project might not necessarily get it to the other project. In fact, ClamAV would probably be the one to give it to, since, at this point, ClamWin essentially provides a Windows GUI to ClamAV, with specific Windows coding only as needed.
Regards,
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
All times are GMT
Page 1 of 2
|
|
|
Powered by phpBB © phpBB Group
Design by phpBBStyles.com | Styles Database.
Content © ClamWin Free Antivirus GNU GPL Free Software Open Source Virus Scanner. Free Windows Antivirus. Stay Virus Free with Free Software.
|  |