ClamWin Free Antivirus Forum Index
ClamWin Free Antivirus
Support and Discussion Forums
Reply to topic
Soooooo slooooow scan engine
blgd


Joined: 08 May 2007
Posts: 0
Reply with quote
What is happening with the scan engine?

From the 0.88.9 or so I don't know if the engine is out of gasoline but every file you try to scan needs at least 20 seconds to be done.

Look, this is what happened when I scanned today a file of 757 Bytes, less than a KB:

----------- SCAN SUMMARY -----------
Known viruses: 115406
Engine version: 0.90.2
Scanned directories: 0
Scanned files: 1
Skipped non-executable files: 0
Infected files: 0

Data scanned: 0.00 MB
Time: 24.875 sec (0 m 24 s)

--------------------------------------
Completed
--------------------------------------

The scan is less than a second (probably it took the half of the 875ms) but the start of the engine is to go to take a coffee while it starts...

I love clamwin because it doesn't need services or run a computer out of memory to run but this...

Can you fix it?

Thank you Smile

My computer is a P4 2.4Ghz 768MB of RAM and Windows 2000.
View user's profileSend private message
alch
Site Admin

Joined: 27 Nov 2005
Posts: 0
Reply with quote
it takes that long because the virus database grown to contain more than 100000 signatures. Next point release will be a significant improvement in database loading.
View user's profileSend private message
blgd


Joined: 08 May 2007
Posts: 0
Reply with quote
Great!!, I'll wait anxious for it.

Thank you for telling it.
View user's profileSend private message
al968


Joined: 24 Feb 2007
Posts: 0
Reply with quote
blgd wrote:
Great!!, I'll wait anxious for it.

Thank you for telling it.


A donation to Clamwin can only help Rolling Eyes Very Happy

Al968
View user's profileSend private message
Point release didn't do much
kcbrown


Joined: 05 Jul 2007
Posts: 0
Reply with quote
I'm now at 0.90.2.1, and it takes well over 30 seconds just for the engine to load the database. This is consistent across machines of varying capabilities. It's true of the 3 GHz P4 system with 2 gig of memory I have.

For a database size of some 10 megabytes (main.cvd + daily.cvd, possibly highly compressed), this is jaw-dropping slow.

Seems to me you need to have clamscan write a cache file directly from the in-memory structures after it has loaded the database from the database files. Have it check the modification date of the cache against the database files and rebuild the cache if it's out of date. Have it checksum the cache or even sign it if it has to in order to guarantee integrity, but the bottom line is that it needs a really fast format for loading the database directly into its internal structures. You'll probably cut the startup time to less than a second that way.

But as it is, it's become nearly unusable. I've been using it in conjunction with Winpooch, but it's gotten to the point where I'm having to disable that on the systems just to keep them usable, because Winpooch has a 30 second timeout on it after which it will abort the scan and tell the user that the file's infected.

PLEASE fix this.
View user's profileSend private message
blgd


Joined: 08 May 2007
Posts: 0
Reply with quote
Yes, I agree that when I posted this were 20 seconds of waiting but now are 30s and sometimes 40s and it's really being unusable Sad
View user's profileSend private message
Trigon


Joined: 04 Jul 2007
Posts: 0
Reply with quote
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program.

(how do other anti viruses programs make loading of the database fast?)
View user's profileSend private message
budtse


Joined: 14 Jan 2006
Posts: 0
Location: Belgium
Reply with quote
Trigon wrote:
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program.


That is how version 1.0 will work. In the current program structure there's no (easy) way to load the database once and then use it when needed. That is because the current ClamWin uses the Clamscan command line utility, which loads the database, scans the file(s), and exits.
We fixed the Outlook plugin to work this way (load database once at the beginning).

regards,
budtse
View user's profileSend private message
GuitarBob


Joined: 09 Jul 2006
Posts: 9
Location: USA
Reply with quote
Is there any way you could use the Outlook code, change it as needed to work under the regular file scanner, and then substitute it for the existing code?

Regards,
View user's profileSend private message
alch
Site Admin

Joined: 27 Nov 2005
Posts: 0
Reply with quote
no, we can't reuse outlook code. We load database only once when Outlook is loaded and use it until Outlook is closed. With on-demand scanning you can't do that unless there is a system service (like clamd), but we already have that in V1 code.
View user's profileSend private message
GuitarBob


Joined: 09 Jul 2006
Posts: 9
Location: USA
Reply with quote
Thanks for the info.

Regards,
View user's profileSend private message
kcbrown


Joined: 05 Jul 2007
Posts: 0
Reply with quote
budtse wrote:
Trigon wrote:
Cant you make clamWin load the database when it first opens? so instead on loading the database every time you want to scan a file it is loaded at the startup of the program.


That is how version 1.0 will work. In the current program structure there's no (easy) way to load the database once and then use it when needed. That is because the current ClamWin uses the Clamscan command line utility, which loads the database, scans the file(s), and exits.
We fixed the Outlook plugin to work this way (load database once at the beginning).

regards,
budtse


But taking more than 30 seconds to load 130000 records on a very fast computer is a bit long for the amount of data involved, don't you think? Clamwin is almost entirely CPU-bound during that entire period of time.

On my P4 2.4 GHz system with 2G of RAM, it takes 46 seconds to load the database with 133,000 virus entries and scan a zero-length file. This is immediately after scanning something else, so the contents of the virus database are likely in the buffer cache in their entirety. The CPU specs on my system are the same as the original poster's.

For the original poster, it took 25 seconds to load a database with 115,000 entries. A 15% increase in the database size caused a 90% increase in the database load time!

Without really knowing more about what it's trying to do, this strongly smells like an algorithm efficiency issue. Based on the numbers above, my calculations show the algorithm to be roughly O(n^4.7). That's really, really bad.

You're likely to get the most bang for the buck on this problem by fixing the root cause rather than trying to work around it.
View user's profileSend private message
The Unix version has exactly the same problem
kcbrown


Joined: 05 Jul 2007
Posts: 0
Reply with quote
This issue is also a problem under Linux. It gets the very same performance after accounting for the difference in CPU speed between my Windows box and my Linux box. Cool. That means it shouldn't be all that hard to figure out. Just profile the code to start with and see what the profiler says is eating all the CPU.
View user's profileSend private message
Fixed it
kcbrown


Joined: 05 Jul 2007
Posts: 0
Reply with quote
Whoever wrote the code in libclamav/readdb.c has now officially failed data structures and/or algorithms. Smile

Trying to maintain an ordered single-linked list at insertion time? For shame... Very Happy

The problem is the MD5 section linked list insertion in cli_loadhdb(). For every MD5 section that it encounters in the virus database, it scans the entire list looking for the right place to put the new node. This is an O(n^2) algorithm.

Like I said, for shame! Very Happy Very Happy


Anyway, I fixed it, and it's now just a question of how I'm supposed to get the fix to someone who can apply it.

The fix takes the entire scan runtime against an almost-empty text file on my Athlon64 linux box from 33 seconds to 2 seconds. I can't tell how much of that time is spent reading the database versus other things.

Let's just say this is a major win and leave it at that. Depending on how that list is used, more performance might be had by changing the data structure being used from a linked list to a variant of a tree, but this seems to take care of the lowest-hanging fruit.

I'd attach the source patch here but I don't see any way of attaching anything...

Edit: I've submitted bug 1749001 against this issue and supplied the patch there. Here's the link to the bug: https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462 https://sourceforge.net/tracker/index.php?func=detail&aid=1749001&group_id=105508&atid=641462
View user's profileSend private message
GuitarBob


Joined: 09 Jul 2006
Posts: 9
Location: USA
Reply with quote
Thanks for your work. I hope they can get it into ClamWin soon. Is this also something that can be used by ClamAV as well? They are really separate projects, and giving it to one project might not necessarily get it to the other project. In fact, ClamAV would probably be the one to give it to, since, at this point, ClamWin essentially provides a Windows GUI to ClamAV, with specific Windows coding only as needed.

Regards,
View user's profileSend private message
Soooooo slooooow scan engine
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT  
Page 1 of 2  

  
  
 Reply to topic