Hello forum,
being totally new to the Raspberry world, I hope to get a few comments and answers to my (maybe crazy) thoughts about using a Raspberry farm instead of Intel i7 systems to serve as a crawling and parsing system.
Application background: I have developed my own crawling (internally using wget) and very specialized parsing software. The goal is to permanently crawl and parse a couple of 10 million websites websites for special purposes (very different from normal full text searches). It currently runs on OpenSuSe Linux on two Intel PC i7 3770 and 4770 nearing the end of beta tests. In the end I will need about 7-10 such PC to run all tasks permanently.
Since I must work on self-financed low-budget, I had the idea of employing a farm of Raspberry systems to handle these tasks. The software could be ported although this would mean some efforts, which must be justified.
However, this would only make sense if most of the following expectations could be fulfilled by a Raspberry server farm:
1) The initial hardware investment related to the same processing level must be less than 50% of what Intel based PCs would cost to justify the additional hassle, handling and software port.
2) The power consumption should be significantly lower.
3) Reliability should reach around 80% of the one of the Intel systems (i.e. somewhat less could be accepted).
4) Crawling / downloading could be separated from parsing. It would be sufficient if parsing was delegated to Raspberries. Crawling / downloading uses RAM disk as intermediate storage because the SSD drives quickly get to their limits and this might therefore better stay on the 32 GB RAM Intel PCs.
What do the Raspberry experts say?
a) Forget about it!
b) A viable concept worth considering.
c) Cost saving will not be significant enough.
d) Or what else?
Thank you very much in advance for your comments.
Best regards
FrankB