Google is known as being the world’s best search engine. But besides search, Google has authored many scholarly papers on how to deal with massive server farms. So many start-ups called open-source search engines harness the power of multiple server farms to collect and organize data. Instead of scouring the web, your internet technology department can use an open-source search engine to comb through mountains of data.
The Apache Solr is often said to be a quantum leap forward from Lucena. This tool boasts a feature-rich, dynamic project developer ecosystem. This merger is complete because today, both projects were eventually merged. Solt in Lucene provides a real operating in distributed mode with distributed queries across multiple servers and then aggregated index. Functionally, Solr is a quantum leap compared to Lucene. For the user, it allows a faceted search by applying filters to results.
Side indexing Solr performs real-time incremental indexing (almost). The engine accesses the database and generates parsing multiple types of documents. It is also usable to perform geospatial searches on dots or polygons. Finally, the engine has a web administration interface to perform the essential maintenance without having to manually edit configuration files. Finally, Solr has a mechanism of caching to improve performance.
Sphinx, or in other words tool with indexing speed, adapted to index the content database. Search engine developed in C ++, Sphinx is still available on many platforms, including several Red Hat Enterprise Linux, Ubuntu and Debian, Windows 32 and 64 bit, and Mac OS X. Designed for pure performance, Sphinx can index billions of documents and terabytes of data running on one or more machines commonplace in distributed mode. The publisher claims a speed of 10 to 15 Mbit / s by heart computing and hard drive.
Created by Andrew and Peter Zaitsev Aksyonoff, the former head of the high-performance MySQL AB group, Sphinx works as a SQL database, with fixed data patterns. It can be indexed natively by SQL data such as MySQL, MariaDB, PostgreSQL, or any other database via ODBC. It can even act as a storage engine for MySQL or MariaDB. In addition to his API, Sphinx implements a subset of SQL, SphinxQL language.
Powerful tools for user searches, portable across multiple platforms and capability for up to 4 billion documents indexed – all that and much more in the form of Xapian.
Coming from Muscat and Muscat Open, a project initiated by the University of Cambridge project Xapian is released under GPL. This engine is popular in academia and widely used by libraries. It is possible to have a commercial motor bracket with three service companies, all English. Using document identifiers of 32 bits, Xapian can index more than 4 billion records per database.
Why the engine will not be suitable for applications Big Data, which explains its absence in the Web giants. The indexing module supports all major file types including office OpenOffice / StarOffice, OpenDocument, and Microsoft Office, as well as HTML, PHP, PDF, and PostScript. The Perl DBI module provides the indexer content access to SQL databases on the market. Xapian does present some fascinating linguistic features, including a search by proximity, support for synonyms, stemming the for multiple languages, including French, or use of wildcards and Boolean operators in search queries. Finally, Xapian allows a faceted search.
Strong community, with frequent updates, low memory footprint, many open source projects that complement the engine itself, link to big data (including Blur project that weaves together Lucene and Hadoop) or simply Apache Lucene.
Search full-text, Lucene features a small memory footprint, the ability to meet a large number of queries per hour, and the compactness of its index. On the downside, there is a limited library functionally.
To have a site crawler, a document parser, we must extend the heart ‘of Apache Lucene with additional bricks. That’s why Apache Lucene provides the basis for many other open-source search solutions to the broader functional coverage just like Solr, Compass, and ElasticSearch. Written in Java, the latest evolution of Apache Lucene (4.8.1) runs on Java 7 and above. There are many other implementations of Lucene, either in C, C ++, Objective-C, Perl, Ruby, etc. If the functional coverage of Lucene is limited, the number of extensions available is the strength of the platform. All we can expect of a modern search engine is available, you must find the right extensions.
The last tool we present to you is ElasticSearch with its ability for simplifying the installation process and modern distributed architecture.
Feeding ElasticSearch index is provided by a plug, Rivers. This allows you to index documents, and the contents of NoSQL databases like CouchDB, a tail RabbitMQ or Twitter, or Wikipedia. Seriously, all the databases are available to him via the JDBC connector. River achieves an indexation in near real-time sources. Side query ElasticSearch supports faceted search and reverse lookup (percolation): a newly indexed document will be reassembled for users if it matches their query. This is the search engine that comes up.
His publisher has raised $ 104 million from New Enterprise Associates (NEA), Benchmark Capital, and Index Ventures, the star of Silicon Valley investment fund. ElasticSearch is with Solr, one of the major search engines from Apache Lucene. It is distinguished by its operation in clusters and JSON interface. Including the power intended for Big Data applications, including conducting research in log files at very high volumes. ElasticSearch structure is a motor.
In a large network of offers on the Internet, it is sometimes difficult to decide what you need. Through this article, we have presented you with a selection of the best open-source search engines. Read and check if and which of them will respond to all your requests and offers in the best possible way.