This is the rough let’s-get-this-done search algorithm; it gets the work done, but it’s not very efficient.
def linear_search(array, key)
if array.index(key).nil?
return -1
else
return "#{key} at index #{array.index(key)}"
end
end
arr = [7, 6, 25, 19, 8, 14, 3, 16, 2, 0]
key = 3
p linear_search(arr, key)
def binary_search(array, key)
low, high = 0, array.length - 1
while low <= high
mid = (low + high) >> 1
case key <=> array[mid]
when 1
low = mid + 1
when -1
high = mid - 1
else
return mid
end
end
end
arr = [1,3,4,12,16,21,34,45,55,76,99,101]
key = 3
p binary_search(arr, key)
require 'benchmark'; require './searches'
# ruby 2.6
arr = (1..).step(5).take(1000000)
key = 1000
Benchmark.bm do |x|
x.report('linear') { linear_search(arr, key) }
x.report('binary') { binary_search(arr, key) }
end
# user system total real
# linear 0.006069 0.000000 0.006069 ( 0.006106)
# binary 0.000012 0.000000 0.000012 ( 0.000011)
/W[aeiou]rd/.match("Word")
# => #<MatchData "Word">
cat smt | rg something useful
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis.html
POST _analyze
{
"analyzer": "standard",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
#[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html
POST _analyze
{
"tokenizer": "standard",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
# [ The, 2, QUICK, Brown, Foxes, jumped, over, the, lazy, dog's, bone ]
Chahge text into stream of tokens text.
Distributed, RESTful search and analytics.
Thats mean that they both use same format for indexing
Each cluster has a single master node which is chosen automatically by the cluster and which can be replaced if the current master node fails.
Each primary shard can have zero or more replicas. A replica is a copy of the primary shard, and has two purposes:
A shard is a single Lucene instance. It is a low-level “worker” unit which is managed automatically by Elasticsearch. An index is a logical namespace which points to primary and replica shards.
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.tar.gz
$ tar xvf elasticsearch-6.5.4.tar.gz
$ cd elasticsearch-6.5.4 && ./bin/elasticsearch
Logstash is the central dataflow engine in the Elastic Stack for gathering, enriching, and unifying all of your data regardless of format or schema.
Event is just json document
One isstance of elasticsearch
Just json in body of http request
curl -X GET "localhost:9200/_cat/health?v"
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1475247709 17:01:49 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0%
# Create
curl -X PUT "localhost:9200/customer?pretty"
# Show
curl -X GET "localhost:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open customer 95SQ4TSUT7mWBT7VNHH67A 5 1 0 0 260b 260b
# Delete
curl -X DELETE "localhost:9200/customer?pretty"
curl -X GET "localhost:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
# Add document
curl -X PUT "localhost:9200/customer/_doc/1?pretty" \
-H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}
'
# Show with id 1
curl -X GET "localhost:9200/customer/_doc/1?pretty"
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : { "name": "John Doe" }
}
curl -X POST "localhost:9200/customer/_doc/1/_update?pretty" \
-H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe", "age": 20 }
}
'
curl -X POST "localhost:9200/customer/_doc/1/_update?pretty" \
-H 'Content-Type: application/json' -d'
{
"script" : "ctx._source.age += 5"
}
'
curl -X POST "localhost:9200/customer/_doc/_bulk?pretty" \
-H 'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
curl -X GET "localhost:9200/bank/_search" \
-H 'Content-Type: application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
'
curl -X GET "localhost:9200/bank/_search" \
-H 'Content-Type: application/json' -d'
{
"query": { "match": { "account_number": 20 } }
}
'
curl -X GET "localhost:9200/bank/_search" \
-H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
'
curl -XGET http://localhost:8983/solr/books/query -d '
{
"query": {
"bool": {
"must_not": "{!frange u:3.0}ranking"
}
},
"filter: [
"title:solr",
{ "lucene" : {"df: "content", query : "lucene solr" }}
]
}'
https://lucene.apache.org/solr/guide/7_5/json-query-dsl.html
solr_collection/(create|update|delete)
https://github.com/tantivy-search/tantivy https://github.com/voloyev/actix_tantivy