Sistemas y Tecnologías Web: Servidor

Master de II. ULL. 1er cuatrimestre. 2020/2021


Organization ULL-MII-SYTWS-2021   Classroom ULL-MII-SYTWS-2021   Campus Virtual SYTWS   Chat Chat   Profesor Casiano

Table of Contents

Elasticseach

Elasticsearch es un motor de búsqueda:

  • Desarrollado en Java
  • Open Source
  • Distribuido
  • Escalable
  • Basado en lucene

Lucene

es una librería que implementa un full-text search engine. No es una aplicación sino una API que da capacidades de búsqueda.

  • Desarrollado en Java
  • Open Source
  • No es Distribuido
  • Escalable
  • Basado en Índices Invertidos

Índices Invertidos

An inverted index (also referred to as a postings file or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content). The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.

indices-invertidos.png

La idea es parecida a los índices de referencias cruzadas que habitualmente aparecen al final de los libros.

  • Ventajas
    • Velocidad de Búsqueda
    • Facilidad para aplicar algoritmos de relevancia
    • Facilidad para aplicar analizadores de texto
  • Desventajas
    • Velocidad de indexación
    • Algunas búsquedas son mas costosas (por ejemplo el Not lógico)

Funcionalidades aportadas por Lucene y Funcionalidades aportadas por Elasticsearch

/assets/images/lucene-vs-elasticsearch.png

Términos usuales

  • Node: A single instance of Elasticsearch running on a machine. Podemos tener varios nodos en sus correspondientes máquinas sirviendo Elasticsearch.
  • Cluster: A cluster is the single name under which one or more nodes/instances of Elasticsearch are connected to each other.
  • Document: A document is a JSON object that contains the actual data in key value pairs. Es la unidad mínima de información que puede ser indexada y recuperada. En elastic los documentos son JSON.
  • Index: A logical namespace under which Elasticsearch stores data, and may be built with more than one Lucene index using shards and replicas. Conjunto de documentos con similares características.
  • Doc types: A doc type in Elasticsearch represents a class of similar documents. A type consists of a name, such as a user or a blog post, and a mapping, including data types and the Lucene configurations for each field. (An index can contain more than one type.). Con el tiempo va en desuso.
  • Shard: Es un fragmento de un índice. An index is divided into one or more shards to make the data distributable. Shards can be stored on a single node or multiple nodes and are composed of Lucene segments.

    Notes

  • Replica: A duplicate copy of the data living in a shard for high availability. Proporciona alta disponibilidad y escalabilidad.
  • Settings: Define la configuración de un índice y sus características específicas (por ejemplo, el número de réplicas y shards). Se define a nivel de cluster y algunos parámetros se pueden modificar una vez creado el índice. Por ejemplo, si yo defino que un índice va a tener replicación de 2, lo va a tener en todos los nodos del cluster. Eso sí, dentro de distintos índices puedo tener diferentes números de replicación. El número de réplicas se puede cambiar a posteriori pero el número de shards no (al menos no fácilmente).
  • Mappings: Es la definición del modelo de datos de un índice (Puede definirse de manera explícita o dejar que lo genere Elasticsearch). Por cada campo se puede definir su tipo, propiedades y analizadores
  • Analizadores de texto: Procesadores de texto que realizan transformaciones del contenido de los diferentes campos para permitir funcionalidades adicionales de búsqueda

    ### Notes

    • Character Filters: The job of character filters is to do cleanup tasks such as stripping out HTML tags.
    • Tokenizers: The next step is to split the text into terms that are called tokens. This is done by a tokenizer. The splitting can be done based on any rule such as whitespace. More details about tokenizers can be found at this URL: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.
    • Token filters: Once the tokens are created, they are passed to token filters that normalize the tokens. Token filters can change the tokens, remove the terms, or add terms to new tokens.
    • Ejemplo:
      • Pasar a minúscula
      • The ASCII folding token filter, which converts Unicode characters into their ASCII equivalent (quitar acentos, etc.)
      • Quitar palabras que no aportan significado (In computing, stop words are words which are filtered out before processing of natural language data)
      • Quedarnos con la raíz de las palabras (stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form)
    • Proceso:
      • Es importante aquí que hagamos las mismas transformaciones y en el mismo orden tanto en indexación como en búsqueda
  • Queries: Elasticsearch utiliza Query DSL (Lenguaje de dominio específico) para realizar las consultas a los documentos indexados. Es un lenguaje sumamente flexible y de gran alcance, además de simple, que permite conocer y explorar los datos de la mejor manera. Al ser utilizado a través de una interfaz de tipo JSON, las consultas son muy sencillas de leer y, lo más importante, de depurar.

Elasticsearch is built on Java 8.

Instructions on how to install Java 8 are available on Oracle’s website

You can run java -version from the command line to confirm that Java is installed and ready.

1
2
3
4
$ java --version
java 9.0.4
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

Instalación de la versión del libro de ElasticSearch

Una forma de instalarse ElasticSearch es ir a la página de descargas:

La versión que se usa en el libro es la 5.2 que se puede descargar desde aquí:

Aquí se puede encontrar una guía de inicio rápido..

Instalación de la versión 6.4.2. Octubre 2018

Esta es la versión que he usado en mi instalación, la 6.4.2 para seguir el libro a finales de 2018 y comienzos de 2019:

1
2
3
$ elasticsearch --version
Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 6.4.2, Build: default/tar/04711c2/2018-09-26T13:34:09.098244Z, JVM: 9.0.4

Once you download the archive, unzip it and run bin/elasticsearch from the command line.

You should see a lot of output containing something like the following (much of the output is omitted here for brevity).

1
2
3
4
5
6
7
8
$ bin/elasticsearch
[INFO ][o.e.n.Node ] [] initializing ...
... many lines omitted ...
[INFO ][o.e.h.HttpServer ] [kAh7Q7Z] publish_address {127.0.0.1:9200},
    bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node            ] [kAh7Q7Z] started
[INFO ][o.e.g.GatewayService  ] [kAh7Q7Z] recovered [0] indices into
    cluster_state

Notice the publish_address and bound_addresses listed toward the end of the output. By default, Elasticsearch binds TCP port 9200 for its HTTP endpoint.

You can specify a lot of settings when setting up an Elasticsearch cluster. By default, is running in development mode.

A full discussion of the Elasticsearch cluster settings for version 5.2 is Elastic’s Important System Configuration 5.2 page. The same instructions for the current version are here

To have Elasticsearch in the PATH, I have added a small script in my ~/.bash_profile:

1
2
[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/.bash_profile | sed -ne '/elastic/,/^$/p'
source ~/bin/elasticsearch-set

With this contents:

1
2
3
[~/campus-virtual/1819/ca1819/practicas(master)]$ cat ~/bin/elastic-search-set
export ES_HOME=~/Applications/elasticsearch-6.4.2
export PATH=$ES_HOME/bin:$PATH

Instalación de la versión 7.5.0 Diciembre 2019

La version en Diciembre de 2019 es la 7.5.0

Install Elasticsearch on macOS with Homebrew. Diciembre 2019

Apuntes tomados de https://www.elastic.co/guide/en/elasticsearch/reference/current/brew.html

Elastic publishes Homebrew formulae so you can install Elasticsearch with the Homebrew package manager.

To install with Homebrew, you first need to tap the Elastic Homebrew repository:

1
brew tap elastic/tap

Once you’ve tapped the Elastic Homebrew repo, you can use brew install to install the default distribution of Elasticsearch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[~/.../transforming-data-and-testing-continuously-chapter-5/databases(master)]$ brew install elastic/tap/elasticsearch-full
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
allure              bedtools            c-blosc             convox              csvq                golang-migrate      helmfile            micronaut           mitmproxy

==> Installing elasticsearch-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%
==> codesign -f -s - /usr/local/Cellar/elasticsearch-full/7.5.0/libexec/modules/x-pack-ml/platform/darwin-x86_64/controller.app --deep
==> Caveats
Data:    /usr/local/var/lib/elasticsearch/elasticsearch_casiano/
Logs:    /usr/local/var/log/elasticsearch/elasticsearch_casiano.log
Plugins: /usr/local/var/elasticsearch/plugins/
Config:  /usr/local/etc/elasticsearch/

To have launchd start elastic/tap/elasticsearch-full now and restart at login:
  brew services start elastic/tap/elasticsearch-full
Or, if you don't want/need a background service you can just run:
  elasticsearch
==> Summary
🍺  /usr/local/Cellar/elasticsearch-full/7.5.0: 921 files, 451.1MB, built in 1 minute 44 seconds

Directory layout for Homebrew installs

Type Description Default Location Setting

home

Elasticsearch home directory or $ES_HOME

/usr/local/var/homebrew/linked/elasticsearch-full

 

bin

Binary scripts including elasticsearch to start a node and elasticsearch-plugin to install plugins

/usr/local/var/homebrew/linked/elasticsearch-full/bin

 

conf

Configuration files including elasticsearch.yml

/usr/local/etc/elasticsearch

ES_PATH_CONF

data

The location of the data files of each index / shard allocated on the node. Can hold multiple locations.

/usr/local/var/lib/elasticsearch

path.data

logs

Log files location.

/usr/local/var/log/elasticsearch

path.logs

plugins

Plugin files location. Each plugin will be contained in a subdirectory.

/usr/local/var/homebrew/linked/elasticsearch/plugins

 

This installs the most recently released default distribution of Elasticsearch. To install the OSS distribution, specify elastic/tap/elasticsearch-oss.

Running Elasticsearch 7.5.0

$ which elasticsearch
/usr/local/bin/elasticsearch
$ elasticsearch --version
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Version: 7.5.0, Build: default/tar/e9ccaed468e2fac2275a3761849cbee64b39519f/2019-11-26T01:06:52.518245Z, JVM: 13.0.1
$ elasticsearch
...
[2019-12-18T09:52:26,489][INFO ][o.e.t.TransportService   ] [sanclemente-2.local] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
...
[2019-12-18T09:52:28,853][INFO ][o.e.h.AbstractHttpServerTransport] [sanclemente-2.local] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
...
[2019-12-18T09:52:58,833][WARN ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] high disk watermark [90%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.3gb[8.2%], shards will be relocated away from this node

Para arreglar el WARNhe editado el fichero de configuración elasticsearch.yml añadiendo la línea cluster.routing.allocation.disk.watermark.high: 95%:

1
2
3
4
5
[.../etc/elasticsearch]$ sed -ne '/cluster\./p' elasticsearch.yml 
# the most important settings you may want to configure for a production cluster.
cluster.name: elasticsearch_casiano
cluster.routing.allocation.disk.watermark.high: 95%
#cluster.initial_master_nodes: ["node-1", "node-2"]

Aunque ahora salen otros warnings y algun INFO quejumbroso:

...
[2019-12-18T10:26:03,369][WARN ][o.e.b.BootstrapChecks    ] [sanclemente-2.local] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
... 
[2019-12-18T10:27:04,078][INFO ][o.e.c.r.a.DiskThresholdMonitor] [sanclemente-2.local] low disk watermark [85%] exceeded on [VK6QoFsVQeGBAcAKIC3vLA][sanclemente-2.local][/usr/local/var/lib/elasticsearch/nodes/0] free: 19.2gb[8.2%], replicas will not be assigned to this node

Ruta root Elasticsearch

Si visitamos con el navegador http://localhost:9200:

/assets/images/elasticsearch-root-page-9200.png

La ruta _cat

  • curl localhost:9200/_cat da una serie de endpoints

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
      .../etc/elasticsearch]$ curl localhost:9200/_cat
      =^.^=
      /_cat/allocation
      /_cat/shards
      /_cat/shards/{index}
      /_cat/master
      /_cat/nodes
      /_cat/tasks
      /_cat/indices
      /_cat/indices/{index}
      /_cat/segments
      /_cat/segments/{index}
      /_cat/count
      /_cat/count/{index}
      /_cat/recovery
      /_cat/recovery/{index}
      /_cat/health
      /_cat/pending_tasks
      /_cat/aliases
      /_cat/aliases/{alias}
      /_cat/thread_pool
      /_cat/thread_pool/{thread_pools}
      /_cat/plugins
      /_cat/fielddata
      /_cat/fielddata/{fields}
      /_cat/nodeattrs
      /_cat/repositories
      /_cat/snapshots/{repository}
      /_cat/templates
    
  • Modo verboso: $ curl localhost:9200/_cat/master?v`

    1
    2
    
      id                     host      ip        node
      VK6QoFsVQeGBAcAKIC3vLA 127.0.0.1 127.0.0.1 sanclemente-2.local
    
  • Help: $ curl localhost:9200/_cat/master?help`

    1
    2
    3
    4
    
      id   |   | node id    
      host | h | host name  
      ip   |   | ip address 
      node | n | node name 
    
  • Each of the commands accepts a query string parameter h which forces only those columns to appear: curl localhost:9200/_cat/nodes?h=ip,port,heapPercent,name

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
      [.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes
      127.0.0.1 28 99 24 2.56   dilm * sanclemente-2.local
      [.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes?help | head -n 5
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                      Dload  Upload   Total   Spent    Left  Speed
      100 17533  100 17533    0     0   903k      0 --:--:-- --:--:-- --:--:--  951k
      id                                 | id,nodeId                                   | unique node id                                                                                                   
      pid                                | p                                           | process id                                                                                                       
      ip                                 | i                                           | ip address                                                                                                       
      port                               | po                                          | bound transport port                                                                                             
      http_address                       | http                                        | bound http address                                                                                               
      [.../etc/elasticsearch]$ curl localhost:9200/_cat/nodes?h=ip,port,heapPercent,name
      127.0.0.1 9300 28 sanclemente-2.local
    

Running ElasticSearch 6.4.2

Let us see where elasticsearch 6.4.2 is installed:

1
2
[~]$ which elasticsearch
/Users/casiano/Applications/elasticsearch-6.4.2/bin/elasticsearch

Let us execute elasticsearch 6.4.2 in development mode. The flow of output when executed is overwhelming:

1
2
3
4
5
6
7
8
9
10
11
[~/sol-nodejs-the-right-way(master)]$ elasticsearch
[Java HotSpot(TM) 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
[2019-12-15T11:28:46,903][INFO ][o.e.n.Node               ] [] initializing ...
  ...
[2019-12-15T11:28:53,337][INFO ][o.e.p.PluginsService     ] [9jAGWs_] loaded module [aggs-matrix-stats]
[2019-12-15T11:28:53,338][INFO ][o.e.p.PluginsService     ] [9jAGWs_] loaded module [analysis-common]
  ...
  
 [2019-12-15T11:29:10,938][INFO ][o.e.t.TransportService   ] [9jAGWs_] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
  ...
[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}

We can see in the last line that is listening at 9200:

1
[2019-12-15T11:29:14,175][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [9jAGWs_] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}

Now we can use insomnia or any other HTTP REST client to make queries to the elasticsearch server:

assets/images/insomnia-elasticsearch-1.png

Referencias para Elasticsearch

Setup Kibana

Installing Kibana

Installing Kibana on MacOS with Homebrew

This text is a copy of https://www.elastic.co/guide/en/kibana/current/brew.html#brew.

Elastic publishes Homebrew formulae so you can install Kibana with the Homebrew package manager.

To install with Homebrew, you first need to tap the Elastic Homebrew repository:

1
brew tap elastic/tap

Once you’ve tapped the Elastic Homebrew repo, you can use brew install to install the default distribution of Kibana:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ brew install elastic/tap/kibana-full
Updating Homebrew...
==> Installing kibana-full from elastic/tap
==> Downloading https://artifacts.elastic.co/downloads/kibana/kibana-7.5.0-darwin-x86_64.tar.gz?tap=elastic/homebrew-tap
######################################################################## 100.0%

==> Caveats
Config: /usr/local/etc/kibana/
If you wish to preserve your plugins upon upgrade, make a copy of
/usr/local/opt/kibana-full/plugins before upgrading, and copy it into the
new keg location after upgrading.

To have launchd start elastic/tap/kibana-full now and restart at login:
  brew services start elastic/tap/kibana-full
Or, if you don't want/need a background service you can just run:
  kibana
==> Summary
🍺  /usr/local/Cellar/kibana-full/7.5.0: 94,615 files, 633.7MB, built in 8 minutes 18 seconds

This installs the most recently released default distribution of Kibana. To install the OSS distribution, specify elastic/tap/kibana-oss.

Directory layout for Homebrew installs

When you install Kibana with brew install, the config files, logs, and data directory are stored in the following locations.

Type Description Default Location Setting

home

Kibana home directory or $KIBANA_HOME

/usr/local/var/homebrew/linked/kibana-full

 

bin

Binary scripts including kibana to start a node and kibana-plugin to install plugins

/usr/local/var/homebrew/linked/kibana-full/bin

 

conf

Configuration files including kibana.yml

/usr/local/etc/kibana

 

data

The location of the data files of each index / shard allocated on the node. Can hold multiple locations.

/usr/local/var/lib/kibana

path.data

logs

Log files location.

/usr/local/var/log/kibana

path.logs

plugins

Plugin files location. Each plugin will be contained in a subdirectory.

/usr/local/var/homebrew/linked/kibana-full/plugins

 

Haciendo Consultas a Elasticsearch con Kibana

Creamos para la versión 7.5. de Elasticsearch el index de libros de Guttenberg con el fichero que habíamos preparado en la práctica anterior:

1
[~/.../t3-p8-commanding-databases-marreA/esclu(master)]$ ./esclu bulk ../t1-p7-transforming-data-and-testing-continuously-marreA/data/bulk_pg.ldj -i books -t book 

Una vez instalado Kibana lo arrancamos:

[.../etc/elasticsearch]$ kibana
  log   [11:08:38.205] [info][plugins-system] Setting up [15] plugins: [timelion,features,code,security,licensing,spaces,uiActions,newsfeed,expressions,inspector,embeddable,advancedUiActions,data,eui_utils,translations]
  ...
  log   [11:08:38.225] [warning][config][plugins][security] Generating a random key for xpack.security.encryptionKey. To prevent sessions from being invalidated on restart, please set xpack.security.encryptionKey in kibana.yml
  log   [11:08:38.227] [warning][config][plugins][security] Session cookies will be transmitted over insecure connections. This is not recommended.
  ...
  log   [11:09:12.590] [warning][licensing][plugins] License information could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms
  log   [11:09:13.610] [warning][legacy-plugins] Skipping non-plugin directory at /usr/local/Cellar/kibana-full/7.5.0/libexec/src/legacy/core_plugins/visualizations
  ...
  log   [11:09:20.088] [warning][config][deprecation] Environment variable "DATA_PATH" will be removed.  It has been replaced with kibana.yml setting "path.data"
  ...
  log   [11:09:24.245] [warning][encrypted_saved_objects] Generating a random key for xpack.encrypted_saved_objects.encryptionKey. To be able to decrypt encrypted saved objects attributes after restart, please set xpack.encrypted_saved_objects.encryptionKey in kibana.yml
  ... from failing on restart, please set xpack.reporting.encryptionKey in kibana.yml
  log   [11:09:29.003] [info][status][plugin:reporting@7.5.0] Status changed from uninitialized to green - Ready
  log   [11:09:29.151] [info][listening] Server running at http://localhost:5601
  log   [11:09:29.917] [info][server][Kibana][http] http server running at http://localhost:5601

Por defecto Kibana corre en el puerto 5601.

Abrimos el navegador en http://localhost:5601 y hacemos click en las herramientas de desarrollo (la llave inglesa) en el menú de la izquierda. Esto nos abre un panel como este en el que podemos hacer requests al servidor de Elasticsearch:

/assets/images/kibana-query-2-elastic-search.png

Algunos ejemplos de queries:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
GET _cat/indices?v


GET books/_search
{
  "query": {
    "match": { 
      "authors": "Twain" 
    }
  }
}

GET books/_search
{
  "query": {
    "query_string": {
      "query": "authors:Twain AND subjects:Missouri AND title:Sawyer" 
    }
  }
}

GET books/_search
{
  "query": {
    "query_string": {
      "fields": ["authors", "subjects", "title"], 
      "query": "Twain AND Missouri AND Sawyer" 
    }
  }
}

POST test/test/1
{
  "title": "hello world"
}

GET test/test/1

POST test/_doc/2
{
  "title": "hola mundo"
}

GET test/_doc/2

PUT test/_doc/2
{
    "title" : "bonjour monde",
    "tags" : ["red", "blue"]
}

PUT test/_doc/2
{
    "tags" : ["green", "orange"]
}


POST test/_doc/3
{
    "title" : "SYTWS",
    "tags" : ["red", "blue"]
}

POST test/_update/3
{
    "script" : {
        "source": "ctx._source.tags = params.colors",
        "params" : {
            "colors" : ["green"]
        }
    }
}

GET test/_doc/3

DELETE test/

Referencias para Kibana

Comment with GitHub Utterances

Comment with Disqus

thread de discusion