About 50 results
Open links in new tab
  1. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  2. How to do Tokenizer Batch processing? - HuggingFace

    Jun 7, 2023 · in the Tokenizer documentation from huggingface, the call fuction accepts List [List [str]] and says: text (str, List [str], List [List [str]], optional) — The sequence or batch of …

  3. Looking for a clear definition of what a "tokenizer", "parser" and ...

    Mar 28, 2018 · A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context …

  4. Huggingface saving tokenizer - Stack Overflow

    Oct 27, 2020 · Make sure that: - './models/tokenizer/' is a correct model identifier listed on 'https://huggingface.co/models' - or './models/tokenizer/' is the correct path to a directory …

  5. How to add new tokens to an existing Huggingface tokenizer?

    May 8, 2023 · # add the tokens to the tokenizer vocabulary tokenizer.add_tokens(list(new_tokens)) # add new, random embeddings for the new tokens …

  6. Unable to get the tokenizer of Gemma-3 - Stack Overflow

    Mar 22, 2025 · 3 I am trying to get the tokenizer using huggingface AutoTokenizer library, but I am unable to fetch, is there any other way to get it? Where I am doing wrong?

  7. How to load the saved tokenizer from pretrained model

    Oct 16, 2019 · I fine-tuned a pretrained BERT model in Pytorch using huggingface transformer. All the training/validation is done on a GPU in cloud. At the end of the training, I save the model …

  8. How can I push a custom tokenizer to HuggingFace Hub?

    Oct 10, 2023 · I have a custom Tokenizer built & trained using HuggingFace Tokenizers functions. I can save & load the custom tokenizer to a JSON file without a problem. Here …

  9. After creating a Custom Tokenizer using HF Tokenizers library, …

    Apr 2, 2024 · I followed this tutorial to create a custom Tokenizer based on SentencePieceBPE class, with a custom pre-tokenizer class. The newly trained Tokenizer was successfully trained …

  10. pytorch - Tokens to Words mapping in the tokenizer decode step ...

    Jun 11, 2020 · Is there a way to know the mapping from the tokens back to the original words in the tokenizer.decode() function? For example: from transformers.tokenization_roberta import …