Task Configuration
Last updated
Was this helpful?
Last updated
Was this helpful?
Text splitting operates based on the set length range, dividing input text according to rules into appropriate paragraphs for subsequent processing.
Function: Sets the minimum character length for each text fragment after splitting, with a current default value of 1500. If a text segment is shorter than this value, it will be merged with adjacent text segments until it meets the minimum length requirement.
Setting method: Enter the desired value (must be a positive integer) in the input box after "Minimum Length".
The value should not be too large, as it may result in too few text fragments, affecting the flexibility of subsequent processing; it should also not be too small, to avoid text fragments being too fragmented.
Function: Limits the maximum character length of each text fragment after splitting, with a current default value of 2000. Text exceeding this length will be split into multiple fragments.
Setting method: Enter an appropriate value (must be a positive integer and greater than the minimum length value) in the input box after "Maximum Split Length".
Function: Sets the maximum character length for generated questions, with a current default value of 240. Ensures that generated questions are within a reasonable length range for easy reading and understanding.
Setting method: Enter the desired value (must be a positive integer) in the input box after "Question Generation Length".
Function: Sets the probability of removing question marks when generating questions, with a current default value of 60%. The question format can be adjusted according to specific needs.
Setting method: Enter an integer between 0 and 100 (representing percentage probability) in the input box after "Removing Question Marks Probability".
Function: Used to limit the number of simultaneous question generation and dataset generation tasks, avoiding system performance degradation or task failure due to too many tasks occupying too many system resources.
Setting method: Set an appropriate upper limit for concurrent tasks based on system resource conditions and task requirements. Specific operations may require finding the corresponding input box or slider in the relevant settings interface (if available).
When setting, consider factors such as server hardware performance and network bandwidth. If there are too many concurrent tasks, it may lead to long task queue waiting times or even task timeout failures.
Function: MinerU Token is used for authentication and authorization for PDF conversion based on MinerU API.
Setting method: Enter a valid MinerU Token in the corresponding input box. Note that the MinerU Token is only valid for 14 days, and a new Token needs to be replaced promptly after expiration to ensure normal function use.
Function: Limits the number of concurrent tasks related to custom large-scale vision models, reasonably allocates system resources, and ensures the stability and efficiency of model processing tasks.
Setting method: Carefully set concurrency limits based on the computational complexity of the model and system resource conditions. Too high may lead to excessive system load, while too low may not fully utilize system resources.
Function: Hugging Face Token is used for authentication when interacting with the Hugging Face platform to implement functions such as dataset uploading (currently the Hugging Face function has not been implemented, this Token setting is temporarily reserved).
Setting method: Enter the Token generated by the Hugging Face platform in the input box after "hf_".