Strip romaji title and groups

Ask questions and have them answered!
Post Reply
AerialAtom
Posts: 20
Joined: Wed Aug 30, 2017 1:14 am

Strip romaji title and groups

Post by AerialAtom » Wed Jun 23, 2021 9:22 am

So right now I'm downloading a huge amount of doujins and its taking a while because I have to edit the metadata to remove the romanji title and group names. Most of the titles I edit have a format of this - romaji title | english title. Some groups with add their names without enclosures and it shows up on the title.

Is there a way to strip the romaji titles and leave only the english titles?

Scylla na Kanojo no Konkatsu Jijou | A Scylla's Journey to Marriage =Dark Mac + CW=
Last edited by AerialAtom on Thu Jun 24, 2021 6:56 am, edited 1 time in total.

User avatar
Squidy
Site Admin
Posts: 1279
Joined: Fri Mar 10, 2017 9:28 pm
Contact:

Re: Strip romaji title and groups

Post by Squidy » Thu Jun 24, 2021 1:09 am

It's possible-- You can do pretty much anything you want with custom tokens under "Settings > Save to > General > Custom Tokens". The UI for creating them isn't so great and I'd like to rework it, but it does the job. If you let me know what website it is, I could help you get started.

It might also be possible to add a setting for "cleaning up" titles, but it's difficult when there are weird edge cases like this where the group adds their name to the title in a nonstandard way. If this is at least consistent across the site in question, it would be significantly easier; I could extract the English title and set the value of the %ALT_TITLE% token to it.
I'm the admin and developer of HDoujin Downloader.

AerialAtom
Posts: 20
Joined: Wed Aug 30, 2017 1:14 am

Re: Strip romaji title and groups

Post by AerialAtom » Thu Jun 24, 2021 4:35 am

Oh forgot the website, It's Exhentai/E-hentai mostly. The ones that have an =NAME= aren't many, like 121 out of 2400 but there's a lot of split titles, 1014 out of 2400, (just used the filter to check, artist are included too).

AerialAtom
Posts: 20
Joined: Wed Aug 30, 2017 1:14 am

Re: Strip romaji title and groups

Post by AerialAtom » Fri Jun 25, 2021 1:56 pm

Ok so found a way to get rid of the group names and romaji titles but I had to use Notepad++'s find and replace on regex mode.

So I got the answer from here. and some help on /spg/
The expression that I used is:
For group names
find:
=.*?=
replace:
blank

For dual titles
find:
:.*?\|
replace:
:

If I load all of the info.txt, I can replace all the characters and just leave the English title.
I added the one for group names to the Find and Replace in HDoujin and it worked to get rid of it but the dual titles didn't work.

I tried to do this with the program but I couldn't figure it out. For the two titles separated by |, I tried to use Custom Tokens. I ran into some trouble with setting the right bound.
https://exhentai.org/g/1941710/bc95d9d739/
https://e-hentai.org/g/1941458/73c1abc15a/
Image
Image

Post Reply

Who is online

Users browsing this forum: Google [Bot] and 3 guests